Demo. of Analysis Re-synthesis system "PROSODY ver1.1"
ANALYSIS RE-SYNTHESIS SYSTEM
"PROSODY ver1.1"
This page presents a brief demonstration of Analysis Re-synthesis
System "PROSODY ver1.1". Compared to "PROSODY ver1.0", the quality of
synthesized speech has been greatly improved. I'll very much
appreciate if you send me some comments by E-mails.
Its overview is like this.
The Japanese version is here.
What is "Analysis Re-synthesis" ?
"Analysis Re-synthesis" is one of speech synthesis techniques,
where prosodic features and segmental features consisting in speech
are firstly extracted (analysis), and the reverse processing of the
analysis is conducted secondly (synthesis). If some modifications are
performed on the above two features before the reverse
processing(synthesis), lots of kinds of speech can be obtained from a
single original speech.
If both of the above two features can be completely and absolutely
defined and extracted, the synthesized speech after some parameter
modifications will result in presenting high quality. But it's very
hard to realize it....(^_^;)
This system can manipulate two acoustic parameters, fundamental
frequency (F0) and speaking rate.
This tool was designed in order to produce speech stimuli for
perceptual experiments. And it will be ftp available in the near(?)
future.
Demo. by Japanese speech materials(ver 1.1)
The above demo clearly shows that there is little (no) difference
among spectrum envelopes of the original speech and the synthesized
speech materials. If you want the above original/synthesized speech of
16 bit, 10 kHz sampling, please click
here.
The above demo clearly shows that there is little (no) difference
among spectrum envelopes of the original speech and the synthesized
speech material. If you want the above original/synthesized speech of
16 bit, 10 kHz sampling, please click
here.
The above demo clearly shows that there is little (no) difference
among spectrum envelopes of the original speech and the synthesized
speech material. If you want the above original/synthesized speech of
16 bit, 10 kHz sampling, please click
here.
If you want the above original/synthesized speech of 16 bit, 10 kHz
sampling, please click here.
Demo. by American English speech materials(ver 1.0)
!!! Currently, NOT AVAILABLE !!!
In the figures of pitch patterns, dots ('+' marks) represent F0 values
extracted from the original speech. Solid curves are the pitch
patterns used for the Re-synthesis.
In the Pitch Modification-0 in
Japanese and
English, the solid curve was drawn
manually so that it would be fitted very well to the original F0
values ('+').
Why don't you listen to the above re-synthesized speech with your eyes
following the pitch pattern used for each re-synthesis.
If you have any comment, please E-mail it to
mine@tutics.tut.ac.jp
Information on Other Servers
Dept. of Information and Computer Sciences,
Toyohashi Univ. of Tech.
Nobuaki MINEMATSU(
mine@tutics.tut.ac.jp
)