Demo. of Analysis Re-synthesis system "PROSODY ver1.1"

ANALYSIS RE-SYNTHESIS SYSTEM
"PROSODY ver1.1"

This page presents a brief demonstration of Analysis Re-synthesis System "PROSODY ver1.1". Compared to "PROSODY ver1.0", the quality of synthesized speech has been greatly improved. I'll very much appreciate if you send me some comments by E-mails.
Its overview is like this.

The Japanese version is here.

What is "Analysis Re-synthesis" ?

"Analysis Re-synthesis" is one of speech synthesis techniques, where prosodic features and segmental features consisting in speech are firstly extracted (analysis), and the reverse processing of the analysis is conducted secondly (synthesis). If some modifications are performed on the above two features before the reverse processing(synthesis), lots of kinds of speech can be obtained from a single original speech.
If both of the above two features can be completely and absolutely defined and extracted, the synthesized speech after some parameter modifications will result in presenting high quality. But it's very hard to realize it....`(^_^;)`
This system can manipulate two acoustic parameters, fundamental frequency (F0) and speaking rate.

This tool was designed in order to produce speech stimuli for perceptual experiments. And it will be ftp available in the near(?) future.

Demo. by Japanese speech materials(ver 1.1)

Original Speech(25Kb) ->Analysis ->F0 &Spectrum Envelope of Orig. Speech
Analysis
-> F0 Modification-0
-> Re-synthesis(25Kb)
-> Spectrum Envelope of Synth. Speech-0
Analysis
-> F0 Modification-1
-> Re-synthesis(25Kb)
-> Spectrum Envelope of Synth. Speech-1
Analysis
-> F0 Modification-2
-> Re-synthesis(25Kb)
-> Spectrum Envelope of Synth. Speech-2
Analysis
-> F0 Modification-3
-> Re-synthesis(25Kb)
-> Spectrum Envelope of Synth. Speech-3

The above demo clearly shows that there is little (no) difference among spectrum envelopes of the original speech and the synthesized speech materials. If you want the above original/synthesized speech of 16 bit, 10 kHz sampling, please click here.

Original Speech with flat F0 pattern(28Kb)
->Analysis ->F0()& Spectrum Envelope of Orig. Speech
Analysis
-> Manual Construction of Adequate Accents(Solid Curve)
-> Re-synthesis(28Kb)
-> Spectrum Envelope of Synth. Speech

The above demo clearly shows that there is little (no) difference among spectrum envelopes of the original speech and the synthesized speech material. If you want the above original/synthesized speech of 16 bit, 10 kHz sampling, please click here.

Original Speech with flat F0 pattern(28Kb)
->Analysis ->F0() &Spectrum Envelope of Orig. Speech
Analysis
-> Manual Construction of Adequate Accents(Solid Curve)
-> Re-synthesis(28Kb)
-> Spectrum Envelope of Synth. Speech

The above demo clearly shows that there is little (no) difference among spectrum envelopes of the original speech and the synthesized speech material. If you want the above original/synthesized speech of 16 bit, 10 kHz sampling, please click here.

Original Speech(24Kb)
->Analysis->Speech Rate Modification(Duration X 0.8)
->Re-synthesis(19Kb)
Original Speech(23Kb)
->Analysis->Speech Rate Modification(Duration X 1.0)
->Re-synthesis(23Kb)
Original Speech(22Kb)
->Analysis->Speech Rate Modification(Duration X 1.2)
->Re-synthesis(27Kb)
Original Speech(23Kb)
->Analysis->Speech Rate Modification(Duration X 1.5)
->Re-synthesis(34Kb)
Original Speech(22Kb)
->Analysis->Speech Rate Modification(Duration X 1.8)
->Re-synthesis(39Kb)

If you want the above original/synthesized speech of 16 bit, 10 kHz sampling, please click here.

Demo. by American English speech materials(ver 1.0)
!!! Currently, NOT AVAILABLE !!!

Original Speech(audio file 17Kbyte)
Analysis-> Pitch Modification-0->Re-synthesis(audio file 17Kbyte)
Analysis-> Pitch Modification-1->Re-synthesis(audio file 17Kbyte)
Analysis-> Pitch Modification-2->Re-synthesis(audio file 17Kbyte)
Analysis-> Pitch Modification-3->Re-synthesis(audio file 17Kbyte)
Analysis->Speech Rate Modification (x 0.8)->
Re-synthesis(audio file 14Kbyte)
Analysis->Speech Rate Modification (x 1.2)->
Re-synthesis(audio file 20Kbyte)
In the figures of pitch patterns, dots ('+' marks) represent F0 values extracted from the original speech. Solid curves are the pitch patterns used for the Re-synthesis.
In the Pitch Modification-0 in Japanese and English, the solid curve was drawn manually so that it would be fitted very well to the original F0 values ('+').
Why don't you listen to the above re-synthesized speech with your eyes following the pitch pattern used for each re-synthesis.

If you have any comment, please E-mail it to mine@tutics.tut.ac.jp

Information on Other Servers

Go to the Home Page of N. Minematsu.
Go to the Home Page of the Minematsus
Go to the Home Page of Nakagawa Laboratory
Go to the Home Page of Toyohashi Univ. of Tech.
Go to the Home Page of Hirose Laboratory.
Go to the Home Page of EEICE.
Go to the Home Page of Univ. of Tokyo.

Dept. of Information and Computer Sciences, Toyohashi Univ. of Tech.
Nobuaki MINEMATSU( mine@tutics.tut.ac.jp )