Statistical Voice Conversion Based on Noisy Channel Model

Statistical Voice Conversion Based on Noisy Channel Model

Daisuke Saito (The University of Tokyo, Japan)

Shinji Watanabe (NTT Corporation, Japan)

Atsushi Nakamura (NTT Corporation, Japan)

Nobuaki Minematsu (The University of Tokyo, Japan)

In this page, some examples generated by our proposed method are available.

Experimental Conditions

The same as condition C in Section IV in the paper
Spectral conversion: statistical voice conversion methods based on GMM
Propodic conversion:

F0: simple linear transformation in Equation 31 in the paper
Duration: not converted
Aperiodic components: fixed to -30 dB at all frequencies
Power coefficients: simple linear transformation in Equation 31 in the paper

Source speaker: one male speaker from ATR Japanese speech database [30]

MSH: sample(1) [ASYN], sample(2) [ASYN]

Training data for joint density model: 1 sentence-pair
Training data for speaker model: 50 sentences

Speech Samples

MLVC: converted speech by maximum likelihood parameter generation [26]
NCMVC w/o Delta: converted speech by proposed method without dynamic features
NCMVC w/ Delta: converted speech by proposed method with dynamic features
ASYN: analysis-synthesized speech of the target speaker

	MLVC	NCMVC w/o Delta	NCMVC w/ Delta	ASYN
MSH to MMY (1)	sample	sample	sample	sample
MSH to MMY (2)	sample	sample	sample	sample
MSH to MTK (1)	sample	sample	sample	sample
MSH to MTK (2)	sample	sample	sample	sample

Daisuke Saito (dsk_saito@gavo.t.u-tokyo.ac.jp)