Statistical Voice Conversion Based on Noisy Channel Model
Daisuke Saito (The University of Tokyo, Japan)
Shinji Watanabe (NTT Corporation, Japan)
Atsushi Nakamura (NTT Corporation, Japan)
Nobuaki Minematsu (The University of Tokyo, Japan)
In this page, some examples generated by our proposed method are available.
Experimental Conditions
- The same as condition C in Section IV in the paper
- Spectral conversion: statistical voice conversion methods based on GMM
- Propodic conversion:
- F0: simple linear transformation in Equation 31 in the paper
- Duration: not converted
- Aperiodic components: fixed to -30 dB at all frequencies
- Power coefficients: simple linear transformation in Equation 31 in the paper
- Source speaker: one male speaker from ATR Japanese speech database [30]
- Training data for joint density model: 1 sentence-pair
- Training data for speaker model: 50 sentences
Speech Samples
- MLVC
- converted speech by maximum likelihood parameter generation [26]
- NCMVC w/o Delta
- converted speech by proposed method without dynamic features
- NCMVC w/ Delta
- converted speech by proposed method with dynamic features
- ASYN
- analysis-synthesized speech of the target speaker
Daisuke Saito (dsk_saito@gavo.t.u-tokyo.ac.jp)