Brazilian Portuguese speech synthesis
based on HMM
New
version (with post-filter)
- Characteristics
- Training database: 221 utterances
- Phone set: 40 models
- Vowels: a, a~, E, e, e~, O, o, o~, u, u~, i, i~
- Semi-vowels: j, j~, w, w~
- Consonants: d, b, p, t, k, g, v, f, z, s, S, Z, m, n, J, L,
l, r, R, X, tS, dZ
- Inclusion of the following contextual factors
- Phones: before the preceeding, and after the succeeding
- Position of the syllable in the phrase
- Vowel of the syllable
- Speech samples
- Sample 1: "Isto representa um teste para o
sintetizador em português do Brasil." [simple
excitation]
- Sample 3: "Prefiro ser essa metamorfose ambulante
do que ter aquela velha opinião formada sobre tudo." [simple
excitation]
- Sample 5: "Debaixo dos caracóis dos seus
cabelos, tanta história pra contar dum mundo tão distante,
e o sorriso e a vontade de ficar mais um instante." [simple
excitation]
Old version
- Characteristics
- Training database: 160 utterances
- Vowels: a, a~, E, e, e~, O, o, o~, u, u~, i, i~
- Diphthongs: aj, aw, ej, ew, oj, ow, wa
- Consonants: d, b, p, t, k, g, v, f, z, s, S, Z, m, n, J, L,
l, r, R, X
- Speech samples
European
Portuguese speech synthesis based on HMM (implemented jointly with
Prof. Maria João Barros, during her one-month visit at
Kitamura-Tokuda Lab)
- Characteristics
- Training database: 104 utterances, 21 minutes including silence
regions
- Phone set: 41 models
- Vowels: A, a, A~, E, e, @, e~, O, o, o~, u, u~, i, i~
- Semi-vowels: j, j~, w, w~
- Consonants: d, b, p, t, k, g, v, f, z, s, S, Z, m, n, J, L,
l, l~, r, R
- Difference, concerning the contextual factors, from the
Brazilian Portuguese version
- Absence of part-of-speech information
- Inclusion of information about dots and exclamation mark
- Speech samples
- Sample 1: "Isto representa um teste para o
sintetizador em português europeu." [simple
excitation]
- Sample 3: "Prefiro ser essa metamorfose ambulante
do que ter aquela velha opinião formada sobre tudo." [simple
excitation]
- Sample 5: "Debaixo dos caracóis dos seus
cabelos, há tanta história pra contar dum mundo tão
distante, e o sorriso e a vontade de ficar mais um instante." [simple
excitation]
On the application of
mixed-excitation to HMM-based TTS
Phonetic vocoding at 265 bps
with mixed-excitation
CELP speech coding at 4 kbps
Some standardized speech coders
-
Original speech at 8 kHz.
- Decoded speech - G.711 PCM 64 kbps.
- Decoded speech - G.721 ADPCM 32 kbps.
- Decoded speech - FS-1016 CELP 4.8 kbps.
- Decoded speech - MELP 2.4 kbps.
- Decoded speech - FS-1015 LPC10 1.2 kbps.
Demonstration
for Eurospeech 2003
Main