The NITech text-to-speech system for the Blizzard Challenge 2017

Kei Sawada, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda,
"The NITech text-to-speech system for the Blizzard Challenge 2017,"
Blizzard Challenge 2017 Workshop [paper] [slide] [link]

Compare acoustic models

HMM: HMM-based speech synthesis system (HTS) [link]
DNN: H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," ICASSP 2013, pp.7962-7966 [link]
DNN_Trj and DNN_Trj_GV: K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks," ICASSP 2016, pp.5600-5604 [link]
MDN_Trj_GV: K. Sawada, K. Hashimoto, K. Oura, and K. Tokuda, "The NITech text-to-speech system for the Blizzard Challenge 2017," Blizzard Challenge 2017 Workshop [link]

HMM DNN DNN_Trj DNN_Trj_GV MDN_Trj_GV
Hamlet_00001_00007
Hamlet_00001_00008
Hamlet_00001_00009
Hamlet_00001_00010
Hamlet_00001_00011
Hamlet_00001_00012
Hamlet_00001_00013
Hamlet_00001_00014
Hamlet_00001_00015
Hamlet_00001_00016
Hamlet_00001_00017
Hamlet_00001_00018
Hamlet_00001_00019
Hamlet_00001_00020
Hamlet_00001_00021
Hamlet_00001_00022
Hamlet_00001_00023

Phrase adaptation

Input text Phrase in the training corpus Natural speech in the training corpus Adapted synthetic speech
"I must tell Hamlet.” Zero vector (average phrase)
Come and see the friendly lion!
"Who's been sitting in my chair?"
"I must tell the King.” (the highest similarity phrase)

Synthesized speech samples (submitted)

Hamlet (Usborne Publishing) [picture 1] [picture 2] [picture 3]