The NITech text-to-speech system for the Blizzard Challenge 2017

Kei Sawada, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda,
"The NITech text-to-speech system for the Blizzard Challenge 2017,"
Blizzard Challenge 2017 Workshop [paper] [slide] [link]

Compare acoustic models

HMM: HMM-based speech synthesis system (HTS) [link]
DNN: H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," ICASSP 2013, pp.7962-7966 [link]
DNN_Trj and DNN_Trj_GV: K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks," ICASSP 2016, pp.5600-5604 [link]
MDN_Trj_GV: K. Sawada, K. Hashimoto, K. Oura, and K. Tokuda, "The NITech text-to-speech system for the Blizzard Challenge 2017," Blizzard Challenge 2017 Workshop [link]

	HMM	DNN	DNN_Trj	DNN_Trj_GV	MDN_Trj_GV
Hamlet_00001_00007
Hamlet_00001_00008
Hamlet_00001_00009
Hamlet_00001_00010
Hamlet_00001_00011
Hamlet_00001_00012
Hamlet_00001_00013
Hamlet_00001_00014
Hamlet_00001_00015
Hamlet_00001_00016
Hamlet_00001_00017
Hamlet_00001_00018
Hamlet_00001_00019
Hamlet_00001_00020
Hamlet_00001_00021
Hamlet_00001_00022
Hamlet_00001_00023

Phrase adaptation

Input text	Phrase in the training corpus	Natural speech in the training corpus	Adapted synthetic speech
"I must tell Hamlet.”	Zero vector (average phrase)
	Come and see the friendly lion!
	"Who's been sitting in my chair?"
	"I must tell the King.” (the highest similarity phrase)

Synthesized speech samples (submitted)

Hamlet (Usborne Publishing) [picture 1] [picture 2] [picture 3]

Update Required To play the media you will need to either update your browser to a recent version or update your Flash plugin.