Tomoki Toda

Graduate School of Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, JAPAN
E-mail: tomoki@ics.nitech.ac.jp
Web: http://kt-lab.ics.nitech.ac.jp/~tomoki/index_e.html

Research interests

Tomoki Toda is interested in speech and acoustic processing, in particular speech synthesis. His research topics include speech analysis, speech synthesis and speech conversion. His research target is to realize speech synthesis having high quality and flexibility.
Keywords: speech signal processing, statistical model, feature mapping, articulatory parameter and unit selection

Education

Apr. 1995 - Mar. 1999
- School of Engineering, Nagoya University, Japan
  - B.E. degree in Electrical and Electronic Engineering and Information Engineering, 1999
Apr. 1999 - Mar. 2003
- Graduate School of Information Science, Nara Institute of Science and Technology, Japan
  - M.E. degree in engineering, 2001
  - Ph.D. degree in engineering, 2003

Professional Experience

Mar. 2001 - Mar. 2003
- Advanced Telecommunications Research Institute International, Spoken Language Translation Research Laboratories (ATR-SLT), Japan
  - Intern Researcher
Apr. 2003 - Sep. 2003
- ATR-SLT, Japan
  - Visiting Researcher
Apr. 2003 - Present (until Mar. 2005)
- Japan Society for the Promotion of Science (JSPS)
  - Research Fellow (Affiliation: Nagoya Institute of Technology)
Oct. 2003 - Sep. 2004
- Language Technologies Institute, Carnegie Mellon University, USA
  - Visiting Researcher
Oct. 2004 - Present
- ATR-SLT, Japan
  - Visiting Researcher

Publications

Journal Papers
1. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``An Evaluation of Cost Functions Capturing Both Total and Local Degradation of Naturalness for Segment Selection in Concatenative Speech Synthesis,'' Speech Communication. (Conditionally accepted).
2. K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano, ``Designing Target Cost Function Based on Prosody of Speech Database,'' IEICE Transactions, Vol. E88-D, No. 3, pp. 519-524, Mar. 2005.
3. T. Masuda, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano, ``Speech Databases with Various Prosody and Its Evaluation on Speech Rate,'' IEICE Transactions in Japanese, Vol. J87-D-II, No. 2, pp. 447-455, Feb. 2004.
4. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``A Segment Selection Algorithm for Japanese Concatenative Speech Synthesis Based on Both Phoneme Unit and Diphone Unit,'' IEICE Transactions in Japanese, Vol. J85-D-II, No. 12, pp. 1760-1770, Dec. 2002.
5. M. Mashimo, T. Toda, H. Kawanami. K. Shikano, N. Campbell, ``Cross-language Voice Conversion Evaluation Using Bilingual Databases,'' IPSJ Journal, Vol. 43, No. 7, pp. 2177-2185, July 2002.
6. T. Toda, J. Lu, H. Saruwatari, K. Shikano, ``Voice Conversion Algorithm Based on Gaussian Mixture Model with Dynamic Frequency Warping,'' IEICE Transactions in Japanese, Vol. J84-D-II, No. 10, pp. 2181-2189, Oct. 2001.
7. T. Toda, H. Banno, S. Kajita, K. Takeda, F. Itakura, K. Shikano, ``Improvement of STRAIGHT Method under Noisy Conditions Based on Lateral Inhibitive Weighting,'' IEICE Transactions in Japanese, Vol. J83-D-II, No. 11, pp. 2180-2189, Nov. 2000.
International Conferences
1. T. Toda, A.W. Black, K. Tokuda, ``Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2005), Vol. 1, pp. 9-12, Philadelphia, USA, Mar 2005.
2. T. Toda, A.W. Black, K. Tokuda, ``Acoustic-to-Articulatory Inversion Mapping with Gaussian Mixture Model,'' Proc. International Conference on Spoken Language Processing (ICSLP2004), pp. 1129-1132, Jeju, Korea, Oct. 2004.
3. T. Toda, A.W. Black, K. Tokuda, ``Mapping from Articulatory Movements to Vocal Tract Spectrum with Gaussian Mixture Model for Articulatory Speech Synthesis,'' Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 31-36, Pittsburgh, USA, June 2004.
4. H. Kawai, T. Toda, J. Ni, M. Tsuzaki, K. Tokuda, ``XIMERA: A New TTS from ATR Based on Corpus-Based Technologies,'' Proc. 5th ISCA Speech Synthesis Workshop (SSW5), pp. 179-184, Pittsburgh, USA, June 2004.
5. K. Adachi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano, ``Perceptual Evaluation of Quality Deterioration Owing to Prosody Modification,'' Proc. the 4th International Conference on Language Resources and Evaluation (LREC2004), pp. 2159-2162, Lisbon, Portugal, May 2004.
6. T. Toda, H. Kawai, M. Tsuzaki, ``Optimizing Sub-Cost Functions for Segment Selection Based on Perceptual Evaluations in Concatenative Speech Synthesis,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2004), pp. 657-660, Montreal, Canada, May 2004.
7. H. Kawai, T. Toda, ``An Evaluation of Automatic Phone Segmentation for Concatenative Speech Synthesis,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2004), pp. 677-680, Montreal, Canada, May 2004.
8. T. Toda, H. Kawai, M. Tsuzaki, ``Optimizing Integrated Cost Function for Segment Selection in Concatenative Speech Synthesis Based on Perceptual Evaluations,'' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2003), pp. 297-300, Geneva, Switzerland, Sep. 2003.
9. T. Shiraishi, T. Toda, H. Kawanami, H. Saruwatari, K. Shikano, ``Simple Designing Methods of Corpus-Based Visual Speech Synthesis,'' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2003), pp. 2241-2244, Geneva, Switzerland, Sep. 2003.
10. H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, K. Shikano, ``GMM-based Voice Conversion Applied to Emotional Speech Synthesis,'' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2003), pp. 2401-2404, Geneva, Switzerland, Sep. 2003.
11. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``Segment Selection Considering Local Degradation of Naturalness in Concatenative Speech Synthesis,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2003), pp. 696-699, Hong Kong, China, Apr. 2003.
12. M. Mashimo, T. Toda, H. Kawanami, H. Kashioka, K. Shikano, N. Campbell, ``Evaluation of Cross-language Voice Conversion Using Bilingual And Non-bilingual Databases,'' Proc. International Conference on Spoken Language Processing (ICSLP2002), pp. 293-296, Denver, USA, Sep. 2002.
13. H. Kawanami, T. Masuda, T. Toda, K. Shikano, ``Designing Japanese Speech Database Covering Wide Range in Prosody,'' Proc. International Conference on Spoken Language Processing (ICSLP2002), pp. 2425-2428, Denver, USA, Sep. 2002.
14. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``Perceptual Evaluation of Cost for Segment Selection in Concatenative Speech Synthesis,'' Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, Sep. 2002.
15. H. Kawanami, T. Masuda, T. Toda, K. Shikano, ``Designing Speech Database with Prosodic Variety for Expressive TTS System,'' Proc. International Conference on Language Resources and Evaluation (LREC2002), pp. 2039-2042, Las Palmas, Spain, May 2002.
16. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``Unit Selection Algorithm for Japanese Speech Synthesis Based on Both Phoneme Unit And Diphone Unit,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2002), pp. 465-468, Orlando, USA, May 2002.
17. T. Toda, H. Saruwatari, K. Shikano, ``High Quality Voice Conversion Based on Gaussian Mixture Model with Dynamic Frequency Warping,'' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2001), pp. 349-352, Aalborg, Denmark, Sep. 2001.
18. M. Mashimo, T. Toda, K. Shikano, N. Campbell, ``Evaluation of Cross-language Voice Conversion Based on GMM And STRAIGHT,'' Proc. European Conference on Speech Communication and Technology (EUROSPEECH2001), pp. 361-364, Aalborg, Denmark, Sep. 2001.
19. T. Toda, H. Saruwatari, K. Shikano, ``Voice Conversion Algorithm Based on Gaussian Mixture Model with Dynamic Frequency Warping of STRAIGHT Spectrum,'' Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2001), pp. 841-844, Salt Lake City, USA, May 2001.
20. T. Toda, J. Lu, H. Saruwatari, K. Shikano, ``STRAIGHT-based Voice Conversion Algorithm Based on Gaussian Mixture Model,'' Proc. International Conference on Spoken Language Processing (ICSLP2000), pp. 279-282, Beijing, China, Oct. 2000.
21. T. Toda, J. Lu, S. Nakamura, K. Shikano, ``Voice Conversion Algorithm Based on Gaussian Mixture Model Applied to STRAIGHT,'' Proc. The Seventh Western Pacific Regional Acoustics Conference (WESTPRAC VII), pp. 169-172, Kumamoto, Japan, Oct. 2000.
Others
1. T. Toda, ``Overview of Voice Conversion,'' 5th ISCA Speech Synthesis Workshop (SSW5), Tutorial, Pittsburgh, USA, June 2004.
2. T. Toda, H. Kawai, M. Tsuzaki, K. Shikano, ``Optimizing Segment Selection for High-Quality Text-to-Speech,'' ATR Technical Report, TR-SLT-0033, Unpublished report, Mar. 2003.
3. H. Kawanami, Y. Iwami, T. Toda, K. Shikano, ``Synthesizing Emotional Speech Using Voice Conversion Technique Based on GMM with DFW and Its Evaluation,'' IEEE 2002 Workshop on Speech Synthesis, Demo presentation, Santa Monica, USA, Sep. 2002.

Award

TELECOM System Technology Award for Student from the Telecommunications Advancement Foundation in 2003

Memberships

Institute of Electronics, Information and Communication Engineers of Japan (IEICE)
The Acoustical Society of Japan (ASJ)
International Speech Communication Association (ISCA)

[Tomoki Toda]

Tomoki Toda

Graduate School of Engineering Nagoya Institute of Technology Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, JAPAN E-mail: tomoki@ics.nitech.ac.jp Web: http://kt-lab.ics.nitech.ac.jp/~tomoki/index_e.html

Research interests

Education

Professional Experience

Publications

Journal Papers

International Conferences

Others

Award

Memberships

Graduate School of Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya-shi, Aichi 466-8555, JAPAN
E-mail: tomoki@ics.nitech.ac.jp
Web: http://kt-lab.ics.nitech.ac.jp/~tomoki/index_e.html