TOKUDA, NANKAKU and HASHIMOTO LABORATORY - HOME/SOFTWARE の履歴(No.4)

Software

In Tokuda & Lee laboratory, softwares for promotion of speech and image research are developed and opened to the public. These are used to research in various organizations and companies.

Software

HMM Speech Sysnthesis System toolkit: HTS

HTS is a basic software for speech synthesis that a lot of research laboratories (Microsoft, IBM, etc) adopt.

Open-source large vocabulary continuous speech recognition engine: Julius

Julius is an open-source and a high-performance, general-purpose large vocabulary continuous speech recognition engine for developments and researches of the speech recognition system. It can perform almost real-time decording on most current PCs in tens of thousands of vocabulary. It has high generality and can apply to various, wide usages by rearranging modules such as a pronunciation dictionary, a language model and an acoustic model, etc. The function is offered in the library, and the inclusion to application is also possible.

Speech signal processing toolkit: SPTK

This is a software that does signal processing and data processing for the acoustical analysis.

Speech synthesis engine: hts_engine

It is a software which synthesis speech by using model learned by HTS. It opens to the public with BSD license.

Anthropomorphic spoken dialogue agent: Galatea

This is an open-source, license-free software toolkit for building anthropomorphic spoken dialogue agents. This is a product of project which speech, language, and image researchers from ten or more university in Japan participate to build anthropomorphic spoken dialogue agents. HTS and Julius, those are developed in this laboratory, are used on speech wave form generation module and speech recognition module respectively in this software.

System

Nitech Campus information Guidance System "Meityan" (Japanese)

Speech information Guidance System is located at the ground floor of second building in Nagoya Institute of Technology. Please talk when you come Nagoya Institute of Technology.

Database

Multimodal speech data base for research: M2TINIT (Japanese)

M2TINIT is a multi-modal data base which japanese speech and lip dynamic scene are recorded concurrently. It is developed and opened to the public by Takao Kobayashi laboratory (Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology) and Kitamura & Tokuda laboratory (Department of Computer Science, Nagoya Institute of Technology. Currently, Tokuda & Lee laboratory) for promotion of multi-modal speech research. It has been used to researches that are generation of speech and lip dynamic scene and bimodal speech recognition.