Takenori Yoshimura

Name

Takenori Yoshimura

Society

Acoustical Society of Japan
IEEE Signal Processing Society

Experience

Apr. 2021 -- Present
Researcher at Techno-Speech, Inc.
Apr. 2020 -- Present
Researcher at Nagoya Institute of Technology
Apr. 2019 -- Present
Engineer at Human Dataware Lab, Co., Ltd.
Apr. 2019 -- Mar. 2021
Engineer at TARVO, Inc.
Oct. 2018 -- Present
Visiting Assistant Professor in Speech and Language Processing Laboratory at Nagoya Institute of Technology
Oct. 2018 -- Mar. 2020
Researcher in Institutes of Innovation for Future Society at Nagoya University
Jan. 2017 -- Mar. 2017
Intern at NTT Communications Science Laboratories
Sep. 2015 -- Feb. 2016
Visiting Researcher at University of Edinburgh

Education

Apr. 2015 -- Sep. 2018
Department of Scientific and Engineering Simulation, Nagoya Institute of Technology (Ph.D.)
Apr. 2013 -- Mar. 2015
Department of Scientific and Engineering Simulation, Nagoya Institute of Technology (Master)
Apr. 2011 -- Mar. 2013
Department of Computer Science, Nagoya Institute of Technology (Bachelor)
Apr. 2006 -- Mar. 2011
Department of Electrical and Computer Engineering, Gifu National College of Technology

Award

Nov. 2020 - DCASE2020 Challenge Task 2 Jury's Award
Dec. 2019 - The 13th IEEE Signal Processing Society Japan Student Journal Paper Award
Mar. 2018 - The 16th Student Presentation Award of the Acoustical Society of Japan

Journal

Yoshihiko Nankaku, Takato Fujimoto, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda,
"Deep hidden semi-Markov model-based speech synthesis,"
IEEE Access, vol. 14, pp. 58495-58514, April 2026. (peer reviewed)
(DOI: 10.1109/ACCESS.2026.3683761)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Mel-cepstrum-based quantization noise shaping applied to neural-network-based speech waveform synthesis,"
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 7, pp. 1173-1180, July 2018. (peer reviewed)
(DOI: 10.1109/TASLP.2018.2818408)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Simultaneous optimization of multiple tree-based factor analyzed HMM for speech synthesis,"
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 9, pp. 1532-1541, September 2017. (peer reviewed)
(DOI: 10.1109/TASLP.2017.2721219)

International Conference

Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Takato Fujimoto, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"SSLZip: Simple autoencoding for enhancing self-supervised speech representations in speech generation,"
Proceedings of 13th ISCA Speech Synthesis Workshop, pp. 117-122, Leeuwarden, Netherlands, August 2025. (peer reviewed, poster)
Takenori Yoshimura, Takato Fujimoto, Keiichiro Oura, and Keiichi Tokuda,
"SPTK4: An open-source software toolkit for speech signal processing,"
Proceedings of 12th ISCA Speech Synthesis Workshop, pp. 211-217, Grenoble, France, August 2023. (peer reviewed, poster)
Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Embedding a differentiable mel-cepstral synthesis filter to a neural speech synthesis system,"
Proceedings of ICASSP 2023, pp. 1-5, Rhodes Island, Greece, June 2023. (peer reviewed, oral) arXiv
Ibuki Kuroyanagi, Tomoki Hayashi, Yusuke Adachi, Takenori Yoshimura, Kazuya Takeda, and Tomoki Toda,
"An ensemble approach to anomalous sound detection based on conformer-based autoencoder and binary classifier incorporated with metric learning,"
Proceedings of DCASE 2021, pp. 110-114, November 2021. (peer reviewed, video)
Tomoki Hayashi, Takenori Yoshimura, Masaya Inuzuka, Ibuki Kuroyanagi, and Osamu Segawa,
"Spontaneous speech summarization: Transformers all the way through,"
Proceedings of EUSIPCO 2021, pp. 456-460, August 2021. (peer reviewed, video)
Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, and Shinji Watanabe,
"End-to-end automatic speech recognition integrated with CTC-based voice activity detection,"
Proceedings of ICASSP 2020, pp. 6999-7003, May 2020. (peer reviewed, video) arXiv
Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, and Xu Tan,
"ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit,"
Proceedings of ICASSP 2020, pp. 7654-7658, May 2020. (peer reviewed, video) arXiv
Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, and Wangyou Zhang,
"A comparative study on Transformer vs RNN in speech applications,"
Proceedings of ASRU 2019, pp. 449-456, Sentosa, Singapore, December 2019. arXiv
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Speaker-dependent WaveNet-based delay-free ADPCM speech coding,"
Proceedings of ICASSP 2019, pp. 7145-7149, Brighton, UK, May 2019. (peer reviewed, poster)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"WaveNet-based zero-delay lossless speech coding,"
Proceedings of SLT 2018, pp. 153-158, Athens, Greece, December 2018. (peer reviewed, poster)
Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Discriminative feature extraction based on sequential variational autoencoder for speaker recognition,"
Proceedings of APSIPA ASC 2018, pp. 1742-1746, Hawaii, USA, November 2018. (peer reviewed, poster)
Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Speech synthesis using WaveNet vocoder based on periodic/aperiodic decomposition,"
Proceedings of APSIPA ASC 2018, pp. 644-648, Hawaii, USA, November 2018. (peer reviewed, oral)
Kei Sawada, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"The NITech text-to-speech system for the Blizzard Challenge 2018,"
The Blizzard Challenge 2018 workshop, Hyderabad, India, September 2018. (non-peer reviewed, oral)
Jumpei Niwa, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Statistical voice conversion based on WaveNet,"
Proceedings of ICASSP 2018, pp. 5289-5293, Calgary, Canada, April 2018. (peer reviewed, poster)
Amelia J. Gully, Takenori Yoshimura, Damian T. Murphy, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Articulatory text-to-speech synthesis using the digital waveguide mesh driven by a deep neural network,"
Proceedings of Interspeech 2017, pp. 234-238, Stockholm, Sweden, August 2017. (peer reviewed, oral)
Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, and Keiichi Tokuda,
"A hierarchical predictor of synthetic speech naturalness using neural networks,"
Proceedings of Interspeech 2016, pp. 342-346, San Francisco, USA, September 2016. (peer reviewed, poster)
Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Simultaneous optimization of multiple tree structures for factor analyzed HMM-based speech synthesis,"
Proceedings of Interspeech 2015, pp. 1196-1200, Dresden, Germany, September 2015. (peer reviewed, oral)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Cross-lingual speaker adaptation based on factor analysis using bilingual speech data for HMM-based speech synthesis,"
Proceedings of 8th ISCA Speech Synthesis Workshop, pp. 297-302, Barcelona, Spain, September 2013. (peer reviewed, poster)

Technical Report

Ibuki Kuroyanagi, Tomoki Hayashi, Yusuke Adachi, Takenori Yoshimura, Kazuya Takeda, and Tomoki Toda,
"Anomalous sound detection with ensemble of autoencoder and binary classification approaches,"
DCASE2021 Challenge, July 2021. PDF
Tomoki Hayashi, Takenori Yoshimura, and Yusuke Adachi,
"Conformer-based ID-aware autoencoder for unsupervised anomalous sound detection,"
DCASE2020 Challenge, July 2020. PDF
Jumpei Niwa, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"A study on voice conversion based on WaveNet,"
IEICE Technical Report, vol. 117, no. 393, pp. 99-104, Tokyo, Japan, January 2018. (non-peer reviewed, oral)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Mel-cepstrum based quantization noise shaping applied to speech synthesis based on WaveNet,"
IEICE Technical Report, vol. 117, no. 393, pp. 93-98, Tokyo, Japan, January 2018. (non-peer reviewed, oral)

Preprint

Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, and Shinji Watanabe,
"ESPnet2-TTS: Extending the edge of TTS research,"
arXiv preprint, October 2021. arXiv
Yoshihiko Nankaku, Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, and Keiichi Tokuda,
"Neural sequence-to-sequence speech synthesis using a hidden semi-Markov model based structured attention mechanism,"
arXiv preprint, August 2021. arXiv

Talk

"Introduction to SPTK: A toolkit for speech signal processing,"
Voice of Wellness 2025,
Nara, Japan, September 2025.

Domestic Conference

Yuta Imamura, Yukiya Hono, Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Neural vocoder embedding a mel-cepstrum synthesis filter with a structure separating periodic and aperiodic components,"
Proceedings of ASJ 2025 Autumn Meeting, 2-1-2, pp. 1177-1180, Sendai, Japan, September 2025. (non-peer reviewed, oral)
Motohiro Kunda, Takato Fujimoto, Yukiya Hono, Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Frame-level neural vocoder utilizing phase information of periodic signals,"
Proceedings of ASJ 2025 Spring Meeting, 1-2-8, pp. 905-908, Saitama, Japan, March 2025. (non-peer reviewed, oral)
Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Embedding a differentiable mel-cepstral synthesis filter to an end-to-end speech synthesis system,"
Proceedings of ASJ 2022 Autumn Meeting, 1-8-17, pp. 1585-1588, Sapporo, Japan, September 2022. (non-peer reviewed, oral)
Kazumasa Sasaki, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"Neural vocoders which can control voice characteristics, average pitch and speaking rate,"
Proceedings of ASJ 2022 Spring Meeting, 1-3-18, pp. 935-938, Online, March 2022. (non-peer reviewed)
Kenta Sumiya, Takenori Yoshimura, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Sequence-to-sequence speech synthesis using a hidden semi-Markov model based structured attention mechanism,"
Proceedings of ASJ 2021 Spring Meeting, 3-2-23, pp. 943-946, Online, March 2021. (non-peer reviewed)
Tomoki Hayashi, Ryuichi Yamamoto, Katuski Inoue, Takenori Yoshimura, Kazuya Takeda, Tomoki Toda, and Shinji Watanabe,
"ESPnet-TTS: A toolkit to accelerate research on end-to-end speech synthesis,"
Proceedings of ASJ 2020 Spring Meeting, 1-2-7, pp. 1267-1268, Saitama, Japan, March 2020. (non-peer reviewed)
Takenori Yoshimura, Natsumi Koike, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Feature extraction based on sequential variational autoencoder for speaker recognition,"
Proceedings of ASJ 2018 Autumn Meeting, 3-2-6, pp. 1341-1344, Oita, Japan, Septemper 2018. (non-peer reviewed, oral)
Takato Fujimoto, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Periodic/aperiodic decomposition based speech synthesis using WaveNet vocoder,"
Proceedings of ASJ 2018 Autumn Meeting, 2-4-1, pp. 1125-1126, Oita, Japan, Septemper 2018. (non-peer reviewed, oral)
Kei Sawada, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Overview of the NITech text-to-speech system for the Blizzard Challenge 2018,"
Proceedings of ASJ 2018 Autumn Meeting, 1-4-5, pp. 1091-1094, Oita, Japan, Septemper 2018. (non-peer reviewed, oral)
Jumpei Niwa, Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"WaveNet-based voice conversion,"
Proceedings of ASJ 2017 Autumn Meeting, 1-8-15, pp. 207-208, Matsuyama, Japan, September 2017. (non-peer reviewed, oral)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Mel-cepstrum based quantization noise shaping applied for WaveNet,"
Proceedings of ASJ 2017 Autumn Meeting, 1-8-8, pp. 193-194, Matsuyama, Japan, September 2017. (non-peer reviewed, oral)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"A design of voice recording software tool to effectively collect speech data based on crowdsourcing,"
Proceedings of ASJ 2016 Spring Meeting, 1-R-29, pp. 307-308, Yokohama, Japan, March 2016. (non-peer reviewed, poster)
Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda,
"A clustering technique for factor analyzed HMM-based speech synthesis,"
Proceedings of ASJ 2014 Autumn Meeting, 2-7-4, pp. 239-240, Sapporo, Japan, September 2014. (non-peer reviewed, oral)
Takenori Yoshimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda,
"Cross-lingual speaker adaptation based on factor analysis using bilingual speech data,"
Proceedings of ASJ 2013 Spring Meeting, 1-7-3, pp. 267-268, Tokyo, Japan, March 2013. (non-peer reviewed, oral)

Thesis

Takenori Yoshimura,
"Acoustic and waveform modeling for statistical speech synthesis,"
Doctoral Dissertation, Nagoya Institute of Technology, August 2018.
Takenori Yoshimura,
"Context clustering with multiple tree structures for factor analyzed HMM-based speech synthesis,"
Master Thesis, Nagoya Institute of Technology, February 2015.
Takenori Yoshimura,
"Cross-lingual speaker adaptation based on factor analysis for HMM-based speech synthesis,"
Bachelor Thesis, Nagoya Institute of Technology, February 2013.

Projects

Apr. 2015 -- Present
SPTK SPTK4 diffsptk
Feb. 2019 -- Oct. 2025
ESPnet
Apr. 2015 -- Mar. 2021
HTS
Apr. 2015 -- Mar. 2017
JST CREST uDialogue

Patent

Hirokazu Kameoka, Takenori Yoshimura, "Signal analyzer, method, and program,"
JP2019070775

Contact Information

Address
Nagoya Institute of Technology Bldg.4 #527, Gokiso-cho, Showa-ku, Nagoya, 466-8555, JAPAN
Phone
+81-52-735-7552
E-mail
takenori at sp.nitech.ac.jp
Please do not ask any questions about software to me directly. Use the mailing list instead.

Link