PeriodCodec

PeriodCodec: A Pitch-Controllable Neural Audio Codec Using Periodic Signals for Singing Voice Synthesis

*Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Nagoya Institute of Technology, Japan

Paper

Abstract

Neural audio codecs (NACs) have attracted considerable attention in the field of text-to-speech. However, previous methods don't offer a mechanism for explicit controlling the fundamental frequency (F0), hence they are not suitable for singing voice synthesis. To overcome this limitation, we propose a NAC that can control F0 by introducing explicit periodic signals into the decoder. This architecture enables direct manipulation of F0 during the synthesis process. Experimental results show that our proposed method achieves F0 control and improves synthesis quality compared to previous methods. Furthermore, by including singing voices in the training data set, we showed that both F0 controllability and the quality of singing voices are improved, enabling the construction of a NAC suitable for singing voice synthesis tasks.

Methods

Name	Proposed	Further Experiments
	Explicit Periodic Signal Generator & Downsampler	Pitch Predictor with Gradient Reversal Layer	Training with singing voice data (GTSinger)
Base
Base+GT			✅
Period	✅
Period+GT	✅		✅
Period-GRL	✅	✅
Period-GRL+GT	✅	✅	✅