PeriodCodec: A Pitch-Controllable Neural Audio Codec Using Periodic Signals for Singing Voice Synthesis
*Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Nagoya Institute of Technology, Japan
*Masato Takagi, Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Nagoya Institute of Technology, Japan
Neural audio codecs (NACs) have attracted considerable attention in the field of text-to-speech. However, previous methods don't offer a mechanism for explicit controlling the fundamental frequency (F0), hence they are not suitable for singing voice synthesis. To overcome this limitation, we propose a NAC that can control F0 by introducing explicit periodic signals into the decoder. This architecture enables direct manipulation of F0 during the synthesis process. Experimental results show that our proposed method achieves F0 control and improves synthesis quality compared to previous methods. Furthermore, by including singing voices in the training data set, we showed that both F0 controllability and the quality of singing voices are improved, enabling the construction of a NAC suitable for singing voice synthesis tasks.
| Name | Proposed | Further Experiments | |
|---|---|---|---|
| Explicit Periodic Signal Generator & Downsampler | Pitch Predictor with Gradient Reversal Layer | Training with singing voice data (GTSinger) | |
| Base | |||
| Base+GT | ✅ | ||
| Period | ✅ | ||
| Period+GT | ✅ | ✅ | |
| Period-GRL | ✅ | ✅ | |
| Period-GRL+GT | ✅ | ✅ | ✅ |
Natural (Ground Truth)
Base
Base+GT
Period
Period+GT
Period-GRL
Period-GRL+GT
Natural
Period+GT
Period+GT-GRL
Period+GT
Period+GT-GRL
Period+GT
Period+GT-GRL
Period+GT
Period+GT-GRL