Mon-1-5-2 FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction

Qiao Tian(Tencent), Zewang Zhang(Tencent), Heng Lu(Tencent), Ling-Hui Chen(Tencent) and Shan Liu(Tencent)

Abstract: In this paper, we propose the FeatherWave, yet another variant of WaveRNN vocoder combining the multi-band signal processing and the linear predictive coding. The LPCNet, a recently proposed neural vocoder which utilized the linear predictive characteristic of speech signal in the WaveRNN architecture, can generate high quality speech with a speed faster than real-time on a single CPU core. However, LPCNet is still not efficient enough for online speech generation tasks. To address this issue, we adopt the multi-band linear predictive coding for WaveRNN vocoder. The multi-band method enables the model to generate several speech samples in parallel at one step. Therefore, it can significantly improve the efficiency of speech synthesis. The proposed model with 4 sub-bands needs less than 1.6 GFLOPS for speech generation. In our experiments, it can generate 24 kHz high-fidelity audio 9x faster than real-time on a single CPU, which is much faster than the LPCNet vocoder. Furthermore, our subjective listening test shows that the FeatherWave can generate speech with better quality than LPCNet.

Paper

prev Mon-1-5-1 Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

next Mon-1-5-3 VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network

About

About the Conference

Welcome from the Chair

Conference Committees

Calls