Thu-1-1-5 StrawNet: Self-Training WaveNet for TTS in Low-Data Regimes

Manish Sharma(Google), Tom Kenter(Google UK) and Robert Clark(Google, UK)

Abstract: Recently, WaveNet has become a popular choice of neural network to synthesize speech audio. Autoregressive WaveNet is capable of producing high-fidelity audio, but is too slow for real-time synthesis. As a remedy, Parallel WaveNet was proposed, which can produce audio faster than real time through distillation of an autoregressive teacher into a feedforward student network. A shortcoming of this approach, however, is that a large amount of recorded speech data is required to produce high-quality student models, and this data is not always available. In this paper, we propose StrawNet: a self-training approach to train a Parallel WaveNet. Self-training is performed using the synthetic examples generated by the autoregressive WaveNet teacher. We show that, in low-data regimes, training on high-fidelity synthetic data from an autoregressive teacher model is superior to training the student model on (much fewer) examples of recorded speech. We compare StrawNet to a baseline Parallel WaveNet, using both side-by-side tests and Mean Opinion Score evaluations. To our knowledge, synthetic speech has not been used to train neural text-to-speech before.

Paper

prev Thu-1-1-4 Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder

next Thu-1-1-6 An Efficient Subband Linear Prediction for LPCNet-based Neural Synthesis

About

About the Conference

Welcome from the Chair

Conference Committees

Calls