Wed-2-8-10 SCADA: Stochastic, Consistent and Adversarial Data Augmentation to Improve ASR

Gary Wang(Simon Fraser University), Andrew Rosenberg(Google LLC), Zhehuai Chen(Google), Yu Zhang(Google), Bhuvana Ramabhadran(Google) and Pedro Moreno(Google)

Abstract: Recent developments in data augmentation has brought great gains in improvement for automatic speech recognition (ASR). Parallel developments in augmentation policy search in computer vision domain has shown improvements in model performance and robustness. In addition, recent developments in semi-supervised learning has shown that consistency measures are crucial for performance and robustness. In this work, we demonstrate that combining augmentation policies with consistency measures and model regularization can greatly improve speech recognition performance. Using the Librispeech task, we show: 1) symmetric consistency measures such as the Jensen-Shannon Divergence provide 11% relative improvements in ASR performance; 2) Augmented adversarial inputs using Virtual Adversarial Noise (VAT) provides 8.9% relative win; and 3) random sampling from arbitrary combination of augmentation policies yields the best policy. These contributions result in an overall reduction in Word Error Rate (WER) of 18% relative on the Librispeech task presented in this paper.

Paper

prev Wed-2-8-9 Utterance invariant training for hybrid two-pass end-to-end speech recognition

next No More

About

About the Conference

Welcome from the Chair

Conference Committees

Calls