Mon-2-11-7 Unsupervised Regularization-Based Adaptive Training for Speech Recognition

Fenglin Ding(University of Science and Technology of China), Wu Guo(university of science and technology of china), Bin Gu(University of Science and Technology of China), Zhenhua Ling(University of Science and Technology of China) and Jun Du(University of Science and Technology of China)

Abstract: In this paper, we propose two novel regularization-based speaker adaptive training approaches for connectionist temporal classification (CTC) based speech recognition. The first method is center loss (CL) regularization, which is used to penalize the distances between the embeddings of different speakers and the only center. The second method is speaker variance loss (SVL) regularization in which we directly minimize the speaker interclass variance during model training. Both methods achieve the purpose of training an adaptive model on the fly by adding regularization terms to the training loss function. Our experiment on the AISHELL-1 Mandarin recognition task shows that both methods are effective at adapting the CTC model without requiring any specific fine-tuning or additional complexity, achieving character error rate improvements of up to 8.1% and 8.6% over the speaker independent (SI) model, respectively.

Paper

prev Mon-2-11-6 On Front-end Gain Invariant Modeling for Wake Word Spotting

next Mon-2-11-8 On the Robustness and Training Dynamics of Raw Waveform Models

About

About the Conference

Welcome from the Chair

Conference Committees

Calls