Mon-3-7-5 Adaptive Speaker Normalization for CTC-Based Speech Recognition

Fenglin Ding(University of Science and Technology of China), Wu Guo(university of science and technology of china), Bin Gu(University of Science and Technology of China), Zhenhua Ling(University of Science and Technology of China) and Jun Du(University of Science and Technologoy of China)
Abstract: In this paper, we propose a new speaker normalization technique for acoustic model adaptation in connectionist temporal classification (CTC)-based automatic speech recognition. In the proposed method, for the inputs of a hidden layer, the mean and variance of each activation are first estimated at the speaker level. Then, we normalize each speaker representation independently by making them follow a standard normal distribution. Furthermore, we propose using an auxiliary network to dynamically generate the scaling and shifting parameters of speaker normalization, and an attention mechanism is introduced to improve performance. The experiments are conducted on the public Chinese dataset AISHELL-1. Our proposed methods present high effectiveness in adapting the CTC model, achieving up to 17.5% character error rate improvement over the speaker independent (SI) model.
Student Information

Student Events

Travel Grants