Huaxin Wu(iFlytek Research, iFlytek Co., Ltd), Genshun Wan(University of Science and Technology of China) and Jia Pan(University of Science and Technology of China)
The performance of automatic speech recognition systems can be improved by speaker adaptive training (SAT), which adapts an acoustic model to compensate for the mismatch between training and testing conditions. Speaker code learning is one of the useful ways for speaker adaptive training. It learns a set of speaker dependent codes together with speaker independent acoustic model in order to remove speaker variation. Conventionally, speaker dependent codes and speaker independent acoustic model are jointly optimized. However, this could make it difficult to decouple the speaker code from the acoustic model. In this paper, we take the speaker code based SAT as a meta-learning task. The acoustic model is considered as meta-knowledge, while speaker code is considered as task specific knowledge. Experiments on the Switchboard task show that our method can not only learn a good speaker code, but also improve the performance of the acoustic model even without speaker code.