Thu-3-10-3 Improving Transformer-based Speech Recognition With Unsupervised Pre-training and Multi-task Semantic Knowledge Learning

Song Li(Xiamen University), Lin Li(Xiamen University), Qingyang Hong(Xiamen University) and Lingling Liu(Xiamen University)
Abstract: Recently, the Transformer-based end-to-end speech recognition system has become a state-of-the-art technology. However, one prominent problem with current end-to-end speech recognition systems is that an extensive amount of paired data are required to achieve better recognition performance. In order to grapple with such an issue, we propose two unsupervised pre-training strategies for the encoder and the decoder of Transformer respectively, which make full use of unpaired data for training. In addition, we propose a new semi-supervised fine-tuning method named multi-task semantic knowledge learning to strengthen the Transformer’s ability to learn about semantic knowledge, thereby improving the system performance. We achieve the best CER with our proposed methods on AISHELL-1 test set: 5.9%,which exceeds the best end-to-end model by 10.6% relative CER. Moreover, relative CER reduction of 20.3% and 17.8% are obtained for low-resource Mandarin and English data sets,respectively.
Student Information

Student Events

Travel Grants