Cross/Multi-Lingual and Code-Switched Speech Recognition

Mon-3-1-3 Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning

Wenxin Hou(Tokyo Institute of Technology), Yue Dong(Tokyo Institute of Technology), Bairong Zhuang(Tokyo Institute of Technology), Longfei Yang(Tokyo Institute of Technology), Jiatong Shi(Johns Hopkins University) and Takahiro Shinozaki(Tokyo Institute of Technology)
Abstract: In this paper, we report a large-scale end-to-end language-independent multilingual model for joint automatic speech recognition (ASR) and language identification (LID). This model adopts hybrid CTC/attention architecture and achieves Word Error Rate (WER) of 52.8 and LID accuracy of 93.5 on 42 languages with around 5000 hours of training data. We also compare the effects of using subword-level or character-level vocabulary for large-scale multilingual tasks. Furthermore, we transfer the pre-trained model to 14 low-resource languages. Results show that the pre-trained model achieves significantly better results than non-pretrained baselines on both language-specific and multilingual low-resource ASR tasks in terms of WER, which is reduced by 28.1% and 11.4% respectively.
Student Information

Student Events

Travel Grants