Cross/Multi-Lingual and Code-Switched Speech Recognition

Mon-3-1-9 A 43 Language Multilingual Punctuation Prediction Neural Network Model

Xinxing Li(Microsoft China) and Edward Lin(Microsoft China)
Abstract: Punctuation prediction is a critical component for speech recognition readability and speech translation segmentation. When considering multiple language support, traditional monolingual neural network models used for punctuation prediction can be costly to manage and may not produce the best accuracy. In this paper, we investigate multilingual Long Short-Term Memory (LSTM) modeling using Byte Pair Encoding (BPE) for punctuation prediction to support 43 languages across 69 countries. Our findings show a single multilingual BPE-based model can achieve similar or even better performance than separate monolingual word-based models by benefiting from shared information across different languages. On an in-domain news article test set, the multilingual model achieves on average 80.2% F1-score while on out-of-domain speech recognition text, it achieves 73.5% F1-score. We also show that the shared information can help in fine-tuning for low-resource languages as well.
Student Information

Student Events

Travel Grants