Thu-3-5-4 Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition Systems

Srikanth Madikeri(Idiap Research Institute), Banriskhem Kayang Khonglah(Idiap Research Institute), Sibo Tong(Idiap Research Institute), Petr Motlicek(Idiap Research Institute), Herve Bourlard(Idiap Research Institute & EPFL) and Dan Povey(Xiaomi, Inc.)

Abstract: Multilingual acoustic model training combines data from multiple languages to train an automatic speech recognition system. Such a system is beneficial when training data for a target language is limited. Lattice-Free Maximum Mutual Information (LF-MMI) training performs sequence discrimination by introducing competing hypotheses through a denominator graph in the cost function. The standard approach to train a multilingual model with LF-MMI is to combine the acoustic units from all languages and use a common denominator graph. The resulting model is either used as a feature extractor to train an acoustic model for the target language or directly fine-tuned. In this work, we propose a scalable approach to train the multilingual acoustic model using a typical multitask network for the LF-MMI framework. A set of language-dependent denominator graphs is used to compute the cost function. The proposed approach is evaluated under typical multilingual ASR tasks using GlobalPhone and BABEL datasets. Relative improvements up to 13.2% in WER are obtained when compared to the corresponding monolingual LF-MMI baselines. The implementation is made available as a part of the Kaldi speech recognition toolkit.

Paper

prev Thu-3-5-3 Multilingual Speech Recognition with Self-Attention Structured Parameterization

next Thu-3-5-5 Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters

About

About the Conference

Welcome from the Chair

Conference Committees

Calls