Thu-3-5-2 Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings

Samuel Thomas(IBM Research AI), Kartik Audhkhasi(IBM Research) and Brian Kingsbury(IBM Research)

Abstract: Multilingual acoustic models are often used to build automatic speech recognition (ASR) systems for low-resource languages. We propose a novel data augmentation technique to improve the performance of an end-to-end (E2E) multilingual acoustic model by transliterating data into the various languages that are part of the multilingual training set. Along with two metrics for data selection, this technique can also improve recognition performance of the model on unsupervised and cross-lingual data. On a set of four low-resource languages, we show that word error rates (WER) can be reduced by up to 12% and 5% relative compared to monolingual and multilingual baselines respectively. We also demonstrate how a multilingual network constructed within this framework can be extended to a new training language. With the proposed methods, the new model has WER reductions of up to 24% and 13% respectively compared to monolingual and multilingual baselines.

Paper

prev Thu-3-5-1 Improving Cross-Lingual Transfer Learning for End-to-End Speech Recognition with Speech Translation

next Thu-3-5-3 Multilingual Speech Recognition with Self-Attention Structured Parameterization

About

About the Conference

Welcome from the Chair

Conference Committees

Calls