Speech Synthesis: Multilingual and Cross-Lingual Approaches

Wed-2-11-7 On Improving Code Mixed Speech Synthesis with Mixlingual Graphene-to-Phoneme Model

Shubham Bansal(Microsoft), Arijit Mukherjee(Microsoft), Sandeepkumar Satpal(Microsoft) and Rupesh Mehta(Microsoft)
Abstract: Regional entities often occur in a code-mixed text in the non-native roman script and synthesizing them with the correct pronunciation and accent is a challenging problem. English graphene-to-phoneme (G2P) rules fail for such entities because of the orthographical mistakes and phonological differences between the English and regional languages. The traditional approach for this problem involves language identification, followed by the transliteration of the regional entities to their native language and then passing them through a native G2P. In this work, we simplify this module based architecture by learning an end-to-end mixlingual G2P in a multi-task type setting. Also, rather than mapping the output phone sequences from our mixlingual G2P to the English phoneset or using the "shared" phoneset, we use the polyglot data and "separated" phoneset to train a mixlingual synthesizer to improvise the synthesized voice accent for regional entities. We have used Hindi-English as the code-mix scenario and we show absolute incremental gains of up to 28% in pronunciation accuracy and a 0.9 gain in "overall impression" mean-opinion-score (MOS) over using a standard English monolingual text-to-speech (TTS).
Student Information

Student Events

Travel Grants