Speech Synthesis: Multilingual and Cross-Lingual Approaches

Wed-2-11-6 Tone Learning in Low-Resource Bilingual TTS

Ruolan Liu(Samsung Research China-Beijing (SRC-B)), Xue Wen(Samsung Research China-Beijing (SRC-B)), Chunhui Lu(Samsung Research China-Beijing (SRC-B)) and Xiao Chen(Samsung Research China-Beijing (SRC-B))
Abstract: We present a system for low-resource multi-speaker cross-lingual text-to-speech synthesis. In particular, we train with monolingual English and Mandarin speakers and synthesize every speaker in both languages. The Mandarin training data is limited to 15 minutes of speech by a female Mandarin speaker. We identify accent carry-over and mispronunciation in low-resource language as two major challenges in this scenario, and address these issues by tone preservation mechanisms and data augmentation, respectively. We apply these techniques to a recent strong multi-lingual baseline and achieve higher ratings in intelligibility and target accent, but slightly lower ratings in cross-lingual speaker similarity.
Student Information

Student Events

Travel Grants