Speech Synthesis: Multilingual and Cross-Lingual Approaches

Wed-2-11-3 Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis

Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Chunyu Qiang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)
Abstract: Most of current end-to-end speech synthesis assumes the input text is in a single language situation. However, code-switching in speech occurs frequently in routine life, in which speakers switch between languages in the same utterance. And building a large mixed-language speech database is difficult and uneconomical. In this paper, both windowing technique and style token modeling are designed for the code-switching end-to-end speech synthesis. To improve the consistency of speaking style in bilingual situation, compared with the conventional windowing techniques that used fixed constraints, the dynamic attention reweighting soft windowing mechanism is proposed to ensure the smooth transition of code-switching. To compensate the shortage of mixed-language training data, the language dependent style token is designed for the cross-language multi-speaker acoustic modeling, where both the Mandarin and English monolingual data are the extended training data set. The attention gating is proposed to adjust style token dynamically based on the language and the attended context infromation. Experimental results show that proposed methods lead to an improvement on intelligibility, naturalness and similarity.
Student Information

Student Events

Travel Grants