Thu-1-11-4 Bi-level Speaker Supervision for One-shot Speech Synthesis

Tao Wang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Ruibo Fu(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences) and Chunyu Qiang(National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)
Abstract: The gap between speaker characteristics of reference speech and synthesized speech remains a challenging problem in one-shot speech synthesis. In this paper, we propose a bi-level speaker supervision framework to close the speaker characteristics gap via supervising the synthesized speech at speaker feature level and speaker identity level. The speaker feature extraction and speaker identity reconstruction are integrated in an end-to-end speech synthesis network, with the one on speaker feature level for closing speaker characteristics and the other on speaker identity level for preserving identity information. This framework guarantees that the synthesized speech has similar speaker characteristics to original speech, and it also ensures the distinguishability between different speakers. Additionally, to solve the influence of speech content on speaker feature extraction task, we propose a text-independent reference encoder (ti-reference encoder) module to extract speaker feature. Experiments on LibriTTS dataset show that our model is able to generate the speech similar to target speaker. Furthermore, we demonstrate that this model can learn meaningful speaker representations by bi-level speaker supervision and ti-reference encoder module.
Student Information

Student Events

Travel Grants