Wed-3-10-4 Nonparallel Emotional Speech Conversion Using VAE-GAN

Yuexin Cao(Ping An Technology), Zheng-Chen Liu(University of Science and Technology of China), Minchuan Chen(Ping An Technology), Jun Ma(Ping An Technology), Shaojun Wang(Ping An Technology) and Jing Xiao(Ping An Technology)
Abstract: This paper proposes a nonparallel emotional speech conversion (ESC) method based on Variational AutoEncoder-Generative Adversarial Network (VAE-GAN). Emotional speech conversion aims at transforming speech from one source emotion to that of a target emotion without changing the speaker’s identity and linguistic content. In this work, an encoder is trained to elicit the content-related representations from acoustic features. Emotion-related representations are extracted in a supervised manner. Then the transformation between emotionrelated representations from different domains is learned using an improved cycle-consistent Generative Adversarial Network (CycleGAN). Finally, emotion conversion is performed by eliciting and recombining the content-related representations of the source speech and the emotion-related representations of the target emotion. Subjective evaluation experiments are conducted and the results show that the proposed method outperforms the baseline in terms of voice quality and emotion conversion ability.
Student Information

Student Events

Travel Grants