Wed-3-10-9 Simultaneous Conversion of Speaker Identity and Emotion Based on Multiple-Domain Adaptive RBM

Takuya Kishida(The University of Electro-Communications), Shin Tsukamoto(The University of Electro-Communications) and Toru Nakashika(The University of Electro-Communications)

Abstract: In this paper, we propose a multiple-domain adaptive restricted Boltzmann machine (MDARBM) for simultaneous conversion of speaker identity and emotion. This study is motivated by the assumption that representing multiple domains (e.g., speaker identity, emotion, accent) of speech explicitly in a single model is beneficial to reduce the effects from other domains when the model learns one domain’s characteristics. The MDARBM decomposes the visible-hidden connections of an RBM into domain-specific factors and a domain-independent factor to make it adaptable to multiple domains of speech. By switching the domain-specific factors from the source speaker and emotion to the target ones, the model can perform a simultaneous conversion. Experimental results showed that the target domain conversion task was enhanced by the other in the simultaneous conversion framework. In a two-domain conversion task, the MDARBM outperformed a combination of ARBMs independently trained with speaker-identity and emotion units.

Paper

prev Wed-3-10-8 Learning Syllable-Level Discrete Prosodic Representation for Expressive Speech Generation

next Wed-3-10-10 Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis

About

About the Conference

Welcome from the Chair

Conference Committees

Calls