Mon-S&T-2-6 Real-time, Full-band, Online DNN-based Voice Conversion System Using A Single CPU

Takaaki Saeki(University of Tokyo), Yuki Saito(University of Tokyo), Shinnnosuke Takamichi(University of Tokyo), and Hiroshi Saruwatari(University of Tokyo)
Abstract: We present a real-time, full-band, online voice conversion(VC) system that uses a single CPU. For practical applications, VC must be high quality and able to perform real-time, online conversion with fewer computational resources. Our system achieves this by combining non-linear conversion with a deep neural network and short-tap, sub-band filtering. We evaluate our system and demonstrate that it 1) achieves the estimated complexity around 2.5 GFLOPS and measures real-time factor(RTF) around 0.5 with a single CPU and 2)can attain converted speech with a 3.4 / 5.0 mean opinion score (MOS) of naturalness.
Student Information

Student Events

Travel Grants