Speech Synthesis: Text Processing, Data and Evaluation

Tue-1-7-6 Deep Learning Based Assessment of Synthetic Speech Naturalness

Gabriel Mittag(Technische Universität Berlin) and Sebastian Möller(Quality and Usability Lab, TU Berlin)
Abstract: In this paper, we present a new objective prediction model for synthetic speech naturalness. It can be used to evaluate Text-To-Speech or Voice Conversion systems and works language independently. The model is trained end-to-end and based on a CNN-LSTM network that previously showed to give good results for speech quality estimation. We trained and tested the model on 16 different datasets, such as from the Blizzard Challenge and the Voice Conversion Challenge. Further, we show that the reliability of deep learning-based naturalness prediction can be improved by transfer learning from speech quality prediction models that are trained on objective POLQA scores. The proposed model is made publicly available and can, for example, be used to evaluate different TTS system configurations.
Student Information

Student Events

Travel Grants