Yu Tsao (The Research Center for Information Technology Innovation (CITI), Academia Sinica), Fei Chen (Department of Electrical and Electronic Engineering, Southern University of Science and Technology)
Abstract:
Although recent success has demonstrated the effectiveness of adopting deep-learning-based models in the speech enhancement (SE) task, several directions are worthy explorations to further improve the SE performance. One direction is to derive a better objective function to replace the conventional mean squared error based one to train the deep-learning-based models. In this tutorial, we first present several well-known intelligibility evaluation metrics and then present the theory and implementation details of SE systems trained with metric-based objective functions. The effectiveness of these terms are confirmed by providing better standardized objective metric and subjective listening test scores, as well as higher automatic speech recognition accuracy.