Ahmet E. Bulut(Center for Robust Speech Systems, University of Texas at Dallas) and Kazuhito Koishida(Microsoft Corporation)
Speech signal reverberation due to reflections in a physical obstacle is one of the main difficulties in speech processing as well as the presence of non-stationary background noise. In this study we explore DNN-based single-channel speech dereverberation with state-of-the-art performance comparisons. We propose a CNN auto-encoder architecture with skip connections focusing on real-time and low-latency applications. The proposed system is evaluated with the REVERB challenge dataset that includes simulated and real reverberated speech samples. Our experimental results show that the proposed system has superior results on the challenge evaluation dataset as opposed to a baseline system that uses deep neural network (DNN) based weighted prediction error (WPE) algorithm. We also extend the comparison with state of the art systems in terms of cepstral distance (CD), log-likelihood ratio (LLR), speech-to-reverberation modulation energy ratio (SRMR), and frequency-weighted segmental signal to noise ratio (FWSegSNR) metrics and comparable results are achieved. Moreover a latency analysis of the proposed system is performed and trade-off between processing time and performance is examined.