Maximilian Strake(Technische Universität Braunschweig, Institute for Communications Technology), Bruno Defraene(Goodix Technology (Belgium) BV), Kristoff Fluyt(Goodix Technology (Belgium) BV), Wouter Tirry(Goodix Technology (Belgium) BV) and Tim Fingscheidt(Technische Universität Braunschweig, Institute for Communications Technology)
The Interspeech 2020 Deep Noise Suppression (DNS) Challenge focuses on evaluating low-latency single-channel speech enhancement algorithms under realistic test conditions. Our contribution to the challenge is a method for joint dereverberation and denoising based on complex spectral mask estimation using a fully convolutional recurrent network (FCRN) which relies on a convolutional LSTM layer for temporal modeling. Since the effects of reverberation and noise on perceived speech quality can differ notably, a multi-target loss for controlling the weight on desired dereverberation and denoising is proposed. In the crowdsourced subjective P.808 listening test conducted by the DNS Challenge organizers, the proposed method shows a significant overall improvement of 0.43 MOS points over the DNS Challenge baseline and ranks amongst the top-3 submissions for both realtime and non-realtime tracks of the challenge.