Ashutosh Pandey(Department of Computer Science and Engineering, The Ohio State University) and DeLiang Wang(Ohio State University)
Abstract:
It is recently revealed that deep learning based speech enhancement systems do not generalize to untrained corpora in low signal-to-noise ratio (SNR) conditions, mainly due to the channel
mismatch between trained and untrained corpora. In this study, we investigate techniques to improve cross-corpus generalization of complex spectrogram enhancement. First, we propose a long short-term memory (LSTM) network for complex spectral mapping. Evaluated on untrained noises and corpora,
the proposed network substantially outperforms a state-of-the-art gated convolutional recurrent network (GCRN). Next, we examine the importance of training corpus for cross-corpus generalization. It is found that a training corpus that contains utterances with different channels can significantly improve performance on untrained corpora. Finally, we observe that using a smaller frame shift in short-time Fourier transform (STFT) is a simple but highly effective technique to improve cross-corpus
generalization.