Mon-1-8-2 Double Adversarial Network based Monaural Speech Enhancement for Robust Speech Recognition

Zhihao Du(Harbin Institute of Technology), Jiqing Han(Harbin Institute of Technology) and Xueliang Zhang(Inner Mongolia University)

Abstract: To improve the noise robustness of automatic speech recognition (ASR), the generative adversarial network (GAN) based enhancement methods are employed as the front-end processing, which comprise a single adversarial process of an enhancement model and a discriminator. In this single adversarial process, the discriminator is encouraged to find differences between the enhanced and clean speeches, but the distribution of clean speeches is ignored. In this paper, we propose a double adversarial network (DAN) by adding another adversarial generation process (AGP), which forces the discriminator not only to find the differences but also to model the distribution. Furthermore, a functional mean square error (f-MSE) is proposed to utilize the representations learned by the discriminator. Experimental results reveal that AGP and f-MSE are crucial for the enhancement performance on ASR task, which are missed in previous GAN-based methods. Specifically, our DAN achieves 13.00% relative word error rate improvements over the noisy speeches on the test set of CHiME-2, which outperforms several recent GAN-based enhancement methods significantly.

Paper

prev Mon-1-8-1 Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition

next Mon-1-8-3 Anti-aliasing regularization in stacking layers

About

About the Conference

Welcome from the Chair

Conference Committees

Calls