Wed-3-7-3 Self-supervised Adversarial Multi-task Learning for Vocoder-based Monaural Speech Enhancement

Zhihao Du(Harbin Institute of Technology), Ming Lei(Machine Intelligence Technology, Alibaba Group), Jiqing Han(Harbin Institute of Technology) and Shiliang Zhang(Machine Intelligence Technology, Alibaba Group)

Abstract: In our previous study, we introduce the neural vocoder into monaural speech enhancement, in which a flow-based generative vocoder is used to synthesize speech waveforms from the Mel power spectra enhanced by a denoising autoencoder. As a result, this vocoder-based enhancement method outperforms several state-of-the-art models on a speaker-dependent dataset. However, we find that there is a big gap between the enhancement performance on the trained and untrained noises. Therefore, in this paper, we propose the self-supervised adversarial multi-task learning (SAMLE) to improve the noise generalization ability. In addition, the speaker dependence is also evaluated for the vocoder-based methods, which is important for real-life applications. Experimental results show that the proposed SAMLE further improves the enhancement performance on both trained and untrained noises, resulting in a better noise generalization ability. Moreover, we find that vocoder-based enhancement methods can be speaker-independent through a large-scale training.

Paper

prev Wed-3-7-2 On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

next Wed-3-7-4 Deep Speech Inpainting of Time-frequency Masks

About

About the Conference

Welcome from the Chair

Conference Committees

Calls