Wed-2-4-2 On Synthesis for Supervised Monaural Speech Separation in Time Domain

Jingjing Chen(School of Computer Science and Communication Engineering, Jiangsu University), Qirong Mao(School of Computer Science and Communication Engineering, Jiangsu University; Jiangsu Key Laboratory of Security Tech. for Industrail Cyberspace) and Dong Liu(School of Computer Science and Communication Engineering, Jiangsu University)
Abstract: Time-domain approaches for speech separation have achieved great success recently. However, the sources separated by these time-domain approaches usually contain some artifacts (broadband noises), especially when separating mixture with noise. In this paper, we incorporate synthesis way into the time-domain speech separation approaches to deal with above broadband noises in separated sources, which can be seamlessly used in the speech separation system by a 'plug-and-play' way. By directly learning an estimation for each source in encoded domain, synthesis way can reduce artifacts in estimated speeches and improve the speech separation performance. Extensive experiments on different state-of-the-art models reveal that the synthesis way acquires the ability to handle with noisy mixture and is more suitable for noisy speech separation. On a new benchmark noisy dataset, the synthesis way obtains 0.97 dB (10.1%) SDR relative improvement and respective gains on various metrics without extra computation cost.
Student Information

Student Events

Travel Grants