Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Targeted Source Separation

Position: Home > Program > Technical Program > Monday 21:45-22:45(GMT+8), October 26 > Targeted Source Separation >

Mon-3-11-10 Speaker-Aware Monaural Speech Separation

Jiahao Xu(The University of Sydney), Kun Hu(The University of Sydney), chang xu(The University of Sydney), Duc Chung Tran(Computing Fundamental Department, FPT University) and zhiyong wang(The University of Sydney)

Abstract: Predicting and applying Time-Frequency (T-F) masks on mixture signals have been successfully utilized for speech separation. However, existing studies have not well utilized the identity context of a speaker for the inference of masks. In this paper, we propose a novel speaker-aware monaural speech separation model. We firstly devise an encoder to disentangle speaker identity information with the supervision from the auxiliary speaker verification task. Then, we develop a spectrogram masking network to predict speaker masks, which would be applied to the mixture signal for the reconstruction of source signals. Experimental results on two WSJ0 mixed datasets demonstrate that our proposed model outperforms existing models in different separation scenarios.

Paper

prev Mon-3-11-9 Crossmodal Sound Retrieval based on Specific Target Co-occurrence Denoted with Weak Labels

next No More

About

About the Conference

Welcome from the Chair

Conference Committees

Calls

Satellite Workshops

Special Sessions & Challenges

Important Dates

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Student Information

Venue & Travel

Conference Venue & Accommodations

Transportations

Sponsorships & Exhibition

Satellite Events

Acknowledgement