Wed-3-2-9 Detecting and Counting Overlapping Speakers in Distant Speech Scenarios

Samuele Cornell(Università Politecnica delle Marche), Maurizio Omologo(Fondazione Bruno Kessler - irst), Stefano Squartini(Università Politecnica delle Marche) and Emmanuel Vincent(Inria)
Abstract: We consider the problem of detecting the activity and counting overlapping speakers in distant-microphone recordings. We treat supervised Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), joint VAD+OSD, and speaker counting as instances of a general Overlapped Speech Detection and Counting (OSDC) task, and we design a Temporal Convolutional Network (TCN) based method to address it. We show that TCNs significantly outperform state-of-the-art methods on two real-world distant speech datasets. In particular our best architecture obtains, for OSD, 29.1 % and 25.5 % absolute improvement in Average Precision over previous techniques on, respectively, the AMI and CHiME-6 datasets. Furthermore, we find that generalization for joint VAD+OSD improves by using a speaker counting objective rather than a VAD+OSD objective. We also study the effectiveness of forced alignment based labeling and data augmentation, and show that both can improve OSD performance.
Student Information

Student Events

Travel Grants