Wed-3-2-4 Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework

Xinyuan Qian(National University of Singapore) and Shoufeng Lin(National University of Singapore)
Abstract: Multi-speaker tracking using both audio and video modalities is a key task in human-robot interaction and video conferencing. The complementary nature of audio and video signals improves the tracking robustness against noise and outliers compared to the uni-modal approaches. However, the online tracking of multiple speakers via audio-video fusion, especially without the target number prior, is still an open challenge. In this paper, we propose a Generalized Labeled Multi-Bernoulli-based framework that jointly estimates the number of targets and their respective states online. Experimental results using the AV16.3 dataset demonstrate the effectiveness of the proposed method.
Student Information

Student Events

Travel Grants