Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Speaker Diarization

Position: Home > Program > Technical Program > Monday 19:15-20:15(GMT+8), October 26 > Speaker Diarization >

Mon-1-7-7 Spot the conversation: speaker diarisation in the wild

Joon Son Chung(University of Oxford), Jaesung Huh(Naver Corporation), Arsha Nagrani(University of Oxford), Triantafyllos Afouras(University of Oxford) and Andrew Zisserman(University of Oxford)

Abstract: The goal of this paper is speaker diarisation of videos collected `in the wild'. We make three key contributions. First, we propose an automatic audio-visual diarisation method for YouTube videos. Our method consists of active speaker detection using audio-visual methods and speaker verification using self-enrolled speaker models. Second, we integrate our method into a semi-automatic dataset creation pipeline which significantly reduces the number of hours required to annotate videos with diarisation labels. Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from `in the wild' videos, which we will release publicly to the research community. Our dataset consists of overlapping speech, a large and diverse speaker pool, and challenging background conditions.

Paper

prev Mon-1-7-6 Deep Self-Supervised Hierarchical Clustering for Speaker Diarization

next No More

About

About the Conference

Welcome from the Chair

Conference Committees

Calls

Satellite Workshops

Special Sessions & Challenges

Important Dates

Program

Program at a Glance

Technical Program

Presentation Videos

Presentation Guidelines

Satellite Workshops

Special Sessions & Challenges

Student Information

Venue & Travel

Conference Venue & Accommodations

Transportations

Sponsorships & Exhibition

Satellite Events

Acknowledgement