Neural Models for Speaker Diarization in the Context of Speech Recognition

Kyu J. Han (ASAPP Inc.), Tae Jin Park (University of Southern California), Dimitrios Dimitriadis (Microsoft, WA)

Abstract: Speaker diarization is an essential component for speech applications in multi-speaker settings. Spoken utterances need to be attributed to speaker-specific classes with or without prior knowledge of the speakers' identity or profile. Initially, speaker diarization technologies were developed as standalone processes without requiring much context of other components in a given speech application. As speech recognition technology has become more accessible, there is an emerging trend considering speaker diarization as an integral part of an overall speech recognition application; while benefiting from the speech recognition output to improve speaker diarization accuracy. As of lately, joint model training for speaker diarization and speech recognition is investigated in an attempt to consolidate the training objectives, enhancing the overall performance. In this tutorial, we will overview the development of speaker diarization in the era of deep learning, present the recent approaches to speaker diarization in the context of speech recognition, and share the industry perspectives on speaker diarization and its challenges. Finally, we will provide insights about future directions of speaker diarization as a part of context-aware interactive system.

Paper

prev No More

next No More

About

About the Conference

Welcome from the Chair

Conference Committees

Calls