Wed-1-7-8 Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

Mufan Sang(University of Texas at Dallas), Wei Xia(University of Texas at Dallas) and John H.L. Hansen(Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)

Abstract: In forensic applications, it is very common that only small naturalistic datasets consisting of short utterances in complex or unknown acoustic environments are available. In this study, we propose a pipeline solution to improve speaker verification on a small actual forensic field dataset. By leveraging large-scale out-of-domain datasets, a knowledge distillation based objective function is proposed for teacher-student learning, which is applied for short utterance forensic speaker verification. The objective function collectively considers speaker classification loss, Kullback-Leibler divergence, and similarity of embeddings. In order to advance the trained deep speaker embedding network to be robust for a small target dataset, we introduce a novel strategy to fine-tune the pre-trained student model towards a forensic target domain by utilizing the model as a fine-tuning start point and a reference in regularization. The proposed approaches are evaluated on the 1st 48-UTD forensic corpus, a newly established naturalistic dataset of actual homicide investigations consisting of short utterances recorded in uncontrolled conditions. We show that the proposed objective function can efficiently improve the performance of teacher-student learning on short utterances and that our fine-tuning strategy outperforms the commonly used weight decay method by providing an explicit inductive bias towards the pre-trained model.

Paper

prev Wed-1-7-7 Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification

next Wed-1-7-9 JukeBox: A Multilingual Singer Recognition Dataset

About

About the Conference

Welcome from the Chair

Conference Committees

Calls