Wed-2-12-3 Segment-level Effects of Gender, Nationality and Emotion Information on Text-independent Speaker Verification

Kai Li(Japan advanced institute of science and technology), Masato Akagi(Japan Advanced Institute of Science and Technology), Yibo Wu(Tianjin university) and Jianwu Dang(JAIST)

Abstract: Speaker embeddings extracted from neural network (NN) achieve excellent performance on general speaker verification(SV) missions. Most current SV systems use only speaker labels. Therefore, the interaction between different types of domain information decrease the prediction accuracy of SV. To overcome this weakness and improve SV performance, four effective SV systems were proposed by using gender, nationality, and emotion information to add more constraints in the NN training stage. More specifically, multitask learning-based systems which including multitask gender (MTG), multitask nationality (MTN) and multitask gender and nationality (MTGN) were used to enhance gender and nationality information learning. Domain adversarial training-based system which including emotion domain adversarial training (EDAT) was used to suppress different emotions information learning. Experimental results indicate that encouraging gender and nationality information and suppressing emotion information learning improve the performance of SV. In the end, our proposed systems achieved 16.4 and 22.9% relative improvements in the equal error rate for MTL- and DAT-based systems, respectively.

Paper

prev Wed-2-12-2 Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

next Wed-2-12-4 Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

About

About the Conference

Welcome from the Chair

Conference Committees

Calls