Mon-1-11-3 On the Usage of Multi-feature Integration for Speaker Verification and Language Identification

Zheng Li(Xiamen University), Miao Zhao(Xiamen University), Jing Li(Xiamen University), Lin Li(Xiamen University) and Qingyang Hong(Xiamen University)

Abstract: In this paper, we study the technology of multiple acoustic feature integration for the applications of Automatic Speaker Verification (ASV) and Language Identification (LID). In contrast to score level fusion, a common method for integrating subsystems built upon various acoustic features, we explore a new integration strategy, which integrates multiple acoustic features based on the x-vector framework. The frame level, statistics pooling level, segment level, and embedding level integrations are investigated in this study. Our results indicate that frame level integration of multiple acoustic features achieves the best performance in both speaker and language recognition tasks, and the multi-feature integration strategy can be generalized in both classification tasks. Furthermore, we introduce a time-restricted attention mechanism into the frame level integration structure to further improve the performance of multi-feature integration. The experiments are conducted on VoxCeleb 1 for ASV and AP-OLR-17 for LID, and we achieve 28% and 19% relative improvement in terms of Equal Error Rate (EER) in ASV and LID tasks, respectively.

Paper

prev Mon-1-11-2 The XMUSPEECH System for AP19-OLR Challenge

next Mon-1-11-4 What does an End-to-End Dialect Identification Model Learn about Non-dialectal Information?

About

About the Conference

Welcome from the Chair

Conference Committees

Calls