Mon-SS-2-6-1 Improving X-vector and PLDA for Text-dependent Speaker Verification

Zhuxin Chen(NetEase Games AI Lab) and Yue Lin(NetEase Games AI Lab)

Abstract: Recently, the pipeline consisting of an x-vector speaker embedding front-end and a Probabilistic Linear Discriminant Analysis (PLDA) back-end has achieved state-of-the-art results in text-independent speaker verification. In this paper, we further improve the performance of x-vector and PLDA based system for text-dependent speaker verification by exploring the choice of layer to produce embedding and modifying the back-end training strategies. In particular, we probe that x-vector based embeddings, specifically the standard deviation statistics in the pooling layer, contain the information related to both speaker characteristics and spoken content. Accordingly, we modify the back-end training labels by utilizing both of the speaker-id and phrase-id. A correlation-alignment-based PLDA adaptation is also adopted to make use of the text-independent labeled data during back-end training. Experimental results on the SDSVC 2020 dataset show that our proposed methods achieve significant performance improvement compared with the x-vector and HMM based i-vector baselines.

Paper

prev No More

next Mon-SS-2-6-2 SdSV Challenge 2020: Large-Scale Evaluation of Short‐Duration Speaker Verification

About

About the Conference

Welcome from the Chair

Conference Committees

Calls