Jiawen Kang(Tsinghua University), Ruiqi Liu(China University of Mining & Technology, Beijing), Lantian Li(Tsinghua University), Dong Wang(Tsinghua University) and Thomas Fang Zheng(CSLT, Tsinghua University)
Domain generalization remains a critical problem for speaker recognition, even with the state-of-the-art architectures based on deep neural nets. For example, a model trained on reading speech may largely fail when applied to scenarios of singing or movie. In this paper, we propose a domain-invariant pro- jection to improve the generalizability of speaker vectors. This projection is a simple neural net and is trained following the Model-Agnostic Meta-Learning (MAML) principle, for which the objective is to classify speakers in one domain if it had been updated with speech data in another domain. We tested the proposed method on CNCeleb, a new dataset consisting of single-speaker multi-condition (SSMC) data. The results demonstrated that the MAML-based domain-invariant projec- tion can produce more generalizable speaker vectors, and effec- tively improve the performance in unseen domains.