Dao Zhou(Tianjin University), Longbiao Wang(Tianjin University), Kong Aik Lee(Biometrics Research Laboratories, NEC Corporation), Yibo Wu(Tianjin University), Meng Liu(Tianjin University), Jianwu Dang(JAIST) and Jianguo Wei(Tianjin University)
We propose a dynamic-margin softmax loss for the training of deep speaker embedding neural network. Our proposal is inspired by the additive-margin softmax (AM-Softmax) loss reported earlier. In AM-Softmax loss, a constant margin is used for all training samples. However, the angle between the feature vector and the ground-truth class center is rarely the same for all samples. Furthermore, the angle also changes during training. Thus, it is more reasonable to set a dynamic margin for each training sample. In this paper, we propose to dynamically set the margin of each training sample commensurate with the cosine angle of that sample, hence, the name dynamic-additive-margin softmax (DAM-Softmax) loss. More specifically, the smaller the cosine angle is, the larger the margin between the training sample and the corresponding class in the feature space should be to promote intra-class compactness. Experimental results show that the proposed DAM-Softmax loss achieves state-of-the-art performance on the VoxCeleb dataset by 1.94% in equal error rate (EER). In addition, our method also outperforms AM-Softmax loss when evaluated on the Speakers in the Wild (SITW) corpus.