Wed-SS-2-3-3 The DKU Speech Activity Detection and Speaker Identification Systems for Fearless Steps Challenge Phase-02

Qingjian Lin(SEIT, Sun Yat-sen University), Tingle Li(Duke Kunshan University) and Ming Li(Duke Kunshan University)
Abstract: This paper describes the systems developed by the DKU team for the Fearless Steps Challenge Phase-02 competition. For the Speech Activity Detection task, we start with the Long Short- Term Memory (LSTM) system and then apply the ResNet- LSTM improvement. Our ResNet-LSTM system reduces the DCF error by about 38% relatively in comparison with the LSTM baseline. We also discuss the system performance with additional training corpora included, and the lowest DCF of 1.406% on the Eval Set is gained with system pre-training. As for the Speaker Identification task, we employ the Deep ResNet vector system, which receives a variable-length feature sequence and directly generates speaker posteriors. The pretraining process with Voxceleb is also considered, and our bestperforming system achieves the Top-5 accuracy of 92.393% on the Eval Set.
Student Information

Student Events

Travel Grants