Mon-1-11-5 Releasing a toolkit and comparing the performance of language embeddings across various spoken language identification datasets

Matias Lindgren(Aalto University), Tommi Jauhiainen(University of Helsinki) and Mikko Kurimo(Aalto University)

Abstract: In this paper, we propose a software toolkit for easier end-to-end training of deep learning based spoken language identification models across several speech datasets. We apply our toolkit to implement three baseline models, one speaker recognition model, and three x-vector architecture variations, which are trained on three datasets previously used in spoken language identification experiments. All models are trained separately on each dataset (closed task) and on a combination of all datasets (open task), after which we compare if the open task training yields better language embeddings. We begin by training all models end-to-end as discriminative classifiers of spectral features, labeled by language. Then, we extract language embedding vectors from the trained end-to-end models, train separate Gaussian Naive Bayes classifiers on the vectors, and compare which model provides best language embeddings for the back-end classifier. Our experiments show that the open task condition leads to improved language identification performance on only one of the datasets. In addition, we discovered that increasing x-vector model robustness with random frequency channel dropout significantly reduces its end-to-end classification performance on the test set, while not affecting back-end classification performance of its embeddings. Finally, we note that two baseline models consistently outperformed all other models.

Paper

prev Mon-1-11-4 What does an End-to-End Dialect Identification Model Learn about Non-dialectal Information?

next Mon-1-11-6 Learning Intonation Pattern Embeddings for Arabic Dialect Identification

About

About the Conference

Welcome from the Chair

Conference Committees

Calls