The INTERSPEECH 2020 Computational Paralinguistics ChallengE (ComParE)

Wed-SS-1-4-7 Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Maxim Markitantov(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences), Denis Dresvyanskiy(Ulm University), Danila Mamontov(Ulm University), Heysem Kaya(Department of Information and Computing Sciences, Utrecht University), Wolfgang Minker(Ulm University) and Alexey Karpov(St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences)
Abstract: This paper describes deep learning approaches for the Mask and Breathing Sub-Challenges (SCs), which are addressed by the INTERSPEECH 2020 Computational Paralinguistics Challenge. Motivated by outstanding performance of state-of-the-art end-to-end (E2E) approaches, we explore and compare effectiveness of different deep Convolutional Neural Network (CNN) architectures on raw data, log Mel-spectrograms, and Mel-Frequency Cepstral Coefficients. We apply a transfer learning approach to improve model's efficiency and convergence speed. In the Mask SC, we conduct experiments with several pretrained CNN architectures on log-Mel spectrograms, as well as Support Vector Machines on baseline features. For the Breathing SC, we propose an ensemble deep learning system that exploits E2E learning and sequence prediction. The E2E model is based on 1D CNN operating on raw speech signals and is coupled with Long Short-Term Memory layers for sequence modeling. The second model works with log-Mel features and is based on a pretrained 2D CNN model stacked to Gated Recurrent Unit layers. To increase performance of our models in both SCs, we use ensembles of the best deep neural models obtained from N-fold cross-validation on combined challenge training and development datasets. Our results markedly outperform the challenge test set baselines in both SCs.
Student Information

Student Events

Travel Grants