Tue-1-1-3 Investigating Self-supervised Pre-training for End-to-end Speech Translation

Ha Nguyen(LIG - Grenoble Alpes University, LIA - Avignon University), Fethi Bougares(LIUM- Le Mans Université), Natalia Tomashenko(LIA, University of Avignon), Yannick Estève(LIA - Avignon University) and Laurent Besacier(LIG)

Abstract: Self-supervised learning from raw speech has been proven beneficial to improve automatic speech recognition (ASR). We investigate here its impact on end-to-end automatic speech translation (AST) performance. We use a contrastive predictive coding (CPC) model pre-trained from unlabeled speech as a feature extractor for a downstream AST task. We show that self-supervised pre-training is particularly efficient in low resource settings and that fine-tuning CPC models on the AST training data further improves performance. Even in higher resource settings, ensembling AST models trained with filter-bank and CPC representations leads to near state-of-the-art models without using any ASR pre-training. This might be particularly beneficial when one needs to develop a system that translates from speech in a language with poorly standardized orthography or even from speech in an unwritten language.

Paper

prev Tue-1-1-2 Efficient Wait-k Models for Simultaneous Machine Translation

next Tue-1-1-4 Contextualized Translation of Automatically Segmented Speech

About

About the Conference

Welcome from the Chair

Conference Committees

Calls