Wed-1-10-9 Now you’re speaking my language: Visual language identification

Triantafyllos Afouras(University of Oxford), Joon Son Chung(University of Oxford) and Andrew Zisserman(University of Oxford)
Abstract: The goal of this work is to train models that can identify a spoken language just by interpreting the speaker's lip movements. To this end, we collect a large scale multilingual audio-visual speech dataset with language labels, starting from TEDx talks downloaded from YouTube. Our contributions are the following: (i) We show that models can learn to discriminate among 14 different languages using only visual speech information; (ii) we compare different designs in sequence modelling and utterance-level aggregation in order to determine the best architecture for this task; (iii) we investigate the factors that contribute discriminative cues and show that our model indeed solves the problem by finding temporal patterns in mouth movements and not by exploiting spurious correlations. We demonstrate this further by evaluating our models on challenging examples from bilingual speakers.
Student Information

Student Events

Travel Grants