Speech Translation and Multilingual/Multimodal Learning

Tue-1-1-6 Evaluating and Optimizing Prosodic Alignment for Automatic Dubbing

Marcello Federico(Amazon AI), Yogesh Virkar(Amazon), Robert Enyedi(Amazon) and Roberto Barra-Chicote(Amazon)
Abstract: Automatic dubbing aims at replacing all speech contained in a video with speech in a different language, so that the result sounds and looks as natural as the original. Hence, in addition to conveying the same content of an original utterance (which is the typical objective of speech translation), dubbed speech should ideally also match its duration, the lip movements and gestures in the video, timbre, emotion and prosody of the speaker, and finally background noise and reverberation of the environment. In this paper, after describing our dubbing architecture, we focus on recent progress on the prosodic alignment component, which aims at synchronizing the translated transcript with the original utterances. We present empirical results for English-to-Italian dubbing on a publicly available collection of TED Talks. Our new prosodic alignment model, which allows for small relaxations in synchronicity, shows to significantly improve both prosodic alignment accuracy and overall subjective dubbing quality of previous work.
Student Information

Student Events

Travel Grants