Mon-2-5-9 Joint prediction of punctuation and disfluency in speech transcripts

Binghuai Lin(Tencent Technology Co., Ltd) and Liyuan Wang(Tencent Technology Co., Ltd)
Abstract: Spoken language transcripts generated from Automatic speech recognition (ASR) often contain a large portion of disfluency and lack punctuation symbols. Punctuation restoration and dis- fluency removal of the transcripts can facilitate downstream tasks such as machine translation, information extraction and syntactic analysis [1]. Various studies have shown the influence between these two tasks and thus performed modeling based on a multi-task learning (MTL) framework [2, 3], which learns general representations in the shared layers and separate repre- sentations in the task-specific layers. However, task dependen- cies are normally ignored in the task-specific layers. To model the dependencies of tasks, we propose an attention based struc- ture in the task-specific layers of the MTL framework incorpo- rating the pretrained BERT (a state-of-art NLP-related model) [4]. Experimental results based on English IWSLT dataset and the Switchboard dataset show the proposed architecture outper- forms the separate modeling methods as well as the traditional MTL methods.
Student Information

Student Events

Travel Grants