Mon-2-5-9 Joint prediction of punctuation and disfluency in speech transcripts

Binghuai Lin(Tencent Technology Co., Ltd) and Liyuan Wang(Tencent Technology Co., Ltd)

Abstract: Spoken language transcripts generated from Automatic speech recognition (ASR) often contain a large portion of disfluency and lack punctuation symbols. Punctuation restoration and dis- fluency removal of the transcripts can facilitate downstream tasks such as machine translation, information extraction and syntactic analysis [1]. Various studies have shown the influence between these two tasks and thus performed modeling based on a multi-task learning (MTL) framework [2, 3], which learns general representations in the shared layers and separate repre- sentations in the task-specific layers. However, task dependen- cies are normally ignored in the task-specific layers. To model the dependencies of tasks, we propose an attention based struc- ture in the task-specific layers of the MTL framework incorpo- rating the pretrained BERT (a state-of-art NLP-related model) [4]. Experimental results based on English IWSLT dataset and the Switchboard dataset show the proposed architecture outper- forms the separate modeling methods as well as the traditional MTL methods.

Paper

prev Mon-2-5-8 Exploiting Cross Domain Visual Feature Generation for Disordered Speech Recognition

next Mon-2-5-10 Focal Loss for Punctuation Prediction

About

About the Conference

Welcome from the Chair

Conference Committees

Calls