Thu-3-8-2 Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Lukasz Augustyniak(Wroclaw University of Science and Technology), Piotr Szymański(Avaya Inc. / Wrocław University of Technology), Mikolaj Morzy(Poznan University of Technology), Piotr Żelasko(Johns Hopkins University), Adrian Szymczak(AVAYA), Jan Mizgajski(AVAYA), Yishay Carmiel(AVAYA) and Najim Dehak(Johns Hopkins University)
Abstract: Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homophones (words which share exact or almost exact pronunciation but differ in meaning) and oronyms (homophones which consist of multiple words). We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for a better alignment of homophone embeddings and the validation of the presented method on the punctuation prediction task. We record the absolute improvement in punctuation prediction accuracy between 6.2% (for question marks) to 9% (for periods) when compared with the state-of-the-art model.
Student Information

Student Events

Travel Grants