Thu-2-8-5 Utterance confidence measure for end-to-end speech recognition with applications to ondevice-server hybrid ASR

Ankur Kumar(Samsung Research India Bangalore), Dhananjaya Gowda(Samsung Research), Sachin Singh(SRIB), Abhinav Garg(Samsung Research), Shatrughan Singh(SRIB) and Chanwoo Kim(Samsung Research)

Abstract: In this paper, we present techniques to compute confidence score on the predictions made by an end-to-end speech recognition model. Our proposed neural confidence measure (NCM) is trained as a binary classification task to accept or reject an end-to-end speech recognition result. We incorporate features from an encoder, a decoder, and an attention block of the attention-based end-to-end speech recognition model to improve NCM significantly. We observe that using information from multiple beams further improves the performance. As a case study of this NCM, we consider an application of the utterance-level confidence score in a distributed speech recognition environment with two or more speech recognition systems running on different platforms with varying resource capabilities. We show that around 57% computation on a resource-rich high-end platform (e.g. a cloud platform) can be saved without sacrificing accuracy compared to the high-end only solution. Around 70-80% of computations can be saved if we allow a degradation of word error rates to within 5-10% relative to the high-end solution.

Paper

prev Thu-2-8-4 Do End-to-End Speech Recognition Models Care About Context?

next Thu-2-8-6 Speaker Code Based Speaker Adaptive Training Using Model Agnostic Meta-learning

About

About the Conference

Welcome from the Chair

Conference Committees

Calls