Ankur Kumar(Samsung Research India Bangalore), Dhananjaya Gowda(Samsung Research), Sachin Singh(SRIB), Abhinav Garg(Samsung Research), Shatrughan Singh(SRIB) and Chanwoo Kim(Samsung Research)
In this paper, we present techniques to compute confidence score on the predictions made by an end-to-end speech recognition model. Our proposed neural confidence measure (NCM) is trained as a binary classification task to accept or reject an end-to-end speech recognition result. We incorporate features from an encoder, a decoder, and an attention block of the attention-based end-to-end speech recognition model to improve NCM significantly. We observe that using information from multiple beams further improves the performance. As a case study of this NCM, we consider an application of the utterance-level confidence score in a distributed speech recognition environment with two or more speech recognition systems running on different platforms with varying resource capabilities. We show that around 57% computation on a resource-rich high-end platform (e.g. a cloud platform) can be saved without sacrificing accuracy compared to the high-end only solution. Around 70-80% of computations can be saved if we allow a degradation of word error rates to within 5-10% relative to the high-end solution.