Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation

Mon-2-3-6 Confidence measures in encoder-decoder models for speech recognition

Alejandro Woodward(Universitat Politècnica de Catalunya), Clara Bonnín(Vilynx), Daivid Varas(Vilynx), Issey Masuda(Vilynx), Elisenda Bou-Balust(Vilynx) and Juan Carlos Riveiro(Vilynx)
Abstract: Recent improvements in Automatic Speech Recognition (ASR) systems have enabled the growth of myriad applications such as voice assistants, intent detection, keyword extraction and sentiment analysis. These applications, which are now widely used in the industry, are very sensitive to the errors generated by ASR systems. This could be overcome by having a reliable confidence measurement associated to the predicted output. This work presents a novel method which uses internal neural features of a frozen ASR model to train an independent neural network to predict a softmax temperature value. This value is computed in each decoder time step and multiplied by the logits in order to redistribute the output probabilities. The resulting softmax values corresponding to predicted tokens constitute a more reliable confidence measure. Moreover, this work also studies the effect of teacher forcing on the training of the proposed temperature prediction module. The output confidence estimation shows an improvement of -25.78\% in EER and +7.59\% in AUC-ROC with respect to the unaltered softmax values of the predicted tokens, evaluated on a proprietary dataset consisting on News and Entertainment videos.
Student Information

Student Events

Travel Grants