Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program

Technical Program

Presentation Videos

Presentation Guidelines

Keynotes

Satellite Workshops

Tutorials

Special Sessions & Challenges

Show & Tell

Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation

Position: Home > Program > Technical Program > Monday 20:30-21:30(GMT+8), October 26 > Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation >

Mon-2-3-6 Confidence measures in encoder-decoder models for speech recognition

Alejandro Woodward(Universitat Politècnica de Catalunya), Clara Bonnín(Vilynx), Daivid Varas(Vilynx), Issey Masuda(Vilynx), Elisenda Bou-Balust(Vilynx) and Juan Carlos Riveiro(Vilynx)

Abstract: Recent improvements in Automatic Speech Recognition (ASR) systems have enabled the growth of myriad applications such as voice assistants, intent detection, keyword extraction and sentiment analysis. These applications, which are now widely used in the industry, are very sensitive to the errors generated by ASR systems. This could be overcome by having a reliable confidence measurement associated to the predicted output. This work presents a novel method which uses internal neural features of a frozen ASR model to train an independent neural network to predict a softmax temperature value. This value is computed in each decoder time step and multiplied by the logits in order to redistribute the output probabilities. The resulting softmax values corresponding to predicted tokens constitute a more reliable confidence measure. Moreover, this work also studies the effect of teacher forcing on the training of the proposed temperature prediction module. The output confidence estimation shows an improvement of -25.78\% in EER and +7.59\% in AUC-ROC with respect to the unaltered softmax values of the predicted tokens, evaluated on a proprietary dataset consisting on News and Entertainment videos.

Paper

prev Mon-2-3-5 Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System

next Mon-2-3-7 Word Error Rate Estimation Without ASR Output: e-WER2

About

About the Conference

Welcome from the Chair

Conference Committees

Calls