Home
About

About the Conference Welcome from the Chair Conference Committees Area Chairs Organizers ISCA
Calls

Papers Surveys Satellite Workshops Tutorials Show & Tell Special Sessions & Challenges Areas & Topics Important Dates
Authors

Author Resources Submission Policy ISCA Ethics Paper Submission Presentation Guidelines
Program

Program at a Glance Technical Program Presentation Videos Presentation Guidelines Keynotes Satellite Workshops Tutorials Special Sessions & Challenges Show & Tell
Student Information

Student Events Travel Grants
Venue & Travel

Conference Venue & Accommodations Transportations Visa About Shanghai
Registration

Registration Overview & Fees ISCA Membership ISCA Code of Conduct Online Registration
Sponsorships & Exhibition

Sponsors Virtual Booth Satellite Events Acknowledgement
Contact

Contact Us

Program

Technical Program

Presentation Videos

Presentation Guidelines

Keynotes

Satellite Workshops

Tutorials

Special Sessions & Challenges

Show & Tell

Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation

Position: Home > Program > Technical Program > Monday 20:30-21:30(GMT+8), October 26 > Evaluation of Speech Technology Systems and Methods for Resource Construction and Annotation >

Mon-2-3-3 Statistical Testing on ASR Performance via Blockwise Bootstrap

Zhe Liu(Facebook, Inc.) and Fuchun Peng(Facebook)

Abstract: A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval (CI) can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference between two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.

Paper

prev Mon-2-3-2 Analyzing the Quality and Stability of a Streaming End-to-End On-Device Speech Recognizer

next Mon-2-3-4 SENTENCE LEVEL ESTIMATION OF PSYCHOLINGUISTIC NORMS USING JOINT MULTIDIMENSIONAL ANNOTATIONS

About

About the Conference

Welcome from the Chair

Conference Committees

Calls