Mon-S&T-2-8 End-to-End Deep Learning Speech Recognition Model for Silent Speech Challenge

Naoki Kimura(University of Tokyo), Zixiong Su(University of Tokyo), Takaaki Saeki(University of Tokyo)
Abstract: This work is the first attempt to apply an end-to-end, deep neural network-based automatic speech recognition(ASR) pipeline to the Silent Speech Challenge dataset (SSC), which contains synchronized ultrasound images and lip images captured when a single speaker read the TIMIT corpus without uttering audible sounds. In silent speech research using SSC dataset, established methods in ASR have been utilized with some modifications to use it in visual speech recognition. In this work, we tested the SOTA method of ASR on the SSC dataset using the End-to-End Speech Processing Toolkit, ESPnet. The experimental results show that this end-to-end method achieved a character error rate (CER) of 10.1% and a WER of 20.5% by incorporating SpecAugment, demonstrating the possibility to further improve the performance with additional data collection.
Student Information

Student Events

Travel Grants