Thu-3-6-4 Dysarthric Speech Recognition Based on Deep Metric Learning

Yuki Takashima(Kobe University), Ryoichi Takashima(Kobe University), Tetsuya Takiguchi(Kobe University) and Yasuo Ariki(Kobe University)

Abstract: We present in this paper an automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. Because their utterances are often unstable or unclear, speech recognition systems have difficulty recognizing the speech of those with this disorder. For example, their speech styles often fluctuate greatly even when they are repeating the same sentences. For this reason, their speech tends to have great variation even within recognition classes. To alleviate this intra-class variation problem, we propose an ASR system based on deep metric learning. This system learns an embedded representation that is characterized by a small distance between input utterances of the same class, while the distance of the input utterances of different classes is large. Therefore, our method makes it easy for the ASR system to distinguish dysarthric speech. Experimental results show that our proposed approach using deep metric learning improves the word-recognition accuracy consistently. Moreover, we also evaluate the combination of our proposed method and transfer learning from unimpaired speech to alleviate the low-resource problem associated with impaired speech.

Paper

prev Thu-3-6-3 Staged Knowledge Distillation for End-to-End Dysarthric Speech Recognition and Speech Attribute Transcription

next Thu-3-6-5 Automatic Glottis Detection and Segmentation in Stroboscopic videos using Convolutional Networks

About

About the Conference

Welcome from the Chair

Conference Committees

Calls