Yuki Takashima(Kobe University), Ryoichi Takashima(Kobe University), Tetsuya Takiguchi(Kobe University) and Yasuo Ariki(Kobe University)
We present in this paper an automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. Because their utterances are often unstable or unclear, speech recognition systems have difficulty recognizing the speech of those with this disorder. For example, their speech styles often fluctuate greatly even when they are repeating the same sentences. For this reason, their speech tends to have great variation even within recognition classes. To alleviate this intra-class variation problem, we propose an ASR system based on deep metric learning. This system learns an embedded representation that is characterized by a small distance between input utterances of the same class, while the distance of the input utterances of different classes is large. Therefore, our method makes it easy for the ASR system to distinguish dysarthric speech. Experimental results show that our proposed approach using deep metric learning improves the word-recognition accuracy consistently. Moreover, we also evaluate the combination of our proposed method and transfer learning from unimpaired speech to alleviate the low-resource problem associated with impaired speech.