Yilin Pan(University of Sheffield), Bahman Mirheidari(Department of Computer Science, University of Sheffield), Zehai Tu(Department of Computer Science, University of Sheffield), Ronan O'Malley(Sheffield Institute for Translational Neuroscience (SITraN)), Traci Walker(Department of Human Communication Sciences), Annalena Venneri(Department of Neuroscience, Royal Hallamshire Hospital), Markus Reuber(Academic Neurology Unit, Royal Hallamshire Hospital), Daniel Blackburn(Sheffield Institute for Translational Neuroscience (SITraN)) and Heidi Christensen(University of Sheffield)
Speech-based automatic approaches for detecting neurodegenerative disorders (ND) and mild cognitive impairment (MCI) have received more attention recently due to being non-invasive and potentially more sensitive than current pen-and-paper tests. The performance of such systems is highly dependent on the choice of features in the classification pipeline. In particular for acoustic features, arriving at a consensus for a best feature set has proven challenging. This paper explores using deep neural network for extracting features directly from the speech signal as a solution to this. Compared with hand-crafted features, more information is present in the raw waveform, but the feature extraction process becomes more complex and less interpretable which is often undesirable in medical domains. Using a SincNet as a first layer allows for some analysis of learned features. We propose and evaluate the Sinc-CLA (with SincNet, Convolutional, Long Short-Term Memory and Attention layers) as a task-driven acoustic feature extractor for classifying MCI, ND and healthy controls (HC). Experiments are carried out on an in-house dataset. Compared with the popular hand-crafted feature sets, the learned task-driven features achieve a superior classification accuracy. The filters of the SincNet is inspected and acoustic differences between HC, MCI and ND are found.