Shahla Farzana(University of Illinois Chicago) and Natalie Parde(University of Illinois at Chicago)
The Mini Mental State Examination (MMSE) is a common standardized screening mechanism that is employed to help diagnose dementia and assess its progression and severity. The assessment is generally administered by trained clinicians, which may be time-consuming and costly. An intriguing and scalable alternative is to detect the same types of changes in cognitive function by automatically monitoring individuals' memory and language abilities in their conversational narratives. We address that problem by predicting clinical MMSE scores using a set of 628 verbal and non-verbal features extracted from the transcripts of 108 speech samples from the ADReSS Challenge dataset. We achieve a Root Mean Squared Error (RMSE) of 4.34, a percentage decrease of 29.3% over the existing performance benchmark. We also explore the performance impacts of acoustic versus linguistic features and find that our acoustic-only model achieves an RMSE of 6.42, whereas our text-only models achieve much lower RMSE scores (minimum RMSE=4.34), providing strong positive support for the inclusion of linguistic features in future MMSE score prediction models. Our best-performing model leverages a selection of many linguistic and non-linguistic feature types, demonstrating that MMSE score prediction is a rich problem that is best addressed using input from multiple perspectives.