Wed-SS-2-7-4 Speech Spectrogram Estimation from Intracranial Brain Activity using a Quantization Approach

Miguel Angrick(University of Bremen), Christian Herff(Maastricht University), Garett Johnson(Old Dominion University), Jerry Shih(UC San Diego Health), Dean Krusienski(Virginia Commonwealth University) and Tanja Schultz(University of Bremen)
Abstract: Direct synthesis from intracranial brain activity into acoustic speech might provide an intuitive and natural communication means for speech-impaired users. In previous studies we have used logarithmic Mel-scaled speech spectrograms (logMels) as an intermediate representation in the decoding from ElectroCorticoGraphic (ECoG) recordings to an audible waveform. Mel-scaled speech spectrograms have a long tradition in acoustic speech processing and speech synthesis applications. In the past, we relied on regression approaches to find a mapping from brain activity to logMel spectral coefficients, due to the continuous feature space. However, regression tasks are unbounded and thus neuronal fluctuations in brain activity may result in abnormally high amplitudes in a synthesized acoustic speech signal. To mitigate these issues, we propose two methods for quantization of power values to discretize the feature space of logarithmic Mel-scaled spectral coefficients by using the median and the logistic formula, respectively, to reduce the complexity and restricting the number of intervals. We evaluate the practicability in a proof-of-concept with one participant through a simple classification based on linear discriminant analysis and compare the resulting waveform with the original speech. Reconstructed spectrograms achieve Pearson correlation coefficients with a mean of r=0.5 ± 0.11 in a 5-fold cross validation.
Student Information

Student Events

Travel Grants