Wed-2-10-4 Speech rate task-specific representation learning from acoustic-articulatory data

Renuka Mannem(Indian Institute of Science), Himajyothi Rajamahendravarapu(Rajiv Gandhi University of Knowledge Technologies, Kadapa), Aravind Illa(PhD Student, Indian Institute of Science, Bangalore) and Prasanta Ghosh(Assistant Professor, EE, IISc)
Abstract: In this work, speech rate is estimated using the task-specific representations which are learned from the acoustic-articulatory data, in contrast to generic representations which may not be optimal for the speech rate estimation. 1-D convolutional filters are used to learn speech rate specific acoustic representations from the raw speech. A convolutional dense neural network (CDNN) is used to estimate the speech rate from the learned representations. In practice, articulatory data is not directly available; thus, we use Acoustic-to-Articulatory Inversion (AAI) to derive the articulatory representations from acoustics. However, these pseudo-articulatory representations are also generic and not optimized for any task. To learn the speech-rate specific pseudo-articulatory representations, we propose a joint training of BLSTM-based AAI and CDNN using a weighted loss function that considers the losses corresponding to speech rate estimation and articulatory prediction. The proposed model yields an improvement in speech rate estimation by ~18.5% in terms of pearson correlation coefficient (CC) compared to the baseline CDNN model with generic articulatory representations as inputs. To utilize complementary information from articulatory features, we further perform experiments by concatenating task-specific acoustic and pseudo-articulatory representations, which yield an improvement in CC by ~2.5% compared to the baseline CDNN model.
Student Information

Student Events

Travel Grants