Tue-1-4-10 Identifying Important Time-frequency Locations in Continuous Speech Utterances

Hassan Salami Kavaki(The Graduate Center, CUNY, New York) and Michael Mandel(Brooklyn College, CUNY, New York)

Abstract: Human listeners use specific cues to recognize speech and recent experiments have shown that certain time-frequency regions of individual utterances are more important to their correct identification than others. A model that could identify such cues or regions from clean speech would facilitate speech recognition and speech enhancement by focusing on those important regions. Thus, in this paper we present a model that can predict the regions of individual utterances that are important to an automatic speech recognition (ASR) ``listener'' by learning to add as much noise as possible to these utterances while still permitting the ASR to correctly identify them. This work utilizes a continuous speech recognizer to recognize multi-word utterances and builds upon our previous work that performed the same process for an isolated word recognizer. Our experimental results indicate that our model can apply noise to obscure 90.5% of the spectrogram while leaving recognition performance nearly unchanged.

Paper

prev Tue-1-4-9 Social and functional pressures in vocal alignment: Differences for human and voice-AI interlocutors

next No More

About

About the Conference

Welcome from the Chair

Conference Committees

Calls