Viet Anh Trinh(The Graduate Center, CUNY, New York, USA) and Michael Mandel(Brooklyn College, CUNY)
Abstract:
This paper proposes a metric that we call the structured saliency
benchmark (SSBM) to evaluate importance maps computed for
automatic speech recognizers on individual utterances. These
maps indicate time-frequency points of the utterance that are
most important for correct recognition of a target word. Our
evaluation technique is not only suitable for standard classification
tasks, but is also appropriate for structured prediction
tasks like sequence-to-sequence models. Additionally, we use
this approach to perform a comparison of the importance maps
created by our previously introduced technique using “bubble
noise” to identify important points through correlation with a
baseline approach based on smoothed speech energy and forced
alignment. Our results show that the bubble analysis approach is
better at identifying important speech regions than this baseline
on 100 sentences from the AMI corpus.