Tue-1-5-5 An alternative to MFCCs for ASR

Pegah Ghahremani(Johns Hopkins University), Hossein Hadian(Department of Computer Engineering, Sharif University of Technology, Tehran, Iran), Sanjeev Khudanpur(Johns Hopkins University), Hynek Hermansky(JHU) and Dan Povey(Johns Hopkins University)

Abstract: The Mel scale is the most commonly used frequency warping function to extract features for automatic speech recognition (ASR) and is known to be quite effective. However, it is not specifically designed for ASR acoustic models based on deep neural networks (DNN). In this study, we introduce a frequency warping function which is a modified version of Mel scale. This warping function is parameterized using 2 parameters and we use it to propose a new set of features called modified Mel-frequency cepstral coefficients (MFCC), which use cosine-shaped filters. The bandwidths are computed using a new function. By evaluating the proposed features on a variety of ASR data sets, we see consistent improvements over regular MFCCs and (log) Mel filter bank energies.

Paper

prev Tue-1-5-4 Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions

next Tue-1-5-6 Phase based spectro-temporal features for building a robust ASR system

About

About the Conference

Welcome from the Chair

Conference Committees

Calls