Gurunath Reddy M(Indian Institute of Technology, Kharagpur, India), K Sreenivasa Rao(Professor) and Partha Pratim Das(Department of Computer Science & Engineering, IIT Kharagpur)
Electroglottography is a non-invasive technique to acquire the vocal folds activity across the larynx called EGG signal. The EGG is a clean signal free from vocal tract resonances, the parameters extracted from such a signal finds many applications in clinical and speech processing technology. In this paper, we propose a classification based approach to detect the significant parameter of the EGG such as glottal closure instant (GCI). We train deep convolutional neural networks (CNN) to predict if a frame of samples contain GCI location. Further, the GCI location within the frame is obtained by exploiting its unique manifestation from its first order derivative. We train several CNN models to determine the suitable input feature representation to efficiently detect the GCI location. Further, we train and evaluate the models on multiple speaker dataset to determine and eliminate any bias towards the speaker. We also show that the GCI identification rate can be improved significantly by the model trained with joint EGG and derivative (dEGG) signal. The deep models are trained with manually annotated GCI markers obtained from dEGG as reference. The objective evaluation measures confirmed that the proposed method is comparable and better than the traditional signal processing GCI detection methods.