Mon-2-8-4 Memory Controlled Sequential Self Attention for Sound Recognition

Arjun Pankajakshan(Queen Mary University of London), Helen L. Bear(Queen Mary University of London), Vinod Subramanian(Queen Mary University of London) and Emmanouil Benetos(Queen Mary University of London)

Abstract: In this paper we investigate the importance of the extent of memory in sequential self attention for sound recognition. We propose to use a memory controlled sequential self attention mechanism on top of a convolutional recurrent neural network (CRNN) model for polyphonic sound event detection (SED). Experiments on the URBAN-SED dataset demonstrate the impact of the extent of memory on sound recognition performance with the self attention induced SED model. We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values. The proposed use of memory controlled sequential self attention offers a way to induce relations among frames of sound event tokens. We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the F -score of 20.10% reported by the model without self attention.

Paper

prev Mon-2-8-3 Contrastive Predictive Coding of Audio with an Adversary

next Mon-2-8-5 Dual Stage Learning based Dynamic Time-Frequency Mask Generation for Audio Event Classification

About

About the Conference

Welcome from the Chair

Conference Committees

Calls