Mon-2-8-3 Contrastive Predictive Coding of Audio with an Adversary

Luyu Wang(DeepMind), Kazuya Kawakami(DeepMind) and Aaron van den Oord(DeepMind)

Abstract: With the vast amount of audio data available, powerful sound representations can be learned with self-supervised methods even in the absence of explicit annotations. In this work we investigate learning general audio representations directly from raw signals using the Contrastive Predictive Coding objective. We further extend it by leveraging ideas from adversarial machine learning to produce additive perturbations that effectively makes the learning harder, such that the predictive tasks will not be distracted by trivial details. We also look at the effects of different design choices for the objective, including the nonlinear similarity measure and the way the negatives are drawn. Combining these contributions our models are able to considerably outperform previous spectrogram-based unsupervised methods. On AudioSet we observe a relative improvement of 14% in mean average precision over the state of the art with half the size of the training data.

Paper

prev Mon-2-8-2 Environmental Sound Classification with Parallel Temporal-spectral Attention

next Mon-2-8-4 Memory Controlled Sequential Self Attention for Sound Recognition

About

About the Conference

Welcome from the Chair

Conference Committees

Calls