Panagiotis Tzirakis(Imperial College London), Alexander Shiarella(Imperial College London), Robert Ewers(Imperial College London) and Björn Schuller(University of Augsburg / Imperial College London)
Auditory data is used by ecologists for a variety of purposes, including identifying species ranges, estimating population sizes, and studying behaviour. Autonomous recording units (ARUs) enable auditory data collection over a wider area, and can provide improved consistency over traditional sampling methods. The result is an abundance of audio data -- much more than can be analysed by scientists with the appropriate taxonomic skills.
In this paper, we address the divide between academic machine learning research on animal vocalisation classifiers, and their application to conservation efforts. As a unique case study, we build a Bornean gibbon call detection system by first manually annotating existing data, and then comparing audio analysis tool kits including end-to-end and bag-of-audio-word modelling. Finally, we propose a deep architecture that outperforms the other approaches with respect to unweighted average recall. The code is available at: https://github.com/glam-imperial/Bornean-Gibbons-Call-Detection