Prasanth Parasu(University of New South Wales), Julien Epps(School of Electrical Engineering and Telecommunications, UNSW Australia), Kaavya Sriskandaraja(The University of New South Wales) and Gajan Suthokumar(The University of New South Wales)
Abstract:
Current approaches to Voice Presentation Attack (VPA) detection have largely focused on spoofing detection within a single database and/or attack type. However, for practical Presentation Attack Detection (PAD) systems to be adopted by industry, they must be able to generalise to detect diverse and previously unseen VPAs. Inspired by successful aspects of deep learning systems for image classification such as the introduction of residual mappings through shortcut connections, this paper proposes a novel Light-ResNet architecture that provides good generalisation across databases and attack types. The introduction of skip connections within residual modules enables the training of deeper spoofing classifiers that can leverage more useful discriminative information learned in the hidden layers, while still generalising well under mismatched conditions. Utilising the wide variety of databases available for VPA research, this paper also proposes a set of generalisation evaluations which a practical PAD system should be able to meet: generalising within a database, generalising across databases within attack type and generalising across all spoofing classes. Evaluations on the ASVspoof 2015, BTAS 2016 (replay) and ASVspoof 2017 V2.0 databases show that the proposed Light-ResNet architecture is able to generalise across these diverse tasks consistently, outperforming CQCC-GMM and Attentive Filtering Network (AFN) baselines.