Ewan Dunbar(Université Paris Diderot), julien karadayi(ENS Ulm), Mathieu Bernard(ENS Ulm), Xuan-Nga Cao(LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA), Robin Algayres(Ecole Normale Supérieure/PSL/Inria), Lucas Ondel(Brno University of Technology), Laurent Besacier(LIG), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Emmanuel Dupoux(Ecole des Hautes Etudes en Sciences Sociales)
We present the Zero Resource Speech Challenge 2020, which aims at learning speech representations from raw audio signals without any labels. It combines the data sets and metrics from two previous benchmarks (2017 and 2019) and features two tasks which tap into two levels of speech representation. The first task is to discover low bit-rate subword representations that optimize the quality of speech synthesis; the second one is to discover word-like units from unsegmented raw speech. We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.