Juliette Millet(LLF, Université de Paris and CoML team, LSCP, ENS Paris) and Ewan Dunbar(Université Paris Diderot)
In this paper, we present a dataset and methods to compare speech processing models and humans on a phone discrimination task. We provide Perceptimatic, an open dataset which consists of French and English speech stimuli, as well as the results of 91 English- and 93 French-speaking listeners. The stimuli test a wide range of French and English contrasts, and are extracted directly from corpora of natural running read speech, used for the 2017 Zero Resource Speech Challenge. We provide a method to compare humans' perceptual space with models' representational space and we apply it to models previously submitted to the challenge and some reference systems. We show that the topline used for the challenge, an HMM GMM phone recognition system, while well discriminating phones, is not producing a representational space closed to humans' perception space.