Thu-3-7-8 Vector Quantized Temporally-Aware Correspondence Sparse Autoencoders for Zero-resource Acoustic Unit Discovery

Batuhan Gundogdu(Bogazici University), Bolaji Yusuf(Bogazici University), Mansur Yesilbursa(Bogazici University) and Murat Saraclar(Bogazici University)

Abstract: A recent task posed by the Zerospeech challenge is the unsupervised learning of the basic acoustic units that exist in an unknown language. Previously, we introduced recurrent sparse autoencoders fine-tuned with corresponding speech segments obtained by unsupervised term discovery. There, the clustering was obtained on the intermediate layer where the nodes represent the acoustic unit assignments. In this paper, we extend this system by incorporating vector quantization and an adaptation of the winner-take-all networks. This way, symbol continuity could be enforced by excitatory and inhibitory weights along the temporal axis. Furthermore, in this work, we utilized the speaker information in a speaker adversarial training on the encoder. The ABX discriminability and the low bitrate results of our proposed approach on the Zerospeech 2020 challenge demonstrate the effect of the enhanced continuity of the encoding brought by the temporal-awareness and sparsity techniques proposed in this work.

Paper

prev Thu-3-7-1 The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units

next Thu-3-7-10 Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

About

About the Conference

Welcome from the Chair

Conference Committees

Calls