Wed-2-6-7 Neural Speech Completion

Kazuki Tsunematsu(Nara Institute of Science and Technology (NAIST)), Johanes Effendi(Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology and RIKEN AIP Center)

Abstract: During a conversation, humans often predict the end of a sentence even when the other person has not finished it. In contrast, most current automatic speech recognition systems remain limited to passively recognizing what is being said. But applications like voice search, simultaneous speech translation, and spoken language communication may require a system that not only recognizes what has been said but also predicts what will be said. This paper proposes a speech completion system based on deep learning and discusses the construction in a text-to-text, speech-to-text, and speech-to-speech framework. We evaluate our system on domain-specific sentences with synthesized speech utterances that are only 25%, 50%, or 75% complete. Our proposed systems provide more natural suggestions than the Bidirectional Encoder Representations from Transformers (BERT) language representation model.

Paper

prev Wed-2-6-6 Generative Adversarial Training Data Adaptation for Very Low-resource Automatic Speech Recognition

next Wed-2-6-8 Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

About

About the Conference

Welcome from the Chair

Conference Committees

Calls