Tue-1-8-7 LVCSR with Transformer Language Models

Eugen Beck （RWTH Aachen University）, Ralf Schluter （RWTH Aachen University）, Hermann Ney （RWTH Aachen University）

Abstract: Neural network language models (LMs) based on self-attention have recently outperformed the previous state of the art, LSTM LMs. Transformer LMs today are often used as a postprocessing step in lattice or n-best list rescoring. In this work the main focus is on using them in one-pass recognition. We show that by a simple reduction of redundant computations in batched self-attention we can obtain a 15% reduction in overall RTF on a well-tuned system. We also show that through proper initialization the layer normalization inside the residual blocks can be removed, yielding a further increase in forwarding speed. All is done under the constraint of staying close to state-of-the-art in terms of word-error rate (5.4% on librispeech test-other) and achieving a real-time factor of around 1. Last but not least we also present an approach to speed up classic push-forward rescoring by mixing it with n-best list rescoring to better utilize the inherent parallelizability of transformer language models, cutting the time needed for rescoring in half.

Paper

prev Tue-1-8-6 Hierarchical Multi-Stage Word-to-Grapheme Named Entity Corrector for Automatic Speech Recognition

next Tue-1-8-8 DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation

About

About the Conference

Welcome from the Chair

Conference Committees

Calls