Tue-1-8-7 LVCSR with Transformer Language Models

Eugen Beck (RWTH Aachen University), Ralf Schluter (RWTH Aachen University), Hermann Ney (RWTH Aachen University)
Abstract: Neural network language models (LMs) based on self-attention have recently outperformed the previous state of the art, LSTM LMs. Transformer LMs today are often used as a postprocessing step in lattice or n-best list rescoring. In this work the main focus is on using them in one-pass recognition. We show that by a simple reduction of redundant computations in batched self-attention we can obtain a 15% reduction in overall RTF on a well-tuned system. We also show that through proper initialization the layer normalization inside the residual blocks can be removed, yielding a further increase in forwarding speed. All is done under the constraint of staying close to state-of-the-art in terms of word-error rate (5.4% on librispeech test-other) and achieving a real-time factor of around 1. Last but not least we also present an approach to speed up classic push-forward rescoring by mixing it with n-best list rescoring to better utilize the inherent parallelizability of transformer language models, cutting the time needed for rescoring in half.
Student Information

Student Events

Travel Grants