Mon-1-1-5 Compressing LSTM Networks with Hierarchical Coarse-Grain Sparsity

Deepak Kadetotad(Arizona State University / Starkey Hearing Technologies), Jian Meng(Arizona State Unviersity), Visar Berisha(Arizona State University), Chaitali Chakrabarti(Arizona State University) and Jae-sun Seo(Arizona State University)

Abstract: The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but exhibits millions of parameters. This makes it prohibitive for memory constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which becomes a bottleneck for latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this can aid acceleration and storage reduction for both training and inference hardware systems. We also jointly optimize in-training low-precision quantization with HCGS-based structured sparsity on 2-/3-layer LSTM networks for TIMIT and TED-LIUM corpora. With 16X structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM corpora, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

Paper

prev Mon-1-1-4 ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

next Mon-1-1-6 BLSTM-Driven Stream Fusion for Automatic Speech Recognition: Novel Methods and a Multi-Size Window Fusion Example

About

About the Conference

Welcome from the Chair

Conference Committees

Calls