Wed-3-9-5 Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition

Hieu Nguyen(Amazon.com), Anastasios Alexandridis(Amazon.com) and Athanasios Mouchtaris(Amazon.com)

Abstract: Compression and quantization is important to neural networks in general and Automatic Speech Recognition (ASR) systems in particular, especially when they operate in real-time on resource-constrained devices. By using fewer number of bits for the model weights, the model size becomes much smaller while inference time is reduced significantly, with the cost of degraded performance. Such degradation can be potentially addressed by the so-called quantization-aware training (QAT). Existing QATs mostly take into account the quantization in forward propagation, while ignoring the quantization loss in gradient calculation during back-propagation. In this work, we introduce a novel QAT scheme based on absolute-cosine regularization (ACosR), which enforces a prior, quantization-friendly distribution to the model weights. We apply this novel approach into ASR task assuming a recurrent neural network transducer (RNN-T) architecture. The results show that there is zero to little degradation between floating-point, 8-bit, and 6-bit ACosR models. Weight distributions further confirm that in-training weights are very close to quantization levels when ACosR is applied.

Paper

prev Wed-3-9-4 Iterative Compression of End-to-End ASR Model using AutoML

next Wed-3-9-6 Streaming on-device end-to-end ASR system for privacy-sensitive voicetyping

About

About the Conference

Welcome from the Chair

Conference Committees

Calls