Wed-3-9-6 Streaming on-device end-to-end ASR system for privacy-sensitive voicetyping

Abhinav Garg(Samsung Electronics), Gowtham Vadisetti(Samsung Electronics), Sichen Jin(Samsung Electronics), Dhananjaya Gowda(Samsung Research), Aditya Jayasimha(Samsung Electronics), Youngho Han(Samsung Electronics), Jiyeon Kim(Samsung Electronics), Junmo Park(Samsung Electronics), Kwangyoun Kim(Samsung Research), Sooyeon Kim(Samsung Electronics), Youngyoon Lee(Samsung Electronics), Kyungbo Min(Samsung Electronics) and Chanwoo Kim(Samsung Research)

Abstract: In this paper, we present our streaming on-device end-to-end speech recognition solution for a privacy sensitive voice-typing application which primarily involves typing user private details and passwords. We highlight challenges specific to voice-typing scenario in the Korean language and propose solutions to these problems within the framework of a streaming attention-based speech recognition system. Some important challenges in voice-typing are the choice of output units, coupling of multiple characters into longer byte-pair encoded units, lack of sufficient training data. Apart from customizing a high accuracy open domain streaming speech recognition model for voice-typing applications, we retain the performance of the model for open domain tasks without significant degradation. We also explore domain biasing using a shallow fusion with a weighted finite state transducer (WFST). We obtain approximately 13% relative word error rate (WER) improvement on our internal Korean voice-typing dataset without a WFST and about 30% additional WER improvement with a WFST fusion.

Paper

prev Wed-3-9-5 Quantization Aware Training with Absolute-Cosine Regularization for Automatic Speech Recognition

next Wed-3-9-7 Scaling Up Online Speech Recognition Using ConvNets

About

About the Conference

Welcome from the Chair

Conference Committees

Calls