Atsunori Ogawa(NTT Communication Science Laboratories), Naohiro Tawara(NTT Communication Science Laboratories) and Marc Delcroix(NTT Communication Science Laboratories)
To improve the performance of automatic speech recognition (ASR) for a specific domain, it is essential to train a language model (LM) using text data of the target domain. In this study, we propose a method to transfer the domain of a large amount of source data to the target domain and augment the data to train a target domain-specific LM. The proposed method consists of two steps, which use a bidirectional long short-term memory (BLSTM)-based word replacing model and a target domain-adapted LSTMLM, respectively. Based on the learned domain-specific wordings, the word replacing model converts a given source domain sentence to a confusion network (CN) that includes a variety of target domain candidate word sequences. Then, the LSTMLM selects a target domain sentence from the CN by evaluating its grammatical correctness based on decoding scores. In experiments using lecture and conversational speech corpora as the source and target domain data sets, we confirmed that the proposed LM data augmentation method improves the target conversational speech recognition performance of a hybrid ASR system using an n-gram LM and the performance of N-best rescoring using an LSTMLM.