Xueli Jia(Ping An Technology (Shenzhen) Co., Ltd.), Jianzong Wang(Ping An Technology (Shenzhen) Co., Ltd.), Zhiyong Zhang(PingAn Tech.), Ning Cheng(Ping An Technology (Shenzhen) Co., Ltd.) and Jing Xiao(Ping An Technology)
Abstract:
End-to-end Spoken Language Understanding (SLU) models are
made increasingly large and complex to achieve the state-of-the-art accuracy. However, the increased complexity of a model
can also introduce higher risk of over-fitting, which is a major
challenge in SLU task due to the limitation of available data.
In this paper, we propose an attention-based SLU model together with three encoder enhancement strategies to overcome
data sparsity challenge. The first strategy focuses on transfer-learning approach to improve feature extraction capability of
the encoder. It is implemented by pre-training the encoder component with a quantity of ASR
annotated data
and then fine-tune the SLU model with a small amount of target labelled data. The second strategy adopts multitask learning strategy that integrates ASR as an auxiliary task,
which shares the underlying encoder, to improve robustness and generalization ability. The third strategy, using
Component Fusion, involves a BERT model and aims to boost
the capability of decoder in the auxiliray network. It hence
further reduces the risk of over-fitting and augments the ability
of underlying encoder, indirectly. Experiments on the FluentAI
data-set show that cross-language transfer learning and multitask strategies have been improved by up to 4.52% and 3.89%
respectively, compared to the baseline.