Thu-1-1-8 Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems

Ravichander Vipperla(Samsung AI Centre), Sangjun Park(Samsung Research), Kihyun Choo(Samsung Research), Samin Ishtiaq(Samsung AI Center), Kyoungbo Min(Samsung Research), Sourav Bhattacharya(Samsung AI Center), Abhinav Mehrotra(Samsung AI Center), Alberto Gil Couto Pimentel Ramos(Samsung AI Center) and Nicholas D. Lane(Samsung AI Center)

Abstract: LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it’s complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).

Paper

prev Thu-1-1-7 Reverberation Modeling for Source-Filter-based Neural Vocoder

next Thu-1-1-9 Neural Text-to-Speech with a Modeling-by-Generation Excitation Vocoder

About

About the Conference

Welcome from the Chair

Conference Committees

Calls