Kyubyong Park(Kakao Brain) and Seanie Lee(KAIST)
Abstract:
Conversion of Chinese graphemes to phonemes (G2P) is
an essential component in Mandarin Chinese Text-To-Speech
(TTS) systems. One of the biggest challenges in Chinese
G2P conversion is how to disambiguate the pronunciation of
polyphones—characters having multiple pronunciations. Although
many academic efforts have been made to address it,
there has been no open dataset that can serve as a standard
benchmark for a fair comparison to date. In addition, most
of the reported systems are hard to employ for researchers or
practitioners who want to convert Chinese text into pinyin at
their convenience. Motivated by these, in this work, we introduce
a new benchmark dataset that consists of 99,000+ sentences
for Chinese polyphone disambiguation. We train a simple
Bi-LSTM model on it and find that it outperforms other pre-existing
G2P systems and slightly underperforms pre-trained
Chinese BERT. Finally, we package our project and share it on
PyPi.