Charl van Heerden(Saigen (Pty) Ltd), Simone Wills(Saigen (Pty) Ltd), Pieter Uys(Saigen (Pty) Ltd) and Etienne Barnard(Saigen (Pty) Ltd)
Abstract:
Different language modeling approaches are evaluated on two under-resourced, agglutinative, South African languages; Sesotho and isiZulu. The two languages present different challenges to language modeling based on their respective orthographies; isiZulu is conjunctively written whereas Sotho is disjunctively written. Two subword modeling approaches are evaluated and shown to be useful to reduce the OOV rate for isiZulu, and for Sesotho, a multi-word approach is evaluated for improving ASR accuracy, with limited success. RNNs are also evaluated and shown to slightly improve ASR accuracy, despite relatively small text corpora.