Thu-3-8-9 Language Modeling for Speech Analytics in Under-Resourced Languages

Charl van Heerden(Saigen (Pty) Ltd), Simone Wills(Saigen (Pty) Ltd), Pieter Uys(Saigen (Pty) Ltd) and Etienne Barnard(Saigen (Pty) Ltd)
Abstract: Different language modeling approaches are evaluated on two under-resourced, agglutinative, South African languages; Sesotho and isiZulu. The two languages present different challenges to language modeling based on their respective orthographies; isiZulu is conjunctively written whereas Sotho is disjunctively written. Two subword modeling approaches are evaluated and shown to be useful to reduce the OOV rate for isiZulu, and for Sesotho, a multi-word approach is evaluated for improving ASR accuracy, with limited success. RNNs are also evaluated and shown to slightly improve ASR accuracy, despite relatively small text corpora.
Student Information

Student Events

Travel Grants