Mon-1-10-6 CUCHILD: A Large-Scale Cantonese Corpus of Child Speech for Phonology and Articulation Assessment

Si-Ioi Ng(The Chinese University of Hong Kong), Cymie Wing-Yee Ng(The Chinese University of Hong Kong), Jiarui Wang(The Chinese University of Hong Kong), Tan Lee(The Chinese University of Hong Kong), Kathy Yuet-Sheung Lee(The Chinese University of Hong Kong) and Michael Chi-Fai Tong(The Chinese University of Hong Kong)
Abstract: This paper describes the design and development of CUCHILD, a large-scale Cantonese corpus of child speech. The corpus contains spoken words collected from 1,986 child speakers aged from 3 to 6 years old. The speech materials include 130 words of 1 to 4 syllables in length. The speakers cover both typically developing (TD) children and children with speech disorder. The intended use of the corpus is to support scientific and clinical research, as well as technology development related to child speech assessment. The design of the corpus, including selection of words, participants recruitment, data acquisition process, and data pre-processing are described in detail. The results of acoustical analysis are presented to illustrate the properties of child speech. Potential applications of the corpus in automatic speech recognition, phonological error detection and speaker diarization are also discussed.
Student Information

Student Events

Travel Grants