Wed-SS-2-7-1 Combining Audio and Brain Activity for Predicting Speech Quality

Ivan Halim Parmonangan(Nara Institute of Science and Technology), Hiroki Tanaka(Nara Institute of Science and Technology), Sakriani Sakti(Nara Institute of Science and Technology (NAIST) / RIKEN AIP) and Satoshi Nakamura(Nara Institute of Science and Technology)
Abstract: Since the perceived audio quality of the synthesized speech may determine a system's market success, quality evaluations are critical. Audio quality evaluations are usually done in either subjectively or objectively. Due to their costly and time-consuming nature, the subjective approaches have generally been replaced by the faster, more cost-efficient objective approaches. The primary downside of the objective approaches primarily is that they lack the human influence factors which are crucial for deriving the subjective perception of quality. However, it cannot be observed directly and manifested in individual brain activity. Thus, we combined predictions from single-subject electroencephalograph (EEG) information and audio features to improve the predictions of the overall quality of synthesized speech. Our result shows that by combining the results from both audio and EEG models, a very simple neural network can surpass the performance of the single-modal approach.
Student Information

Student Events

Travel Grants