Thu-2-9-9 Discriminative Method to Extract Coarse Prosodic Structure and Its Application for Statistical Phrase/Accent Command Estimation

Yuma Shirahata(The University of Tokyo), Daisuke Saito(The University of Tokyo) and Nobuaki Minematsu(The University of Tokyo)
Abstract: This paper introduces a method of extracting coarse prosodic structure from F0 contours by using a discriminative approach such as DNNs, and applies the method for the parameter estimation of the Fujisaki model. In the conventional methods for the parameter estimation of the Fujisaki model, generative approaches, in which the estimation is treated as an inverse problem of the generation process, have been adopted. On the other hand, recent development of the discriminative approaches would enable us to treat the problem in a direct manner. To introduce a discriminative approach to the parameter estimation of the Fujisaki model in which the precise labels for the parameter are expensive, this study focuses on the similarities between the acoustic realization of the prosodic structure in F0 contours and the sentence structure of the read text. In the proposed method, the sentence structure obtained from the text is utilized as the labels for the discriminative model, and the model estimates the coarse prosodic structure. Finally this structure is refined by using a conventional framework for the parameter estimation. Experimental results demonstrate that the proposed method improves the estimation accuracy by 18% in terms of detection rate without using any auxiliary features at inference.
Student Information

Student Events

Travel Grants