Ioannis Douros(Université de Lorraine, CNRS, Inria, LORIA, Inserm, IADI, F-54000 Nancy, France), Ajinkya Kulkarni(Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Crysanthi Dourou(School of ECE, National Technical University of Athens, Athens 15773, Greece), Yu Xie(Department of Neurology, Zhongnan Hospital of Wuhan University, Wuhan 430071), Jacques Felblinger(Université de Lorraine, INSERM 1433, CIC-IT, CHRU de Nancy, F-54000 Nancy, France), Karyna Isaieva(IADI, Université de Lorraine, INSERM U1254), Pierre-André Vuissoz(Université de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France) and Yves Laprie(LORIA/CNRS)
Abstract:
In this work we present an algorithm for synthesising
pseudo rtMRI data of the vocal tract. rtMRI data on the midsagittal
plane were used to synthesise target consonant-vowel
(CV) using only a silence frame of the target speaker. For this
purpose, several single speaker models were created. The input
of the algorithm is a silence frame of both train and target
speaker and the rtMRI data of the target CV. An image transformation
is computed from each CV frame to the next one,
creating a set of transformations that describe the dynamics of
the CV production. Another image transformation is computed
from the silence frame of train speaker to the silence frame of
the target speaker and is used to adapt the set of transformations
computed previously to the target speaker. The adapted set of
transformations is applied to the silence of the target speaker to
synthesise his/her CV pseudo rtMRI data. Synthesised images
from multiple single speaker models are frame aligned and then
averaged to create the final version of synthesised images. Synthesised
images are compared with the original ones using image
cross-correlation. Results show good agreement between
the synthesised and the original images.