John Mendonca(INESC-ID/Instituto Superior Técnico), Francisco Teixeira(INESC-ID/Instituto Superior Técnico, Universidade de Lisboa), Isabel Trancoso(INESC-ID / IST Univ. Lisbon) and Alberto Abad(INESC-ID/IST)
This paper presents our contribution to the INTERSPEECH 2020 Breathing Sub-challenge. Besides fulfilling the main goal of the challenge, which involves the automatic prediction from conversational speech of the breath signals obtained from respiratory belts, we also analyse both original and predicted signals in an attempt to overcome the main pitfalls of the proposed systems. In particular, we identify the subsets of most irregular belt signals which yield the worst performance, measured by the Pearson correlation coefficient, and show how they affect the results that were obtained by both the baseline end-to-end system and variants such as a Bidirectional LSTM. The performance of this type of architecture indicates that future information is also relevant when predicting breathing patterns.
We also study the information retained from the AM-FM decomposition of the speech signal for this purpose, showing how the AM component significantly outperforms the FM component on all experiments, but fails to surpass the prediction results obtained using the original speech signal.
Finally, we validate the system’s performance in video-conferencing conditions by using data augmentation and compare clinically relevant parameters, such as breathing rate, from both the original belt signals and the ones predicted from the simulated video-conferencing signals.