Tatsuya Kawahara (Kyoto University, Japan), Kristiina Jokinen (AI Research Center AIST Tokyo Waterfront, Japan)
While smartphone assistants and smart speakers are prevalent and there are high expectations for social communicative robots, spoken language interaction with these kinds of robots is not effectively deployed. This tutorial aims to give an overview of the issues and challenges related to the integration of natural multimodal dialogue processing for social robots. We first outline dialogue tasks and interfaces suitable for robots in comparison with the conventional dialogue systems and virtual agents. Then, challenges and approaches in the component technologies including ASR, TTS, SLU and dialogue management are reviewed with the focus on human-robot interaction. Issues related to multimodal processing are also addressed. In particular, we review non-verbal processing, including gaze and gesturing, for facilitating turn-taking, timing of backchannels, and indicating troubles in interaction. Finally, we will also briefly discuss open questions concerning architectures for integrating spoken dialogue systems and human-robot interaction.