Mon-1-9-8 Using Speaker-Aligned Graph Memory Block in Multimodally Attentive Emotion Recognition Network

Jeng-Lin Li(Department of Electrical Engineering, National Tsing Hua University) and Chi-Chun Lee(Department of Electrical Engineering, National Tsing Hua University)

Abstract: Integrating multimodal emotion sensing modules in realizing human-centered technologies is rapidly growing. Despite recent advancement of deep architectures in improving recognition performances, inability to handle individual differences in the expressive cues creates a major hurdle for real world applications. In this work, we propose a Speaker-aligned Graph Memory Network (SaGMN) that leverages the use of speaker embedding learned from a large speaker verification network to characterize such an individualized personal difference across speakers. Specifically, the learning of the gated memory block is jointly optimized with a speaker graph encoder which aligns similar vocal characteristics samples together while effectively enlarge the discrimination across emotion classes. We evaluate our multimodal emotion recognition network on the CMUMOSEI database and achieve a state-of-art accuracy of 65.1% UAR and 74.7% F1 score. Further visualization experiments demonstrate the effect of speaker space alignment with the use of graph memory blocks.

Paper

prev Mon-1-9-7 Multi-modal embeddings using multi-task learning for emotion recognition

next Mon-1-9-9 Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition

About

About the Conference

Welcome from the Chair

Conference Committees

Calls