Wed-2-6-9 Multimodal Sign Language Recognition via Temporal Deformable Convolutional Sequence Learning

Katerina Papadimitriou(Electrical and Computer Engineering Department, University of Thessaly) and Gerasimos Potamianos(Electrical and Computer Engineering Department, University of Thessaly)

Abstract: In this paper we address the challenging problem of sign language recognition (SLR) from videos, introducing an end-to-end deep learning approach that relies on the fusion of a number of spatio-temporal feature streams, as well as a fully convolutional encoder-decoder for prediction. Specifically, we examine the contribution of optical flow, human skeletal features, as well as appearance features of handshapes and mouthing, in conjunction with a temporal deformable convolutional attention-based encoder-decoder for SLR. To our knowledge, this is the first use in this task of a fully convolutional multi-step attention-based encoder-decoder employing temporal deformable convolutional block structures. We conduct experiments on three sign language datasets and compare our approach to existing state-of-the-art SLR methods, demonstrating its superiority.

Paper

prev Wed-2-6-8 Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

next Wed-2-6-10 MLS: A Large-Scale Multilingual Dataset for Speech Research

About

About the Conference

Welcome from the Chair

Conference Committees

Calls