Open Access

ARTICLE

A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition

Sameena Javaid*, Safdar Rizvi
Department of Computer Sciences, School of Engineering and Applied Sciences, Bahria University, Karachi Campus, Karachi, Pakistan
* Corresponding Author: Sameena Javaid. Email:

Computers, Materials & Continua 2023, 74(1), 523-537. https://doi.org/10.32604/cmc.2023.031924

Received 30 April 2022; Accepted 22 June 2022; Issue published 22 September 2022

Abstract

Sign language fills the communication gap for people with hearing and speaking ailments. It includes both visual modalities, manual gestures consisting of movements of hands, and non-manual gestures incorporating body movements including head, facial expressions, eyes, shoulder shrugging, etc. Previously both gestures have been detected; identifying separately may have better accuracy, but much communicational information is lost. A proper sign language mechanism is needed to detect manual and non-manual gestures to convey the appropriate detailed message to others. Our novel proposed system contributes as Sign Language Action Transformer Network (SLATN), localizing hand, body, and facial gestures in video sequences. Here we are expending a Transformer-style structural design as a “base network” to extract features from a spatiotemporal domain. The model impulsively learns to track individual persons and their action context in multiple frames. Furthermore, a “head network” emphasizes hand movement and facial expression simultaneously, which is often crucial to understanding sign language, using its attention mechanism for creating tight bounding boxes around classified gestures. The model’s work is later compared with the traditional identification methods of activity recognition. It not only works faster but achieves better accuracy as well. The model achieves overall 82.66% testing accuracy with a very considerable performance of computation with 94.13 Giga-Floating Point Operations per Second (G-FLOPS). Another contribution is a newly created dataset of Pakistan Sign Language for Manual and Non-Manual (PkSLMNM) gestures.

Keywords

Sign language; gesture recognition; manual signs; non-manual signs; action transformer network

Cite This Article

S. Javaid and S. Rizvi, "A novel action transformer network for hybrid multimodal sign language recognition," Computers, Materials & Continua, vol. 74, no.1, pp. 523–537, 2023.



This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 180

    View

  • 145

    Download

  • 0

    Like

Share Link

WeChat scan