HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition

Wuyan Liang; Xiaolong Xu

doi:10.32604/cmc.2024.047861

Open Access icon Open Access

ARTICLE

HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition

Wuyan Liang¹, Xiaolong Xu^2,*

1 School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
2 School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

* Corresponding Author: Xiaolong Xu. Email: email

(This article belongs to the Special Issue: Machine Vision Detection and Intelligent Recognition)

Computers, Materials & Continua 2024, 79(1), 1089-1103. https://doi.org/10.32604/cmc.2024.047861

Received 20 November 2023; Accepted 04 March 2024; Issue published 25 April 2024

Abstract

Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual and skeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data, failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility to efficiently process both uniform and disparate input patterns. Thus, in this paper, an attention-enhanced pseudo-3D residual model is proposed to address the GAR problem, called HgaNets. This model comprises two independent components designed for modeling visual RGB (red, green and blue) images and 3D skeletal heatmaps, respectively. More specifically, each component consists of two main parts: 1) a multi-dimensional attention module for capturing important spatial, temporal and feature information in human gestures; 2) a spatiotemporal convolution module that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then, the output weights of the two components are fused to generate the recognition results. Finally, we conducted experiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy on four datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s and the parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approaches in terms of recognition accuracy.

Keywords

Gesture action recognition; multi-dimensional attention; pseudo-3D; skeletal heatmap

Cite This Article

APA Style

Liang, W., Xu, X. (2024). HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition. Computers, Materials & Continua, 79(1), 1089–1103. https://doi.org/10.32604/cmc.2024.047861

Vancouver Style

Liang W, Xu X. HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition. Comput Mater Contin. 2024;79(1):1089–1103. https://doi.org/10.32604/cmc.2024.047861

IEEE Style

W. Liang and X. Xu, “HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition,” Comput. Mater. Contin., vol. 79, no. 1, pp. 1089–1103, 2024. https://doi.org/10.32604/cmc.2024.047861

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition

Abstract

Keywords

Cite This Article

782

349

0

Further Information

Guidelines

Follow Us

Join Us

Share Link