Open Access
ARTICLE
HgaNets: Fusion of Visual Data and Skeletal Heatmap for Human Gesture Action Recognition
1 School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
2 School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
* Corresponding Author: Xiaolong Xu. Email:
(This article belongs to the Special Issue: Machine Vision Detection and Intelligent Recognition)
Computers, Materials & Continua 2024, 79(1), 1089-1103. https://doi.org/10.32604/cmc.2024.047861
Received 20 November 2023; Accepted 04 March 2024; Issue published 25 April 2024
Abstract
Recognition of human gesture actions is a challenging issue due to the complex patterns in both visual and skeletal features. Existing gesture action recognition (GAR) methods typically analyze visual and skeletal data, failing to meet the demands of various scenarios. Furthermore, multi-modal approaches lack the versatility to efficiently process both uniform and disparate input patterns. Thus, in this paper, an attention-enhanced pseudo-3D residual model is proposed to address the GAR problem, called HgaNets. This model comprises two independent components designed for modeling visual RGB (red, green and blue) images and 3D skeletal heatmaps, respectively. More specifically, each component consists of two main parts: 1) a multi-dimensional attention module for capturing important spatial, temporal and feature information in human gestures; 2) a spatiotemporal convolution module that utilizes pseudo-3D residual convolution to characterize spatiotemporal features of gestures. Then, the output weights of the two components are fused to generate the recognition results. Finally, we conducted experiments on four datasets to assess the efficiency of the proposed model. The results show that the accuracy on four datasets reaches 85.40%, 91.91%, 94.70%, and 95.30%, respectively, as well as the inference time is 0.54 s and the parameters is 2.74M. These findings highlight that the proposed model outperforms other existing approaches in terms of recognition accuracy.Keywords
Cite This Article
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.