Open Access
ARTICLE
3-Dimensional Bag of Visual Words Framework on Action Recognition
Shiqi Wang1, Yimin Yang1, *, Ruizhong Wei1, Qingming Jonathan Wu2
1 Department of Computer Science, Lakehead University, Thunder Bay, Canada.
2 Department of Electrical and Computer Engineering, University of Windsor, Windsor, Canada.
* Corresponding Author: Yimin Yang. Email: .
Computers, Materials & Continua 2020, 63(3), 1081-1091. https://doi.org/10.32604/cmc.2020.09648
Received 13 January 2020; Accepted 26 March 2020; Issue published 30 April 2020
Abstract
Human motion recognition plays a crucial role in the video analysis
framework. However, a given video may contain a variety of noises, such as an unstable
background and redundant actions, that are completely different from the key actions.
These noises pose a great challenge to human motion recognition. To solve this problem,
we propose a new method based on the 3-Dimensional (3D) Bag of Visual Words
(BoVW) framework. Our method includes two parts: The first part is the video action
feature extractor, which can identify key actions by analyzing action features. In the
video action encoder, by analyzing the action characteristics of a given video, we use the
deep 3D CNN pre-trained model to obtain expressive coding information. A classifier
with subnetwork nodes is used for the final classification. The extensive experiments
demonstrate that our method leads to an impressive effect on complex video analysis.
Our approach achieves state-of-the-art performance on the datasets of UCF101 (85.3%)
and HMDB51 (54.5%).
Keywords
Cite This Article
S. Wang, Y. Yang, R. Wei and Q. Jonathan Wu, "3-dimensional bag of visual words framework on action recognition,"
Computers, Materials & Continua, vol. 63, no.3, pp. 1081–1091, 2020.