Open Access
ARTICLE
Multi-Modality Video Representation for Action Recognition
Chao Zhu1, Yike Wang1, Dongbing Pu1,Miao Qi1,*, Hui Sun2,*, Lei Tan3,*
1 College of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
2 Institute for Intelligent Elderlycare, College of Humanities and Sciences of Northeast Normal University, Changchun, 130117, China
3 College of Humanities and information, Changchun University of Technology, Changchun, 130012, China
* Corresponding Author: Miao Qi. Email:
Journal on Big Data 2020, 2(3), 95-104. https://doi.org/10.32604/jbd.2020.010431
Received 20 May 2020; Accepted 10 September 2020; Issue published 13 October 2020
Abstract
Nowadays, action recognition is widely applied in many fields.
However, action is hard to define by single modality information. The difference
between image recognition and action recognition is that action recognition
needs more modality information to depict one action, such as the appearance,
the motion and the dynamic information. Due to the state of action evolves with
the change of time, motion information must be considered when representing an
action. Most of current methods define an action by spatial information and
motion information. There are two key elements of current action recognition
methods: spatial information achieved by sampling sparsely on video frames’
sequence and the motion content mostly represented by the optical flow which is
calculated on consecutive video frames. However, the relevance between them in
current methods is weak. Therefore, to strengthen the associativity, this paper
presents a new architecture consisted of three streams to obtain multi-modality
information. The advantages of our network are: (a) We propose a new sampling
approach to sample evenly on the video sequence for acquiring the appearance
information; (b) We utilize ResNet101 for gaining high-level and distinguished
features; (c) We advance a three-stream architecture to capture temporal, spatial
and dynamic information. Experimental results on UCF101 dataset illustrate that
our method outperforms other previous methods.
Keywords
Cite This Article
C. Zhu, Y. Wang, D. Pu, M. Qi, H. Sun
et al., "Multi-modality video representation for action recognition,"
Journal on Big Data, vol. 2, no.3, pp. 95–104, 2020. https://doi.org/10.32604/jbd.2020.010431
Citations