Multi-Modality Video Representation for Action Recognition

Chao Zhu; Yike Wang; Dongbing Pu; Miao Qi; Hui Sun; Lei Tan

doi:10.32604/jbd.2020.010431

Open Access icon Open Access

ARTICLE

Multi-Modality Video Representation for Action Recognition

Chao Zhu¹, Yike Wang¹, Dongbing Pu¹,Miao Qi^1,*, Hui Sun^2,*, Lei Tan^3,*

1 College of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
2 Institute for Intelligent Elderlycare, College of Humanities and Sciences of Northeast Normal University, Changchun, 130117, China
3 College of Humanities and information, Changchun University of Technology, Changchun, 130012, China

* Corresponding Author: Miao Qi. Email: email

Journal on Big Data 2020, 2(3), 95-104. https://doi.org/10.32604/jbd.2020.010431

Received 20 May 2020; Accepted 10 September 2020; Issue published 13 October 2020

Download PDF

Abstract

Nowadays, action recognition is widely applied in many fields. However, action is hard to define by single modality information. The difference between image recognition and action recognition is that action recognition needs more modality information to depict one action, such as the appearance, the motion and the dynamic information. Due to the state of action evolves with the change of time, motion information must be considered when representing an action. Most of current methods define an action by spatial information and motion information. There are two key elements of current action recognition methods: spatial information achieved by sampling sparsely on video frames’ sequence and the motion content mostly represented by the optical flow which is calculated on consecutive video frames. However, the relevance between them in current methods is weak. Therefore, to strengthen the associativity, this paper presents a new architecture consisted of three streams to obtain multi-modality information. The advantages of our network are: (a) We propose a new sampling approach to sample evenly on the video sequence for acquiring the appearance information; (b) We utilize ResNet101 for gaining high-level and distinguished features; (c) We advance a three-stream architecture to capture temporal, spatial and dynamic information. Experimental results on UCF101 dataset illustrate that our method outperforms other previous methods.

Keywords

Action recognition; dynamic; appearance; spatial; motion; ResNet101; UCF101

Cite This Article

APA Style

Zhu, C., Wang, Y., Pu, D., Qi, M., Sun, H. et al. (2020). Multi-Modality Video Representation for Action Recognition. Journal on Big Data, 2(3), 95–104. https://doi.org/10.32604/jbd.2020.010431

Vancouver Style

Zhu C, Wang Y, Pu D, Qi M, Sun H, Tan L. Multi-Modality Video Representation for Action Recognition. J Big Data. 2020;2(3):95–104. https://doi.org/10.32604/jbd.2020.010431

IEEE Style

C. Zhu, Y. Wang, D. Pu, M. Qi, H. Sun, and L. Tan, “Multi-Modality Video Representation for Action Recognition,” J. Big Data, vol. 2, no. 3, pp. 95–104, 2020. https://doi.org/10.32604/jbd.2020.010431

BibTex EndNote RIS

Citations

1

[click to view]

Copyright © 2020 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Multi-Modality Video Representation for Action Recognition

Abstract

Keywords

Cite This Article

Citations

2377

1397

2

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link