Open AccessOpen Access


Criss-Cross Attention Based Auto Encoder for Video Anomaly Event Detection

Jiaqi Wang1, Jie Zhang2, Genlin Ji2,*, Bo Sheng3

1 School of Mathematical Sciences, Nanjing Normal University, Nanjing, 210023, China
2 School of Computer and Electronic Information, Nanjing Normal University, Nanjing, 210023, China
3 Department of Computer Science, University of Massachusetts Boston, Boston, 02125, USA

* Corresponding Author: Genlin Ji. Email:

Intelligent Automation & Soft Computing 2022, 34(3), 1629-1642.


The surveillance applications generate enormous video data and present challenges to video analysis for huge human labor cost. Reconstruction-based convolutional autoencoders have achieved great success in video anomaly detection for their ability of automatically detecting abnormal event. The approaches learn normal patterns only with the normal data in an unsupervised way due to the difficulty of collecting anomaly samples and obtaining anomaly annotations. But convolutional autoencoders have limitations in global feature extraction for the local receptive field of convolutional kernels. What is more, 2-dimensional convolution lacks the capability of capturing temporal information while videos change over time. In this paper, we propose a method established on Criss-Cross attention based AutoEncoder (CCAE) for capturing global visual features of sequential video frames. The method utilizes Criss-Cross attention based encoder to extract global appearance features. Another Criss-Cross attention module is embedded into bi-directional convolutional long short-term memory in hidden layer to explore global temporal features between frames. A decoder is executed to fuse global appearance and temporal features and reconstruct the frames. We perform extensive experiments on two public datasets UCSD Ped2 and CUHK Avenue. The experimental results demonstrate that CCAE achieves superior detection accuracy compared with other video anomaly detection approaches.


Cite This Article

J. Wang, J. Zhang, G. Ji and B. Sheng, "Criss-cross attention based auto encoder for video anomaly event detection," Intelligent Automation & Soft Computing, vol. 34, no.3, pp. 1629–1642, 2022.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 622


  • 278


  • 0


Share Link

WeChat scan