A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data

Sameema Tariq; Ata-Ur- Rehman; Maria Abubakar; Waseem Iqbal; Hatoon Alsagri; Yousef Alduraywish; Haya Abdullah

doi:10.32604/cmc.2024.055787

Open Access icon Open Access

ARTICLE

A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data

Sameema Tariq¹, Ata-Ur- Rehman^2,3, Maria Abubakar², Waseem Iqbal⁴, Hatoon S. Alsagri⁵, Yousef A. Alduraywish⁵, Haya Abdullah A. Alhakbani^5,*

1 Department of Electrical Engineering, University of Engineering and Technology, Lahore, 54890, Pakistan
2 Department of Electrical Engineering, National University of Science and Technology, National University of Sciences and Technology, Islamabad, 24090, Pakistan
3 Department of Business and Computing, Ravensbourne University London, Ravensbourne University, London, SE10 0EW, England
4 Electrical and Computer Engineering Department, College of Engineering, Sultan Qaboos University, Muscat, 123, Oman
5 College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11673, Saudi Arabia

* Corresponding Author: Haya Abdullah A. Alhakbani. Email: email

Computers, Materials & Continua 2024, 81(2), 2493-2515. https://doi.org/10.32604/cmc.2024.055787

Received 07 July 2024; Accepted 19 September 2024; Issue published 18 November 2024

Abstract

In video surveillance, anomaly detection requires training machine learning models on spatio-temporal video sequences. However, sometimes the video-only data is not sufficient to accurately detect all the abnormal activities. Therefore, we propose a novel audio-visual spatiotemporal autoencoder specifically designed to detect anomalies for video surveillance by utilizing audio data along with video data. This paper presents a competitive approach to a multi-modal recurrent neural network for anomaly detection that combines separate spatial and temporal autoencoders to leverage both spatial and temporal features in audio-visual data. The proposed model is trained to produce low reconstruction error for normal data and high error for abnormal data, effectively distinguishing between the two and assigning an anomaly score. Training is conducted on normal datasets, while testing is performed on both normal and anomalous datasets. The anomaly scores from the models are combined using a late fusion technique, and a deep dense layer model is trained to produce decisive scores indicating whether a sequence is normal or anomalous. The model’s performance is evaluated on the University of California, San Diego Pedestrian 2 (UCSD PED 2), University of Minnesota (UMN), and Tampere University of Technology (TUT) Rare Sound Events datasets using six evaluation metrics. It is compared with state-of-the-art methods depicting a high Area Under Curve (AUC) and a low Equal Error Rate (EER), achieving an (AUC) of 93.1 and an (EER) of 8.1 for the (UCSD) dataset, and an (AUC) of 94.9 and an (EER) of 5.9 for the UMN dataset. The evaluations demonstrate that the joint results from the combined audio-visual model outperform those from separate models, highlighting the competitive advantage of the proposed multi-modal approach.

Keywords

Acoustic-visual anomaly detection; sequence-to-sequence autoencoder; reconstruction error; late fusion; regularity score

Cite This Article

APA Style

Tariq, S., Rehman, A., Abubakar, M., Iqbal, W., Alsagri, H.S. et al. (2024). A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data. Computers, Materials & Continua, 81(2), 2493–2515. https://doi.org/10.32604/cmc.2024.055787

Vancouver Style

Tariq S, Rehman A, Abubakar M, Iqbal W, Alsagri HS, Alduraywish YA, et al. A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data. Comput Mater Contin. 2024;81(2):2493–2515. https://doi.org/10.32604/cmc.2024.055787

IEEE Style

S. Tariq et al., “A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data,” Comput. Mater. Contin., vol. 81, no. 2, pp. 2493–2515, 2024. https://doi.org/10.32604/cmc.2024.055787

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Recurrent Neural Network for Multimodal Anomaly Detection by Using Spatio-Temporal Audio-Visual Data

Abstract

Keywords

Cite This Article

946

341

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link