Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

Kamil Yasen; Heyan Jin; Sijie Yang; Li Zhan; Xuyang Zhang; Ke Qin; Ye Li

doi:10.32604/cmc.2025.061037

Open Access icon Open Access

ARTICLE

Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

Kamil Yasen^1,#, Heyan Jin^2,#, Sijie Yang², Li Zhan², Xuyang Zhang², Ke Qin^1,3, Ye Li^2,3,*

1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
2 School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
3 Kashi Institute of Electronics and Information Industry, Kashi, 844508, China

* Corresponding Author: Ye Li. Email: email
# Both Kamil Yasen and Heyan Jin contributed equally to this work

Computers, Materials & Continua 2025, 83(2), 3277-3301. https://doi.org/10.32604/cmc.2025.061037

Received 15 November 2024; Accepted 09 January 2025; Issue published 16 April 2025

Abstract

Research on mass gathering events is critical for ensuring public security and maintaining social order. However, most of the existing works focus on crowd behavior analysis areas such as anomaly detection and crowd counting, and there is a relative lack of research on mass gathering behaviors. We believe real-time detection and monitoring of mass gathering behaviors are essential for migrating potential security risks and emergencies. Therefore, it is imperative to develop a method capable of accurately identifying and localizing mass gatherings before disasters occur, enabling prompt and effective responses. To address this problem, we propose an innovative Event-Driven Attention Network (EDAN), which achieves image-text matching in the scenario of mass gathering events with good results for the first time. Traditional image-text retrieval methods based on global alignment are difficult to capture the local details within complex scenes, limiting retrieval accuracy. While local alignment-based methods are more effective at extracting detailed features, they frequently process raw textual features directly, which often contain ambiguities and redundant information that can diminish retrieval efficiency and degrade model performance. To overcome these challenges, EDAN introduces an Event-Driven Attention Module that adaptively focuses attention on image regions or textual words relevant to the event type. By calculating the semantic distance between event labels and textual content, this module effectively significantly reduces computational complexity and enhances retrieval efficiency. To validate the effectiveness of EDAN, we construct a dedicated multimodal dataset tailored for the analysis of mass gathering events, providing a reliable foundation for subsequent studies. We conduct comparative experiments with other methods on our dataset, the experimental results demonstrate the effectiveness of EDAN. In the image-to-text retrieval task, EDAN achieved the best performance on the R@5 metric, while in the text-to-image retrieval task, it showed superior results on both R@10 and R@5 metrics. Additionally, EDAN excelled in the overall Rsum metric, achieving the best performance. Finally, ablation studies further verified the effectiveness of event-driven attention module.

Keywords

Mass gathering events; image-text retrieval; attention mechanism

Cite This Article

APA Style

Yasen, K., Jin, H., Yang, S., Zhan, L., Zhang, X. et al. (2025). Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events. Computers, Materials & Continua, 83(2), 3277–3301. https://doi.org/10.32604/cmc.2025.061037

Vancouver Style

Yasen K, Jin H, Yang S, Zhan L, Zhang X, Qin K, et al. Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events. Comput Mater Contin. 2025;83(2):3277–3301. https://doi.org/10.32604/cmc.2025.061037

IEEE Style

K. Yasen et al., “Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events,” Comput. Mater. Contin., vol. 83, no. 2, pp. 3277–3301, 2025. https://doi.org/10.32604/cmc.2025.061037

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Event-Driven Attention Network: A Cross-Modal Framework for Efficient Image-Text Retrieval in Mass Gathering Events

Abstract

Keywords

Cite This Article

265

130

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link