TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Li, Qiqi; Ma, Longfei; Jiang, Zheng; Li, Mingyong; Jin, Bo

doi:10.32604/cmc.2023.037463

Open Access icon Open Access

ARTICLE

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

by Qiqi Li¹, Longfei Ma¹, Zheng Jiang¹, Mingyong Li^1,*, Bo Jin²

1 College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China
2 Institute of Systems and Robotics (ISR), Department of Electrical and Computer Engineering (DEEC),University of Coimbra, Coimbra, Portugal

* Corresponding Author: Mingyong Li. Email: email

Computers, Materials & Continua 2023, 75(2), 3713-3728. https://doi.org/10.32604/cmc.2023.037463

Received 04 November 2022; Accepted 30 January 2023; Issue published 31 March 2023

Abstract

In recent years, cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage. Cross-modal retrieval technology can be applied to search engines, cross-modal medical processing, etc. The existing main method is to use a multi-label matching paradigm to finish the retrieval tasks. However, such methods do not use fine-grained information in the multi-modal data, which may lead to sub-optimal results. To avoid cross-modal matching turning into label matching, this paper proposes an end-to-end fine-grained cross-modal hash retrieval method, which can focus more on the fine-grained semantic information of multi-modal data. First, the method refines the image features and no longer uses multiple labels to represent text features but uses BERT for processing. Second, this method uses the inference capabilities of the transformer encoder to generate global fine-grained features. Finally, in order to better judge the effect of the fine-grained model, this paper uses the datasets in the image text matching field instead of the traditional label-matching datasets. This article experiment on Microsoft COCO (MS-COCO) and Flickr30K datasets and compare it with the previous classical methods. The experimental results show that this method can obtain more advanced results in the cross-modal hash retrieval field.

Keywords

Deep learning; cross-modal retrieval; hash learning; transformer

Cite This Article

APA Style

Li, Q., Ma, L., Jiang, Z., Li, M., Jin, B. (2023). TECMH: transformer-based cross-modal hashing for fine-grained image-text retrieval. Computers, Materials & Continua, 75(2), 3713-3728. https://doi.org/10.32604/cmc.2023.037463

Vancouver Style

Li Q, Ma L, Jiang Z, Li M, Jin B. TECMH: transformer-based cross-modal hashing for fine-grained image-text retrieval. Comput Mater Contin. 2023;75(2):3713-3728 https://doi.org/10.32604/cmc.2023.037463

IEEE Style

Q. Li, L. Ma, Z. Jiang, M. Li, and B. Jin, “TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval,” Comput. Mater. Contin., vol. 75, no. 2, pp. 3713-3728, 2023. https://doi.org/10.32604/cmc.2023.037463

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

Abstract

Keywords

Cite This Article

885

1029

3

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link