Home / Advanced Search

  • Title/Keywords

  • Author/Affliations

  • Journal

  • Article Type

  • Start Year

  • End Year

Update SearchingClear
  • Articles
  • Online
Search Results (2)
  • Open Access

    ARTICLE

    TECMH: Transformer-Based Cross-Modal Hashing For Fine-Grained Image-Text Retrieval

    Qiqi Li1, Longfei Ma1, Zheng Jiang1, Mingyong Li1,*, Bo Jin2

    CMC-Computers, Materials & Continua, Vol.75, No.2, pp. 3713-3728, 2023, DOI:10.32604/cmc.2023.037463 - 31 March 2023

    Abstract In recent years, cross-modal hash retrieval has become a popular research field because of its advantages of high efficiency and low storage. Cross-modal retrieval technology can be applied to search engines, cross-modal medical processing, etc. The existing main method is to use a multi-label matching paradigm to finish the retrieval tasks. However, such methods do not use fine-grained information in the multi-modal data, which may lead to sub-optimal results. To avoid cross-modal matching turning into label matching, this paper proposes an end-to-end fine-grained cross-modal hash retrieval method, which can focus more on the fine-grained semantic… More >

  • Open Access

    ARTICLE

    ViT2CMH: Vision Transformer Cross-Modal Hashing for Fine-Grained Vision-Text Retrieval

    Mingyong Li, Qiqi Li, Zheng Jiang, Yan Ma*

    Computer Systems Science and Engineering, Vol.46, No.2, pp. 1401-1414, 2023, DOI:10.32604/csse.2023.034757 - 09 February 2023

    Abstract In recent years, the development of deep learning has further improved hash retrieval technology. Most of the existing hashing methods currently use Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to process image and text information, respectively. This makes images or texts subject to local constraints, and inherent label matching cannot capture fine-grained information, often leading to suboptimal results. Driven by the development of the transformer model, we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs. Specifically, we use a More >

Displaying 1-10 on page 1 of 2. Per Page