Mingyong Li, Qiqi Li, Zheng Jiang, Yan Ma*
Computer Systems Science and Engineering, Vol.46, No.2, pp. 1401-1414, 2023, DOI:10.32604/csse.2023.034757
- 09 February 2023
Abstract In recent years, the development of deep learning has further improved hash retrieval technology. Most of the existing hashing methods currently use Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to process image and text information, respectively. This makes images or texts subject to local constraints, and inherent label matching cannot capture fine-grained information, often leading to suboptimal results. Driven by the development of the transformer model, we propose a framework called ViT2CMH mainly based on the Vision Transformer to handle deep Cross-modal Hashing tasks rather than CNNs or RNNs. Specifically, we use a More >