Open Access iconOpen Access

ARTICLE

CMMCAN: Lightweight Feature Extraction and Matching Network for Endoscopic Images Based on Adaptive Attention

Nannan Chong1,2,*, Fan Yang1

1 School of Electronic and Information Engineering, Hebei University of Technology, Tianjin, 300401, China
2 School of Information and Intelligence Engineering, Tianjin Renai College, Tianjin, 301636, China

* Corresponding Author: Nannan Chong. Email: email

(This article belongs to the Special Issue: Multimodal Learning in Image Processing)

Computers, Materials & Continua 2024, 80(2), 2761-2783. https://doi.org/10.32604/cmc.2024.052217

Abstract

In minimally invasive surgery, endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities. However, in clinical operating environments, endoscopic images often suffer from challenges such as low texture, uneven illumination, and non-rigid structures, which affect feature observation and extraction. This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images, leading to treatment and postoperative recovery issues for patients. To address these challenges, this paper introduces, for the first time, a Cross-Channel Multi-Modal Adaptive Spatial Feature Fusion (ASFF) module based on the lightweight architecture of EfficientViT. Additionally, a novel lightweight feature extraction and matching network based on attention mechanism is proposed. This network dynamically adjusts attention weights for cross-modal information from grayscale images and optical flow images through a dual-branch Siamese network. It extracts static and dynamic information features ranging from low-level to high-level, and from local to global, ensuring robust feature extraction across different widths, noise levels, and blur scenarios. Global and local matching are performed through a multi-level cascaded attention mechanism, with cross-channel attention introduced to simultaneously extract low-level and high-level features. Extensive ablation experiments and comparative studies are conducted on the HyperKvasir, EAD, M2caiSeg, CVC-ClinicDB, and UCL synthetic datasets. Experimental results demonstrate that the proposed network improves upon the baseline EfficientViT-B3 model by 75.4% in accuracy (Acc), while also enhancing runtime performance and storage efficiency. When compared with the complex DenseDescriptor feature extraction network, the difference in Acc is less than 7.22%, and IoU calculation results on specific datasets outperform complex dense models. Furthermore, this method increases the F1 score by 33.2% and accelerates runtime by 70.2%. It is noteworthy that the speed of CMMCAN surpasses that of comparative lightweight models, with feature extraction and matching performance comparable to existing complex models but with faster speed and higher cost-effectiveness.

Keywords


Cite This Article

APA Style
Chong, N., Yang, F. (2024). CMMCAN: lightweight feature extraction and matching network for endoscopic images based on adaptive attention. Computers, Materials & Continua, 80(2), 2761-2783. https://doi.org/10.32604/cmc.2024.052217
Vancouver Style
Chong N, Yang F. CMMCAN: lightweight feature extraction and matching network for endoscopic images based on adaptive attention. Comput Mater Contin. 2024;80(2):2761-2783 https://doi.org/10.32604/cmc.2024.052217
IEEE Style
N. Chong and F. Yang, “CMMCAN: Lightweight Feature Extraction and Matching Network for Endoscopic Images Based on Adaptive Attention,” Comput. Mater. Contin., vol. 80, no. 2, pp. 2761-2783, 2024. https://doi.org/10.32604/cmc.2024.052217



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 374

    View

  • 188

    Download

  • 0

    Like

Share Link