Omar
Alqahtani, Mohamed Ghouse*, Asfia Sabahath, Omer Bin Hussain, Arshiya Begum
CMC-Computers, Materials & Continua, Vol.83, No.2, pp. 2221-2244, 2025, DOI:10.32604/cmc.2025.061977
- 16 April 2025
Abstract This paper introduces a novel method for medical image retrieval and classification by integrating a multi-scale encoding mechanism with Vision Transformer (ViT) architectures and a dynamic multi-loss function. The multi-scale encoding significantly enhances the model’s ability to capture both fine-grained and global features, while the dynamic loss function adapts during training to optimize classification accuracy and retrieval performance. Our approach was evaluated on the ISIC-2018 and ChestX-ray14 datasets, yielding notable improvements. Specifically, on the ISIC-2018 dataset, our method achieves an F1-Score improvement of +4.84% compared to the standard ViT, with a precision increase of +5.46% More >