Syed Tasnimul Karim Ayon1,#, Farhan Md. Siraj1,#, Jia Uddin2,*
CMC-Computers, Materials & Continua, Vol.82, No.1, pp. 499-520, 2025, DOI:10.32604/cmc.2025.058361
- 03 January 2025
Abstract This study investigates the application of Learnable Memory Vision Transformers (LMViT) for detecting metal surface flaws, comparing their performance with traditional CNNs, specifically ResNet18 and ResNet50, as well as other transformer-based models including Token to Token ViT, ViT without memory, and Parallel ViT. Leveraging a widely-used steel surface defect dataset, the research applies data augmentation and t-distributed stochastic neighbor embedding (t-SNE) to enhance feature extraction and understanding. These techniques mitigated overfitting, stabilized training, and improved generalization capabilities. The LMViT model achieved a test accuracy of 97.22%, significantly outperforming ResNet18 (88.89%) and ResNet50 (88.90%), as well… More >