Steel Surface Defect Detection Using Learnable Memory Vision Transformer

Syed Tasnimul; Farhan Siraj; Jia Uddin

doi:10.32604/cmc.2025.058361

Open Access icon Open Access

ARTICLE

Steel Surface Defect Detection Using Learnable Memory Vision Transformer

Syed Tasnimul Karim Ayon^1,#, Farhan Md. Siraj^1,#, Jia Uddin^2,*

1 Department of Computer Science and Engineering, BRAC University, Dhaka, 1212, Bangladesh
2 Department of AI and Big Data, Endicott College, Woosong University, Daejeon, 34606, Republic of Korea

* Corresponding Author: Jia Uddin. Email: email
# These authors contributed equally to this work

(This article belongs to the Special Issue: Advancements in Machine Fault Diagnosis and Prognosis: Data-Driven Approaches and Autonomous Systems)

Computers, Materials & Continua 2025, 82(1), 499-520. https://doi.org/10.32604/cmc.2025.058361

Received 10 September 2024; Accepted 17 December 2024; Issue published 03 January 2025

Abstract

This study investigates the application of Learnable Memory Vision Transformers (LMViT) for detecting metal surface flaws, comparing their performance with traditional CNNs, specifically ResNet18 and ResNet50, as well as other transformer-based models including Token to Token ViT, ViT without memory, and Parallel ViT. Leveraging a widely-used steel surface defect dataset, the research applies data augmentation and t-distributed stochastic neighbor embedding (t-SNE) to enhance feature extraction and understanding. These techniques mitigated overfitting, stabilized training, and improved generalization capabilities. The LMViT model achieved a test accuracy of 97.22%, significantly outperforming ResNet18 (88.89%) and ResNet50 (88.90%), as well as the Token to Token ViT (88.46%), ViT without memory (87.18), and Parallel ViT (91.03%). Furthermore, LMViT exhibited superior training and validation performance, attaining a validation accuracy of 98.2% compared to 91.0% for ResNet18, 96.0% for ResNet50, and 89.12%, 87.51%, and 91.21% for Token to Token ViT, ViT without memory, and Parallel ViT, respectively. The findings highlight the LMViT’s ability to capture long-range dependencies in images, an area where CNNs struggle due to their reliance on local receptive fields and hierarchical feature extraction. The additional transformer-based models also demonstrate improved performance in capturing complex features over CNNs, with LMViT excelling particularly at detecting subtle and complex defects, which is critical for maintaining product quality and operational efficiency in industrial applications. For instance, the LMViT model successfully identified fine scratches and minor surface irregularities that CNNs often misclassify. This study not only demonstrates LMViT’s potential for real-world defect detection but also underscores the promise of other transformer-based architectures like Token to Token ViT, ViT without memory, and Parallel ViT in industrial scenarios where complex spatial relationships are key. Future research may focus on enhancing LMViT’s computational efficiency for deployment in real-time quality control systems.

Keywords

Learnable Memory Vision Transformer (LMViT); Convolutional Neural Networks (CNN); metal surface defect detection; deep learning, computer vision; image classification; learnable memory; gradient clipping; label smoothing; t-SNE visualization

Cite This Article

APA Style

Ayon, S.T.K., Siraj, F.M., Uddin, J. (2025). Steel Surface Defect Detection Using Learnable Memory Vision Transformer. Computers, Materials & Continua, 82(1), 499–520. https://doi.org/10.32604/cmc.2025.058361

Vancouver Style

Ayon STK, Siraj FM, Uddin J. Steel Surface Defect Detection Using Learnable Memory Vision Transformer. Comput Mater Contin. 2025;82(1):499–520. https://doi.org/10.32604/cmc.2025.058361

IEEE Style

S. T. K. Ayon, F. M. Siraj, and J. Uddin, “Steel Surface Defect Detection Using Learnable Memory Vision Transformer,” Comput. Mater. Contin., vol. 82, no. 1, pp. 499–520, 2025. https://doi.org/10.32604/cmc.2025.058361

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Steel Surface Defect Detection Using Learnable Memory Vision Transformer

Abstract

Keywords

Cite This Article

3282

1362

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link