Open Access iconOpen Access

ARTICLE

Token Masked Pose Transformers Are Efficient Learners

Xinyi Song1, Haixiang Zhang1,*, Shaohua Li2

1 School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310018, China
2 College of Artificial Intelligence, Nankai University, Tianjin, 300350, China

* Corresponding Author: Haixiang Zhang. Email: email

(This article belongs to the Special Issue: Computer Vision and Image Processing: Feature Selection, Image Enhancement and Recognition)

Computers, Materials & Continua 2025, 83(2), 2735-2750. https://doi.org/10.32604/cmc.2025.059006

Abstract

In recent years, Transformer has achieved remarkable results in the field of computer vision, with its built-in attention layers effectively modeling global dependencies in images by transforming image features into token forms. However, Transformers often face high computational costs when processing large-scale image data, which limits their feasibility in real-time applications. To address this issue, we propose Token Masked Pose Transformers (TMPose), constructing an efficient Transformer network for pose estimation. This network applies semantic-level masking to tokens and employs three different masking strategies to optimize model performance, aiming to reduce computational complexity. Experimental results show that TMPose reduces computational complexity by 61.1% on the COCO validation dataset, with negligible loss in accuracy. Additionally, our performance on the MPII dataset is also competitive. This research not only enhances the accuracy of pose estimation but also significantly reduces the demand for computational resources, providing new directions for further studies in this field. Code is available at: (accessed on 9 January 2025).

Keywords

Pattern recognition; image processing; neural network; pose transformer

Cite This Article

APA Style
Song, X., Zhang, H., Li, S. (2025). Token Masked Pose Transformers Are Efficient Learners. Computers, Materials & Continua, 83(2), 2735–2750. https://doi.org/10.32604/cmc.2025.059006
Vancouver Style
Song X, Zhang H, Li S. Token Masked Pose Transformers Are Efficient Learners. Comput Mater Contin. 2025;83(2):2735–2750. https://doi.org/10.32604/cmc.2025.059006
IEEE Style
X. Song, H. Zhang, and S. Li, “Token Masked Pose Transformers Are Efficient Learners,” Comput. Mater. Contin., vol. 83, no. 2, pp. 2735–2750, 2025. https://doi.org/10.32604/cmc.2025.059006



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 399

    View

  • 196

    Download

  • 0

    Like

Share Link