Token Masked Pose Transformers Are Efficient Learners

Xinyi Song; Haixiang Zhang; Shaohua Li

doi:10.32604/cmc.2025.059006

Open Access icon Open Access

ARTICLE

Token Masked Pose Transformers Are Efficient Learners

Xinyi Song¹, Haixiang Zhang^1,*, Shaohua Li²

1 School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, 310018, China
2 College of Artificial Intelligence, Nankai University, Tianjin, 300350, China

* Corresponding Author: Haixiang Zhang. Email: email

(This article belongs to the Special Issue: Computer Vision and Image Processing: Feature Selection, Image Enhancement and Recognition)

Computers, Materials & Continua 2025, 83(2), 2735-2750. https://doi.org/10.32604/cmc.2025.059006

Received 25 September 2024; Accepted 10 January 2025; Issue published 16 April 2025

Abstract

In recent years, Transformer has achieved remarkable results in the field of computer vision, with its built-in attention layers effectively modeling global dependencies in images by transforming image features into token forms. However, Transformers often face high computational costs when processing large-scale image data, which limits their feasibility in real-time applications. To address this issue, we propose Token Masked Pose Transformers (TMPose), constructing an efficient Transformer network for pose estimation. This network applies semantic-level masking to tokens and employs three different masking strategies to optimize model performance, aiming to reduce computational complexity. Experimental results show that TMPose reduces computational complexity by 61.1% on the COCO validation dataset, with negligible loss in accuracy. Additionally, our performance on the MPII dataset is also competitive. This research not only enhances the accuracy of pose estimation but also significantly reduces the demand for computational resources, providing new directions for further studies in this field. Code is available at: (accessed on 9 January 2025).

Keywords

Pattern recognition; image processing; neural network; pose transformer

Cite This Article

APA Style

Song, X., Zhang, H., Li, S. (2025). Token Masked Pose Transformers Are Efficient Learners. Computers, Materials & Continua, 83(2), 2735–2750. https://doi.org/10.32604/cmc.2025.059006

Vancouver Style

Song X, Zhang H, Li S. Token Masked Pose Transformers Are Efficient Learners. Comput Mater Contin. 2025;83(2):2735–2750. https://doi.org/10.32604/cmc.2025.059006

IEEE Style

X. Song, H. Zhang, and S. Li, “Token Masked Pose Transformers Are Efficient Learners,” Comput. Mater. Contin., vol. 83, no. 2, pp. 2735–2750, 2025. https://doi.org/10.32604/cmc.2025.059006

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Token Masked Pose Transformers Are Efficient Learners

Abstract

Keywords

Cite This Article

399

196

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link