Special Issues
Table of Content

Recognition Tasks with Transformers

Submission Deadline: 01 July 2024 (closed) View: 354

Guest Editors

Prof. Huimin Lu, Kyushu Institute of Technology, Japan.
Prof. Jihua Zhu, Xi’an Jiaotong University, China.
Dr. Xing Xu, University of Electronic Science and Technology of China, China.
Dr. Yuchao Zheng, Kyushu Institute of Technology, Japan.

Summary

Pattern recognition (PR) is experiencing a revolutionary change with the rapid advancements in transformer-based methodologies, which hold the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Moreover, the inherently simple architecture of transformers enables the processing of diverse modalities (such as point cloud, images, videos, text, and speech) using similar processing blocks, thereby fostering the creation of robust and adaptable pattern recognition solutions. These advanced architectures have been increasingly applied in fields such as natural language processing, system identification, speech recognition and image recognition. The rising demand for sophisticated recognition solutions across multiple industries motivates researchers to explore innovative techniques and methodologies that leverage the potential of transformers.

 

However, recognition tasks with transformers present various challenges, including the necessity for robust feature extraction, the improvement of model interpretability, ensuring recognition robustness, effectively addressing dynamic environments, accommodating variations in scale and viewpoint, and devising real-time, scalable solutions. Addressing these complexities necessitates the development of innovative approaches and a profound understanding of the underlying principles to advance the state-of-the-art in pattern recognition research.

 

The objective of this special issue is to establish a forum for researchers to share their most recent discoveries and advancements in recognition tasks using Transformer-based methodologies. In order to overcome the challenges, groundbreaking methods and theoretical insights are necessary to push the boundaries of recognition tasks for transformer-based applications. We invite original research papers that tackle fundamental problems within this domain. Additionally, we welcome survey papers that assess the current state of progress and challenges in recognition tasks employing transformer architectures. Researchers from fields such as computer science, engineering, mathematics, physics, and other related disciplines are encouraged to submit their research findings. This interdisciplinary approach will foster the exchange of ideas and stimulate the evolution of state-of-the-art recognition technologies, ultimately driving progress in pattern recognition and its practical applications in the real world. Therefore, the following subtopics are the particular interests of this special issue, including but not limited to:

• Transformer-based real-time recognition applications in areas like 2D/3D object detection, video analytics, and action identification.

• Transformer-centric approaches for complex vision problems, such as recognition, panoptic segmentation, multi-object tracking, and pose estimation.

• Transformer-centric approaches for fundamental vision challenges, such as image super-resolution, de-blurring, de-raining, de-noising and colorization.

• Advanced transformer architectures designed specifically for the handling of spatial (image) and temporal (video) data.

• Transformer models tailored for the processing of 3D data types, such as volumetric, mesh, and point-cloud data.

• Fine-tuning methodologies of transformer models for improved performance in recognition tasks.

• Optimization of multi-modal recognition through the innovative use of transformers.

• Unsupervised, weakly supervised, and semi-supervised learning with transformer models.

• Construction of interpretability methods based on transformer's attention mechanism.

• Multi-modal transformer models designed to foster better understanding of image-text interactions.


Keywords

Pattern Recognition, Artificial Intelligence, Deep Learning, Transfer Learning, Feature Engineering, Representation Learning, Knowledge Learning, Interpretable Machine Learning, Multimodal Machine Learning, Visual Transformer, Optimization Policy.

Published Papers


  • Open Access

    ARTICLE

    A Real-Time Semantic Segmentation Method Based on Transformer for Autonomous Driving

    Weiyu Hao, Jingyi Wang, Huimin Lu
    CMC-Computers, Materials & Continua, Vol.81, No.3, pp. 4419-4433, 2024, DOI:10.32604/cmc.2024.055478
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract While traditional Convolutional Neural Network (CNN)-based semantic segmentation methods have proven effective, they often encounter significant computational challenges due to the requirement for dense pixel-level predictions, which complicates real-time implementation. To address this, we introduce an advanced real-time semantic segmentation strategy specifically designed for autonomous driving, utilizing the capabilities of Visual Transformers. By leveraging the self-attention mechanism inherent in Visual Transformers, our method enhances global contextual awareness, refining the representation of each pixel in relation to the overall scene. This enhancement is critical for quickly and accurately interpreting the complex elements within driving scenarios—a fundamental… More >

  • Open Access

    ARTICLE

    Masked Autoencoders as Single Object Tracking Learners

    Chunjuan Bo, Xin Chen, Junxing Zhang
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 1105-1122, 2024, DOI:10.32604/cmc.2024.052329
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract Significant advancements have been witnessed in visual tracking applications leveraging ViT in recent years, mainly due to the formidable modeling capabilities of Vision Transformer (ViT). However, the strong performance of such trackers heavily relies on ViT models pretrained for long periods, limiting more flexible model designs for tracking tasks. To address this issue, we propose an efficient unsupervised ViT pretraining method for the tracking task based on masked autoencoders, called TrackMAE. During pretraining, we employ two shared-parameter ViTs, serving as the appearance encoder and motion encoder, respectively. The appearance encoder encodes randomly masked image data,… More >

  • Open Access

    ARTICLE

    SMSTracker: A Self-Calibration Multi-Head Self-Attention Transformer for Visual Object Tracking

    Zhongyang Wang, Hu Zhu, Feng Liu
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 605-623, 2024, DOI:10.32604/cmc.2024.050959
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract Visual object tracking plays a crucial role in computer vision. In recent years, researchers have proposed various methods to achieve high-performance object tracking. Among these, methods based on Transformers have become a research hotspot due to their ability to globally model and contextualize information. However, current Transformer-based object tracking methods still face challenges such as low tracking accuracy and the presence of redundant feature information. In this paper, we introduce self-calibration multi-head self-attention Transformer (SMSTracker) as a solution to these challenges. It employs a hybrid tensor decomposition self-organizing multi-head self-attention transformer mechanism, which not only… More >

  • Open Access

    ARTICLE

    A U-Shaped Network-Based Grid Tagging Model for Chinese Named Entity Recognition

    Yan Xiang, Xuedong Zhao, Junjun Guo, Zhiliang Shi, Enbang Chen, Xiaobo Zhang
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 4149-4167, 2024, DOI:10.32604/cmc.2024.050229
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract Chinese named entity recognition (CNER) has received widespread attention as an important task of Chinese information extraction. Most previous research has focused on individually studying flat CNER, overlapped CNER, or discontinuous CNER. However, a unified CNER is often needed in real-world scenarios. Recent studies have shown that grid tagging-based methods based on character-pair relationship classification hold great potential for achieving unified NER. Nevertheless, how to enrich Chinese character-pair grid representations and capture deeper dependencies between character pairs to improve entity recognition performance remains an unresolved challenge. In this study, we enhance the character-pair grid representation… More >

  • Open Access

    ARTICLE

    Multi-Branch High-Dimensional Guided Transformer-Based 3D Human Posture Estimation

    Xianhua Li, Haohao Yu, Shuoyu Tian, Fengtao Lin, Usama Masood
    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 3551-3564, 2024, DOI:10.32604/cmc.2024.047336
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract The human pose paradigm is estimated using a transformer-based multi-branch multidimensional directed the three-dimensional (3D) method that takes into account self-occlusion, badly posedness, and a lack of depth data in the per-frame 3D posture estimation from two-dimensional (2D) mapping to 3D mapping. Firstly, by examining the relationship between the movements of different bones in the human body, four virtual skeletons are proposed to enhance the cyclic constraints of limb joints. Then, multiple parameters describing the skeleton are fused and projected into a high-dimensional space. Utilizing a multi-branch network, motion features between bones and overall motion More >

  • Open Access

    ARTICLE

    Network Configuration Entity Extraction Method Based on Transformer with Multi-Head Attention Mechanism

    Yang Yang, Zhenying Qu, Zefan Yan, Zhipeng Gao, Ti Wang
    CMC-Computers, Materials & Continua, Vol.78, No.1, pp. 735-757, 2024, DOI:10.32604/cmc.2023.045807
    (This article belongs to the Special Issue: Recognition Tasks with Transformers)
    Abstract Nowadays, ensuring the quality of network services has become increasingly vital. Experts are turning to knowledge graph technology, with a significant emphasis on entity extraction in the identification of device configurations. This research paper presents a novel entity extraction method that leverages a combination of active learning and attention mechanisms. Initially, an improved active learning approach is employed to select the most valuable unlabeled samples, which are subsequently submitted for expert labeling. This approach successfully addresses the problems of isolated points and sample redundancy within the network configuration sample set. Then the labeled samples are… More >

Share Link