Special Issues
Table of Content

Multimodal Learning in Image Processing

Submission Deadline: 31 July 2024 View: 58 Submit to Special Issue

Guest Editors

Prof. Shuai Liu, Hunan Normal University, China
Prof. Gautam Srivastava, Brandon University, Canada

Summary

Multimodal image segmentation and recognition is a significant and challenging research field. With the rapid development of information technology, multimodal target information is caught from different kinds of sensors, such as optical, infrared, and radar information. In this way, how to effectively fuse and utilize these multimodal data with different features and information has become a key issue.


Multimodal learning, as a powerful machine for data learning and fusion, has the ability to learn fused feature for complex data processing. In multimodal image processing, deep learning methods extract different features from multiple sensors; and then information fusion methods combine the features considering their contribution to target recognition. This can defend major challegences of classical methods, however, there are still many issues waiting solutions, such as the fusion strategy of multimodal data, data imbalance based cognitive distortion, small sample driven one/few-shot models, etc.


In this way, this issue is provided to focus on the methods and applications of multimodal learning in image processing, aiming to explore innovative methods and technologies to solve existing problems. Respected experts, scholars, and researchers are invited to share their latest research achievements and practical experience in this field, which can promote the development of multimodal image recognition, improve classification and recognition accuracy, and provide reliable solutions for practical applications.


We sincerely invite researchers from academia and industry to submit original research papers, review articles, and technical reports to jointly explore the methods and applications of multimodal mearning in image processing, solve existing problems, and promote further development in this field.


Keywords

Image processing; multimodal fusion; deep learning; classification; recognition

Published Papers


  • Open Access

    ARTICLE

    Target Detection on Water Surfaces Using Fusion of Camera and LiDAR Based Information

    Yongguo Li, Yuanrong Wang, Jia Xie, Caiyin Xu, Kun Zhang
    CMC-Computers, Materials & Continua, DOI:10.32604/cmc.2024.051426
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle (USV) perception, this paper proposes a method based on the fusion of visual and LiDAR point-cloud projection for water surface target detection. Firstly, the visual recognition component employs an improved YOLOv7 algorithm based on a self-built dataset for the detection of water surface targets. This algorithm modifies the original YOLOv7 architecture to a Slim-Neck structure, addressing the problem of excessive redundant information during feature extraction in the original YOLOv7 network model. Simultaneously, this modification simplifies… More >

  • Open Access

    ARTICLE

    BDPartNet: Feature Decoupling and Reconstruction Fusion Network for Infrared and Visible Image

    Xuejie Wang, Jianxun Zhang, Ye Tao, Xiaoli Yuan, Yifan Guo
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 4621-4639, 2024, DOI:10.32604/cmc.2024.051556
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract While single-modal visible light images or infrared images provide limited information, infrared light captures significant thermal radiation data, whereas visible light excels in presenting detailed texture information. Combining images obtained from both modalities allows for leveraging their respective strengths and mitigating individual limitations, resulting in high-quality images with enhanced contrast and rich texture details. Such capabilities hold promising applications in advanced visual tasks including target detection, instance segmentation, military surveillance, pedestrian detection, among others. This paper introduces a novel approach, a dual-branch decomposition fusion network based on AutoEncoder (AE), which decomposes multi-modal features into intensity… More >

  • Open Access

    ARTICLE

    Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset, Methodology and Evaluation

    Shiwen Song, Rui Zhang, Min Hu, Feiyao Huang
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5243-5271, 2024, DOI:10.32604/cmc.2024.050879
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security. Currently, with the emergence of massive high-resolution multi-modality images, the use of multi-modality images for fine-grained recognition has become a promising technology. Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples. The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features. The attention mechanism helps the model to pinpoint the key information in the image, resulting in a… More >

  • Open Access

    ARTICLE

    Vehicle Abnormal Behavior Detection Based on Dense Block and Soft Thresholding

    Yuanyao Lu, Wei Chen, Zhanhe Yu, Jingxuan Wang, Chaochao Yang
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5051-5066, 2024, DOI:10.32604/cmc.2024.050865
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract With the rapid advancement of social economies, intelligent transportation systems are gaining increasing attention. Central to these systems is the detection of abnormal vehicle behavior, which remains a critical challenge due to the complexity of urban roadways and the variability of external conditions. Current research on detecting abnormal traffic behaviors is still nascent, with significant room for improvement in recognition accuracy. To address this, this research has developed a new model for recognizing abnormal traffic behaviors. This model employs the R3D network as its core architecture, incorporating a dense block to facilitate feature reuse. This… More >

  • Open Access

    ARTICLE

    Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network

    Jingyi Mao, Yuchen Zhou, Yifan Wang, Junyu Li, Ziqing Liu, Fanliang Bu
    CMC-Computers, Materials & Continua, Vol.79, No.1, pp. 837-855, 2024, DOI:10.32604/cmc.2024.048703
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Voice portrait technology has explored and established the relationship between speakers’ voices and their facial features, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker. Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now been widely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based on GANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs face limitations in image generation quality and struggle to maintain facial similarity. Additionally, the training process is relatively unstable, thereby… More >

  • Open Access

    ARTICLE

    BCCLR: A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues

    Yunhe Wang, Yuxin Xia, Shuai Liu
    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4489-4507, 2024, DOI:10.32604/cmc.2024.048813
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract In recent years, skeleton-based action recognition has made great achievements in Computer Vision. A graph convolutional network (GCN) is effective for action recognition, modelling the human skeleton as a spatio-temporal graph. Most GCNs define the graph topology by physical relations of the human joints. However, this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs, resulting in a low recognition rate for specific actions with implicit correlation between joint pairs. In addition, existing methods ignore the trend correlation between adjacent frames within an action… More >

Share Link