Special Issues
Table of Content

Multimodal Learning in Image Processing

Submission Deadline: 31 July 2024 (closed) View: 526

Guest Editors

Prof. Shuai Liu, Hunan Normal University, China
Prof. Gautam Srivastava, Brandon University, Canada

Summary

Multimodal image segmentation and recognition is a significant and challenging research field. With the rapid development of information technology, multimodal target information is caught from different kinds of sensors, such as optical, infrared, and radar information. In this way, how to effectively fuse and utilize these multimodal data with different features and information has become a key issue.


Multimodal learning, as a powerful machine for data learning and fusion, has the ability to learn fused feature for complex data processing. In multimodal image processing, deep learning methods extract different features from multiple sensors; and then information fusion methods combine the features considering their contribution to target recognition. This can defend major challegences of classical methods, however, there are still many issues waiting solutions, such as the fusion strategy of multimodal data, data imbalance based cognitive distortion, small sample driven one/few-shot models, etc.


In this way, this issue is provided to focus on the methods and applications of multimodal learning in image processing, aiming to explore innovative methods and technologies to solve existing problems. Respected experts, scholars, and researchers are invited to share their latest research achievements and practical experience in this field, which can promote the development of multimodal image recognition, improve classification and recognition accuracy, and provide reliable solutions for practical applications.


We sincerely invite researchers from academia and industry to submit original research papers, review articles, and technical reports to jointly explore the methods and applications of multimodal mearning in image processing, solve existing problems, and promote further development in this field.


Keywords

Image processing; multimodal fusion; deep learning; classification; recognition

Published Papers


  • Open Access

    ARTICLE

    Border Sensitive Knowledge Distillation for Rice Panicle Detection in UAV Images

    Anitha Ramachandran, Sendhil Kumar K.S.
    CMC-Computers, Materials & Continua, Vol.81, No.1, pp. 827-842, 2024, DOI:10.32604/cmc.2024.054768
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Research on panicle detection is one of the most important aspects of paddy phenotypic analysis. A phenotyping method that uses unmanned aerial vehicles can be an excellent alternative to field-based methods. Nevertheless, it entails many other challenges, including different illuminations, panicle sizes, shape distortions, partial occlusions, and complex backgrounds. Object detection algorithms are directly affected by these factors. This work proposes a model for detecting panicles called Border Sensitive Knowledge Distillation (BSKD). It is designed to prioritize the preservation of knowledge in border areas through the use of feature distillation. Our feature-based knowledge distillation method More >

  • Open Access

    ARTICLE

    Semantic Segmentation and YOLO Detector over Aerial Vehicle Images

    Asifa Mehmood Qureshi, Abdul Haleem Butt, Abdulwahab Alazeb, Naif Al Mudawi, Mohammad Alonazi, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu
    CMC-Computers, Materials & Continua, Vol.80, No.2, pp. 3315-3332, 2024, DOI:10.32604/cmc.2024.052582
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Intelligent vehicle tracking and detection are crucial tasks in the realm of highway management. However, vehicles come in a range of sizes, which is challenging to detect, affecting the traffic monitoring system’s overall accuracy. Deep learning is considered to be an efficient method for object detection in vision-based systems. In this paper, we proposed a vision-based vehicle detection and tracking system based on a You Look Only Once version 5 (YOLOv5) detector combined with a segmentation technique. The model consists of six steps. In the first step, all the extracted traffic sequence images are subjected… More >

  • Open Access

    ARTICLE

    CMMCAN: Lightweight Feature Extraction and Matching Network for Endoscopic Images Based on Adaptive Attention

    Nannan Chong, Fan Yang
    CMC-Computers, Materials & Continua, Vol.80, No.2, pp. 2761-2783, 2024, DOI:10.32604/cmc.2024.052217
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract In minimally invasive surgery, endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities. However, in clinical operating environments, endoscopic images often suffer from challenges such as low texture, uneven illumination, and non-rigid structures, which affect feature observation and extraction. This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images, leading to treatment and postoperative recovery issues for patients. To address these challenges, this paper introduces, for the first time, a Cross-Channel Multi-Modal… More >

  • Open Access

    ARTICLE

    An Enhanced GAN for Image Generation

    Chunwei Tian, Haoyang Gao, Pengwei Wang, Bob Zhang
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 105-118, 2024, DOI:10.32604/cmc.2024.052097
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Generative adversarial networks (GANs) with gaming abilities have been widely applied in image generation. However, gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes. Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation. In this paper, we propose an enhanced GAN via improving a generator for image generation (EIGGAN). EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness… More >

  • Open Access

    ARTICLE

    Target Detection on Water Surfaces Using Fusion of Camera and LiDAR Based Information

    Yongguo Li, Yuanrong Wang, Jia Xie, Caiyin Xu, Kun Zhang
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 467-486, 2024, DOI:10.32604/cmc.2024.051426
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle (USV) perception, this paper proposes a method based on the fusion of visual and LiDAR point-cloud projection for water surface target detection. Firstly, the visual recognition component employs an improved YOLOv7 algorithm based on a self-built dataset for the detection of water surface targets. This algorithm modifies the original YOLOv7 architecture to a Slim-Neck structure, addressing the problem of excessive redundant information during feature extraction in the original YOLOv7 network model. Simultaneously, this modification simplifies… More >

  • Open Access

    ARTICLE

    A Novel 3D Gait Model for Subject Identification Robust against Carrying and Dressing Variations

    Jian Luo, Bo Xu, Tardi Tjahjadi, Jian Yi
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 235-261, 2024, DOI:10.32604/cmc.2024.050018
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Subject identification via the subject’s gait is challenging due to variations in the subject’s carrying and dressing conditions in real-life scenes. This paper proposes a novel targeted 3-dimensional (3D) gait model (3DGait) represented by a set of interpretable 3DGait descriptors based on a 3D parametric body model. The 3DGait descriptors are utilised as invariant gait features in the 3DGait recognition method to address object carrying and dressing. The 3DGait recognition method involves 2-dimensional (2D) to 3DGait data learning based on 3D virtual samples, a semantic gait parameter estimation Long Short Time Memory (LSTM) network (3D-SGPE-LSTM), a feature fusion… More >

  • Open Access

    ARTICLE

    UNet Based on Multi-Object Segmentation and Convolution Neural Network for Object Recognition

    Nouf Abdullah Almujally, Bisma Riaz Chughtai, Naif Al Mudawi, Abdulwahab Alazeb, Asaad Algarni, Hamdan A. Alzahrani, Jeongmin Park
    CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 1563-1580, 2024, DOI:10.32604/cmc.2024.049333
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract The recent advancements in vision technology have had a significant impact on our ability to identify multiple objects and understand complex scenes. Various technologies, such as augmented reality-driven scene integration, robotic navigation, autonomous driving, and guided tour systems, heavily rely on this type of scene comprehension. This paper presents a novel segmentation approach based on the UNet network model, aimed at recognizing multiple objects within an image. The methodology begins with the acquisition and preprocessing of the image, followed by segmentation using the fine-tuned UNet architecture. Afterward, we use an annotation tool to accurately label… More >

  • Open Access

    ARTICLE

    BDPartNet: Feature Decoupling and Reconstruction Fusion Network for Infrared and Visible Image

    Xuejie Wang, Jianxun Zhang, Ye Tao, Xiaoli Yuan, Yifan Guo
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 4621-4639, 2024, DOI:10.32604/cmc.2024.051556
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract While single-modal visible light images or infrared images provide limited information, infrared light captures significant thermal radiation data, whereas visible light excels in presenting detailed texture information. Combining images obtained from both modalities allows for leveraging their respective strengths and mitigating individual limitations, resulting in high-quality images with enhanced contrast and rich texture details. Such capabilities hold promising applications in advanced visual tasks including target detection, instance segmentation, military surveillance, pedestrian detection, among others. This paper introduces a novel approach, a dual-branch decomposition fusion network based on AutoEncoder (AE), which decomposes multi-modal features into intensity… More >

  • Open Access

    ARTICLE

    Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset, Methodology and Evaluation

    Shiwen Song, Rui Zhang, Min Hu, Feiyao Huang
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5243-5271, 2024, DOI:10.32604/cmc.2024.050879
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security. Currently, with the emergence of massive high-resolution multi-modality images, the use of multi-modality images for fine-grained recognition has become a promising technology. Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples. The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features. The attention mechanism helps the model to pinpoint the key information in the image, resulting in a… More >

  • Open Access

    ARTICLE

    Vehicle Abnormal Behavior Detection Based on Dense Block and Soft Thresholding

    Yuanyao Lu, Wei Chen, Zhanhe Yu, Jingxuan Wang, Chaochao Yang
    CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5051-5066, 2024, DOI:10.32604/cmc.2024.050865
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract With the rapid advancement of social economies, intelligent transportation systems are gaining increasing attention. Central to these systems is the detection of abnormal vehicle behavior, which remains a critical challenge due to the complexity of urban roadways and the variability of external conditions. Current research on detecting abnormal traffic behaviors is still nascent, with significant room for improvement in recognition accuracy. To address this, this research has developed a new model for recognizing abnormal traffic behaviors. This model employs the R3D network as its core architecture, incorporating a dense block to facilitate feature reuse. This… More >

  • Open Access

    ARTICLE

    Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network

    Jingyi Mao, Yuchen Zhou, Yifan Wang, Junyu Li, Ziqing Liu, Fanliang Bu
    CMC-Computers, Materials & Continua, Vol.79, No.1, pp. 837-855, 2024, DOI:10.32604/cmc.2024.048703
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Voice portrait technology has explored and established the relationship between speakers’ voices and their facial features, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker. Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now been widely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based on GANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs face limitations in image generation quality and struggle to maintain facial similarity. Additionally, the training process is relatively unstable, thereby… More >

  • Open Access

    ARTICLE

    BCCLR: A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues

    Yunhe Wang, Yuxin Xia, Shuai Liu
    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4489-4507, 2024, DOI:10.32604/cmc.2024.048813
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract In recent years, skeleton-based action recognition has made great achievements in Computer Vision. A graph convolutional network (GCN) is effective for action recognition, modelling the human skeleton as a spatio-temporal graph. Most GCNs define the graph topology by physical relations of the human joints. However, this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs, resulting in a low recognition rate for specific actions with implicit correlation between joint pairs. In addition, existing methods ignore the trend correlation between adjacent frames within an action… More >

Share Link