Multimodal image segmentation and recognition is a significant and challenging research field. With the rapid development of information technology, multimodal target information is caught from different kinds of sensors, such as optical, infrared, and radar information. In this way, how to effectively fuse and utilize these multimodal data with different features and information has become a key issue.

Multimodal learning, as a powerful machine for data learning and fusion, has the ability to learn fused feature for complex data processing. In multimodal image processing, deep learning methods extract different features from multiple sensors; and then information fusion methods combine the features considering their contribution to target recognition. This can defend major challegences of classical methods, however, there are still many issues waiting solutions, such as the fusion strategy of multimodal data, data imbalance based cognitive distortion, small sample driven one/few-shot models, etc.

In this way, this issue is provided to focus on the methods and applications of multimodal learning in image processing, aiming to explore innovative methods and technologies to solve existing problems. Respected experts, scholars, and researchers are invited to share their latest research achievements and practical experience in this field, which can promote the development of multimodal image recognition, improve classification and recognition accuracy, and provide reliable solutions for practical applications.

We sincerely invite researchers from academia and industry to submit original research papers, review articles, and technical reports to jointly explore the methods and applications of multimodal mearning in image processing, solve existing problems, and promote further development in this field.

Keywords

Image processing; multimodal fusion; deep learning; classification; recognition

Published Papers

Show export options

Select all

Export citation of selected articles as:

Open Access

EDITORIAL

Multimodal Learning in Image Processing
Zhixin Chen, Gautam Srivastava, Shuai Liu
CMC-Computers, Materials & Continua, Vol.82, No.2, pp. 3615-3618, 2025, DOI:10.32604/cmc.2025.062313
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract This article has no abstract. More >

View
404

Download
189
Open Access

ARTICLE

Border Sensitive Knowledge Distillation for Rice Panicle Detection in UAV Images
Anitha Ramachandran, Sendhil Kumar K.S.
CMC-Computers, Materials & Continua, Vol.81, No.1, pp. 827-842, 2024, DOI:10.32604/cmc.2024.054768
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Research on panicle detection is one of the most important aspects of paddy phenotypic analysis. A phenotyping method that uses unmanned aerial vehicles can be an excellent alternative to field-based methods. Nevertheless, it entails many other challenges, including different illuminations, panicle sizes, shape distortions, partial occlusions, and complex backgrounds. Object detection algorithms are directly affected by these factors. This work proposes a model for detecting panicles called Border Sensitive Knowledge Distillation (BSKD). It is designed to prioritize the preservation of knowledge in border areas through the use of feature distillation. Our feature-based knowledge distillation method More >

View
588

Download
295
Open Access

ARTICLE

Semantic Segmentation and YOLO Detector over Aerial Vehicle Images
Asifa Mehmood Qureshi, Abdul Haleem Butt, Abdulwahab Alazeb, Naif Al Mudawi, Mohammad Alonazi, Nouf Abdullah Almujally, Ahmad Jalal, Hui Liu
CMC-Computers, Materials & Continua, Vol.80, No.2, pp. 3315-3332, 2024, DOI:10.32604/cmc.2024.052582
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Intelligent vehicle tracking and detection are crucial tasks in the realm of highway management. However, vehicles come in a range of sizes, which is challenging to detect, affecting the traffic monitoring system’s overall accuracy. Deep learning is considered to be an efficient method for object detection in vision-based systems. In this paper, we proposed a vision-based vehicle detection and tracking system based on a You Look Only Once version 5 (YOLOv5) detector combined with a segmentation technique. The model consists of six steps. In the first step, all the extracted traffic sequence images are subjected… More >

View
818

Download
298
Open Access

ARTICLE

CMMCAN: Lightweight Feature Extraction and Matching Network for Endoscopic Images Based on Adaptive Attention
Nannan Chong, Fan Yang
CMC-Computers, Materials & Continua, Vol.80, No.2, pp. 2761-2783, 2024, DOI:10.32604/cmc.2024.052217
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract In minimally invasive surgery, endoscopes or laparoscopes equipped with miniature cameras and tools are used to enter the human body for therapeutic purposes through small incisions or natural cavities. However, in clinical operating environments, endoscopic images often suffer from challenges such as low texture, uneven illumination, and non-rigid structures, which affect feature observation and extraction. This can severely impact surgical navigation or clinical diagnosis due to missing feature points in endoscopic images, leading to treatment and postoperative recovery issues for patients. To address these challenges, this paper introduces, for the first time, a Cross-Channel Multi-Modal… More >

View
712

Download
306
Open Access

ARTICLE

An Enhanced GAN for Image Generation
Chunwei Tian, Haoyang Gao, Pengwei Wang, Bob Zhang
CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 105-118, 2024, DOI:10.32604/cmc.2024.052097
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Generative adversarial networks (GANs) with gaming abilities have been widely applied in image generation. However, gamistic generators and discriminators may reduce the robustness of the obtained GANs in image generation under varying scenes. Enhancing the relation of hierarchical information in a generation network and enlarging differences of different network architectures can facilitate more structural information to improve the generation effect for image generation. In this paper, we propose an enhanced GAN via improving a generator for image generation (EIGGAN). EIGGAN applies a spatial attention to a generator to extract salient information to enhance the truthfulness… More >

View
1024

Download
491
Open Access

ARTICLE

Target Detection on Water Surfaces Using Fusion of Camera and LiDAR Based Information
Yongguo Li, Yuanrong Wang, Jia Xie, Caiyin Xu, Kun Zhang
CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 467-486, 2024, DOI:10.32604/cmc.2024.051426
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract To address the challenges of missed detections in water surface target detection using solely visual algorithms in unmanned surface vehicle (USV) perception, this paper proposes a method based on the fusion of visual and LiDAR point-cloud projection for water surface target detection. Firstly, the visual recognition component employs an improved YOLOv7 algorithm based on a self-built dataset for the detection of water surface targets. This algorithm modifies the original YOLOv7 architecture to a Slim-Neck structure, addressing the problem of excessive redundant information during feature extraction in the original YOLOv7 network model. Simultaneously, this modification simplifies… More >

View
782

Download
331
Open Access

ARTICLE

A Novel 3D Gait Model for Subject Identification Robust against Carrying and Dressing Variations
Jian Luo, Bo Xu, Tardi Tjahjadi, Jian Yi
CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 235-261, 2024, DOI:10.32604/cmc.2024.050018
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Subject identification via the subject’s gait is challenging due to variations in the subject’s carrying and dressing conditions in real-life scenes. This paper proposes a novel targeted 3-dimensional (3D) gait model (3DGait) represented by a set of interpretable 3DGait descriptors based on a 3D parametric body model. The 3DGait descriptors are utilised as invariant gait features in the 3DGait recognition method to address object carrying and dressing. The 3DGait recognition method involves 2-dimensional (2D) to 3DGait data learning based on 3D virtual samples, a semantic gait parameter estimation Long Short Time Memory (LSTM) network (3D-SGPE-LSTM), a feature fusion… More >

View
678

Download
330
Open Access

ARTICLE

UNet Based on Multi-Object Segmentation and Convolution Neural Network for Object Recognition
Nouf Abdullah Almujally, Bisma Riaz Chughtai, Naif Al Mudawi, Abdulwahab Alazeb, Asaad Algarni, Hamdan A. Alzahrani, Jeongmin Park
CMC-Computers, Materials & Continua, Vol.80, No.1, pp. 1563-1580, 2024, DOI:10.32604/cmc.2024.049333
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract The recent advancements in vision technology have had a significant impact on our ability to identify multiple objects and understand complex scenes. Various technologies, such as augmented reality-driven scene integration, robotic navigation, autonomous driving, and guided tour systems, heavily rely on this type of scene comprehension. This paper presents a novel segmentation approach based on the UNet network model, aimed at recognizing multiple objects within an image. The methodology begins with the acquisition and preprocessing of the image, followed by segmentation using the fine-tuned UNet architecture. Afterward, we use an annotation tool to accurately label… More >

View
485

Download
351
Open Access

ARTICLE

BDPartNet: Feature Decoupling and Reconstruction Fusion Network for Infrared and Visible Image
Xuejie Wang, Jianxun Zhang, Ye Tao, Xiaoli Yuan, Yifan Guo
CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 4621-4639, 2024, DOI:10.32604/cmc.2024.051556
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract While single-modal visible light images or infrared images provide limited information, infrared light captures significant thermal radiation data, whereas visible light excels in presenting detailed texture information. Combining images obtained from both modalities allows for leveraging their respective strengths and mitigating individual limitations, resulting in high-quality images with enhanced contrast and rich texture details. Such capabilities hold promising applications in advanced visual tasks including target detection, instance segmentation, military surveillance, pedestrian detection, among others. This paper introduces a novel approach, a dual-branch decomposition fusion network based on AutoEncoder (AE), which decomposes multi-modal features into intensity… More >

View
721

Download
276
Open Access

ARTICLE

Fine-Grained Ship Recognition Based on Visible and Near-Infrared Multimodal Remote Sensing Images: Dataset, Methodology and Evaluation
Shiwen Song, Rui Zhang, Min Hu, Feiyao Huang
CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5243-5271, 2024, DOI:10.32604/cmc.2024.050879
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Fine-grained recognition of ships based on remote sensing images is crucial to safeguarding maritime rights and interests and maintaining national security. Currently, with the emergence of massive high-resolution multi-modality images, the use of multi-modality images for fine-grained recognition has become a promising technology. Fine-grained recognition of multi-modality images imposes higher requirements on the dataset samples. The key to the problem is how to extract and fuse the complementary features of multi-modality images to obtain more discriminative fusion features. The attention mechanism helps the model to pinpoint the key information in the image, resulting in a… More >

View
695

Download
409
Open Access

ARTICLE

Vehicle Abnormal Behavior Detection Based on Dense Block and Soft Thresholding
Yuanyao Lu, Wei Chen, Zhanhe Yu, Jingxuan Wang, Chaochao Yang
CMC-Computers, Materials & Continua, Vol.79, No.3, pp. 5051-5066, 2024, DOI:10.32604/cmc.2024.050865
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract With the rapid advancement of social economies, intelligent transportation systems are gaining increasing attention. Central to these systems is the detection of abnormal vehicle behavior, which remains a critical challenge due to the complexity of urban roadways and the variability of external conditions. Current research on detecting abnormal traffic behaviors is still nascent, with significant room for improvement in recognition accuracy. To address this, this research has developed a new model for recognizing abnormal traffic behaviors. This model employs the R3D network as its core architecture, incorporating a dense block to facilitate feature reuse. This… More >

View
618

Download
369
Open Access

ARTICLE

Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network
Jingyi Mao, Yuchen Zhou, Yifan Wang, Junyu Li, Ziqing Liu, Fanliang Bu
CMC-Computers, Materials & Continua, Vol.79, No.1, pp. 837-855, 2024, DOI:10.32604/cmc.2024.048703
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract Voice portrait technology has explored and established the relationship between speakers’ voices and their facial features, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker. Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now been widely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based on GANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs face limitations in image generation quality and struggle to maintain facial similarity. Additionally, the training process is relatively unstable, thereby… More >

View
735

Download
419
Open Access

ARTICLE

BCCLR: A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues
Yunhe Wang, Yuxin Xia, Shuai Liu
CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4489-4507, 2024, DOI:10.32604/cmc.2024.048813
（This article belongs to the Special Issue: Multimodal Learning in Image Processing)
Abstract In recent years, skeleton-based action recognition has made great achievements in Computer Vision. A graph convolutional network (GCN) is effective for action recognition, modelling the human skeleton as a spatio-temporal graph. Most GCNs define the graph topology by physical relations of the human joints. However, this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs, resulting in a low recognition rate for specific actions with implicit correlation between joint pairs. In addition, existing methods ignore the trend correlation between adjacent frames within an action… More >

View
791

Download
339

Multimodal Learning in Image Processing

Guest Editors

Summary

Keywords

Published Papers

View

404

Download

189

View

588

Download

295

View

818

Download

298

View

712

Download

306

View

1024

Download

491

View

782

Download

331

View

678

Download

330

View

485

Download

351

View

721

Download

276

View

695

Download

409

View

618

Download

369

View

735

Download

419

View

791

Download

339

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link