Open Access iconOpen Access

ARTICLE

Point-Based Fusion for Multimodal 3D Detection in Autonomous Driving

Xinxin Liu, Bin Ye*

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China

* Corresponding Author: Bin Ye. Email: email

(This article belongs to the Special Issue: Advanced Machine Learning and Artificial Intelligence in Engineering Applications)

Computer Systems Science and Engineering 2025, 49, 287-300. https://doi.org/10.32604/csse.2025.061655

Abstract

In the broader field of mechanical technology, and particularly in the context of self-driving vehicles, cameras and Light Detection and Ranging (LiDAR) sensors provide complementary modalities that hold significant potential for sensor fusion. However, directly merging multi-sensor data through point projection often results in information loss due to quantization, and managing the differing data formats from multiple sensors remains a persistent challenge. To address these issues, we propose a new fusion method that leverages continuous convolution, point-pooling, and a learned Multilayer Perceptron (MLP) to achieve superior detection performance. Our approach integrates the segmentation mask with raw LiDAR points rather than relying on projected points, effectively avoiding quantization loss. Additionally, when retrieving corresponding semantic information from images through point cloud projection, we employ linear interpolation and upsample the image feature maps to mitigate quantization loss. We employ nearest-neighbor search and continuous convolution to seamlessly fuse data from different formats. Moreover, we integrate pooling and aggregation operations, which serve as conceptual extensions of convolution, and are specifically designed to reconcile the inherent disparities among these data representations. Our detection network operates in two stages: in the first stage, preliminary proposals and segmentation features are generated; in the second stage, we refine the fusion results together with the segmentation mask to yield the final prediction. Notably, in our approach, the image network is used solely to provide semantic information, serving to enhance the point cloud features. Extensive experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) dataset demonstrate the effectiveness of our approach, which achieves both high precision and robust performance in 3D object detection tasks.

Keywords

Autonomous driving; 3D object detection; multi-sensor fusion; deep learning

Cite This Article

APA Style
Liu, X., Ye, B. (2025). Point-based fusion for multimodal 3D detection in autonomous driving. Computer Systems Science and Engineering, 49(1), 287–300. https://doi.org/10.32604/csse.2025.061655
Vancouver Style
Liu X, Ye B. Point-based fusion for multimodal 3D detection in autonomous driving. Comput Syst Sci Eng. 2025;49(1):287–300. https://doi.org/10.32604/csse.2025.061655
IEEE Style
X. Liu and B. Ye, “Point-Based Fusion for Multimodal 3D Detection in Autonomous Driving,” Comput. Syst. Sci. Eng., vol. 49, no. 1, pp. 287–300, 2025. https://doi.org/10.32604/csse.2025.061655



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 279

    View

  • 173

    Download

  • 0

    Like

Share Link