Open Access iconOpen Access

ARTICLE

MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection

by Tianzhe Jiao, Yuming Chen, Zhe Zhang, Chaopeng Guo, Jie Song*

Software College, Northeastern University, Shenyang, 110819, China

* Corresponding Author: Jie Song. Email: email

Computers, Materials & Continua 2024, 81(3), 4307-4325. https://doi.org/10.32604/cmc.2024.058238

Abstract

Multi-modal 3D object detection has achieved remarkable progress, but it is often limited in practical industrial production because of its high cost and low efficiency. The multi-view camera-based method provides a feasible solution due to its low cost. However, camera data lacks geometric depth, and only using camera data to obtain high accuracy is challenging. This paper proposes a multi-modal Bird-Eye-View (BEV) distillation framework (MMDistill) to make a trade-off between them. MMDistill is a carefully crafted two-stage distillation framework based on teacher and student models for learning cross-modal knowledge and generating multi-modal features. It can improve the performance of unimodal detectors without introducing additional costs during inference. Specifically, our method can effectively solve the cross-gap caused by the heterogeneity between data. Furthermore, we further propose a Light Detection and Ranging (LiDAR)-guided geometric compensation module, which can assist the student model in obtaining effective geometric features and reduce the gap between different modalities. Our proposed method generally requires fewer computational resources and faster inference speed than traditional multi-modal models. This advancement enables multi-modal technology to be applied more widely in practical scenarios. Through experiments, we validate the effectiveness and superiority of MMDistill on the nuScenes dataset, achieving an improvement of 4.1% mean Average Precision (mAP) and 4.6% NuScenes Detection Score (NDS) over the baseline detector. In addition, we also present detailed ablation studies to validate our method.

Keywords


Cite This Article

APA Style
Jiao, T., Chen, Y., Zhang, Z., Guo, C., Song, J. (2024). Mmdistill: multi-modal BEV distillation framework for multi-view 3D object detection. Computers, Materials & Continua, 81(3), 4307-4325. https://doi.org/10.32604/cmc.2024.058238
Vancouver Style
Jiao T, Chen Y, Zhang Z, Guo C, Song J. Mmdistill: multi-modal BEV distillation framework for multi-view 3D object detection. Comput Mater Contin. 2024;81(3):4307-4325 https://doi.org/10.32604/cmc.2024.058238
IEEE Style
T. Jiao, Y. Chen, Z. Zhang, C. Guo, and J. Song, “MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection,” Comput. Mater. Contin., vol. 81, no. 3, pp. 4307-4325, 2024. https://doi.org/10.32604/cmc.2024.058238



cc Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 161

    View

  • 48

    Download

  • 0

    Like

Share Link