MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection

Tianzhe Jiao; Yuming Chen; Zhe Zhang; Chaopeng Guo; Jie Song

doi:10.32604/cmc.2024.058238

Open Access icon Open Access

ARTICLE

MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection

Tianzhe Jiao, Yuming Chen, Zhe Zhang, Chaopeng Guo, Jie Song^*

Software College, Northeastern University, Shenyang, 110819, China

* Corresponding Author: Jie Song. Email: email

Computers, Materials & Continua 2024, 81(3), 4307-4325. https://doi.org/10.32604/cmc.2024.058238

Received 08 September 2024; Accepted 11 November 2024; Issue published 19 December 2024

Abstract

Multi-modal 3D object detection has achieved remarkable progress, but it is often limited in practical industrial production because of its high cost and low efficiency. The multi-view camera-based method provides a feasible solution due to its low cost. However, camera data lacks geometric depth, and only using camera data to obtain high accuracy is challenging. This paper proposes a multi-modal Bird-Eye-View (BEV) distillation framework (MMDistill) to make a trade-off between them. MMDistill is a carefully crafted two-stage distillation framework based on teacher and student models for learning cross-modal knowledge and generating multi-modal features. It can improve the performance of unimodal detectors without introducing additional costs during inference. Specifically, our method can effectively solve the cross-gap caused by the heterogeneity between data. Furthermore, we further propose a Light Detection and Ranging (LiDAR)-guided geometric compensation module, which can assist the student model in obtaining effective geometric features and reduce the gap between different modalities. Our proposed method generally requires fewer computational resources and faster inference speed than traditional multi-modal models. This advancement enables multi-modal technology to be applied more widely in practical scenarios. Through experiments, we validate the effectiveness and superiority of MMDistill on the nuScenes dataset, achieving an improvement of 4.1% mean Average Precision (mAP) and 4.6% NuScenes Detection Score (NDS) over the baseline detector. In addition, we also present detailed ablation studies to validate our method.

Keywords

3D object detection; multi-modal; knowledge distillation; deep learning; remote sensing

Cite This Article

APA Style

Jiao, T., Chen, Y., Zhang, Z., Guo, C., Song, J. (2024). Mmdistill: multi-modal BEV distillation framework for multi-view 3D object detection. Computers, Materials & Continua, 81(3), 4307–4325. https://doi.org/10.32604/cmc.2024.058238

Vancouver Style

Jiao T, Chen Y, Zhang Z, Guo C, Song J. Mmdistill: multi-modal BEV distillation framework for multi-view 3D object detection. Comput Mater Contin. 2024;81(3):4307–4325. https://doi.org/10.32604/cmc.2024.058238

IEEE Style

T. Jiao, Y. Chen, Z. Zhang, C. Guo, and J. Song, “MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection,” Comput. Mater. Contin., vol. 81, no. 3, pp. 4307–4325, 2024. https://doi.org/10.32604/cmc.2024.058238

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection

Abstract

Keywords

Cite This Article

693

281

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link