Research Advances on 3D Object Detection Method Based on Visual Information and


	LiDAR for Intelligent Driving

doi:10.3969/j.issn.1006-2475.2025.05.013

Abstract

Abstract: 3D object detection based on visual information and LiDAR is one of the key technologies in intelligent driving perception and plays a crucial role in understanding complex driving scenarios. Due to the inherent limitations of single sensor and the complexity of multi-modal data， achieving high-quality 3D object detection is not a straightforward task. It requires considerring many factors， including the heterogeneity of the data and optimization. Current research work mainly focuses on data fusion processing by leveraging the complementarity of single-modal data. To advance further research in 3D object detection， this paper first reviews 3D object detection methods based on visual information and LiDAR and then reviews 3D object detection methods based on LiDAR-Camera fusion from the perspectives of temporal fusion and stage-wise fusion. In addition， commonly used datasets and evaluation metrics are introduced， followed by performance comparisons of diffrent network architectures on these datasets. The advantages and limitations of different network types are analyzed accordingly. Finally， the challenges and solutions for the 3D object detection method based on visual information and LiDAR are given.

Key words: intelligent driving, deep learning, multi-modal fusion, 3D object detection

CLC Number:

TP751.1

WEI Yunsong1, 2, LI Jiaqiang1, 2, HE Chao1, 2, 3, YU Haisheng1, 2, CHEN Yanlin1, 2, ZHAO Longqing1, 2, WEI Rongkun1, 2. Research Advances on 3D Object Detection Method Based on Visual Information and LiDAR for Intelligent Driving [J]. Computer and Modernization, 2025, 0(05): 91-102.

References

［1］ REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once: Unified， real-time object detection［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2016:779-788.
［2］ YAN Y， MAO Y X， LI B. SECOND: Sparsely embedded convolutional detection［J］. Sensors， 2018，18（10）:3337.1
-3337.16.
［3］ WANG Y J， MAO Q Y， ZHU H Q， et al. Multi-modal 3D object detection in autonomous driving: A survey［J］. International Journal of Computer Vision， 2023，131（8）:2122-2152.
［4］ LIU Z J， TANG H T， AMINI A， et al. BEVFusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation［C］// 2023 IEEE International Conference on Robotics and Automation （ICRA）. IEEE， 2023:2774-2781.
［5］ BAI X Y， HU Z Y， ZHU X G， et al. TransFusion: Robust LiDAR-camera fusion for 3D object detection with transformers［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:1090-1099.
［6］ WANG L， ZHANG X Y， SONG Z Y， et al. Multi-modal 3D object detection in autonomous driving: A survey and taxonomy［J］. IEEE Transactions on Intelligent Vehicles， 2023，8（7）:3781-3798.
［7］ LIU W， ANGUELOV D， ERHAN D， et al. SSD: Single shot multibox detector［C］// Proceedings of the 14th European Conference on Computer Vision（ECCV 2016）. Springer， 2016:21-37.
［8］ CHU X M， DENG J J， LI Y， et al. Neighbor-vote: Improving monocular 3D object detection through neighbor distance voting［C］// Proceedings of the 29th ACM International Conference on Multimedia. ACM， 2021:5239-5247.
［9］ LUO S J， DAI H， SHAO L， et al. M3DSSD: Monocular 3D single stage object detector［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2021:6141-6150.
［10］ LI P L， CHEN X Z， SHEN S J. Stereo R-CNN based 3D object detection for autonomous driving［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019:7644-7652.
［11］ SUN J M， CHEN L H， XIE Y M， et al. Disp R-CNN: Stereo 3D object detection via shape prior guided instance disparity estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:10548-10557.
［12］ PENG W L， PAN H， LIU H， et al. IDA-3D: Instance-depth-aware 3D object detection from stereo vision for autonomous driving［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:13012-13021.
［13］ YOU Y R， WANG Y， CHAO W L， et al. Pseudo-LiDAR++: Accurate depth for 3D object detection in autonomous driving［J］. arXiv preprint arXiv:1906.06310， 2019.
［14］ WENG X S， KITANI K. Monocular 3D object detection with Pseudo-LiDAR point cloud［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops. IEEE， 2019:857-866.
［15］ QI C R， LIU W， WU C X， et al. Frustum PointNets for 3D object detection from RGB-D data［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:918-927.
［16］ LI Z Q， WANG W H， LI H Y， et al. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers［C］// 17th European Conference on Computer Vision. Springer， 2022:1-18.
［17］ WANG Y， GUIZILINI V C， ZHANG T Y， et al. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries［C］// Conference on 5th Robot Learning. PMLR， 2022:180-191.
［18］ HUANG J J， HUANG G. BEVDet4D: Exploit temporal cues in multi-camera 3D object detection［J］. arXiv preprint arXiv:2203.17054， 2022.
［19］ YANG B， LUO W J， URTASUN R. PIXOR: Real-time 3D object detection from point clouds［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:7652-7660.
［20］ ZHOU Y， TUZEL O. VoxelNet: End-to-end learning for point cloud based 3D object detection［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:4490-4499.
［21］ LANG A H， VORA S， CAESAR H， et al. PointPillars: Fast encoders for object detection from point clouds［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019:12697-12705.
［22］ DENG J J， SHI S S， LI P W， et al. Voxel R-CNN: Towards high performance voxel-based 3D object detection［C］// Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI， 2021，35（2）:1201-1209.
［23］ QI C R， SU H， MO K C， et al. PointNet: Deep learning on point sets for 3D classification and segmentation［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2017:652-660.
［24］ QI C R， YI L， SU H， et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems（NIPS’17）. ACM， 2017:5105-5114.
［25］ SHI S S， WANG X G， LI H S. PointRCNN: 3D object proposal generation and detection from point cloud［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019:770-779.
［26］ YANG Z T， SUN Y N， LIU S， et al. STD: Sparse-to-dense 3D object detector for point cloud［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. IEEE， 2019:1951-1960.
［27］ SHI S S， GUO C X， JIANG L， et al. PV-RCNN: Point-voxel feature set abstraction for 3D object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:10529-10538.
［28］ HE C H， ZENG H， HUANG J Q， et al. Structure aware single-stage 3D object detection from point cloud［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:11873-11882.
［29］ DENG S H， LIANG Z H， SUN L， et al. VISTA: Boosting 3D object detection via dual cross-view spatial attention［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:8448-8457.
［30］ SHI S S， JIANG L， DENG J J， et al. PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection［J］. International Journal of Computer Vision， 2023，131（2）:531-551.
［31］ GUAN T R， WANG J， LAN S Y， et al. M3DETR: Multi-representation， multi-scale， mutual-relation 3D object detection with transformers［C］// Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE， 2022:2293-2303.
［32］ LIU Y F， YAN J J， JIA F， et al. PETRv2: A unified framework for 3D perception from multi-camera images［J］. arXiv preprint arXiv:2206.01256， 2022.
［33］ YANG Z T， ZHOU Y， CHEN Z F， et al. 3D-MAN: 3D multi-frame attention network for object detection［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2021:1863-1872.
［34］ HUANG R， ZHANG W Y， KUNDU A， et al. An LSTM approach to temporal 3D object detection in LiDAR point clouds［C］// Proceedings of the 16th European Conference on Computer Vision（ECCV 2020）. Springer， 2020:266-282.
［35］ PIERGIOVANNI A J， CASSER V， RYOO M S， et al. 4D-Net for learned multi-modal alignment［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. IEEE， 2021:15435-1544.
［36］ ZENG Y H， ZHANG D， WANG C W， et al. LIFT: Learning 4D LiDAR image fusion transformer for 3D object detection［C］//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022: 17172-17181.
［37］ CALTAGIRONE L， BELLONE M， SVENSSON L， et al. LiDAR–camera fusion for road detection using fully convolutional neural networks［J］. Robotics and Autonomous Systems， 2019，111:125-131.
［38］ VORA S， LANG A H， HELOU B， et al. PointPainting: Sequential fusion for 3D object detection［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:4604-4612.
［39］ KU J， MOZIFIAN M， LEE J， et al. Joint 3D proposal generation and object detection from view aggregation［C］// 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. IEEE， 2018:5750-5757.
［40］ LIANG M， YANG B， WANG S L， et al. Deep continuous fusion for multi-sensor 3D object detection ［C］// Proceedings of the 15th European Conference on Computer Vision（ECCV）. Springer， 2018:663-678.
［41］ SHELHAMER E， LONG J， DARRELL T. Fully convolutional networks for semantic segmentation［C］// 2015 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. IEEE， 2015:3431-3440.
［42］ WANG Y， CHAO W L， GARG D， et al. Pseudo-LiDAR from visual depth estimation: Bridging the gap in 3D object detection for autonomous driving［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019:8445-8453.
［43］ YANG Z T， SUN Y N， LIU S， et al. 3DSSD: Point-based 3D single stage object detector［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:11037-11045.
［44］ XU D F， ANGUELOV D， JAIN A. PointFusion: Deep sensor fusion for 3D bounding box estimation［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:244-253.
［45］ ZHANG Y， LIU K， BAO H， et al. PMPF: Point-cloud multiple-pixel fusion-based 3D object detection for autonomous driving［J］. Remote Sensing， 2023，15（6）.DOI: 10.3390/rs15061580.
［46］ KIM T L， PARK T H. Camera-LiDAR fusion method with feature switch layer for object detection networks［J］. Sensors， 2022，22（19）. DOI: 10.3390/s22197163.
［47］ XU X L， DONG S C， XU T F， et al. FusionRCNN: LiDAR-Camera fusion for two-stage 3D object detection［J］. Remote Sensing， 2023，15（7）. DOI: 10.3390/rs15071839.
［48］ XU S Q， ZHOU D F， FANG J， et al. FusionPainting: Multimodal fusion with adaptive attention for 3D object detection［C］// 2021 IEEE International Intelligent Transportation Systems Conference（ITSC）. IEEE， 2021:3047-3054.
［49］ ZHANG Y， LIU K， BAO H， et al. AFTR: A robustness multi-sensor fusion model for 3D object detection based on adaptive fusion transformer［J］. Sensors， 2023，23（20）. DOI: 10.3390/s23208400.
［50］ LIN C M， TIAN D X， DUAN X T， et al. CL3D: Camera-LiDAR 3D object detection with point feature enhancement and point-guided fusion［J］. IEEE Transactions on Intelligent Transportation Systems， 2022，23（10）:18040-18050.
［51］ WANG S L， SUO S， MA W C， et al. Deep parametric continuous convolutional neural networks［C］// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:2589-2597.
［52］ LIANG M， YANG B， CHEN Y， et al. Multi-task multi-sensor fusion for 3D object detection［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2019:7345-7353.
［53］ SINDAGI V A， ZHOU D， TUZEL O. MVX-Net: Multimodal voxelnet for 3D object detection［C］// 2019 International Conference on Robotics and Automation（ICRA）. IEEE， 2019:7276-7282.
［54］ ZHANG Y N， CHEN J X， HUANG D. CAT-DET: Contrastively augmented transformer for multi-modal 3D object detection［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:898-907.
［55］ LI Y， YU A W， MENG T， et al. DeepFusion: LiDAR-camera deep fusion for multi-modal 3D object detection［C］//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2022:17182-17191.
［56］ YOO J H， KIM Y， KIM J， et al. 3D-CVF: Generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection［C］// Proceedings of the 16th European Conference on Computer Vision（ECCV 2020）. Springer， 2020:720-736.
［57］ CHEN X Y， ZHANG T Y， WANG Y， et al. FUTR3D: A unified sensor fusion framework for 3D detection［C］// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2023:172-181.
［58］ GEIGER A， LENZ P， URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite［C］// 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2012:3354-3361.
［59］ CAESAR H， BANKITI V， LANG A H， et al. NuScenes: A multimodal dataset for autonomous driving［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020:11621-11631.
［60］ SUN P， KRETZSCHMAR H， DOTIWALLA X， et al. Scalability in perception for autonomous driving: Waymo open dataset［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE， 2020: 2446-2454.
［61］ HUANG T T， LIU Z， CHEN X W， et al. EPNET: Enhancing point features with image semantics for 3D object detection［C］// Proceedings of the 16th European Conference on Computer Vision（ECCV 2020）. Springer， 2020:35-52.
［62］ CHEN X Z， MA H M， WAN J， et al. Multi-view 3D object detection network for autonomous driving［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2017:6526-6534.
［63］ HOU L H， XU X S， ITO T， et al. An optimization-based IMU/LiDAR/Camera co-calibration method［C］// 2022 7th International Conference on Robotics and Automation Engineering（ICRAE）. IEEE， 2022:118-122.
［64］ HUANG Z W， ZHANG Y K， CHEN Q J， et al. Online， target-free LiDAR-Camera extrinsic calibration via cross-modal mask matching［J］. arXiv preprint arXiv:2404.18083，
2024.
［65］ ZHOU Y T， HAN T S， NIE Q， et al. Adaptive point-line fusion: A targetless LiDAR-camera calibration method with scheme selection for autonomous driving［J］. Sensors， 2024， 24（4）. DOI: 10.3390/s24041127.
［66］ SHORTEN C， KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning［J］. Journal of Big Data， 2019，6. DOI: 10.1186/s40537-019-0197-0.
［67］李建，杜建强，朱彦陈，等. 基于Transformer的目标检测算法综述［J］. 计算机工程与应用， 2023，59（10）:48-64.
［68］ CEN J， YUN P， CAI J B， et al. Open-set 3D object detection［C］// 2021 International Conference on 3D Vision （3DV）. IEEE， 2021:869-878.
［69］ GREER R， ANTONIUSSEN B， MØGELMOSE A， et al. Language-driven active learning for diverse open-set 3D object detection［J］. arXiv preprint arXiv:2404.12856， 2024.
［70］ WENG T Y， XIAO J， PAN H， et al. PartCom: Part composition learning for 3D open-set recognition［J］. International Journal of Computer Vision， 2024，132（4）:1393-1416.
［71］ OLAH C， SATYANARAYAN A， JOHNSON I， et al. The building blocks of interpretability［J］. Distill， 2018， 3（3）. DOI: 10.23915/DISTILL.00010.
［72］ ZHANG Q S， CAO R M， SHI F， et al. Interpreting CNN knowledge via an explanatory graph［C］// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI， 2018:4454-4463.
［73］ ZAFAR M R， KHAN N M. DLIME: A deterministic local interpretable model-agnostic explanations approach for computer-aided diagnosis systems［J］. arXiv preprint arXiv:
1906.10263， 2019.
［74］ HAN S， POOL J， TRAN J， et al. Learning both weights and connections for efficient neural network［C］// Proceedings of the 29th International Conference on Neural Information Processing Systems（NIPS’15）. ACM， 2015:1135-1143.
［75］ HINTON G， VINYALS O， DEAN J. Distilling the knowledge in a neural network［J］. arXiv preprint arXiv:1503.025
31， 2015.
［76］ SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2: Inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018:4510-4520.

[1]	LI Zhuoqi, ZHAO Lihui. Image Encryption Method Based on Poisoning Attack Strategy [J]. Computer and Modernization, 2025, 0(05): 41-47.
[2]	WANG Dongfang1, YANG Yan1, ZHANG Dong1, HAN Wenrui2, LI Mingchang2. DSA De-artifacting Algorithm Based on Deformation Field Registration [J]. Computer and Modernization, 2025, 0(05): 86-90.
[3]	XU Ling1, ZHANG Dong1, WEN Shen1, HU Ping2. Glioma Segmentation and Classification Network Assisted by Object Detection [J]. Computer and Modernization, 2025, 0(05): 111-116.
[4]	ZHUANG Yu, FU Xiaojin, LI Sha, WU Zheng. UAV Small Target Detection Based on XMB-YOLOv5s [J]. Computer and Modernization, 2025, 0(04): 29-35.
[5]	LI Kai, JIN Yunpeng, LI Haiyang, KONG Shasha, YANG Peng, FANG Chengwu, HUANG Xiangjie, HAN Yaosheng, LI Chunmei. AGP Calculation Methods in UAV Imagery Based on Image Segmentation [J]. Computer and Modernization, 2025, 0(04): 83-88.
[6]	WANG Jiale, SONG Wenai, FU Lizhen. Occupational Pneumoconiosis Screening Based on HA-Net Model [J]. Computer and Modernization, 2025, 0(04): 103-110.
[7]	TANG Rui1, WU Jianchao1, CHEN Jianbo1, CHAI Jiang1, WANG Qian1, HE Yuchen2. Improved YOLOv8s Algorithm Based on GiraffeDet for Transmission Line Icing Detection [J]. Computer and Modernization, 2025, 0(03): 6-11.
[8]	LI Haoran1, HE Wenxue1, XU Jiazhen1, YANG Banghua2. Classification Method of EEG Signals for Depression Based on Multi-Scale Dynamic Convolution and Attention Mechanism [J]. Computer and Modernization, 2025, 0(03): 60-65.
[9]	PU Yaya, WANG Yanbo, SU Yongdong, XU Zhongcheng. Multi-scale Feature Image Defogging Algorithm Based on Content-guided Attention Fusion [J]. Computer and Modernization, 2025, 0(03): 78-85.
[10]	LUO Hao, LI Xianfeng. Remote Sensing Image Classification Based on Multi-scale Feature Extraction [J]. Computer and Modernization, 2025, 0(03): 86-92.
[11]	ZHANG Yue, GUO Zixin, HUANG Yibin, YAN Tao. Network Intrusion Detection Method Based on Convolutional Neural Networks with convLSTM [J]. Computer and Modernization, 2025, 0(03): 119-126.
[12]	LIU Chongyi, LI Hua, REN Dejun, LIU Yaokai, WANG Yulong. Anomaly Detection Algorithm Based on Bidirectional Multi-scale Knowledge Distillation [J]. Computer and Modernization, 2025, 0(02): 58-63.
[13]	HE Guotao1, ZHAO Chunhui2, LIU Zhenyu1, WANG Long1. Optimization for Camera Self-calibration Based on Horizon Detection in Road Scenes [J]. Computer and Modernization, 2025, 0(02): 77-85.
[14]	ZHAO Yin, YIN Siqing, ZHANG Yonglai. Improved Traffic Sign Detection Algorithm of YOLOv7 [J]. Computer and Modernization, 2025, 0(02): 94-99.
[15]	YAN Xiaoqi, PENG Yiqing, REN Xiaoling. Point Cloud Data Classification Method of PointNet++ with Position Adaptive Convolution [J]. Computer and Modernization, 2025, 0(01): 44-49.

Research Advances on 3D Object Detection Method Based on Visual Information and LiDAR for Intelligent Driving

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments