基于YOLO v4的车辆目标检测算法

摘要/Abstract

摘要： 针对车辆目标检测中存在遮挡目标导致检测精度低、小目标检测效果差等问题，提出一种基于YOLO v4改进的目标检测算法YOLO v4-ASC。通过在主干提取网络尾部加入卷积块注意力模块，提升网络模型的特征表达能力；改进损失函数提升网络模型的收敛速度，利用Adam+SGDM优化方法替代原始模型优化方法SGDM，进一步提升模型检测性能。此外，利用K-Means聚类算法优化先验框尺寸大小，并合并交通场景数据集中的car、truck、bus类别为vehicle，将本文问题简化为二分类问题。实验结果表明，本文提出的YOLO v4-ASC目标检测算法在保持原算法检测速度的基础上，AP达到了70.05%，F1-score达到了71%，与原YOLO v4算法相比，AP提升了9.92个百分点，F1-score提升了9个百分点。

关键词: YOLO v4, 模型优化, 卷积块注意力模块

Abstract: Aiming at the problems of low occlusion target detection accuracy and poor small target detection effect in vehicle target detection, an improved target detection algorithm YOLO v4-ASC based on YOLO v4 is proposed. By adding convolution block attention module to the tail of the backbone extraction network, the feature expression ability of the network model is improved; The loss function is improved to improve the convergence speed of the network model, and the Adam+SGDM optimization method is used to replace the original model optimization method SGDM to further improve the model detection performance. In addition, K-Means clustering algorithm is used to optimize the priori frame size, and the car, truck and bus categories in the traffic scene data set are combined as vehicle, which simplifies the problem in this paper into a two classification problem. The experimental results show that on the basis of maintaining the detection speed of the original algorithm, the proposed YOLO v4-ASC target detection algorithm achieves 70.05% AP and 71% F1-score. Compared with the original YOLO v4 algorithm, AP is improved by 9.92 percentage points and F1 score is improved by 9 percentage points.

Key words: YOLO v4, model optimization, convolution block attention module

殷远齐, 徐源, 邢远新. 基于YOLO v4的车辆目标检测算法[J]. 计算机与现代化, 2022, 0(07): 8-14.

YIN Yuan-qi, XU Yuan, XING Yuan-xin. Vehicle Target Detection Algorithm Based on YOLO v4[J]. Computer and Modernization, 2022, 0(07): 8-14.

参考文献

［1］李明熹,林正奎,曲毅. 计算机视觉下的车辆目标检测算法综述［J］. 计算机工程与应用, 2019,55(24):20-28.
［2］ TSAI D M, LAI S C. Independent component analysis-based background subtraction for indoor surveillance［J］. IEEE Transactions on Image Processing, 2009,18(1):158-167.
［3］ LEE D S. Effective gaussian mixture learning for video background subtraction［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005,27(5):827-832.
［4］ HORN B K, SCHUNCK B G. Determining optical flow［J］. Artificial Intelligence, 1981,17(1-3):185-203.
［5］ GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
［6］ GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1440-1448.
［7］ REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(6):1137-1149.
［8］ REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788.
［9］ REDMON J, FARHADI A. YOLO9000: Better, faster, stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:6517-6525.
［10］REDMON J, FARHADI A. YOLO v3: An incremental improvement［J］. arXiv preprint arXiv:1804.02767, 2018.
［11］BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLO v4: Optimal speed and accuracy of object detection［J］. arXiv preprint arXiv:2004.10934, 2020.
［12］WOO S Y, PARK J C, LEE J Y, et al. CBAM: Convolutional block attention module［C］// Proceedings of the 2018 European Conference on Computer Vision (ECCV). 2018:3-19.
［13］MACQUEEN J. Some methods for classification and analysis of multivariate observations［C］// Proceedings of the 15th Berkeley Symposium on Mathematical Statistics and Probability. 1967:281-297.
［14］HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1904-1916.
［15］LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation［C］// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8759-8768.
［16］LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition［J］. Proceedings of the IEEE, 1998,86(11):2278-2324.
［17］IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift［C］// 2015 International Conference on Machine Learning. 2015:448-456.
［18］MISRA D. Mish: A self regularized non-monotonic neural activation function［J］. arXiv preprint arXiv:1908.08681, 2019.
［19］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［20］ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization［C］// Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:2921-2929.
［21］LIN M, CHEN Q, YAN S. Network in network［J］. arXiv preprint arXiv:1312.4400, 2013.
［22］TOLSTIKHIN I, HOULSBY N, KOLESNIKOV A, et al. MLP-Mixer: An all-MLP architecture for vision［J］. arXiv preprint arXiv:2105.01601, 2021.
［23］SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning［C］// Proceedings of the 30th International Conference on Machine Learning. 2013:1139-1147.
［24］ROBBINS H, MONRO S. A Stochastic Approximation Method［M］. Springer, 1985:400-407.
［25］KINGMA D P, BA J. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980, 2014.
［26］JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection［C］// Proceedings of the 2018 European Conference on Computer Vision. 2018:784-799.
［27］DANIELSSON P E. Euclidean distance mapping［J］. Computer Graphics and Image Processing, 1980,14(3):227-248.
［28］YU F, XIAN W Q, CHEN Y Y, et al. Bdd100k: A diverse driving video database with scalable annotation tooling［J］. arXiv preprint arXiv:1805.04687, 2018.
［29］NEUBECK A, VAN GOOL L. Efficient non-maximum suppression［C］// 2006 18th International Conference on Pattern Recognition. 2006:850-855.
［30］HU J, SHEN L, SUN G. Squeeze-and-excitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,42:2011-2023.
［31］LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector［C］// 2016 European Conference on Computer Vision. 2016:21-37.

[1]	申智, 徐丽, 符祥远. 基于改进YOLO v4光线模糊场景下交通标志检测[J]. 计算机与现代化, 2022, 0(07): 27-32.
[2]	姚思佳, 桂智明, 郭黎敏. 基于改进eRCNN的局部路网交通流预测[J]. 计算机与现代化, 2021, 0(07): 49-53.
[3]	陈龙,万定生,顾昕辰. 基于Hive的水利普查数据仓库[J]. 计算机与现代化, 2014, 0(5): 127-130.
[4]	孟晓东;袁道华;施惠丰. 基于回归模型的数据挖掘研究[J]. 计算机与现代化, 2010, 1(01): 26-28.