基于深度学习的目标检测算法研究综述

doi:10.3969/j.issn.1006-2475.2020.05.011

摘要/Abstract

摘要： 传统的目标检测算法主要依赖于人工选取的特征来对物体进行检测。人工提取的特征对主要针对某些特定对象，比如有的特征适合做边缘检测，有的适合做纹理检测，不具有普遍性。近年来，深度学习蓬勃发展，在计算机视觉领域比如图像分类、目标检测、图像语义分割等方面取得了重大的进展。深度学习作为一种特征学习方法能够自动学习到目标的有用特征，避免了人工提取特征，同时能够保证良好的检测效果。本文首先介绍基于深度学习的目标检测算法研究进展，其次总结目标检测算法中常见的难题与解决措施，最后对目标检测算法的可能发展方向进行展望。

关键词: 目标检测, 深度学习, 计算机视觉

Abstract: Traditional target detection algorithms rely mainly on manually selecting features to detect objects. The artificially extracted feature pairs are mainly for certain specific objects, such as some features suitable for edge detection, and some suitable for texture detection, which is not universal. In recent years, deep learning has flourished, and significant research progress has been made in the field of computer vision such as image classification, target detection, and image semantic segmentation. As a feature learning method, deep learning can automatically learn the useful features of the target, avoiding the problem of manual extraction of features, and at the same time ensuring good detection results. Firstly, the research progress of target detection algorithm based on deep learning is introduced. Secondly, the common problems and solutions in target detection algorithm are summarized. Finally, the possible development direction of target detection algorithm is prospected.

Key words: target detection, deep learning, computer vision

中图分类号:

TP183

曹燕,李欢,王天宝. 基于深度学习的目标检测算法研究综述[J]. 计算机与现代化, 2020, 0(05): 63-.

CAO Yan, LI Huan, WANG Tian-bao. A Survey of Research on Target Detection Algorithms Based on Deep Learning[J]. Computer and Modernization, 2020, 0(05): 63-.

参考文献

［1］ SZELISKI R. Computer Vision: Algorithms and Applications［M］. New York: Springer, 2010.
［2］ LECUN Y, BENGIO Y, HINTON G. Deep learning［J］. Nature, 2015,521(7553):436-444.
［3］ LAWRENCE G R. Machine Perception of Three-dimensional Solids［D］. Cambridge: Massachusetts Institute of Technology, 1963.
［4］ CANNY J. A computational approach to edge detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986,8(6):679-698.
［5］ MARR D, HILDRETH E. Theory of edge detection［J］. Proceedings of the Royal Society of London, Series B: Biological Sciences, 1980,207(1167):187-217.
［6］ PELLEGRINO F A, VANZELLA W, TORRE V. Edge detection revisited［J］. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004,34(3):1500-1518.
［7］ HARRIS C, STEPHENS M. A combined corner and edge detector［C］// Proceedings of the 4th Alvey Vision Conference. 1988:147-152.
［8］〖JP+2〗ROSTEN E, PORTER R, DRUMMOND T. Faster and better: A machine learning approach to corner detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(1):105-119.
［9］ LOWE D G. Object recognition from local scale-invariant features［C］// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157.
［10］KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012:1097-1105.
［11］SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9.
［12］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［13］GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587.
［14］REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788.
［15］LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multiBox detector［C］// Proceedings of the 14th European Conference on Computer Vision. 2016:21-37.
［16］UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al.Selective search for object recognition［J］. International Journal of Computer Vision, 2013,104(2):154-171.
［17］ZITNICK C L, DOLLAR P. Edge boxes: Locating object proposals from edges［C］// Proceedings of the 13th European Conference on Computer Vision. 2014:391-405.
［18］HOSANG J, BENENSON R, DOLLAR P, et al. What makes for effective detection proposals?［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(4):814-830.
［19］SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: Integrated recognition, localization and detection using convolutional networks［J］. arXiv preprint arXiv:1312.6229, 2013.
［20］RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision, 2015,115(3):211-252.
［21］EVERINGHAM M, WINN J. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Development Kit［DB/OL］. (2007-06-07)［2019-04-10］. https://www.nevis.columbia.edu/~vgenty/public/devkit_doc.pdf.
［22］HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1904-1916.
［23］GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1440-1448.
［24］REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015:91-99.
［25］KONG T, YAO A B, CHEN Y R, et al. HyperNet: Towards accurate region proposal generation and joint object detection［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:845-853.
［26］SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:761-769.
［27］SUNG K K. Learning and Example Selection for Object and Pattern Detection［D］. Cambridge: Massachusetts Institute of Technology, 1996.
［28］LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:936-944.
［29］LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context［C］// Proceedings of the 13th European Conference on Computer Vision. 2014:740-755.
［30］HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:2980-2988.
［31］JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection［J］. arXiv preprint arXiv:1807.11590, 2018.
［32］〖JP2〗YANG B, YAN J J, LEI Z, et al. CRAFT objects from images［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:6043-6051.
［33］GIDARIS S, KOMODAKIS N. Attend refine repeat: Active box proposal generation via in-out localization［J］. arXiv preprint arXiv:1606.04446, 2016.
［34］GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1134-1142.
［35］RAJARAM R N, OHN-BAR E, TRIVEDI M M. RefineNet: Iterative refinement for accurate object localization［C］// Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems. 2016:1528-1533.
［36］CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:6154-6162.
［37］SZEGEDY C, TOSHEV A, ERHAN D. Deep neural networks for object detection［C］// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013:2553-2561.
［38］ERHAN D, SZEGEDY C, TOSHEV A, et al. Scalable object detection using deep neural networks［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:2155-2162.
［39］LI X D, YE M, LIU D, et al. Memory-based object detection in surveillance scenes［C］// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. 2016, DOI: 10.1109/ICME.2016.7552920.
［40］〖JP2〗REDMON J, FARHADI A. YOLO9000: Better, faster, stronger［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:6517-6525.
［41］IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift［C］// Proceedings of the 32nd International Conference on Machine Learning. 2015:448-456.
［42］HARTIGAN J A, WONG M A. Algorithm AS 136: A K-means clustering algorithm［J］. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979,28(1):100-108.
［43］NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1520-1528.
［44］NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation［C］// Proceedings of the 〖JP4〗14th European Conference on Computer Vision. 2016:483-499.
［45］FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional single shot detector［J］. arXiv preprint arXiv:1701.06659, 2017.
［46］SHEN Z Q, LIU Z, LI J G, et al. DSOD: Learning Deeply Supervised Object Detectors from Scratch［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:1937-1945.
［47］REDMON J, FARHADI A. YOLOv3: An incremental improvement［J］. arXiv preprint arXiv:1804.02767, 2018.
［48］HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications［J］. arXiv preprint arXiv:1704.04861, 2017.
［49］ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices［J］. arXiv preprint arXiv:1707.01083, 2017.
［50］〖JP2〗ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8697-8710.
［51］SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:4510-4520.
［52］WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018:1967-1976.
［53］HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:2261-2269.
［54］SHRIVASTAVA A, SUKTHANKAR R, MALIK J, et al. Beyond skip connections: Top-down modulation for object detection［J］. arXiv preprint arXiv:1612.06851, 2016.
［55］LI Z M, PENG C, YU G, et al. DetNet: A backbone network for object detection［J］. arXiv preprint arXiv:1804.06215, 2018.
［56］CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［J］. arXiv preprint arXiv:1412.7062, 2014.
［57］YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions［J］. arXiv preprint arXiv:1511.07122, 2016.
［58］KONG T, SUN F C, YAO A B, et al. RON: Reverse connection with objectness prior networks for object detection［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:5244-5252.
［59］ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:4203-4212.
［60］LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:2999-3007.
［61］BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: Improving object detection with one line of code［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:5562-5570.
［62］OZCAN B. Object Detection and Localization Using Dense and SIFT Features［D］. Kingsville: Texas A&M University-Kingsville, 2014.
［63］LAW H, DENG J. CornerNet: Detecting objects as paired key points［C］// Proceedings of the 15th European Conference on Computer Vision. 2018:765-781.

[1]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[2]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[3]	陈亮, 李诚, 易伟, 熊伟, 汪晓帆, 唐海东. 基于毫米波雷达与视觉融合的电力现场安全帽佩戴检测[J]. 计算机与现代化, 2024, 0(12): 100-107.
[4]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[5]	董玉玟. 基于改进YOLOv7-tiny的多尺度运动目标检测算法[J]. 计算机与现代化, 2024, 0(11): 99-105.
[6]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[7]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[8]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[9]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[10]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[11]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[12]	张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113.
[13]	程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126.
[14]	程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29.
[15]	王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36.