收稿日期:
2019-06-26
出版日期:
2020-05-20
发布日期:
2020-05-21
作者简介:
曹燕(1993-),女,四川广安人,硕士研究生,研究方向:深度学习,图像处理,E-mail: 726377694@qq.com; 李欢(1995-),男,湖南衡阳人,硕士研究生,研究方向:深度学习,图像处理,E-mail: 1603420591@qq.com; 通信作者:王天宝(1967-),男,四川剑阁人,教授,硕士,研究方向:无线通信技术与应用,网络通信与信息安全,E-mail: wangtianbao@cuit.edu.cn。
基金资助:
Received:
2019-06-26
Online:
2020-05-20
Published:
2020-05-21
摘要: 传统的目标检测算法主要依赖于人工选取的特征来对物体进行检测。人工提取的特征对主要针对某些特定对象,比如有的特征适合做边缘检测,有的适合做纹理检测,不具有普遍性。近年来,深度学习蓬勃发展,在计算机视觉领域比如图像分类、目标检测、图像语义分割等方面取得了重大的进展。深度学习作为一种特征学习方法能够自动学习到目标的有用特征,避免了人工提取特征,同时能够保证良好的检测效果。本文首先介绍基于深度学习的目标检测算法研究进展,其次总结目标检测算法中常见的难题与解决措施,最后对目标检测算法的可能发展方向进行展望。
中图分类号:
曹燕,李欢,王天宝. 基于深度学习的目标检测算法研究综述[J]. 计算机与现代化, doi: 10.3969/j.issn.1006-2475.2020.05.011.
CAO Yan, LI Huan, WANG Tian-bao. A Survey of Research on Target Detection Algorithms Based on Deep Learning[J]. Computer and Modernization, doi: 10.3969/j.issn.1006-2475.2020.05.011.
[1] | SZELISKI R. Computer Vision: Algorithms and Applications[M]. New York: Springer, 2010. |
[2] | LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015,521(7553):436-444. |
[3] | LAWRENCE G R. Machine Perception of Three-dimensional Solids[D]. Cambridge: Massachusetts Institute of Technology, 1963. |
[4] | CANNY J. A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986,8(6):679-698. |
[5] | MARR D, HILDRETH E. Theory of edge detection[J]. Proceedings of the Royal Society of London, Series B: Biological Sciences, 1980,207(1167):187-217. |
[6] | PELLEGRINO F A, VANZELLA W, TORRE V. Edge detection revisited[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2004,34(3):1500-1518. |
[7] | HARRIS C, STEPHENS M. A combined corner and edge detector[C]// Proceedings of the 4th Alvey Vision Conference. 1988:147-152. |
[8] | 〖JP+2〗ROSTEN E, PORTER R, DRUMMOND T. Faster and better: A machine learning approach to corner detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010,32(1):105-119. |
[9] | LOWE D G. Object recognition from local scale-invariant features[C]// Proceedings of the 7th IEEE International Conference on Computer Vision. 1999,2:1150-1157. |
[10] | KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 25th International Conference on Neural Information Processing Systems. 2012:1097-1105. |
[11] | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9. |
[12] | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. |
[13] | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:580-587. |
[14] | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788. |
[15] | LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multiBox detector[C]// Proceedings of the 14th European Conference on Computer Vision. 2016:21-37. |
[16] | UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al.Selective search for object recognition[J]. International Journal of Computer Vision, 2013,104(2):154-171. |
[17] | ZITNICK C L, DOLLAR P. Edge boxes: Locating object proposals from edges[C]// Proceedings of the 13th European Conference on Computer Vision. 2014:391-405. |
[18] | HOSANG J, BENENSON R, DOLLAR P, et al. What makes for effective detection proposals?[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(4):814-830. |
[19] | SERMANET P, EIGEN D, ZHANG X, et al. OverFeat: Integrated recognition, localization and detection using convolutional networks[J]. arXiv preprint arXiv:1312.6229, 2013. |
[20] | RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015,115(3):211-252. |
[21] | EVERINGHAM M, WINN J. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Development Kit[DB/OL]. (2007-06-07)[2019-04-10]. https://www.nevis.columbia.edu/~vgenty/public/devkit_doc.pdf. |
[22] | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015,37(9):1904-1916. |
[23] | GIRSHICK R. Fast R-CNN[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1440-1448. |
[24] | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015:91-99. |
[25] | KONG T, YAO A B, CHEN Y R, et al. HyperNet: Towards accurate region proposal generation and joint object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:845-853. |
[26] | SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:761-769. |
[27] | SUNG K K. Learning and Example Selection for Object and Pattern Detection[D]. Cambridge: Massachusetts Institute of Technology, 1996. |
[28] | LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:936-944. |
[29] | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[C]// Proceedings of the 13th European Conference on Computer Vision. 2014:740-755. |
[30] | HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:2980-2988. |
[31] | JIANG B R, LUO R X, MAO J Y, et al. Acquisition of localization confidence for accurate object detection[J]. arXiv preprint arXiv:1807.11590, 2018. |
[32] | 〖JP2〗YANG B, YAN J J, LEI Z, et al. CRAFT objects from images[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:6043-6051. |
[33] | GIDARIS S, KOMODAKIS N. Attend refine repeat: Active box proposal generation via in-out localization[J]. arXiv preprint arXiv:1606.04446, 2016. |
[34] | GIDARIS S, KOMODAKIS N. Object detection via a multi-region and semantic segmentation-aware CNN model[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1134-1142. |
[35] | RAJARAM R N, OHN-BAR E, TRIVEDI M M. RefineNet: Iterative refinement for accurate object localization[C]// Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems. 2016:1528-1533. |
[36] | CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:6154-6162. |
[37] | SZEGEDY C, TOSHEV A, ERHAN D. Deep neural networks for object detection[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013:2553-2561. |
[38] | ERHAN D, SZEGEDY C, TOSHEV A, et al. Scalable object detection using deep neural networks[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:2155-2162. |
[39] | LI X D, YE M, LIU D, et al. Memory-based object detection in surveillance scenes[C]// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. 2016, DOI: 10.1109/ICME.2016.7552920. |
[40] | 〖JP2〗REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:6517-6525. |
[41] | IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the 32nd International Conference on Machine Learning. 2015:448-456. |
[42] | HARTIGAN J A, WONG M A. Algorithm AS 136: A K-means clustering algorithm[J]. Journal of the Royal Statistical Society, Series C (Applied Statistics), 1979,28(1):100-108. |
[43] | NOH H, HONG S, HAN B. Learning deconvolution network for semantic segmentation[C]// Proceedings of the 2015 IEEE International Conference on Computer Vision. 2015:1520-1528. |
[44] | NEWELL A, YANG K Y, DENG J. Stacked hourglass networks for human pose estimation[C]// Proceedings of the 〖JP4〗14th European Conference on Computer Vision. 2016:483-499. |
[45] | FU C Y, LIU W, RANGA A, et al. DSSD: Deconvolutional single shot detector[J]. arXiv preprint arXiv:1701.06659, 2017. |
[46] | SHEN Z Q, LIU Z, LI J G, et al. DSOD: Learning Deeply Supervised Object Detectors from Scratch[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:1937-1945. |
[47] | REDMON J, FARHADI A. YOLOv3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018. |
[48] | HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017. |
[49] | ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: An extremely efficient convolutional neural network for mobile devices[J]. arXiv preprint arXiv:1707.01083, 2017. |
[50] | 〖JP2〗ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8697-8710. |
[51] | SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:4510-4520. |
[52] | WANG R J, LI X, LING C X. Pelee: A real-time object detection system on mobile devices[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2018:1967-1976. |
[53] | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:2261-2269. |
[54] | SHRIVASTAVA A, SUKTHANKAR R, MALIK J, et al. Beyond skip connections: Top-down modulation for object detection[J]. arXiv preprint arXiv:1612.06851, 2016. |
[55] | LI Z M, PENG C, YU G, et al. DetNet: A backbone network for object detection[J]. arXiv preprint arXiv:1804.06215, 2018. |
[56] | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[J]. arXiv preprint arXiv:1412.7062, 2014. |
[57] | YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[J]. arXiv preprint arXiv:1511.07122, 2016. |
[58] | KONG T, SUN F C, YAO A B, et al. RON: Reverse connection with objectness prior networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:5244-5252. |
[59] | ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:4203-4212. |
[60] | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:2999-3007. |
[61] | BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS: Improving object detection with one line of code[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:5562-5570. |
[62] | OZCAN B. Object Detection and Localization Using Dense and SIFT Features[D]. Kingsville: Texas A&M University-Kingsville, 2014. |
[63] | LAW H, DENG J. CornerNet: Detecting objects as paired key points[C]// Proceedings of the 15th European Conference on Computer Vision. 2018:765-781. |
[1] | 王秋忆, 周 浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52. |
[2] | 胡崇佳, 刘金洲, 方 立. 基于无监督域适应的室外点云语义分割[J]. 计算机与现代化, 2024, 0(01): 74-79. |
[3] | 林 威. 基于自监督学习和数据回放的新闻推荐模型增量学习方法[J]. 计算机与现代化, 2023, 0(12): 1-6. |
[4] | 梁天恺, 黄康华, 刘凯航, 兰 岚, 曾 碧. 基于双向同态加密的深度联邦图片分类方法[J]. 计算机与现代化, 2023, 0(12): 36-40. |
[5] | 邱凯星, 冯 广. 基于双重特征注意力的多标签图像分类模型[J]. 计算机与现代化, 2023, 0(12): 41-47. |
[6] | 张伯泉, 麦海鹏, 陈嘉敏, 逄锦聚. 基于高灰度值注意力机制的脑白质高信号分割[J]. 计算机与现代化, 2023, 0(12): 67-75. |
[7] | 马泽宇, 叶 宁, 徐 康, 王 甦, 王汝传, . 基于FMCW雷达和ResNeSt-GRU的行为识别方法[J]. 计算机与现代化, 2023, 0(11): 101-107. |
[8] | 欧嘉城, 曾 安, 金 亮. 基于CP-YOLOX的冷冻电镜图像蛋白质目标检测算法[J]. 计算机与现代化, 2023, 0(11): 113-119. |
[9] | 闫子贤, 董宝良, 唐思谜. 针对复杂背景下低分辨率舰船目标的改进YOLOv7算法[J]. 计算机与现代化, 2023, 0(11): 120-126. |
[10] | 李延满, 王必恒, 赵羚焱. 基于轻量化YOLOv5的安全帽检测[J]. 计算机与现代化, 2023, 0(10): 59-64. |
[11] | 黎世达, 项剑文. 一种提高图像识别模型鲁棒性的弱化强化方法[J]. 计算机与现代化, 2023, 0(10): 70-76. |
[12] | 沈加炜, 陆一鸣, 陈晓艺, 钱美玲, 陆卫忠, . 基于深度学习的人体行为检测方法研究综述[J]. 计算机与现代化, 2023, 0(09): 1-9. |
[13] | 顾成伟, 丁 勇, 李登华. 基于计算机视觉的工业厂区人员安全警戒系统[J]. 计算机与现代化, 2023, 0(09): 20-26. |
[14] | 刘禅奕, 黄 丹, 薛林雁, 王 涛, 朱 桃, . 改进EfficientNet网络的COVID-19 X光分类[J]. 计算机与现代化, 2023, 0(09): 94-99. |
[15] | 马国祥, 杨凌菲, 严传波, 张志豪, 孙 彬, 王晓荣. 基于深度DenseNet网络的肝包虫病超声影像诊断方法[J]. 计算机与现代化, 2023, 0(09): 100-104. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||