计算机与现代化 ›› 2023, Vol. 0 ›› Issue (09): 1-9.doi: 10.3969/j.issn.1006-2475.2023.09.001
• 人工智能 • 下一篇
出版日期:
2023-09-28
发布日期:
2023-10-10
作者简介:
沈加炜(1998—),男,江苏兴化人,硕士研究生,研究方向:机器学习,建筑智能化,E-mail: 1968431836@qq.com; 陆一鸣(1990—),男,江苏苏州人,助理讲师,硕士,研究方向:人工智能,量化交易; 陈晓艺(1997—),女,山东泰安人,硕士研究生,研究方向:人工智能,机器学习; 钱美玲1999—),女,江苏泰州人,硕士研究生,研究方向:人工智能,机器学习; 陆卫忠(1964—),男,江苏苏州人,副教授,CCF会员,硕士,研究方向:人工智能,机器学习。
基金资助:
Online:
2023-09-28
Published:
2023-10-10
摘要: 当下结合计算机视觉和视频的特征提取对人体行为动作进行捕捉识别的研究炙手可热,并且其在智能视频监控和智能家居的人机交互等其他领域方向上的应用场景也十分丰富。基于传统方法的人体行为检测算法有着依赖数据样本过多、易受环境噪音影响从而降低精确率等缺点,而不断发展的深度学习技术逐渐展现出它的优势,可以很好地解决这些问题。本文基于此,首先介绍一些目前常用的行为识别数据集并在此基础上剖析当下基于深度学习的人体行为识别检测的研究现状;其次描述常见的人体行为识别检测方法及其识别的流程;最后对现存的各种行为识别检测方法性能、现存问题进行总结和未来发展方向进行展望。
中图分类号:
沈加炜, 陆一鸣, 陈晓艺, 钱美玲, 陆卫忠, . 基于深度学习的人体行为检测方法研究综述[J]. 计算机与现代化, 2023, 0(09): 1-9.
SHEN Jia-wei, LU Yi-ming, CHEN Xiao-yi, QIAN Mei-ling, LU Wei-zhong, . Review of Research on Human Behavior Detection Methods Based on Deep Learning[J]. Computer and Modernization, 2023, 0(09): 1-9.
[1] | 马钰锡,谭励,董旭,等. 面向智能监控的行为识别[J]. 中国图象图形学报, 2018,24(2):282-290. |
[2] | 李永,梁起明,杨凯凯,等. 基于深度学习的人体行为识别检测综述[J]. 科学技术与工程, 2021(20):8310-8320. |
[3] | WU J, YANG X, MENG X, et al. Research on behavior recognition algorithm based on SE-I3D-GRU network[J]. High Technology Letters, 2021,27(2):163-172. |
[4] | ZUNINO A, BARGAL S A, MORERIO P, et al. Excitation dropout: Encouraging plasticity in deep neural networks[J]. International Journal of Computer Vision, 2021,129(4):1139-1152. |
[5] | BYEON Y H, KIM D, LEE J, et al. Body and hand-object ROI-based behavior recognition using deep learning[J]. Sensors, 2021,21(5). DOI:10.3390/s21051838. |
[6] | 田志强,邓春华,张俊雯. 基于骨骼时序散度特征的人体行为识别算法[J]. 计算机应用, 2021(5):1450-1457. |
[7] | 孔玮,刘云,李辉,等. 基于图卷积网络的行为识别方法综述[J]. 控制与决策, 2021(7):1537-1546. |
[8] | 丁雪琴,朱轶昇,朱浩华,等. 基于时空异构双流卷积网络的行为识别[J]. 计算机应用与软件, 2022(3):154-158. |
[9] | 张冰冰,葛疏雨,王旗龙,等. 基于多阶信息融合的行为识别方法研究[J]. 自动化学报, 2021(3):609-619. |
[10] | 刘云,薛盼盼,李辉,等. 基于深度学习的关节点行为识别综述[J]. 电子与信息学报, 2021(6):1789-1802. |
[11] | 袁首,乔勇军,苏航 等. 基于深度学习的行为识别方法综述[J]. 微电子学与计算机, 2022(8):1-10. |
[12] | YANG X Y, ZHANG Y F, LV W, et al. Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier[J]. Renewable Energy, 2021,163(1):386-397. |
[13] | FU Z Z, HE X R, WANG E K, et al. Personalized human activity recognition based on integrated wearable sensor and transfer learning[J]. Sensors, 2021,21(3):885-885. |
[14] | HAO X K, LI J, GUO Y C, et al. Hypergraph neural network for skeleton-based action recognition[J]. IEEE transactions on Image Processing, 2021,30:2263-2275. |
[15] | 邓淼磊,高振东,李磊,等. 基于深度学习的人体行为识别综述[J]. 计算机工程与应用, 2022(13):14-26. |
[16] | 裴利沈,刘少博,赵雪专. 人体行为识别研究综述[J]. 计算机科学与探索, 2022(2):305-322. |
[17] | 周波,李俊峰. 结合目标检测的人体行为识别[J]. 自动化学报, 2020(9):1961-1970. |
[18] | MAHADEVAN V, LI W X, BHALODIA V, et al. Anomaly detection in crowded scenes[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2010:1975-1981. |
[19] | MEMO A, ZANUTTIGH P. Head-mounted gesture controlled interface for human-computer interaction[J]. Multimedia Tools & Applications, 2016,77(6):27-53. |
[20] | SOOMRO K, ZAMIR A R, SHAH M. UCF101: A dataset of 101 human actions classes from videos in the wild[J]. arXiv preprint arXiv:1212.0402, 2012. |
[21] | SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+ D: A large scale dataset for 3D human activity analysis[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2016:1010-1019. |
[22] | SCHMIDHUBER J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015,61:85-117. |
[23] | LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015,521(7553):436. |
[24] | LAPTEV I, MARSZALEK M, SCHMID C, et al. Learning realistic human actions from movies[C]// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008:1-8. |
[25] | RODRIGUEZ M D, AHMED J, SHAH M. Action MACH a spatio-temporal maximum average correlation height filter for action recognition[C]// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008:1-8. |
[26] | WEINLAND D, RONFARD R, BOYER E. Free viewpoint action recognition using motion history volumes[J]. Computer Vision and Image Understanding, 2006,104(2-3):249-257. |
[27] | SINGH S, VELASTIN S A, RAGHEB H. MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods[C]// Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. 2010:48-55. |
[28] | SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions: A local SVM approach[C]// Proceedings of the 17th International Conference on Pattern Recognition. 2004:32-36. |
[29] | GORELICK L, BLANK M, SHECHTMAN E, et al. Actions as space-time shapes[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2007,29(12):2247-2253. |
[30] | YANG A Y, JAFARI R, SASTRY S S, et al. Distributed recognition of human actions using wearable motion sensor networks[J]. Journal of Ambient Intelligence and Smart Environments, 2009,1(2):103-115. |
[31] | ELLIS C, MASOOD S Z, TAPPEN M F, et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013,101(3):420-436. |
[32] | MCKENNA S J, JABRI S, DURIC Z, et al. Tracking groups of people[J]. Computer Vision & Image Understanding, 1997,80(1):42-56. |
[33] | WANG J, HE H. Publication Ye ARM-based embedded video monitoring system research[C]// IEEE Conference on Computer Science and Information Technology. 2010:677-679. |
[34] | TIAN D P. A review on image feature extraction and representation techniques[C]// International Journal of Multimedia and Ubiquitous Engineering. 2013:385-395. |
[35] | RABINER L R. A tutorial on hidden markov models and selected applications in speech recognition[J]. Proceedings of the IEEE, 1989,77(2):257-286. |
[36] | COLLINS R, LIPTON A, KANADE T, et al. A System for Video Surveillance and Monitoring[R]. Technical Report CMU, 2000. |
[37] | SALIGRAMA V, CHEN Z. Video anomaly detection based on local statistical aggregates[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012:2112-2119. |
[38] | BASHARAT A, GRITAI A, SHAH M. Learning object motion patterns for anomaly detection and improved object detection[C]// 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008:1-8. |
[39] | ZHANG F, WANG Y H, ZHANG Z X. View-invariant action recognition in surveillance videos[C]// The 1st Asian Conference on Pattern Recognition. 2011:580-583. |
[40] | LI K L, HUANG H K, TIAN S F, et al. Improving one-class SVM for anomaly detection[C]// Proceedings of the 2003 International Conference on Machine Learning and Cybernetics. 2003(5):7803-7865. |
[41] | KARPATHY A, TODERICI G, SHETTY S, et al. Large-scale video classification with convolutional neural networks[C]// IEEE Conference on Computer Vision & Pattern Recognition. 2014:1725-1732. |
[42] | SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[J]. arXiv preprint arXiv:1406.2199, 2014. |
[43] | VISHWAKARMA D, KAPOOR R, MAHESHWARI R, et al. Recognition of abnormal human activity using the changes in orientation of silhouette in key frames[C]// IEEE International Conference on Indiacom. 2015:336-341. |
[44] | DONAHUE J, HENDRICKS L A, GUADARRAMA S, et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description[M]. // AB Initto Calculation of the Structures and Properties of Molecules, 2015:2625-2634. |
[45] | NG J Y, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification[C]// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:4694-4702. |
[46] | GKIOXARI G, GIRSHICK R, MALIK J. Contextual action recognition with R*CNN[J]. International Journal of Cancer Journal International DU Cancer, 2015,40(1):1080-1088. |
[47] | CHÉRON G, LAPTEV I, SCHMID C, et al. P-CNN: Pose-based CNN features for action recognition[C]// 2015 IEEE International Conference on Computer Vision (ICCV). 2015:3218-3226. |
[48] | RAMANATHAN V, HUANG J, HAI S, et al. Detecting events and key actors in multi-person videos[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:3043-3053. |
[49] | INSAFUTDINOV E, PISHCHULIN L, ANDRES B, et al. DeeperCut: A deeper, stronger, and faster multi-person pose estimation model[C]// Computer Vision-ECCV 2016. 2016:34-50. |
[50] | CAO Z, SIMON T, WEI S E, et al. Realtime multi-person 2D pose estimation using part affinity fields[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017:1302-1310. |
[51] | REN J, REYES N H, BARCZAK A L C, et al. Towards 3D human action recognition using a distilled CNN model[C]// IEEE International Conference on Signal & Image Processing. 2018:7-12. |
[52] | ARDIANTO S, HANG H M. Multi-view and multi-modal action recognition with learned fusion[C]// Asia-pacific Signal & Information Processing Association Summit & Conference. 2018:1601-1604. |
[53] | BALDERAS D, PONCE P, MOLINA A. Convolutional long short term memory deep neural networks for image sequence prediction[J]. Expert Systems with Application. 2019,122(5):152-162. |
[54] | WANG L M, TONG Z, JI B, et al. TDN: Temporal difference networks for efficient action recognition[C]// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:1895-1904. |
[55] | AGHAEI A, NAZARI A, MOGHADDAM M E. Sparse deep LSTMs with convolutional attention for human action recognition[J]. SN Computer Science, 2021,2(3):1-14. |
[56] | WU Z, WANG X, JIANG Y G, et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification[J]. arXiv preprint arXiv:1504.01561, 2021. |
[57] | PENG X J, WANG L M, WANG X X, et al. Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[J]. Computer Vision and Image Understanding, 2016,150(9):109-125. |
[58] | WANG L L, GE L Z, LI R F, et al. Three-stream CNNs for action recognition[J]. Pattern Recognition Letters, 2017,92(6):33-40. |
[59] | 莫宏伟,汪海波. 基于Faster R-CNN的人体行为检测研究[J]. 智能系统学报, 2018,13(6):967-973. |
[60] | 汤华东. 基于LSTM融合多CNN的事件图像分类研究[D]. 北京:北京交通大学, 2018. |
[61] | 周道洋. 基于卷积神经网络的人体行为检测研究[D]. 合肥:中国科学技术大学, 2018. |
[62] | 余兴. 基于深度学习的视频行为识别技术研究[D]. 成都:电子科技大学, 2018. |
[63] | ZHOU Z G, DUAN G X, HUAN L, et al. Human behavior recognition method based on double-branch deep convolution neural network[C]// 2018 Chinese Control and Decision Conference. 2018(9):5520-5524. |
[64] | 张瑞,李其申,储珺. 基于3D卷积神经网络的人体动作识别算法[J]. 计算机工程, 2019,45(1):259-263. |
[65] | HAO F F, LIU J, CHEN X D. A review of human behavior recognition based on deep learning[C]// Proceedings of 2020 International Conference on Artificial Itelligence and Education. 2020:19-23. |
[66] | 黄文明,阳沐利,蓝如师,等. 融合非局部神经网络的行为检测模型[J]. 图学学报, 2021,42(3):439-445. |
[67] | 揭志浩,曾明如,周鑫恒,等. 结合Attention-ConvLSTM的双流卷积行为识别[J]. 小型微型计算机系统, 2021,42(2):405-408. |
[68] | FEICHTENHOFER C, PINZ A, WILDES R P. Spatiotemporal residual networks for video action recognition[J]. arXiv preprint arXiv:1611.02155, 2016. |
[69] | JI S W, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(1):221-231. |
[70] | FEICHTENHOFER C, FAN H Q, MALIK J, et al. Slowfast networks for video recognition[C]// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019:6201-6210. |
[71] | TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of 2015 IEEE International Conference on Computer Vision. 2015:4489-4497. |
[72] | CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:4724-4733. |
[73] | QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]// Proceedings of 2017 IEEE International Conference on Computer Vision. 2017:5534-5542. |
[74] | TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// Proceedings of 2018IEEE/CVF Conference on Compute Vision and Pattern Recognition. 2018:6450-6459. |
[75] | 罗海波,许凌云,惠斌,等. 基于深度学习的目标跟踪方法研究现状与展望[J]. 红外与激光工程, 2017(5):14-20. |
[1] | 胡崇佳, 刘金洲, 方 立. 基于无监督域适应的室外点云语义分割[J]. 计算机与现代化, 2024, 0(01): 74-79. |
[2] | 林 威. 基于自监督学习和数据回放的新闻推荐模型增量学习方法[J]. 计算机与现代化, 2023, 0(12): 1-6. |
[3] | 梁天恺, 黄康华, 刘凯航, 兰 岚, 曾 碧. 基于双向同态加密的深度联邦图片分类方法[J]. 计算机与现代化, 2023, 0(12): 36-40. |
[4] | 邱凯星, 冯 广. 基于双重特征注意力的多标签图像分类模型[J]. 计算机与现代化, 2023, 0(12): 41-47. |
[5] | 张伯泉, 麦海鹏, 陈嘉敏, 逄锦聚. 基于高灰度值注意力机制的脑白质高信号分割[J]. 计算机与现代化, 2023, 0(12): 67-75. |
[6] | 马泽宇, 叶 宁, 徐 康, 王 甦, 王汝传, . 基于FMCW雷达和ResNeSt-GRU的行为识别方法[J]. 计算机与现代化, 2023, 0(11): 101-107. |
[7] | 李延满, 王必恒, 赵羚焱. 基于轻量化YOLOv5的安全帽检测[J]. 计算机与现代化, 2023, 0(10): 59-64. |
[8] | 黎世达, 项剑文. 一种提高图像识别模型鲁棒性的弱化强化方法[J]. 计算机与现代化, 2023, 0(10): 70-76. |
[9] | 刘禅奕, 黄 丹, 薛林雁, 王 涛, 朱 桃, . 改进EfficientNet网络的COVID-19 X光分类[J]. 计算机与现代化, 2023, 0(09): 94-99. |
[10] | 马国祥, 杨凌菲, 严传波, 张志豪, 孙 彬, 王晓荣. 基于深度DenseNet网络的肝包虫病超声影像诊断方法[J]. 计算机与现代化, 2023, 0(09): 100-104. |
[11] | 农皓程, 任德均, 任秋霖, 刘澎笠, 黄德成. 基于改进ConvNeXt的软塑包装表面异常检测算法[J]. 计算机与现代化, 2023, 0(08): 12-17. |
[12] | 欧阳飞, 吴 旭, 向东升. 基于改进YOLOX的垃圾分类检测方法[J]. 计算机与现代化, 2023, 0(08): 68-73. |
[13] | 胡睿杰, 车 逗. 红外小目标检测方法综述[J]. 计算机与现代化, 2023, 0(08): 79-86. |
[14] | 江 蕾, 唐 建, 杨超越, 吕婷婷. 基于CWGAN-GP与CNN的轴承故障诊断方法[J]. 计算机与现代化, 2023, 0(07): 1-6. |
[15] | 王家晨, 张鸿鑫, 刘庆华, . 基于生成对抗网络的肺炎CT图像生成[J]. 计算机与现代化, 2023, 0(07): 20-24. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||