Review of Research on Human Behavior Detection Methods Based on Deep Learning

doi:10.3969/j.issn.1006-2475.2023.09.001

Abstract

Abstract: Human behavior recognition has always been a hot topic of research in the field of computer vision and video understanding and is widely used in other areas such as intelligent video surveillance and human-computer interaction in smart homes. While traditional human behavior detection algorithms have the disadvantages of relying on too many data samples and being susceptible to environmental noise， evolving deep learning techniques are gradually showing their advantages and can be a good solution to these problems. Based on this， this paper firstly introduces some commonly used behavioral recognition datasets and analyses the current research status of human behavioral recognition based on deep learning， then describes the basic process of behavioral recognition and commonly used behavioral recognition methods， finally summarizes the performance， existing problems of various existing behavioral recognition methods， and outlooks the future development directions.

Key words: deep learning, human behavior recognition, smart surveillance, behavior dataset

CLC Number:

TP311

SHEN Jia-wei, LU Yi-ming, CHEN Xiao-yi, QIAN Mei-ling, LU Wei-zhong, . Review of Research on Human Behavior Detection Methods Based on Deep Learning[J]. Computer and Modernization, 2023, 0(09): 1-9.

References

［1］马钰锡，谭励，董旭，等. 面向智能监控的行为识别［J］. 中国图象图形学报， 2018，24（2）：282-290.
［2］李永，梁起明，杨凯凯，等. 基于深度学习的人体行为识别检测综述［J］. 科学技术与工程， 2021（20）：8310-8320.
［3］ WU J， YANG X， MENG X， et al. Research on behavior recognition algorithm based on SE-I3D-GRU network［J］. High Technology Letters， 2021，27（2）：163-172.
［4］ ZUNINO A， BARGAL S A， MORERIO P， et al. Excitation dropout： Encouraging plasticity in deep neural networks［J］. International Journal of Computer Vision， 2021，129（4）：1139-1152.
［5］ BYEON Y H， KIM D， LEE J， et al. Body and hand-object ROI-based behavior recognition using deep learning［J］. Sensors， 2021，21（5）. DOI：10.3390/s21051838.
［6］田志强，邓春华，张俊雯. 基于骨骼时序散度特征的人体行为识别算法［J］. 计算机应用， 2021（5）：1450-1457.
［7］孔玮，刘云，李辉，等. 基于图卷积网络的行为识别方法综述［J］. 控制与决策， 2021（7）：1537-1546.
［8］丁雪琴，朱轶昇，朱浩华，等. 基于时空异构双流卷积网络的行为识别［J］. 计算机应用与软件， 2022（3）：154-158.
［9］张冰冰，葛疏雨，王旗龙，等. 基于多阶信息融合的行为识别方法研究［J］. 自动化学报， 2021（3）：609-619.
［10］刘云，薛盼盼，李辉，等. 基于深度学习的关节点行为识别综述［J］. 电子与信息学报， 2021（6）：1789-1802.
［11］袁首，乔勇军，苏航等. 基于深度学习的行为识别方法综述［J］. 微电子学与计算机， 2022（8）：1-10.
［12］ YANG X Y， ZHANG Y F， LV W， et al. Image recognition of wind turbine blade damage based on a deep learning model with transfer learning and an ensemble learning classifier［J］. Renewable Energy， 2021，163（1）：386-397.
［13］ FU Z Z， HE X R， WANG E K， et al. Personalized human activity recognition based on integrated wearable sensor and transfer learning［J］. Sensors， 2021，21（3）：885-885.
［14］ HAO X K， LI J， GUO Y C， et al. Hypergraph neural network for skeleton-based action recognition［J］. IEEE transactions on Image Processing， 2021，30：2263-2275.
［15］邓淼磊，高振东，李磊，等. 基于深度学习的人体行为识别综述［J］. 计算机工程与应用， 2022（13）：14-26.
［16］裴利沈，刘少博，赵雪专. 人体行为识别研究综述［J］. 计算机科学与探索， 2022（2）：305-322.
［17］周波，李俊峰. 结合目标检测的人体行为识别［J］. 自动化学报， 2020（9）：1961-1970.
［18］ MAHADEVAN V， LI W X， BHALODIA V， et al. Anomaly detection in crowded scenes［C］// IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2010：1975-1981.
［19］ MEMO A， ZANUTTIGH P. Head-mounted gesture controlled interface for human-computer interaction［J］. Multimedia Tools & Applications， 2016，77（6）：27-53.
［20］ SOOMRO K， ZAMIR A R， SHAH M. UCF101： A dataset of 101 human actions classes from videos in the wild［J］. arXiv preprint arXiv：1212.0402， 2012.
［21］ SHAHROUDY A， LIU J， NG T T， et al. NTU RGB+ D： A large scale dataset for 3D human activity analysis［C］// IEEE Conference on Computer Vision and Pattern Recognition. 2016：1010-1019.
［22］ SCHMIDHUBER J. Deep learning in neural networks： An overview［J］. Neural Networks， 2015，61：85-117.
［23］ LECUN Y， BENGIO Y， HINTON G. Deep learning［J］. Nature， 2015，521（7553）：436.
［24］ LAPTEV I， MARSZALEK M， SCHMID C， et al. Learning realistic human actions from movies［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008：1-8.
［25］ RODRIGUEZ M D， AHMED J， SHAH M. Action MACH a spatio-temporal maximum average correlation height filter for action recognition［C］// Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008：1-8.
［26］ WEINLAND D， RONFARD R， BOYER E. Free viewpoint action recognition using motion history volumes［J］. Computer Vision and Image Understanding， 2006，104（2-3）：249-257.
［27］ SINGH S， VELASTIN S A， RAGHEB H. MuHAVi： A multicamera human action video dataset for the evaluation of action recognition methods［C］// Proceedings of the 7th IEEE International Conference on Advanced Video and Signal Based Surveillance. 2010：48-55.
［28］ SCHULDT C， LAPTEV I， CAPUTO B. Recognizing human actions： A local SVM approach［C］// Proceedings of the 17th International Conference on Pattern Recognition. 2004：32-36.
［29］ GORELICK L， BLANK M， SHECHTMAN E， et al. Actions as space-time shapes［J］. IEEE Transactions on Pattern Analysis & Machine Intelligence， 2007，29（12）：2247-2253.
［30］ YANG A Y， JAFARI R， SASTRY S S， et al. Distributed recognition of human actions using wearable motion sensor networks［J］. Journal of Ambient Intelligence and Smart Environments， 2009，1（2）：103-115.
［31］ ELLIS C， MASOOD S Z， TAPPEN M F， et al. Exploring the trade-off between accuracy and observational latency in action recognition［J］. International Journal of Computer Vision， 2013，101（3）：420-436.
［32］ MCKENNA S J， JABRI S， DURIC Z， et al. Tracking groups of people［J］. Computer Vision & Image Understanding， 1997，80（1）：42-56.
［33］ WANG J， HE H. Publication Ye ARM-based embedded video monitoring system research［C］// IEEE Conference on Computer Science and Information Technology. 2010：677-679.
［34］ TIAN D P. A review on image feature extraction and representation techniques［C］// International Journal of Multimedia and Ubiquitous Engineering. 2013：385-395.
［35］ RABINER L R. A tutorial on hidden markov models and selected applications in speech recognition［J］. Proceedings of the IEEE， 1989，77（2）：257-286.
［36］ COLLINS R， LIPTON A， KANADE T， et al. A System for Video Surveillance and Monitoring［R］. Technical Report CMU， 2000.
［37］ SALIGRAMA V， CHEN Z. Video anomaly detection based on local statistical aggregates［C］// 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012：2112-2119.
［38］ BASHARAT A， GRITAI A， SHAH M. Learning object motion patterns for anomaly detection and improved object detection［C］// 2008 IEEE Conference on Computer Vision and Pattern Recognition. 2008：1-8.
［39］ ZHANG F， WANG Y H， ZHANG Z X. View-invariant action recognition in surveillance videos［C］// The 1st Asian Conference on Pattern Recognition. 2011：580-583.
［40］ LI K L， HUANG H K， TIAN S F， et al. Improving one-class SVM for anomaly detection［C］// Proceedings of the 2003 International Conference on Machine Learning and Cybernetics. 2003（5）：7803-7865.
［41］ KARPATHY A， TODERICI G， SHETTY S， et al. Large-scale video classification with convolutional neural networks［C］// IEEE Conference on Computer Vision & Pattern Recognition. 2014：1725-1732.
［42］ SIMONYAN K， ZISSERMAN A. Two-stream convolutional networks for action recognition in videos［J］. arXiv preprint arXiv：1406.2199， 2014.
［43］ VISHWAKARMA D， KAPOOR R， MAHESHWARI R， et al. Recognition of abnormal human activity using the changes in orientation of silhouette in key frames［C］// IEEE International Conference on Indiacom. 2015：336-341.
［44］ DONAHUE J， HENDRICKS L A， GUADARRAMA S， et al. Long-term Recurrent Convolutional Networks for Visual Recognition and Description［M］. // AB Initto Calculation of the Structures and Properties of Molecules， 2015：2625-2634.
［45］ NG J Y， HAUSKNECHT M， VIJAYANARASIMHAN S， et al. Beyond short snippets： Deep networks for video classification［C］// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015：4694-4702.
［46］ GKIOXARI G， GIRSHICK R， MALIK J. Contextual action recognition with R*CNN［J］. International Journal of Cancer Journal International DU Cancer， 2015，40（1）：1080-1088.
［47］ CHÉRON G， LAPTEV I， SCHMID C， et al. P-CNN： Pose-based CNN features for action recognition［C］// 2015 IEEE International Conference on Computer Vision （ICCV）. 2015：3218-3226.
［48］ RAMANATHAN V， HUANG J， HAI S， et al. Detecting events and key actors in multi-person videos［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016：3043-3053.
［49］ INSAFUTDINOV E， PISHCHULIN L， ANDRES B， et al. DeeperCut： A deeper， stronger， and faster multi-person pose estimation model［C］// Computer Vision-ECCV 2016. 2016：34-50.
［50］ CAO Z， SIMON T， WEI S E， et al. Realtime multi-person 2D pose estimation using part affinity fields［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. 2017：1302-1310.
［51］ REN J， REYES N H， BARCZAK A L C， et al. Towards 3D human action recognition using a distilled CNN model［C］// IEEE International Conference on Signal & Image Processing. 2018：7-12.
［52］ ARDIANTO S， HANG H M. Multi-view and multi-modal action recognition with learned fusion［C］// Asia-pacific Signal & Information Processing Association Summit & Conference. 2018：1601-1604.
［53］ BALDERAS D， PONCE P， MOLINA A. Convolutional long short term memory deep neural networks for image sequence prediction［J］. Expert Systems with Application. 2019，122（5）：152-162.
［54］ WANG L M， TONG Z， JI B， et al. TDN： Temporal difference networks for efficient action recognition［C］// Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021：1895-1904.
［55］ AGHAEI A， NAZARI A， MOGHADDAM M E. Sparse deep LSTMs with convolutional attention for human action recognition［J］. SN Computer Science， 2021，2（3）：1-14.
［56］ WU Z， WANG X， JIANG Y G， et al. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification［J］. arXiv preprint arXiv：1504.01561， 2021.
［57］ PENG X J， WANG L M， WANG X X， et al. Bag of visual words and fusion methods for action recognition： Comprehensive study and good practice［J］. Computer Vision and Image Understanding， 2016，150（9）：109-125.
［58］ WANG L L， GE L Z， LI R F， et al. Three-stream CNNs for action recognition［J］. Pattern Recognition Letters， 2017，92（6）：33-40.
［59］莫宏伟，汪海波. 基于Faster R-CNN的人体行为检测研究［J］. 智能系统学报， 2018，13（6）：967-973.
［60］汤华东. 基于LSTM融合多CNN的事件图像分类研究［D］. 北京：北京交通大学， 2018.
［61］周道洋. 基于卷积神经网络的人体行为检测研究［D］. 合肥：中国科学技术大学， 2018.
［62］余兴. 基于深度学习的视频行为识别技术研究［D］. 成都：电子科技大学， 2018.
［63］ ZHOU Z G， DUAN G X， HUAN L， et al. Human behavior recognition method based on double-branch deep convolution neural network［C］// 2018 Chinese Control and Decision Conference. 2018（9）：5520-5524.
［64］张瑞，李其申，储珺. 基于3D卷积神经网络的人体动作识别算法［J］. 计算机工程， 2019，45（1）：259-263.
［65］ HAO F F， LIU J， CHEN X D. A review of human behavior recognition based on deep learning［C］// Proceedings of 2020 International Conference on Artificial Itelligence and Education. 2020：19-23.
［66］黄文明，阳沐利，蓝如师，等. 融合非局部神经网络的行为检测模型［J］. 图学学报， 2021，42（3）：439-445.
［67］揭志浩，曾明如，周鑫恒，等. 结合Attention-ConvLSTM的双流卷积行为识别［J］. 小型微型计算机系统， 2021，42（2）：405-408.
［68］ FEICHTENHOFER C， PINZ A， WILDES R P. Spatiotemporal residual networks for video action recognition［J］. arXiv preprint arXiv：1611.02155， 2016.
［69］ JI S W， XU W， YANG M， et al. 3D convolutional neural networks for human action recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2013，35（1）：221-231.
［70］ FEICHTENHOFER C， FAN H Q， MALIK J， et al. Slowfast networks for video recognition［C］// Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. 2019：6201-6210.
［71］ TRAN D， BOURDEV L， FERGUS R， et al. Learning spatiotemporal features with 3D convolutional networks［C］// Proceedings of 2015 IEEE International Conference on Computer Vision. 2015：4489-4497.
［72］ CARREIRA J， ZISSERMAN A. Quo vadis， action recognition? A new model and the kinetics dataset［C］// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017：4724-4733.
［73］ QIU Z F， YAO T， MEI T. Learning spatio-temporal representation with pseudo-3D residual networks［C］// Proceedings of 2017 IEEE International Conference on Computer Vision. 2017：5534-5542.
［74］ TRAN D， WANG H， TORRESANI L， et al. A closer look at spatiotemporal convolutions for action recognition［C］// Proceedings of 2018IEEE/CVF Conference on Compute Vision and Pattern Recognition. 2018：6450-6459.
［75］罗海波，许凌云，惠斌，等. 基于深度学习的目标跟踪方法研究现状与展望［J］. 红外与激光工程， 2017（5）：14-20.

[1]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[2]	CHEN Kai1, LI Yiting1, 2, QUAN Huafeng1. A River Discarded Bottles Detection Method Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(11): 113-120.
[3]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[4]	WANG Yingying, HAO Xiao. Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution [J]. Computer and Modernization, 2024, 0(10): 74-79.
[5]	SHI Xingyu1, LI Qiang2, ZHUANG Li3, LIANG Yi3, WANG Qiulin3, CHEN Kai3, WU Chenzhou3, CHANG Sheng1. Object Detection Models Distillation Technique for Industrial Deployment [J]. Computer and Modernization, 2024, 0(10): 93-99.
[6]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[7]	CHENG Yazi1, LEI Liang1, 2, CHEN Han1, ZHAO Yiran1. Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention [J]. Computer and Modernization, 2024, 0(09): 121-126.
[8]	CHENG Meng, LI Hao. Improved Deciduous Tree Nest Detection Method Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 24-29.
[9]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[10]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[11]	XU Xin’ai, LI Gang. An Image Generation Method of Classroom Expression Images [J]. Computer and Modernization, 2024, 0(08): 88-91.
[12]	GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images [J]. Computer and Modernization, 2024, 0(08): 98-107.
[13]	HUANG Wendong, WANG Yifan. Survey on Multimodal Information Processing and Fusion Based on Modal Categories [J]. Computer and Modernization, 2024, 0(07): 47-62.
[14]	WU Li1, ZHANG Zhenghao2, GE Caicheng2, YU Jun2. Lane Line Detection Algorithm Based on Improved SCNN Network [J]. Computer and Modernization, 2024, 0(07): 87-92.
[15]	ZHANG Ke1, AI Zhongliang2, LIU Zhonglin3, GU Pingli1, LIU Xuelin4. Judicial Argumentation Understanding Method Based on Multiplet Loss [J]. Computer and Modernization, 2024, 0(06): 115-120.