Human Action Recognition Algorithm Based on Spatiotemporal Motion Model

doi:10.3969/j.issn.1006-2475.2025.05.014

Abstract

Abstract: With the rapid development of human-computer interaction technology， efficient and accurate human action recognition techniques have demonstrated tremendous application potential in fields such as virtual reality and intelligent surveillance. However， due to the complexity and diversity of human actions， traditional recognition methods have limitations. Based on this， we propose a human action recognition algorithm that integrates a spatio-temporal motion model and deep learning to overcome these challenges. Our method converts depth video sequences into multi-angle depth video sequences through the rotation of coordinate systems， and utilizes an adaptive temporal model to segment the depth video sequences into several sub-actions. By accumulating the parts of the depth video images with large energy changes between adjacent frames， we form motion energy maps， while accumulating the parts with smaller energy changes forms static energy maps， collectively referred to as the Spatial-Temporal Motion Model （STMM）. A multi-channel convolutional neural network is introduced to extract dynamic and static features from the STMM， and Spatial Pyramid Histogram of Oriented Gradients （SPHOG） features extracted from the STMM serve as a complement to the features of the multi-channel convolutional neural network. Furthermore， we introduce adaptive moment estimation to adjust the learning rate of each parameter during neural network training， enhancing the efficiency and stability of the model training. We also introduce L2 norm regularization to reduce model complexity and prevent overfitting. Finally， we employ a fully connected neural network to classify the actions， achieving a high recognition rate on public datasets. The experimental results demonstrate that the human action recognition algorithm integrating spatio-temporal pyramid and deep learning is highly effective.

Key words: motion energy map, static energy map, spatial-temporal motion model, multi-channel convolutional neural network, adaptive moment estimation, L2 norm regularization,

CLC Number:

TP391

XU Haining1, WANG Yankun2, 3, FAN Yong3 , 4, LUO Lina2, GUO Jing5 . Human Action Recognition Algorithm Based on Spatiotemporal Motion Model[J]. Computer and Modernization, 2025, 0(05): 103-110.

References

［1］ ZHAO H， ZHANG J S， LAI Y K， et al. High-fidelity human avatars from a single RGB camera［C］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2022:15883-15892.
［2］ HU T， SARKAR K， LIU L J， et al. Rendering human avatars from egocentric camera images［C］// 2021 IEEE/CVF International Conference on Computer Vision（ICCV）. IEEE， 2021:14508-14518.
［3］ WEN Y L， PAN H， YANG L， et al. Hierarchical temporal transformer for 3D hand pose estimation and action recognition from egocentric RGB videos［C］// 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2023:21243-21253.
［4］ WANG X H， XU X， MU Y D. Neural koopman pooling: Control-inspired temporal dynamics encoding for skeleton-based action recognition ［C］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2023:10597-10607.
［5］ XIANG W M， LI C， ZHOU Y X， et al. Learning discriminative representations for skeleton based action recognition［C］// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2023:10608-10617.
［6］ MAO Y Y， DENG J J， ZHOU W G， et al. Masked motion predictors are strong 3D action representation learners［C］// 2023 IEEE/CVF International Conference on Computer Vision（ICCV）. IEEE， 2023:10181-10191.
［7］徐海宁，陈恩庆，梁成武. 三维动作识别时空特征提取方法［J］. 计算机应用， 2016，36（2）:568-573.
［8］ CHEN C， LIU K， KEHTARNAVAZ N. Real-time human action recognition based on depth motion maps［J］. Journal of Real-Time Image Processing， 2016，12:155-163．
［9］ CHEN C， JAFARI R， KEHTARNAVAZ N. Action recognition from depth sequences using depth motion maps-based local binary patterns［C］// Proceedings of 2015 IEEE Winter Conference on Applications of Computer Vision. IEEE， 2015:1092-1099．
［10］ SANCHEZ-CABALLERO A， DE LÓPEZ-DIZ S， FUENT
ES-JIMENEZ D， et al. 3DFCNN: Real-time action recognition using 3D deep neural networks with raw depth information［J］. Multimedia Tools and Applications， 2022，81 （17）:24119-24143.
［11］ ZONG M， WANG R L， CHEN X B， et al. Motion saliency based multi-stream multiplier ResNets for action recognition［J］. Image and Vision Computing， 2021，107:104108.1
-104108.8.
［12］ ZHANG H C， LIU D， XIONG Z W. Two-stream action recognition oriented video super resolution［C］// 2019 IEEE/CVF International Conference on Computer Vision（ICCV）. IEEE， 2019:8798-8807.
［13］ ZHOU Y Z， SUN X Y， LUO C， et al. Spatiotemporal fusion in 3D CNNs: A probabilistic view［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2020:9826-9835.
［14］ YANG C Y， XU Y H， SHI J P， et al. Temporal pyramid network for action recognition［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2020:588-597.
［15］ KIM J， CHA S， WEE D， et al. Regularization on spatio-temporally smoothed feature for action recognition［C］// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2020:12100-12109.
［16］宋轶航，胡静，徐超，等. 基于深度信息的特征学习与动作识别方法［J］. 计算机应用研究， 2021，38（11）:3446-3450.
［17］李元祥，谢林柏. 结合 RGB-D 视频和卷积神经网络的行为识别算法［J］. 计算机与数字工程， 2020，48（12）:3052-3058.
［18］张良，钱毅敏. 基于深度图像和骨骼信息的人体动作识别方法［J］. 中国民航大学学报， 2021，39（2）:54-60.
［19］ YU B X B， LIU Y， ZHANG X， et al. MMNet: A model-based multimodal network for human action recognition in RGB-D videos［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023，45（3）: 3522-3538.
［20］ XU W Y， WU M Q， ZHAO M， et al. Multimodal feature fusion model for Rgb-D action recognition［C］// 2021 IEEE International Conference on Multimedia & Expo Workshops （ICMEW）. IEEE， 2021. DOI:10.1109/ICMEW53276.2021.
9455975.
［21］ AHMAD Z， KHAN N. Human action recognition using deep multilevel multimodal（M2）fusion of depth and inertial sensors［J］. IEEE Sensors Journal， 2020，20（3）:1445-1455.
［22］马亚彤，王松，刘英芳. 融合多模态数据的人体动作识别方法研究［J］. 计算机工程， 2022，48（9）:180-188.
［23］王松，党建武，王阳萍，等. 基于3D运动历史图像和多任务学习的动作识别［J］. 吉林大学学报（工学版）， 2020，50（4）:1495-1502.
［24］司宇航，周天彤，冯珂垚. 基于骨架关键关节构建时空金字塔模型的人体行为识别［J］. 计算机应用与软件， 2024，41（1）:153-160.
［25］吴潇颖，李锐，吴胜昔. 基于CNN与双向LSTM 的行为识别算法［J］. 计算机工程与设计， 2020，41（2）:361-366
［26］施海勇，侯振杰，巢新，等. 多模态时空特征表示及其在行为识别中的应用［J］. 中国图象图形学报， 2023，28（4）:1041-1055.
［27］ BULBUL M F， TABUSSUM S， ALI H， et al. Exploring 3D human action recognition using STACOG on multi-view depth motion maps sequences［J］. Sensors， 2021.21（11）. DOI: 10.3390/s21113642.
［28］ FAN H H，YANG Y， KANKANHALLI M. Point 4D transformer networks for spatio-temporal modeling in point cloud videos［C］// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. IEEE， 2021:
14199-14208.
［29］宋真东，杨国超，马玉鹏，等. 基于注意力机制的多模态人体行为识别算法［J］. 计算机测量与控制， 2022，30（2）:276-283.

[1]	LANG Kun, NIU Chunhui, LI Chenqiong, ZENG Suyu. Box Office Prediction Model Based on SA-EW-LSTM [J]. Computer and Modernization, 2025, 0(05): 1-9.
[2]	BAO Xuguang1, 2 , CHEN Zhiwei1, LI Qiaochen1, 2, JIANG Chengcheng1 . An Interactive Mining Approach for Spatial Co-location Patterns Incorporating#br# User Interest Preferences [J]. Computer and Modernization, 2025, 0(05): 10-20.
[3]	WANG Xing1, ZHONG Haili1, YU Yang2, LI Zhentao2, BAI Chuang1. Interference Suppression Algorithm for Millimeter Wave Radar Based on Independent Component Analysis [J]. Computer and Modernization, 2025, 0(05): 21-27.
[4]	WANG Chengxian1, ZHAO Qing2. Yi Language Named Entity Recognition Method Based on CR-BACC Model [J]. Computer and Modernization, 2025, 0(05): 28-35.
[5]	CHI Biwei1, SUN Rui2. DDoS Attack Detection Method Based on Transformer Architecture [J]. Computer and Modernization, 2025, 0(05): 36-40.
[6]	LI Zhuoqi, ZHAO Lihui. Image Encryption Method Based on Poisoning Attack Strategy [J]. Computer and Modernization, 2025, 0(05): 41-47.
[7]	JIANG Sulun1, 2, 3, YUAN Decheng1, GUO Qingda2, 3, LIU Jian3, YU Guangping2, 3. Survey of Application of Knowledge Graph in Field of Intelligent Manufacturing [J]. Computer and Modernization, 2025, 0(05): 48-59.
[8]	ZHANG Jun, JIANG Lin. Distributed System Fault Prediction Method Based on XGBoost & LightGBM [J]. Computer and Modernization, 2025, 0(05): 60-65.
[9]	CAO Guozhen1, PENG Han2, ZHANG Xiaoli2, JING Yuejuan2, HOU Yuanyuan2. Review of Development Trends of Modeling Languages and Tools for Airborne Software [J]. Computer and Modernization, 2025, 0(05): 66-72.
[10]	XU Shengchao1, ZHOU Jipeng2. Cloud-PERM: Ab Initio Prediction Method for Protein Folding Simulation [J]. Computer and Modernization, 2025, 0(05): 73-78.
[11]	WANG Long, YANG Fengbao, YANG Tongyao. Global Path Planning for Unmanned Vehicles Based on Adaptive Artificial Potential Field Method [J]. Computer and Modernization, 2025, 0(05): 79-85.
[12]	WANG Dongfang1, YANG Yan1, ZHANG Dong1, HAN Wenrui2, LI Mingchang2. DSA De-artifacting Algorithm Based on Deformation Field Registration [J]. Computer and Modernization, 2025, 0(05): 86-90.
[13]	WEI Yunsong1, 2, LI Jiaqiang1, 2, HE Chao1, 2, 3, YU Haisheng1, 2, CHEN Yanlin1, 2, ZHAO Longqing1, 2, WEI Rongkun1, 2. Research Advances on 3D Object Detection Method Based on Visual Information and LiDAR for Intelligent Driving [J]. Computer and Modernization, 2025, 0(05): 91-102.
[14]	XU Ling1, ZHANG Dong1, WEN Shen1, HU Ping2. Glioma Segmentation and Classification Network Assisted by Object Detection [J]. Computer and Modernization, 2025, 0(05): 111-116.
[15]	DENG Yuyan, HE Yueshun, HE Linlin, CHEN Jie, LI Juan, ZOU Zhiyi. SE-BCNN with Feature Recalibration for Fine-grained Conodont Identification [J]. Computer and Modernization, 2025, 0(05): 117-121.