Computer and Modernization ›› 2025, Vol. 0 ›› Issue (05): 103-110.doi: 10.3969/j.issn.1006-2475.2025.05.014

Previous Articles     Next Articles

Human Action Recognition Algorithm Based on Spatiotemporal Motion Model

  

  1. (1. Digital Consultation Centre, China Communications Construction Group Design Institute Co., Ltd., Zhengzhou 450052, China; 2. Institute of Internet of Things, Shenzhen Polytechnic University, Shenzhen 518055, China; 3. Key Laboratory of Urban Land Resources Monitoring and Simulation, MNR, Shenzhen 518034, China; 4. Institute of Artificial Intelligence, Shenzhen Polytechnic University , Shenzhen 518055, China; 5. Henan Polytechnic, Zhengzhou 450046, China)
  • Online:2025-05-29 Published:2025-05-29

Abstract: With the rapid development of human-computer interaction technology, efficient and accurate human action recognition techniques have demonstrated tremendous application potential in fields such as virtual reality and intelligent surveillance. However, due to the complexity and diversity of human actions, traditional recognition methods have limitations. Based on this, we propose a human action recognition algorithm that integrates a spatio-temporal motion model and deep learning to overcome these challenges. Our method converts depth video sequences into multi-angle depth video sequences through the rotation of coordinate systems, and utilizes an adaptive temporal model to segment the depth video sequences into several sub-actions. By accumulating the parts of the depth video images with large energy changes between adjacent frames, we form motion energy maps, while accumulating the parts with smaller energy changes forms static energy maps, collectively referred to as the Spatial-Temporal Motion Model (STMM). A multi-channel convolutional neural network is introduced to extract dynamic and static features from the STMM, and Spatial Pyramid Histogram of Oriented Gradients (SPHOG) features extracted from the STMM serve as a complement to the features of the multi-channel convolutional neural network. Furthermore, we introduce adaptive moment estimation to adjust the learning rate of each parameter during neural network training, enhancing the efficiency and stability of the model training. We also introduce L2 norm regularization to reduce model complexity and prevent overfitting. Finally, we employ a fully connected neural network to classify the actions, achieving a high recognition rate on public datasets. The experimental results demonstrate that the human action recognition algorithm integrating spatio-temporal pyramid and deep learning is highly effective.

Key words: motion energy map, static energy map, spatial-temporal motion model, multi-channel convolutional neural network, adaptive moment estimation, L2 norm regularization,

CLC Number: