Computer and Modernization ›› 2020, Vol. 0 ›› Issue (07): 97-103.doi: 10.3969/j.issn.1006-2475.2020.07.019

Previous Articles     Next Articles

Video Action Recognition in Complex Background Based on Deep Learning

  

  1. (1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2. MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing 211106, China;
    3. Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China)
  • Online:2020-07-06 Published:2020-07-15

Abstract: Recognizing human actions in videos has broad application prospects and great potential economic value. However, the accuracy of video action recognition is affected by a number of factors such as swaying, background changes, camera shaking and moving shadows. To reduce the influence of such complex background, we proposed non-local temporal segment networks (NLTSNet). The NLTSNet is based on the temporal segment network but is enhanced with non-local modules over the ResNet so as to capture the non-local spatial and temporal information contained in the video clips. To furthermore improve the network’s robustness against stationary cluttered background, we integrate the optical flow into the non-local module. Finally, we adopt a learnable ensemble network to fuse the prediction results from both the appearance and temporal modality. Extensive experimental results on the TDAP dataset show that our new method can recognize human actions with more accuracy in a complex background compared with several state of the art methods, without increasing the time complexity.

Key words: action recognition, non-local module, temporal segment network, complex background, self-attention

CLC Number: