Action Recognition Technology Based on Improved C3D Neural Network

doi:10.3969/j.issn.1006-2475.2019.03.007

Abstract

Abstract: Although the C3D convolutional neural network proposed by Facebook can achieve good video action recognition accuracy, there is still much room for improvement in terms of speed, and the model obtained by training is too large to be used by mobile devices. This paper uses small convolutional kernels to reduce the characteristics of parameters, optimizes the existing network structure, and proposes a new action recognition scheme, which decomposes the 3×3×3 convolutional kernel commonly used in the original C3D neural network into deep convolution and point convolution (1×1×1 convolution kernel), and training tests on the UCF101 dataset and ActivityNet dataset. The results show that compared with the original C3D network, the improved C3D network accuracy is 2.4% higher than C3D, 12.9% faster than C3D in speed, and the model size is compressed to 25.8%.

Key words: action recognition, convolution decomposition, recognition speed, model compression

CLC Number:

TP391

LIAO Xiao-dong, JIA Xiao-xia . Action Recognition Technology Based on Improved C3D Neural Network[J]. Computer and Modernization, doi: 10.3969/j.issn.1006-2475.2019.03.007.

References

［1］ LAPTEV I, LINDEBERG T. Space-time interest points［J］. International Journal of Computer Vision, 2005,64(2-3):107-123.
［2］ BOIMAN O, IRANI M. Detecting irregularities in images and in video［J］. International Journal of Computer Vision, 2007,74(1):17-31.
［3］ KITANI K M, ZIEBART B D, BAGNELL J A, et al. Activity forecasting［C］// European Conference on Computer Vision. Springer, Berlin, Heidelberg, 2012:201-214.
［4］ WANG H , SCHMID C. Action recognition with improved trajectories［C］// IEEE International Conference on Computer Vision. 2014:3551-3558.
［5］ LE Q V, ZOU W Y, YEUNG S Y, et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis［C］// Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. 2011:3361-3368.
［6］ NG Y H, HAUSKNECHT M, VIJAYANARASIMHAN S, et al. Beyond short snippets: Deep networks for video classification［C］// 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:4694-4702.
［7］ DIBA A, SHARMA V, GOOL L V. Deep temporal linear encoding networks［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:430-443.
［8］ TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks［C］// 2015 IEEE International Conference on Computer Vision. 2015:4489-4497.
［9］ XU H, DAS A, SAENKO K. R-C3D: Region convolutional 3D network for temporal activity detection［C］// Proceedings of International Conference on Computer Vision. 2017:761-774.
［10］黄凯奇,陈晓棠,康运锋,等. 智能视频监控技术综述［J］. 计算机学报, 2015,20(6):1093-1118.
［11］王松,党建武,王阳萍,等. 实时动作识别方法研究［J］. 计算机工程与应用, 2017,53(3):28-31.
［12］JI S, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(1):221-231.
［13］YAN Z, ZHANG H, PIRAMUTHU R, et al. HD-CNN: Hierarchical deep convolutional neural networks for large scale visual recognition［C］// Proceedings of International Conference on Computer Vision. 2015:2740-2748.
［14］KARPATHY A, TODERICI G, SHETTY S , et al. Large-scale video classication with convolutional neural networks［C］// International Conference on Computer Vision and Pattern Recognition. 2014:1-6.
［15］HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［16］HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications［C］// 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:21-26.
［17］ZHENG Q, ZHANG Z, ZHANG S, et al. Merging-and-evolution networks for mobile vision applications［J］. IEEE Access, 2018(99):1.
［18］YANG Z, MOCZULSKI M, DENIL M, et al. Deep fried convnets［J］. Computer Science, 2015(12):7-13.
［19］SIGURDSSON G A, DIVVALA S, FARHADI A, et al. Asynchronous temporal fields for action recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition. 2017:5650-5659.
［20］LI X, DING L, WANG L, et al. FPGA accelerates deep residual learning for image recognition［C］// IEEE Information Technology, Networking, Electronic and Automation Control Conference. 2017:834-840.

[1]	HUANG Yan-hui, LAN Hai, WEI Xian. Lightweight Vision Transformer Based on Separable Structured Transformations [J]. Computer and Modernization, 2022, 0(10): 75-81.
[2]	BAO Zhi-qiang, CHENG Ping, HUANG Qiong-dan, LYU Shao-qing. A Model Compression Algorithm of Convolutional Neural Network [J]. Computer and Modernization, 2021, 0(10): 107-111.
[3]	BAI Shi-lei, YIN Ke-xin, ZHU Jian-qi. Lightweight YOLOv3 Traffic Sign Detection Algorithm [J]. Computer and Modernization, 2020, 0(09): 83-88.
[4]	PAN Chen-ting, TAN Xiao-yang, . Video Action Recognition in Complex Background Based on Deep Learning [J]. Computer and Modernization, 2020, 0(07): 97-103.
[5]	ZHENG Xuan-yu, SHI Chang, CUI Wen-cheng. Exercise Training Assist System Based on Kinect [J]. Computer and Modernization, 2019, 0(08): 12-.
[6]	HAN Min-jie. Multi-modal Action Recognition Based on Deep Learning Framework [J]. Computer and Modernization, 2017, 0(7): 48-52.
[7]	LYU Wen, XU Gui-li, CHENG Yue-hua, LI Kai-yu, WANG Biao. Soft Classification in Action Recognition Based on Local Spatio-temporal Features [J]. Computer and Modernization, 2014, 0(3): 94-98,103.
[8]	SU Wen-ying;LIU Yan;LI Wen-bo . Research on 3D Human Action Recognition Based on Period [J]. Computer and Modernization, 2013, 1(4): 90-94.
[9]	SHI Wei;. Human Action Recognition System Based on Topic Model [J]. Computer and Modernization, 2013, 1(4): 1-4.
[10]	SUN Jin-hong;LIU Wei-dong;MA Liang;YANG Wei-lei. Research on Machine Visual’s 3D Human Body Action Recognition [J]. Computer and Modernization, 2011, 1(11): 86-4.