[1] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986,323:533-536.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]// Proceedings of the Advances in Neural Information Processing Systems. 2012:1097-1105.
[3] XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// Proceedings of the International Conference on Machine Learning. 2015:2048-2057.
[4] BUADES A, COLL B, MOREL J M. A non-local algorithm for image denoising[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005,2:60-65.
[5] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:7794-7803.
[6] WOLD S, ESBENSEN K, GELADI P. Principal component analysis[J]. Chemometrics and Intelligent Laboratory Systems, 1987,2(1-3):37-52.
[7] BURGES C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998,2(2):121-167.
[8] 〖KG-*3〗FREUND Y, SCHAPIRE R E. Experiments with a new boosting algorithm[C]// Proceedings of the International Conference on Machine Learning. 1996,96:148-156.
[9] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004,60(2):91-110.
[10]SCOVANNER P, ALI S, SHAH M. A 3-Dimensional sift descriptor and its application to action recognition[C]// Proceedings of the 15th ACM International Conference on Multimedia. ACM, 2007:357-360.
[11]DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005:886-893.〖HJ1.6mm〗
[12]KLASER A, MARSZAEK M, SCHMID C. A spatio-temporal descriptor based on 3D-Gradients[C]// The 19th British Machine Vision Conference(BMVC 2008), British Machine Vision Association, 2008: DOI: 10.5244/C.22.99.
[13]CHAUDHRY R, RAVICHANDRAN A, HAGER G, et al. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009:1932-1939.
[14]WANG H, KLSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013,103(1):60-79.
[15]WANG H, KLSER A, SCHMID C, et al. Action recognition by dense trajectories[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011:3169-3176.
[16]WANG H, SCHMID C. Action recognition with improved trajectories[C]// Proceedings of the IEEE International Conference on Computer Vision. 2013:3551-3558.
[17]YANG J, JIANG Y G, HAUPTMANN A G, et al. Evaluating bag-of-visual-words representations in scene classification[C]// Proceedings of the International Workshop on Multimedia Information Retrieval. ACM, 2007:197-206.
[18]SNCHEZ J, PERRONNIN F, MENSINK T, et al. Image classification with the fisher vector: Theory and practice[J]. International Journal of Computer Vision, 2013,105(3):222-245.
[19]JGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010:3304-3311.
[20]SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]// Proceedings of the Advances in Neural Information Processing Systems. 2014:568-576.
[21]FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:1933-1941.
[22]WANG L M, QIAO Y, TANG X O. Action recognition with trajectory-pooled deep-convolutional descriptors[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:4305-4314.
[23]TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:4489-4497.
[24]SUN L, JIA K, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:4597-4605.
[25]TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6450-6459.
[26]LEA C, FLYNN M D, VIDAL R, et al. Temporal convolutional networks for action segmentation and detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:156-165.
[27]QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017:5533-5541.
[28]CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:6299-6308.
[29]HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
[30]CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv:1412.3555, 2014.
[31]DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:2625-2634.
[32]WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]// Proceedings of the European Conference on Computer Vision. 2016:20-36.
[33]HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
[34]KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
[35]ZACH C, POCK T, BISCHOF H. A duality based approach for realtime TV-L1 optical flow[C]// Proceedings of the Joint Pattern Recognition Symposium. Springer, 2007:214-223.
[36]IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the International Conference on Machine Learning. 2015:448-456.
[37]SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
|