[1] |
RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986,323:533-536.
|
[2] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]// Proceedings of the Advances in Neural Information Processing Systems. 2012:1097-1105.
|
[3] |
XU K, BA J, KIROS R, et al. Show, attend and tell: Neural image caption generation with visual attention[C]// Proceedings of the International Conference on Machine Learning. 2015:2048-2057.
|
[4] |
BUADES A, COLL B, MOREL J M. A non-local algorithm for image denoising[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005,2:60-65.
|
[5] |
WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:7794-7803.
|
[6] |
WOLD S, ESBENSEN K, GELADI P. Principal component analysis[J]. Chemometrics and Intelligent Laboratory Systems, 1987,2(1-3):37-52.
|
[7] |
BURGES C J C. A tutorial on support vector machines for pattern recognition[J]. Data Mining and Knowledge Discovery, 1998,2(2):121-167.
|
[8] |
〖KG-*3〗FREUND Y, SCHAPIRE R E. Experiments with a new boosting algorithm[C]// Proceedings of the International Conference on Machine Learning. 1996,96:148-156.
|
[9] |
LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004,60(2):91-110.
|
[10] |
SCOVANNER P, ALI S, SHAH M. A 3-Dimensional sift descriptor and its application to action recognition[C]// Proceedings of the 15th ACM International Conference on Multimedia. ACM, 2007:357-360.
|
[11] |
DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2005:886-893.〖HJ1.6mm〗
|
[12] |
KLASER A, MARSZAEK M, SCHMID C. A spatio-temporal descriptor based on 3D-Gradients[C]// The 19th British Machine Vision Conference(BMVC 2008), British Machine Vision Association, 2008: DOI: 10.5244/C.22.99.
|
[13] |
CHAUDHRY R, RAVICHANDRAN A, HAGER G, et al. Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2009:1932-1939.
|
[14] |
WANG H, KLSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013,103(1):60-79.
|
[15] |
WANG H, KLSER A, SCHMID C, et al. Action recognition by dense trajectories[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2011:3169-3176.
|
[16] |
WANG H, SCHMID C. Action recognition with improved trajectories[C]// Proceedings of the IEEE International Conference on Computer Vision. 2013:3551-3558.
|
[17] |
YANG J, JIANG Y G, HAUPTMANN A G, et al. Evaluating bag-of-visual-words representations in scene classification[C]// Proceedings of the International Workshop on Multimedia Information Retrieval. ACM, 2007:197-206.
|
[18] |
SNCHEZ J, PERRONNIN F, MENSINK T, et al. Image classification with the fisher vector: Theory and practice[J]. International Journal of Computer Vision, 2013,105(3):222-245.
|
[19] |
JGOU H, DOUZE M, SCHMID C, et al. Aggregating local descriptors into a compact image representation[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2010:3304-3311.
|
[20] |
SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]// Proceedings of the Advances in Neural Information Processing Systems. 2014:568-576.
|
[21] |
FEICHTENHOFER C, PINZ A, ZISSERMAN A. Convolutional two-stream network fusion for video action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:1933-1941.
|
[22] |
WANG L M, QIAO Y, TANG X O. Action recognition with trajectory-pooled deep-convolutional descriptors[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:4305-4314.
|
[23] |
TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3D convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:4489-4497.
|
[24] |
SUN L, JIA K, YEUNG D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2015:4597-4605.
|
[25] |
TRAN D, WANG H, TORRESANI L, et al. A closer look at spatiotemporal convolutions for action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:6450-6459.
|
[26] |
LEA C, FLYNN M D, VIDAL R, et al. Temporal convolutional networks for action segmentation and detection[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:156-165.
|
[27] |
QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3D residual networks[C]// Proceedings of the IEEE International Conference on Computer Vision. 2017:5533-5541.
|
[28] |
CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:6299-6308.
|
[29] |
HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997,9(8):1735-1780.
|
[30] |
CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. arXiv preprint arXiv:1412.3555, 2014.
|
[31] |
DONAHUE J, ANNE HENDRICKS L, GUADARRAMA S, et al. Long-term recurrent convolutional networks for visual recognition and description[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015:2625-2634.
|
[32] |
WANG L M, XIONG Y J, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]// Proceedings of the European Conference on Computer Vision. 2016:20-36.
|
[33] |
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
|
[34] |
KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[J]. arXiv preprint arXiv:1609.02907, 2016.
|
[35] |
ZACH C, POCK T, BISCHOF H. A duality based approach for realtime TV-L1 optical flow[C]// Proceedings of the Joint Pattern Recognition Symposium. Springer, 2007:214-223.
|
[36] |
IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]// Proceedings of the International Conference on Machine Learning. 2015:448-456.
|
[37] |
SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014,15(1):1929-1958.
|