[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 2012 International Conference on Neural Information Processing Systems. 2012:1097-1105.
[2] DENG J, DOND W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009:248-255.
[3] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]// Proceedings of the 2015 International Conference on Learning Representations. 2015:212-219.
[4] HE K M, ZHZANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
[5] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:321-326.
[6] RIQUELME C, PUIGCERVER J, MUSTAFA B, et al. Scaling vision with sparse mixture of experts[C]// Proceedings of the 2021 International Conference on Neural Information Processing Systems. 2021:1097-1105.
[7] LIU Z, LIN Y T, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:9992-10002.
[8] STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: Transformer for semantic segmentation[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:7242-7252.
[9] TAY Y, DEHGHANI M, BAHRI D, et al. Efficient transformers: A survey[J]. arXiv preprint arXiv:2009.06732, 2020.
[10]WANG S N, LI B Z, KHABSA M, et al. Linformer: Self-attention with linear complexity[J]. arXiv preprint arXiv:2006.04768, 2020.
[11]CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Rethinking attention with performers[C]// Proceedings of the 2021 International Conference on Learning Representations. 2021:3-7.〖HJ0.68mm〗
[12]KITAEV N, KAISER L, LEVSKAYA A. Reformer: The efficient transformer[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:1-7.
[13]CHILD R, GRAY S, RADFORD A, et al. Generating long sequences with sparse transformers[J]. arXiv preprint arXiv:1904.10509, 2019.
[14]BHOJANAPALLI S, YUN C, RAWAT A S, et al. Low-rank bottleneck in multi-head attention models[C]// Proceedings of the 2020 International Conference on Machine Learning. 2020:864-873.
[15]YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:67-76.
[16]WU J X, LENG C, WANG Y H, et al. Quantized convolutional neural networks for mobile devices[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:4820-4828.
[17]HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
[18]HANSON S J, PRATT L Y. Comparing biases for minimal network construction with back-propagation[C]// Proceedings of the 1st International Conference on Neural Information Processing Systems. 1988:177-185.
[19]WU H P, XIAO B, CODELLA N, et al. CvT: Introducing convolutions to vision transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:22-31.
[20]BELTAGY I, PETERS M E, COHAN A. Longformer: The long-document transformer[J]. arXiv preprint arXiv:2004.05150, 2020.
[21]TAY Y, BAHRI D, METZLER D, et al. Synthesizer: Rethinking self-attention for transformer models[C]// Proceedings of the 38th IEEE International Conference on Machine Learning. 2021:10183-10192.
[22]DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.
[23]RAE J W, POTAPENKO A, JAYAKUMAR S M, et al. Compressive transformers for long-range sequence modelling[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:26-30.
[24]TAI C, XIAO T, ZHANG Y, et al. Convolutional neural networks with low-rank regularization[J]. arXiv preprint arXiv:1511.06067, 2015.
[25]GONG Y C, LIU L, YANG M, et al. Compressing deep convolutional networks using vector quantization[J]. arXiv preprint arXiv:1412.6115, 2014.
[26]VANHOUCKE V, SENIOR A, MAO M Z. Improving the speed of neural networks on CPUs[EB/OL]. (2011-12-15)[2021-10-02]. https://research.google/pubs/pub37631/.
[27]ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: Hints for thin deep nets[J]. arXiv preprint arXiv:1412.6550, 2014.
[28]MOZER M C, JORDAN M I, PETSCHE T. Advances in Neural Information Processing Systems 9[M]. Morgan Kaufmann Publishers, 1997.
[29] HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon[C]// Proceedings of the 1992 International Conference on Neural Information Processing Systems. 1992:164-171.
[30]SRINIVAS S, BABU R V. Data-free parameter pruning for deep neural networks[C]// Proceedings of the 2015 British Machine Vision Conference. 2015. DOI: 10.5244/C.29.31.
[31]DENG L. The MNIST database of handwritten digit images for machine learning research [best of the Web][J]. IEEE Signal Processing Magazine, 2012,29(6):141-142.
[32]KRIZHEVSKY A. Learning multiple layers of features from tiny images[R]. University of Toronto, 2009.
[33]GRIFFIN G, HOLUB A, PERONA P. Caltech-256 object category dataset[EB/OL]. (2007-12-15)[2021-12-15]. https://authors.library.caltech.edu/7694/?〖KG-*3〗ref=https://githubhelp.com.
[34]SEHWAG V, WANG S Q, MITTAL P, et al. HYDRA: Pruning adversarially robust neural networks[C]// Proceedings of the 2020 International Conference on Neural Information Processing Systems. 2020:97-105.
[35]LIU N, MA X L, XU Z Y, et al. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. 2020:4876-4883.
[36]SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:618-626.
|