[1] |
KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of the 2012 International Conference on Neural Information Processing Systems. 2012:1097-1105.
|
[2] |
DENG J, DOND W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009:248-255.
|
[3] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]// Proceedings of the 2015 International Conference on Learning Representations. 2015:212-219.
|
[4] |
HE K M, ZHZANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
|
[5] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:321-326.
|
[6] |
RIQUELME C, PUIGCERVER J, MUSTAFA B, et al. Scaling vision with sparse mixture of experts[C]// Proceedings of the 2021 International Conference on Neural Information Processing Systems. 2021:1097-1105.
|
[7] |
LIU Z, LIN Y T, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:9992-10002.
|
[8] |
STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: Transformer for semantic segmentation[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:7242-7252.
|
[9] |
TAY Y, DEHGHANI M, BAHRI D, et al. Efficient transformers: A survey[J]. arXiv preprint arXiv:2009.06732, 2020.
|
[10] |
WANG S N, LI B Z, KHABSA M, et al. Linformer: Self-attention with linear complexity[J]. arXiv preprint arXiv:2006.04768, 2020.
|
[11] |
CHOROMANSKI K, LIKHOSHERSTOV V, DOHAN D, et al. Rethinking attention with performers[C]// Proceedings of the 2021 International Conference on Learning Representations. 2021:3-7.〖HJ0.68mm〗
|
[12] |
KITAEV N, KAISER L, LEVSKAYA A. Reformer: The efficient transformer[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:1-7.
|
[13] |
CHILD R, GRAY S, RADFORD A, et al. Generating long sequences with sparse transformers[J]. arXiv preprint arXiv:1904.10509, 2019.
|
[14] |
BHOJANAPALLI S, YUN C, RAWAT A S, et al. Low-rank bottleneck in multi-head attention models[C]// Proceedings of the 2020 International Conference on Machine Learning. 2020:864-873.
|
[15] |
YU X Y, LIU T L, WANG X C, et al. On compressing deep models by low rank and sparse decomposition[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:67-76.
|
[16] |
WU J X, LENG C, WANG Y H, et al. Quantized convolutional neural networks for mobile devices[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:4820-4828.
|
[17] |
HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.
|
[18] |
HANSON S J, PRATT L Y. Comparing biases for minimal network construction with back-propagation[C]// Proceedings of the 1st International Conference on Neural Information Processing Systems. 1988:177-185.
|
[19] |
WU H P, XIAO B, CODELLA N, et al. CvT: Introducing convolutions to vision transformers[C]// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision. 2021:22-31.
|
[20] |
BELTAGY I, PETERS M E, COHAN A. Longformer: The long-document transformer[J]. arXiv preprint arXiv:2004.05150, 2020.
|
[21] |
TAY Y, BAHRI D, METZLER D, et al. Synthesizer: Rethinking self-attention for transformer models[C]// Proceedings of the 38th IEEE International Conference on Machine Learning. 2021:10183-10192.
|
[22] |
DAI Z H, YANG Z L, YANG Y M, et al. Transformer-XL: Attentive language models beyond a fixed-length context[J]. arXiv preprint arXiv:1901.02860, 2019.
|
[23] |
RAE J W, POTAPENKO A, JAYAKUMAR S M, et al. Compressive transformers for long-range sequence modelling[C]// Proceedings of the 2020 International Conference on Learning Representations. 2020:26-30.
|
[24] |
TAI C, XIAO T, ZHANG Y, et al. Convolutional neural networks with low-rank regularization[J]. arXiv preprint arXiv:1511.06067, 2015.
|
[25] |
GONG Y C, LIU L, YANG M, et al. Compressing deep convolutional networks using vector quantization[J]. arXiv preprint arXiv:1412.6115, 2014.
|
[26] |
VANHOUCKE V, SENIOR A, MAO M Z. Improving the speed of neural networks on CPUs[EB/OL]. (2011-12-15)[2021-10-02]. https://research.google/pubs/pub37631/.
|
[27] |
ROMERO A, BALLAS N, KAHOU S E, et al. FitNets: Hints for thin deep nets[J]. arXiv preprint arXiv:1412.6550, 2014.
|
[28] |
MOZER M C, JORDAN M I, PETSCHE T. Advances in Neural Information Processing Systems 9[M]. Morgan Kaufmann Publishers, 1997.
|
[29] |
HASSIBI B, STORK D G. Second order derivatives for network pruning: Optimal brain surgeon[C]// Proceedings of the 1992 International Conference on Neural Information Processing Systems. 1992:164-171.
|
[30] |
SRINIVAS S, BABU R V. Data-free parameter pruning for deep neural networks[C]// Proceedings of the 2015 British Machine Vision Conference. 2015. DOI: 10.5244/C.29.31.
|
[31] |
DENG L. The MNIST database of handwritten digit images for machine learning research [best of the Web][J]. IEEE Signal Processing Magazine, 2012,29(6):141-142.
|
[32] |
KRIZHEVSKY A. Learning multiple layers of features from tiny images[R]. University of Toronto, 2009.
|
[33] |
GRIFFIN G, HOLUB A, PERONA P. Caltech-256 object category dataset[EB/OL]. (2007-12-15)[2021-12-15]. https://authors.library.caltech.edu/7694/?〖KG-*3〗ref=https://githubhelp.com.
|
[34] |
SEHWAG V, WANG S Q, MITTAL P, et al. HYDRA: Pruning adversarially robust neural networks[C]// Proceedings of the 2020 International Conference on Neural Information Processing Systems. 2020:97-105.
|
[35] |
LIU N, MA X L, XU Z Y, et al. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates[C]// Proceedings of the 2020 AAAI Conference on Artificial Intelligence. 2020:4876-4883.
|
[36] |
SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:618-626.
|