基于迁移学习的卷积神经网络通道剪枝

摘要/Abstract

摘要： 卷积神经网络在计算机视觉等多个领域应用广泛，然而其模型参数量众多、计算开销庞大，导致许多边缘设备无法满足其存储与计算资源要求。针对其边缘部署困难，提出使用迁移学习策略改进基于BN层缩放因子通道剪枝方法的稀疏化过程。本文对比不同层级迁移方案对稀疏化效果与通道剪枝选取容限的影响；并基于网络结构搜索观点设计实验，探究其精度保持极限与迭代结构的收敛性。实验结果表明，对比原模型，采用迁移学习的通道剪枝算法，在精度损失不超过0.10的前提下，参数量减少89.1%，模型存储大小压缩89.3%；对比原剪枝方法，将剪枝阈值从0.85提升到0.97，进一步减少参数42.6%。实验证明，引入迁移策略更易实现充分的稀疏化，提高通道剪枝阈值选取容限，实现更高压缩率；并在迭代剪枝的网络结构搜索过程中，提供更高效的搜索起点，利于快速迭代趋近至搜索空间的一个网络结构局部最优解。

关键词: 卷积神经网络, 迁移学习, 通道剪枝, 网络结构搜索

Abstract: Convolutional neural networks are widely used in many fields like computer vision. However, large number of model parameters and huge cost make many edge devices unable to offer enough storage and computing resource. Aiming at problems above, a migration learning method is introduced to improve the sparsity proportion of the channel pruning method based on the scaling factor of the BN layer. The effects of different levels of migration on the sparsity proportion and channel pruning are compared, and experiments based on the NAS viewpoint are designed to explore its pruning accuracy limit and iterative structure convergence. The results show that compared with the original model, with the accuracy loss under 0.10, the parameter amount is reduced by 89.1%, and the model storage size is reduced by 89.3%. Compared with the original pruning method, the pruning threshold is increased from 0.85 to 0.97, further reducing the parametes by 42.6%. Experiments have proved that the introduction of migration method makes it easier to fully sparse the weights, increases the tolerance of the channel pruning threshold, and gets a higher compression rate. In the pruning network architecture search process, the migration provides a more efficient starting point to search, which seems easy to converge to a local optimal solution of the NAS.

Key words: convolutional neural network, migration learning, channel pruning, neural architecture search

冯敬翔. 基于迁移学习的卷积神经网络通道剪枝[J]. 计算机与现代化, 2021, 0(12): 13-18.

FENG Jing-xiang. Channel Pruning of Convolutional Neural Network Based on Transfer Learning[J]. Computer and Modernization, 2021, 0(12): 13-18.

参考文献

［1］ PRASOON A, PETERSEN K, IGEL C, et al. Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network［C］// 2013 International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 2013:246-253.
［2］ SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556, 2014.
［3］ KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks［C］// 2012 Advances in Neural Information Processing Systems. 2012:1097-1105.
［4］ SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9.
［5］ HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［6］ HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding［J］. arXiv preprint arXiv:1510.00149, 2015.
［7］ LUO J H, WU J X, LIN W Y. Thinet: A filter level pruning method for deep neural network compression［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:5068-5076.
［8］ HE W H, ZHANG X Y, YIN F, et al. Deep direct regression for multi-oriented scene text detection［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:745-753.
［9］ LIU Z, LI J G, SHEN Z Q, et al. Learning efficient convolutional networks through network slimming［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:2755-2763.
［10］HUANG Z H, WANG N Y. Data-driven sparse structure selection for deep neural networks［C］// Proceedings of the 2018 European Conference on Computer Vision (ECCV). 2018:317-334.
［11］LI H, KADAV A, DURDANOVIC I, et al. Pruning filters for efficient convNets［J］. arXiv preprint arXiv:1608.08710, 2016.
［12］HE Y, DONG X Y, KANG G L, et al. Asymptotic soft filter pruning for deep convolutional neural networks［J］. IEEE Transactions on Cybernetics, 2020,50(8):3594-3604.

［13］LECUN Y, DENKER J S, SOLLA S A. Optimal brain damage［C］// 1990 Advances in Neural Information Processing Systems. 1990:598-605.

［14］LIU Z, SUN M J, ZHOU T H, et al. Rethinking the value of network pruning［J］. arXiv preprint arXiv:1810.05270, 2018.
［15］LEE N, AJANTHAN T, TORR P H S. SNIP: Single-shot network pruning based on connection sensitivity［J］. arXiv preprint arXiv:1810.02340, 2018.
［16］YU K C, SCIUTO C, JAGGI M, et al. Evaluating the search phase of neural architecture search［J］. arXiv:1902.08142, 2019.
［17］FRANKLE J, CARBIN M. The lottery ticket hypothesis: Finding sparse, trainable neural networks［J］. arXiv:1803.03635, 2018.
［18］ZHOU H, LAN J, LIU R, et al. Deconstructing lottery tickets: Zeros, signs, and the supermask［C］// Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019:3597-3607.
［19］FRANKLE J, DZIUGAITE G K, ROY D M, et al. Stabilizing the lottery ticket hypothesis［J］. arXiv preprint arXiv:1903.01611, 2019.
［20］GALE T, ELSEN E, HOOKER S. The state of sparsity in deep neural networks［J］. arXiv preprint arXiv:1902.09574, 2019.
［21］YOU Y, GITMAN I, GINSBURG B. Large batch training of convolutional networks［J］. arXiv preprint arXiv:1708.03888, 2017.
［22］GOYAL P, DOLLR P, GIRSHICK R, et al. Accurate, large minibatch SGD: Training imageNet in 1 hour［J］. arXiv preprint arXiv:1706.02677, 2017.
［23］LE T H N, QUACH K G, ZHU C C, et al. Robust hand detection and classification in vehicles and in the wild［C］// 2017 CVPR Workshops. 2017:1203-1210.
［24］THOMPSON J A F, SCHNWIESNER M, BENGIO Y, et al. How transferable are features in convolutional neural network acoustic models across languages?［C］// 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019:2827-2831.
［25］BELKIN M, HSU D, MA S Y, et al. Reconciling modern machine-learning practice and the classical bias-variance trade-off［J］. Proceedings of the National Academy of Sciences of the United States of Awerica, 2019,116(32):15849-15854.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[3]	刘宝宝, 杨菁菁, 陶露, 王贺应. 基于注意力的DSMSC的遥感图像场景分类[J]. 计算机与现代化, 2024, 0(12): 72-77.
[4]	陈雪松1, 李衡1, 王浩畅2. 结合注意力机制和Mengzi模型的短文本分类[J]. 计算机与现代化, 2024, 0(09): 101-106.
[5]	马永, 王俊, 张子健, 赵煜阳, 张靖, 周明. 面向智慧运维系统的改进YOLOv8行为检测算法[J]. 计算机与现代化, 2024, 0(08): 43-48.
[6]	高帅鹏, 王怡凡. 基于图像的群体情绪识别综述[J]. 计算机与现代化, 2024, 0(08): 98-107.
[7]	周宪溪, 牟莉. 基于改进TF-IDF和AGLCNN的新闻长文本分类模型[J]. 计算机与现代化, 2024, 0(08): 120-126.
[8]	杨江1, 孙晓梅1, 许韬2. 基于业务内容构建股票关联关系的股价预测[J]. 计算机与现代化, 2024, 0(07): 21-25.
[9]	刘存莉1, 雷占占2, 郑澳2. 基于循环卷积神经网络的排水管网缺陷检测方法[J]. 计算机与现代化, 2024, 0(07): 26-35.
[10]	李珊, 王林娜, 高丁佳, 宣海波. 基于图神经网络的多层银企网络融合研究[J]. 计算机与现代化, 2024, 0(05): 27-32.
[11]	钟海龙1, 2, 何月顺1, 何璘琳1, 陈杰1, 田鸣3, 郑瑞银4. 基于代价敏感卷积神经网络的加密流量分类#br# #br#[J]. 计算机与现代化, 2024, 0(05): 55-60.
[12]	高埂1, 肖风丽2, 杨飞1. 基于改进MobileNetV3-Small的色素减退性皮肤病诊断[J]. 计算机与现代化, 2024, 0(05): 120-126.
[13]	游嘉靖1, 2, 何月顺1, 何璘琳1, 钟海龙1, 2. 基于AHP-CNN的加密流量分类方法[J]. 计算机与现代化, 2024, 0(04): 83-87.
[14]	胡美辰1, 2, 刘敦龙1, 2, 桑学佳1, 2, 张少杰3, 陈乔4. 面向摄像头视频监控的泥石流发生场景智能识别方法[J]. 计算机与现代化, 2024, 0(03): 41-46.
[15]	许跃雯1, 李明1, 李莉2. 基于对比学习MocoV2的COVID-19图像分类#br#[J]. 计算机与现代化, 2024, 0(02): 81-87.