基于全局自注意力的小麦图像识别

计算机与现代化 ›› 2022, Vol. 0 ›› Issue (04): 38-44.

基于全局自注意力的小麦图像识别

(四川大学电子信息学院，四川成都610065)

出版日期:2022-05-07 发布日期:2022-05-07
作者简介:何晨曦（1996—），男，河南信阳人，硕士研究生，研究方向：计算机视觉，E-mail: 1824413744@qq.com; 通信作者：王正勇（1969—），女，四川成都人，副教授，硕士生导师，博士，研究方向：图像处理与模式识别，通信与信息处理，计算机视觉，E-mail: wangzheny@scu.edu.cn; 卿粼波(1982—)，男，四川成都人，副教授，博士生导师，博士，研究方向：多媒体通信与信息系统，人工智能与计算机视觉，嵌入式系统，E-mail: qing_lb@scu.edu.cn; 何小海(1964—)，男，四川绵阳人，教授，博士生导师，博士，研究方向：图像处理，模式识别，图像通信，E-mail: nic5602@scu.edu.cn; 吴小强(1971—)，男，四川成都人，高级工程师，硕士，研究方向：计算机应用与模式识别，E-mail: 2396480@qq.com。
基金资助:
国家自然科学基金资助项目(61871278)

Wheat Image Recognition Based on Global Self-attention

(College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China)

Online:2022-05-07 Published:2022-05-07

摘要/Abstract

摘要： 在实际应用场景下，通过图像识别的方式来识别小麦的病虫害具有极大的挑战性。与以往纯粹基于卷积神经网络（Convolutional Neural Network, CNN）的方法相比，将小麦图像转换成一系列视觉语言，并从全局视角进行小麦识别的方法是更可行和实用的。运用Convolutional Visual Transformers（CVT）来解决小麦识别分为2个环节。首先，利用2分支CNN生成的2种特征图来实现注意选择性融合(Attentional Selective Fusion, ASF)。ASF通过融合多个特征和全局-局部注意力来获取有区别的信息，并投射成一系列的视觉语言。其次，受Transformers在自然语言处理方面的成功启发，用全局自注意力来建模这些视觉语言之间的关系。将CVT与经典分类网络LeNet-5、ResNet-18、VGG-16、EfficientNet对比，识别率有所提升，同时该方法具有良好的泛化能力。

关键词: 小麦识别, 全局-局部注意, Transformer, 全局自注意力

Abstract: In the actual application scenario, it is very challenging to identify wheat diseases and pests by image recognition. Compared with the previous methods based solely on convolutional neural network (CNN), the method of converting wheat images into a series of visual languages and recognizing wheat from a global perspective is more feasible and practical. The use of convolutional visual Transformers (CVT) to solve wheat recognition is divided into two links. First, two feature maps generated by two-branch CNN are used to realize attentional selective fusion (ASF). ASF obtains different information by fusing multiple features and global-local attention, and projects it into a series of visual languages. Secondly, inspired by the success of Transformers in natural language processing, global self-attention is used to model the relationship between these visual languages. Compared with classical classification networks LeNet-5, ResNet-18, VGG-16 and EfficientNet, CVT improves the recognition rate, and this method has good generalization ability.

Key words: wheat recognition, global-local attention, Transformer, global self-attention

何晨曦, 王正勇, 卿粼波, 何小海, 吴小强. 基于全局自注意力的小麦图像识别[J]. 计算机与现代化, 2022, 0(04): 38-44.

HE Chen-xi, WANG Zheng-yong, QING Lin-bo, HE Xiao-hai, WU Xiao-qiang. Wheat Image Recognition Based on Global Self-attention[J]. Computer and Modernization, 2022, 0(04): 38-44.

参考文献

［1］ ZAYAS I, POMERANZ, LAI F S. Discrimination between Arthur and Arkan wheats by image analysis［J］. Cereal Chemistry, 1985,62(6):478-480.
［2］ THOMSON W H, POMERANZ Y. Classification of wheat kernels using three-dimensional image analysis［J］. Cereal Chemistry, 1991,68(4):357-361.
［3］ NEETHIRAJAN S, JAYAS D S, WHITE N D G. Detection of sprouted wheat kernels using soft X-ray image analysis［J］. Journal of Food Engineering, 2007,81(3):509-513.
［4］陈丰农. 基于机器视觉的小麦并肩杂与不完善粒动态实时检测研究［D］. 杭州:浙江大学, 2012.
［5］曹婷翠,何小海,董德良,等. 基于CNN深度模型的小麦不完善粒识别［J］. 现代计算机(专业版), 2017(36):9-14.
［6］陈文根. 基于深度学习的小麦图谱特征技术研究［D］. 郑州:河南工业大学, 2018.
［7］祝诗平,卓佳鑫,黄华,等. 基于CNN的小麦籽粒完整性图像检测系统［J］. 农业机械学报, 2020,51(5):36-42.
［8］ LI Y, ZENG J B, SHAN S G, et al. Occlusion aware facial expression recognition using CNN with attention mechanism［J］. IEEE Transactions on Image Processing, 2019,28(5):2439-2450.
［9］ WANG Z N, ZENG F W, LIU S C, et al. OAENet: Oriented attention ensemble for accurate facial expression recognition［J］. Pattern Recognition, 2021,112. DOI: 10.1016/j.patcog.2020.107694.
［10］WEN P Z, DING Y, WEN Y Y, et al. Facial expression recognition method based on convolution neural network combining attention mechanism［C］// Proceedings of the 2020 International Conference on Artificial Intelligence and Security. 2020:136-147.
［11］ZHENG W M. Multi-view facial expression recognition based on group sparse reduced-rank regression［J］. IEEE Transactions on Affective Computing, 2014,5(1):71-85.
［12］LIU Y Y, ZENG J B, SHAN S G, et al. Multi-channel pose-aware convolution neural networks for multi-view facial expression recognition［C］// Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). 2018:458-465.
［13］DIN N U, JAVED K, BAE S, et al. A novel GAN-based network for unmasking of masked face［J］. IEEE Access, 2020,8:44276-44287.〖HJ1.18mm〗
［14］SHEN X B, SUN Q S, YUAN Y H. A unified multiset canonical correlation analysis framework based on graph embedding for multiple feature extraction［J］. Neurocomputing, 2015,148:397-408.
［15］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［16］PENG C, ZHANG X Y, YU G, et al. Large kernel matters: Improve semantic segmentation by global convolutional network［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:1743-1751.
［17］VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:6000-6010.
［18］CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers［C］// Proceedings of the 2020 European Conference on Computer Vision. 2020:213-229.
［19］师小燕,万韬阮,汤汶,等. 姿态估计算法在古建筑室内环境中的应用研究［J］. 计算机与现代化, 2012(11):214-216.
［20］MAHAPATRA D, KUANAR S, BOZORGTABAR B, et al. Self-supervised learning of inter-label geometric relationships for Gleason grade segmentation［M］// Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health. Springer, 2021:57-67.
［21］WANG Y Q, XU Z L, WANG X L, et al. End-to-end video instance segmentation with transformers［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:8737-8746.
［22］PARK S H, LEE G, SEO J, et al. Diverse and admissible trajectory forecasting through multimodal context understanding［C］// Proceedings of the 2020 European Conference on Computer Vision. 2020:282-298.
［23］WU B C, XU C F, DAI X L, et al. Visual Transformers: Token-based image representation and processing for computer vision［J］. arXiv preprint arXiv:2006.03677, 2020.
［24］RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge［J］. International Journal of Computer Vision, 2015,115(3):211-252.
［25］WANG W H, XIE E Z, LI X, et al. Pyramid vision Transformer: A versatile backbone for dense prediction without convolutions［C］// Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 2021:548-558.
［26］YOU J Y, KORHONEN J. Transformer for image quality assessment［C］// Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP). 2021:1389-1393.
［27］LECUN Y. LeNet-5, convolutional neural networks［DB/OL］. ［2021-09-07］. http://yann.lecun.com/exdb/lenet.
［28］ALIPPI C, DISABATO S, ROVERI M. Moving convolutional neural networks to embedded systems: The AlexNet and VGG-16 case［C］// Proceedings of the 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 2018:212-223.
［29］MAHBOD A, SCHAEFER G, WANG C L, et al. Skin lesion classification using hybrid deep neural networks［C］// Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2019:1229-1233.
［30］TAN M X, LE Q V. EfficientNet: Rethinking model scaling for convolutional neural networks［C］// Proceedings of the 36th International Conference on Machine Learning. 2019:6105-6114.

[1]	王海洋, 弓同鑫, 杨锦涛, 陈再龙. 多尺度时间编码的工业园区短期负荷预测[J]. 计算机与现代化, 2024, 0(12): 59-65.
[2]	李岸然1, 方阳阳2, 程慧杰2, 张申申2, 阎金强3, 于腾3, 杨国为3. 基于双流Transformer的单幅图像去雾方法[J]. 计算机与现代化, 2024, 0(03): 78-84.
[3]	李亚平, 王军防, 余红梅, 窦一民, 肖媛, 田继林. Regformer：基于稀疏注意力的输油管道水力压降预测方法[J]. 计算机与现代化, 2024, 0(01): 59-66.
[4]	付鸿林, 张太红, 杨雅婷, 艾孜麦提·艾瓦尼尔, 马博. 基于生成对抗网络的维语场景文字修改网络[J]. 计算机与现代化, 2024, 0(01): 41-46.
[5]	许叶彤, 耿信哲, 赵伟强, 张月, 宁海龙, 雷涛. 基于CNN-Transformer混合结构的遥感影像变化检测模型[J]. 计算机与现代化, 2023, 0(07): 79-85.
[6]	刘静, 陈金广. 基于通道注意力和Transformer的图像标题生成方法[J]. 计算机与现代化, 2023, 0(05): 8-12.
[7]	王洪义, 孔梅梅, 徐荣青. 基于改进YOLOV5的火焰检测算法[J]. 计算机与现代化, 2023, 0(01): 103-107.
[8]	林椹尠, 屈嘉欣, 罗亮. 基于改进的Transformer_decoder的增强图像描述[J]. 计算机与现代化, 2023, 0(01): 7-12.
[9]	李鑫, 任德均, 任秋霖, 曹林杰, 闫宗一. 基于Retinanet的轮毂焊缝检测定位方法[J]. 计算机与现代化, 2022, 0(09): 60-67.
[10]	陈彤, 周登文. 基于多级Transformer的超大倍率重建网络:参考图像超分辨率[J]. 计算机与现代化, 2022, 0(08): 121-126.
[11]	冯茹嘉, 张海军, 潘伟民. 基于情感分析和Transformer模型的微博谣言检测[J]. 计算机与现代化, 2021, 0(10): 1-7.
[12]	李欣晔, 张承强, 周雄图, 郭太良, 张永爱. 多场景融合的细粒度图像描述生成算法[J]. 计算机与现代化, 2021, 0(09): 1-6.
[13]	杨佳雪, 彭国争, 韩立新. 基于用户动态兴趣的视频点击率预测模型[J]. 计算机与现代化, 2021, 0(03): 82-87.