Spam Recognition Method Based on BiGRU-Attention-CNN Model

Abstract

Abstract: E-mail is an important communication tool, but the problem of spam has been affecting peoples daily work and life. Continuously improving spam detection technology and increasing the speed and accuracy of spam detection has important research and practical significance. Bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) are widely used in the field of text classification. The combination of them could give full play to BiGRU context dependency extraction capabilities and CNN feature extraction capabilities. But for the problem of spam recognition, it is also necessary to consider some specific words in the email. So this article proposes a spam recognition method based on the BiGRU-Attention-CNN model to improve the accuracy of spam detection. The model first converts the email text into feature vectors and performs BiGRU serialization learning, and then introduces the attention mechanism (Attention) to give greater weight to specific words. After the attention layer is input to the CNN model, through convolution, pooling, and full connection, the classification result is finally obtained. The model is tested on the Trec06c mail data set and compared with other models, better results are achieved. The final accuracy of the model is 91.62%.

Key words: BiGRU, attention, CNN, spam recognition

ZHAO Yu-xuan, HU Huai-xiang. Spam Recognition Method Based on BiGRU-Attention-CNN Model[J]. Computer and Modernization, 2021, 0(04): 122-126.

References

［1］林延中,裴智勇,刘川琦,等. 2019年中国企业邮箱安全性研究报告［R］. 北京:奇安信创新团队, 2020.
［2］申超. 反垃圾邮件新技术在新华网电子邮局中的应用研究［J］. 中国传媒科技, 2013(15):58-61.
［3］林建洪,翟建桐,徐菁. 融合LDA与Word2vector的垃圾邮件过滤方法研究［J］. 网络安全技术与应用, 2017(3):73-75.
［4］王鹿,李志伟,朱成德,等. 基于朴素贝叶斯算法的垃圾邮件过滤研究［J］. 传感器与微系统, 2020,39(9):46-48.
［5］吴小晴,万国金,李程文,等. 一种改进TF-IDF的中文邮件识别算法研究［J］. 现代电子技术, 2020,43(12):83-86.
［6］黄鹤,荆晓远,董西伟,等. 基于Skip-gram的CNNs文本邮件分类模型［J］. 计算机技术与发展, 2019,29(6):143-147.
［7］周枝凝,王斌君,翟一鸣,等. 基于ALBERT动态词向量的垃圾邮件过滤模型［J］. 信息网络安全, 2020,20(9):107-111.
［8］ KIM Y. Convolutional neural networks for sentence classification［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014:1746-1751.
［9］迟殿委. 基于Python的网页图片爬取［J］. 电脑编程技巧与维护, 2019(5):129-130.
［10］官琴,邓三鸿,王昊. 中文文本聚类常用停用词表对比研究［J］. 数据分析与知识发现, 2017(3):72-80.
［11］徐博龙. 应用Jieba和Wordcloud库的词云设计与优化［J］. 福建电脑, 2019,35(6):25-28.
［12］景栋盛,薛劲松,冯仁君. 基于深度Q网络的垃圾邮件文本分类方法［J］. 计算机与现代化, 2020(6):89-94.
［13］HARRIS D, HARRIS S. Digital Design and Computer Architecture［M］. Morgan Kaufmann, 2010.
［14］MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space［J］. arXiv preprint arXiv:1301.3781, 2013.
［15］LUO Q, XU W R, GUO J. A study on the CBOW models overfitting and stability［C］// Proceedings of the 5th International Workshop on Web-scale Knowledge Representation Retrieval & Reasoning. 2014:9-12.
［16］HOCHREITER S, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［17］CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［C］// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014:1724-1734.
［18］胡玉琦,李婧,常艳鹏,等. 引入注意力机制的BiGRU-CNN情感分类模型［J］. 小型微型计算机系统, 2020,41(8):1602-1607.
［19］ZHANG Y, WALLACE B. A sensitivity analysis of (and practitioners guide to) convolutional neural networks for sentence classification［C］// Proceedings of the 8th International Joint Conference on Natural Language Processing. 2017:253-263.
［20］季威志,薛雷. 基于BiGRU-CNN-Attention模型的股市评论情感分析［J］. 工业控制计算机, 2020,33(4):70-72.
［21］徐娟,卞良. 基于SVM的中文垃圾邮件预测系统研究［J］. 数字技术与应用, 2020,38(1):38-39.
［22］郑诚,薛满意,洪彤彤,等. 用于短文本分类的DC-BiGRU_CNN模型［J］. 计算机科学, 2019,46(11):186-192.
［23］吴小晴. 基于CNN的双向LSTM注意力机制垃圾邮件分类的研究与分析［D］. 南昌:南昌大学, 2020.

[1]	LIU Ziyang, JIA Huizhen, WANG Tonghan. No-reference Image Quality Assessment Based on DenseNet and Meta-learning [J]. Computer and Modernization, 2025, 0(12): 81-87.
[2]	YAO Li1, 2, ZHAN Bosi3, WAN Weiguo3, LUO Yitao3, YANG Yuxian4. Face Sketch-photo Synthesis Network Based on Attention Mechanism [J]. Computer and Modernization, 2025, 0(12): 97-106.
[3]	HUANG Yeqin. Medical Image Registration Network Based on Efficient Cross-attention [J]. Computer and Modernization, 2025, 0(12): 107-114.
[4]	ZHU Wenji1, BAN Weihua2, ZOU Lin3, LIU Xu3. Improved V-SLAM Method for Transmission Line Inspection UAV Based on LightGlue Network [J]. Computer and Modernization, 2025, 0(11): 41-48.
[5]	WAN Chengkai1, AN Gaoyun2, CUI Lan3. Crowd Counting Estimation Algorithm of Railway Stations Based on Improved P2PNet [J]. Computer and Modernization, 2025, 0(11): 58-64.
[6]	SUN Erjie1, ZHANG Qifeng2, WANG Deqing3. Substation Equipment Defect Detection Based on Lightweight YOLOv8 [J]. Computer and Modernization, 2025, 0(11): 65-70.
[7]	YANG Anbo1, ZHONG Guoyun1, LIU Meifeng2, XI Chao2, ZHANG Wei1, DING Peng1. LSGI-YOLOv8: Ceramic Tile Surface Defect Detection Algorithm Based on Lightweight YOLOv8 [J]. Computer and Modernization, 2025, 0(11): 80-88.
[8]	SHI Hongyu, ZHANG Zheyu, DU Wen, LI Yi. Fusion of Spatial Information for YOLOv7 Traffic Sign Detection [J]. Computer and Modernization, 2025, 0(10): 7-13.
[9]	ZHOU Bangyuan1, XIN Guojiang1, LIANG Hao2, DING Changsong1. Pancreas Segmentation Based on Two-stage Network of Multiple Attention Mechanisms [J]. Computer and Modernization, 2025, 0(10): 67-72.
[10]	LIU Rongcheng1, XIN Guojiang1, ZHANG Yang1, ZHU Lei2. Natural Environment Tongue Image Segmentation Method Based on Improved Labv3+ Model [J]. Computer and Modernization, 2025, 0(10): 32-36.
[11]	GUO Jinhao, WANG Fengping, WANG Haoqi. PB-YOLOv7 Pedestrian Detection Method for Dense Scenes [J]. Computer and Modernization, 2025, 0(09): 14-19.
[12]	ZHANG Jingying1, GENG Lin2, LIU Ningzhong2. Low-data Fine-grained Image Classification Based on Self-distillation and Self-attention Enhancement [J]. Computer and Modernization, 2025, 0(09): 27-34.
[13]	LIN Ruizi, YAO Da, DAI Xin, SHEN Guoyu, WANG Jiahui, WAN Weiguo. Facial Sketch Image Conversion Based on CycleGAN and Attention Mechanism [J]. Computer and Modernization, 2025, 0(09): 61-66.
[14]	LIU Cheng, FENG Guang. CNN-BiLSTM and LightGBM Stock Prediction Based on Dual Attention Mechanism [J]. Computer and Modernization, 2025, 0(09): 97-103.
[15]	JING Qingwu, CHEN Hongjun, GAO Di, ZHOU Meimei. Goal Driven Recommendation-oriented Dialog Generation Method [J]. Computer and Modernization, 2025, 0(08): 16-23.