计算机与现代化 ›› 2021, Vol. 0 ›› Issue (04): 122-126.

• 信息安全 • 上一篇    

基于BiGRU-Attention-CNN模型的垃圾邮件检测方法

  

  1. (华北计算技术研究所基础一部,北京100083)
  • 出版日期:2021-04-22 发布日期:2021-04-25
  • 作者简介:赵宇轩(1997—),男,吉林长春人,硕士研究生,研究方向:网络安全,计算机体系结构,E-mail: nsczyx@gmail.com; 胡怀湘(1965—),男,研究员,研究方向:计算机体系结构,网络存储,E-mail: huaixianghu@163.com。

Spam Recognition Method Based on BiGRU-Attention-CNN Model

  1. (North China Institute of Computing Technology, Beijing 100083, China)
  • Online:2021-04-22 Published:2021-04-25

摘要: 电子邮件是一种重要的通信工具,但是垃圾邮件问题一直影响着人们日常的工作生活。不断改进垃圾邮件的检测技术、提高垃圾邮件的检测速度和准确率有着重要的研究意义和现实意义。双向门控循环单元(BiGRU)和卷积神经网络(CNN)广泛应用于文本分类领域,二者的结合可以充分发挥BiGRU上下文依赖关系提取能力以及CNN特征提取能力,但是针对垃圾邮件检测问题,还需要考虑邮件中一些特定的词语,因此本文提出一种基于BiGRU-Attention-CNN模型的垃圾邮件检测方法来提高垃圾邮件的检测准确率。模型首先将邮件文本转换成特征向量并进行BiGRU序列化学习,随后引入注意力机制(Attention)对特定词语赋予更大的权重,再将注意力层输入CNN模型,经过卷积、池化、全连接,最终得到分类结果。本文将模型在Trec06c邮件数据集上进行实验,与其他模型进行对比取得了更好的效果,最终模型的准确率达到91.62%。

关键词: 双向门控循环单元, 注意力机制, 卷积神经网络, 垃圾邮件识别

Abstract: E-mail is an important communication tool, but the problem of spam has been affecting peoples daily work and life. Continuously improving spam detection technology and increasing the speed and accuracy of spam detection has important research and practical significance. Bi-directional gated recurrent unit (BiGRU) and convolutional neural network (CNN) are widely used in the field of text classification. The combination of them could give full play to BiGRU context dependency extraction capabilities and CNN feature extraction capabilities. But for the problem of spam recognition, it is also necessary to consider some specific words in the email. So this article proposes a spam recognition method based on the BiGRU-Attention-CNN model to improve the accuracy of spam detection. The model first converts the email text into feature vectors and performs BiGRU serialization learning, and then introduces the attention mechanism (Attention) to give greater weight to specific words. After the attention layer is input to the CNN model, through convolution, pooling, and full connection, the classification result is finally obtained. The model is tested on the Trec06c mail data set and compared with other models, better results are achieved. The final accuracy of the model is 91.62%.

Key words: BiGRU, attention, CNN, spam recognition