计算机与现代化 ›› 2010, Vol. 1 ›› Issue (10): 125-128,.doi: 10.3969/j.issn.1006-2475.2010.10.033

• 网络与通信 • 上一篇    下一篇

一种改进的贝叶斯邮件过滤算法

夏 超,徐德华   

  1. 同济大学经济与管理学院,上海 200092
  • 收稿日期:2010-08-12 修回日期:1900-01-01 出版日期:2010-10-21 发布日期:2010-10-21

An Improved Bayesian Mail Filtering Algorithm

XIA Chao, XU De-hua   

  1. College of Economics and Management, Tongji University, Shanghai 200092, China
  • Received:2010-08-12 Revised:1900-01-01 Online:2010-10-21 Published:2010-10-21

摘要: 贝叶斯过滤算法是反垃圾邮件过滤技术中应用最为广泛的方法之一。考虑到邮件的错误分类对邮件接收者带来的损失不同,引入判定垃圾邮件是判定正常邮件的λ倍作为最终邮件分类依据;同时,为了提高贝叶斯过滤算法的分类质量,运用遗传算法来对邮件中正文和标题的特征词在邮件分类中不同的重要程度做区分。最后用实际的邮件样本对改进后的算法进行验证,验证结果表明,利用遗传算法优化配合贝叶斯过滤算法能有效提高邮件分类的质量。

关键词: 贝叶斯, 反垃圾邮件, 遗传算法

Abstract: Bayesian filtering algorithm is one of most widely used methods of anti-spam filtering technology. Taking into account the fact that the wrong classification of the mail causes different losses to recipients, so introducing a message that if judging as a spam mail is λ times that of judging as a normal mail, it can conclude that this is a spam mail. Meanwhile, in order to improve the quality of classification, the paper uses genetic algorithm to distinguish between tokens in the body and tokens in the subject. Finally, using the sample to validate the improved algorithm, the result shows that using new algorithm can improve the quality of the message classification.

Key words: Bayesian, anti-spam mail, genetic algorithm

中图分类号: