计算机与现代化

• 网络与通信 • 上一篇    下一篇

一种基于条件熵的垃圾邮件过滤算法

  

  1.  
    (1.渤海大学,辽宁 锦州 121000; 2.沈阳大学,辽宁 沈阳 110044)
  • 收稿日期:2013-10-16 出版日期:2014-02-14 发布日期:2014-02-14
  • 作者简介:翟军昌(1978-),男,辽宁东港人,渤海大学讲师,博士研究生,研究方向:机器学习; 车伟伟(1980-),女,辽宁丹东人,沈阳大学副教授,博士,研究方向:量化控制。
  • 基金资助:
    国家自然科学基金资助项目(61104106)

 
A Spam Filtering Algorithm Based on Conditional Entropy

  1.  
    (1. Bohai University, Jinzhou 121000, China; 2. Shenyang University, Shenyang 110044, China)
  • Received:2013-10-16 Online:2014-02-14 Published:2014-02-14

摘要: 在垃圾邮件过滤中,针对过滤器对合法邮件的误判问题,提出一种改进的垃圾邮件过滤算法。该算法对信息增益的条件熵估计方法作了改进,结合最小风险贝叶斯决策方法,在英文语料库上进行实验,并采用召回率和正确率对算法进行评价分析。实验结果表明,改进后的方法可提高过滤器对合法邮件的识别能力,降低对合法邮件的误判,减少用户的损失。

关键词: 垃圾邮件, 信息增益, 条件熵, 最小风险

Abstract: In spam filtering, according to the filter misjudgment for legitimate mails, we put forward an improved spam filtering algorithm, which improves the conditional entropy estimation method of information gain. Combined with the Bayes minimum risk decision method, we analyze the algorithm through the recall and accuracy by carrying out an experiment on the English Corpus. Experimental results show that the improved algorithm can enhance the classification precision and reduce the misjudgment of legitimate emails, which can reduce the loss of users.

Key words: spam, information gain, conditional entropy, minimum risk

中图分类号: