计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

基于文本过滤的贝叶斯分类算法的改进

  

  1.  (解放军信息工程大学,河南郑州450000)
  • 收稿日期:2016-03-22 出版日期:2016-09-12 发布日期:2016-09-13
  • 作者简介:路金泉(1991-),男,山西临汾人,解放军信息工程大学硕士研究生,研究方向:信息安全; 徐开勇(1963-),男,研究员,博士,研究方向:信息安全,可信计算; 戴乐育(1990-),男,助教,研究方向:信息安全,密码协处理器。

Improvement of Bayes Classification Algorithm Based on Text Filtering

  1. (The PLA Information Engineering University, Zhengzhou 450000, China)
  • Received:2016-03-22 Online:2016-09-12 Published:2016-09-13

摘要: 针对传统贝叶斯分类算法无法满足复杂网络文本过滤需求,提出一种多词贝叶斯分类算法(Multi Word-Bayes,MWB)。该算法一方面引入了特征权重(Term Frequency-Inverse Document Frequency,TF-IDF)的计算思想,优化了传统贝叶斯分类算法只考虑词频不考虑文本间关系的问题;另一方面将词与词间的关系作为文本分类的重要参考项,克服了传统贝叶斯分类算法在分类器训练上对语义分析的忽视。实验结果表明,MWB在垃圾文本过滤上具有更好的分类性能。

关键词: 贝叶斯分类算法; TF-IDF, 语义分析; 文本过滤

Abstract: As the complexity of the network, traditional Bayes classification algorithm cannot meet the demand of text filtering. Multi Word-Bayes (MWB) classification algorithm is proposed. On the one hand, Term Frequency-Inverse Document Frequency (TF-IDF) feature weight is introduced in MWB algorithm to optimize the traditional Bayes algorithm which only considers the problem of word frequency, but doesn’t consider the relationship between the texts. On the other hand, the new algorithm views the relationship between the word and the word as an important reference, which overcomes the traditional Bayes classification algorithm ignoring the semantic analysis on the classifier training. Experiment results show that MWB classification algorithm is of better classification effect on the text filtering.

Key words: Bayes classification algorithm, TF-IDF, semantic analysis, text filtering

中图分类号: