计算机与现代化 ›› 2012, Vol. 203 ›› Issue (7): 120-123.doi: 10.3969/j.issn.1006-2475.2012.07.032

• 网络与通信 • 上一篇    下一篇

基于动态特征库的电子邮件分类的研究

穆俊鹏1,董魁锋2,张明2   

  1. 1.上海出版印刷高等专科学校,上海200093;2.上海海事大学信息工程学院,上海201306
  • 收稿日期:2012-02-24 修回日期:1900-01-01 出版日期:2012-08-10 发布日期:2012-08-10

Research on E-mail Classification Based on Dynamic Characteristics Library

MU Jun-peng1, DONG Kui-feng2, ZHANG Ming2

  

  1. 1. Shanghai Publishing and Printing College, Shanghai 200093, China;2. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China
  • Received:2012-02-24 Revised:1900-01-01 Online:2012-08-10 Published:2012-08-10

摘要: 随着邮件分类技术的不断发展,为了对邮件进行更加有效的组织和管理,需要对不断变化的邮件进行动态特征提取,根据其动态特征对邮件进行分类。从邮件的动态特征方面入手,通过编写邮件客户端程序,利用中科院的ICTCLAS分词工具实现中文邮件的准确分词,利用改进的TFIDF算法对邮件的特征权重进行计算,并利用WEKA挖掘工具进行结果的仿真实验。实验结果表明,利用邮件的动态特征来对邮件进行分类是切实可行的,且在一定程度上能够对邮件进行合理有效的分类。

关键词: 动态特征, 邮件分类, 中文分词, TF-IDF, WEKA, 数据挖掘

Abstract: With the development of E-mail classification technology, it needs to extract from the constantly E-mail features, so as to improve the organization and management of the message category more effective, according to changing characteristics. This article resolves the problem from the aspects of the message’s dynamic characteristics, by using the mail client software, using the ICTCLAS tool to realize Chinese word segmentation, and using the improved TF-IDF algorithm to calculate the mail feature weighting, and also using the WEKA mining tool to examine the result with the simulation experiment. The experimental results show that, by using the dynamic characteristics in a mail message, the realization of changing characteristics in mail classification is feasible, and to a certain extent, this method is more reasonable and effective.

Key words: dynamic characteristics, mail classification, Chinese word segmentation, TF-IDF, WEKA, data mining

中图分类号: