Computer and Modernization ›› 2014, Vol. 0 ›› Issue (9): 6-9.doi: doi: 10.3969/j.issn.10062475.2014.09.002

Previous Articles     Next Articles

Research on Text Categorization Based on Improved TFIDF Algorithm

  

  1. School of Economics and Management, Tongji University, Shanghai 200092, China
  • Received:2014-06-25 Online:2014-10-10 Published:2014-11-04

Abstract: Due to the broad application of text categorization in information retrieval, email filtering, Web page classification, personalized recommendation and other fields, it raised extensive attention among scholars since the concept of text categorization was presented. In text classification research, scholars have adopted a lot of methods, and TFIDF was one of the most commonly used algorithms to calculate the weight of feature items. But the traditional TFIDF algorithm ignored the distribution of feature items within classes and among classes, leading to high weight given to many items with little discrimination. In this paper, with the purpose of improving the traditional TFIDF algorithm, we modified the calculation method of IDF, adding some factors which reflected the distribution of feature items within classes and among classes. In the experiment, we applied the improved TFIDF algorithm into text categorization. By investigating the effect of text classification, the improving algorithm was verified valid.

 

Key words: TFIDF algorithm, feature items selection, text categorization

CLC Number: