Computer and Modernization

Previous Articles     Next Articles

Research on Improvement of Feature Weights in Text Classification

  

  1. (School of Computer Science and Engineering, Xian Technological University, Xian 710021, China)
  • Received:2017-09-29 Online:2018-03-08 Published:2018-03-09

Abstract: In order to overcome the shortcomings of traditional TF-IDF (Term Frequency Inverse Document Frequency) algorithm, the improved TF-IDF-dist algorithm is proposed by using the distribution of feature words. The experimental results show that the improved algorithm has an average increase of F1 value by 3.2% in the different feature dimensions. With the different feature selection algorithm, the F1 value is increased by 2.75% and the improved TF-IDF-dist algorithm has more adaptability on the imbalance datasets. It shows the validity of the algorithm in text classification.

Key words: machine learning, text classification, feature weights, TF-IDF

CLC Number: