Computer and Modernization

Previous Articles     Next Articles

An Improved Feature Selection Algorithm Based on Category Distinguished Words

  

  1. (School of Computer and Electronic Information, Guangxi University, Nanning 530004, China)
  • Received:2018-09-06 Online:2019-04-08 Published:2019-04-10

Abstract: The traditional category distinguished words(CDW) feature selection algorithm, which takes inter-class dispersion degree and intra-class importance degree as comprehensive metrics, ignores the fact that contribution weights of the two indicators to feature scoring function are often different, and thus affects feature selection efficiency to some extent. A CDW feature selection algorithm combining with balance factor(ICDW) is proposed. During feature selection, the contribution weights of two indicators to feature scoring function are adjusted by continuously adjusting the value of the balance factor to complete more efficient feature selection. Using Nave Bayes classification algorithm for text categorization, experiments show that classification performance of ICDW algorithm not only outperforms that of CDW algorithm, but also exceeds that of ECE, IG and CHI, which are commonly used for feature selection.

Key words: text categorization, feature selection, balance factor, category distinguished words

CLC Number: