Computer and Modernization

Previous Articles     Next Articles

An Improved CHI Text Feature Selection Algorithm

  

  1. College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China
  • Received:2016-04-12 Online:2016-11-15 Published:2016-11-23

Abstract: In the process of text classification, feature selection algorithm is a greatly important part. CHI statistics is a classical feature selection method, but it has some defects. Aiming at the shortage of CHI statistics algorithm, on the one hand, in order to take into account the document frequency and word frequency of items, word frequency factor and variance among classes were introduced into CHI algorithm. On the other hand, in order to exclude the items which rarely appear in the specified class and largely appear in other classes, and reduce the error of artificially selecting scaling factor, the adaptive scaling factor was introduced into CHI algorithm. The results show that the improved CHI feature selection algorithm is superior to CHI statistics algorithm in the unbalanced corpus.

Key words: CHI statistics, word frequency factor, variance among classes, adaptive scaling factor

CLC Number: