Computer and Modernization ›› 2015, Vol. 0 ›› Issue (3): 48-51,56.doi: 10.3969/j.issn.1006-2475.2015.03.010

Previous Articles     Next Articles

An Improved K-means Optimization Approach for Text Clustering

  

  1. School of Foreign Language, Changshu Institute of Technology, Changshu 215500, China
  • Received:2014-12-09 Online:2015-03-23 Published:2015-03-26

Abstract: K-meansSC as an improved k-means optimization approach for text clustering is proposed. By means of processing of word segmentation, clustering document sets will be treated for extraction of main entry sets. Then the feature vectors of the document are respectively represented by Boolean function and TFIDF function, through the comparison of their respective strengths and weaknesses. Based on the entry set building support degrees matrix and confidence degrees matrix, similar degrees calculation formula can be defined, and under different clustering number conditions the formula and  other distance calculation formula of iteration number and error function of performance situation have been in detailed analysis. Experimental results shows that under certain conditions TFIDF function featuring document vector can effectively improve processing efficiency and clustering effectiveness.

Key words: k-means, similarity, text clustering, support, confidence

CLC Number: