Computer and Modernization

Previous Articles     Next Articles

An Improved Text Clustering Algorithm Based on Latent Semantic Indexing

  

  1. (Department of Information Engineering, University for Science & Technology Zhengzhou, Zhengzhou 450064, China)
  • Received:2014-04-21 Online:2014-07-16 Published:2014-07-17

Abstract: This paper presents an improved text clustering algorithm based on latent semantic indexing. This algorithm introduces the theory of latent semantic index, improves the traditional SOM algorithm. By using the latent semantic indexing text feature vector representation theory, we mine the semantic structure relationships hidden among the words in text, thereby eliminating the correlation among words, to reduce the feature vector dimension. The limitations of the traditional SOM algorithm are improved to accurately give the number of clustering classes. Experimental results show that the clustering effect of this algorithm is better, and the clustering time is less.

Key words: text clustering, latent semantic index, self-organizing maps

CLC Number: