Computer and Modernization ›› 2021, Vol. 0 ›› Issue (05): 59-65.

Previous Articles     Next Articles

Feature Weighted CLSVSM

  

  1. (School of Mathematical Sciences, Shanxi University, Taiyuan 030006, China)
  • Online:2021-06-03 Published:2021-06-03

Abstract: The rational and effective representation of document information using spatial vectors has a larger impact on text clustering and retrieval results. The Co-occurrence Latent Semantic Vector Space Model (CLSVSM) deeply excavates the co-occurrence latent semantic information between document feature words and improves the performance of document clustering. Based on CLSVSM, this paper first introduces word frequency information, then, the introduced word frequency is used as a weight to assign the co-occurrence strength in CLSVSM, and finally constructs feature weighted CLSVSM. The clustering effect of feature weighted CLSVSM on Chinese data is as follows: compared with CLSVSM and Word2vec text models, the F value is increased respectively by nearly 2.4% and 5.2%; compared with 90%CLSVSM_K and Word2vec text models, the entropy value is reduced respectively by nearly 3.1% and 9.0%; compared with the word frequency CLSVSM and TF-IDF models, the clustering effect is improved. The clustering effect of feature weighted CLSVSM on English data is similar to that of other models. The stability of feature weighted CLSVSM needs to be improved, which is limited by the completeness of keyword frequency information expression.

Key words: CLSVSM, feature weighted, TF-IDF, clustering