Computer and Modernization ›› 2021, Vol. 0 ›› Issue (05): 66-72.

Previous Articles     Next Articles

Penalized Matrix Decomposition Based on CLSVSM and Its Application in Text Topic Clustering

  

  1. (School of Mathematical Sciences,Shanxi University, Taiyuan 030006, China)
  • Online:2021-06-03 Published:2021-06-03

Abstract: Reasonable representation of text information plays an important role in text topic clustering and retrieval. Aiming at the problem of high dimension of text representation model, penalized matrix decomposition (PMD) is studied based on the co-occurrence potential semantic vector space model (CLSVSM), and the vector is sparsely constrained by PMD to extract core features, so as to realize the reconstruction of original data. Through co-occurrence analysis theory and PMD method, the semantic information between features is deeply mined and the semantic kernel function (PMD_K) is constructed. The methods proposed in this paper are applied to text topic clustering, the experimental results show that the clustering effect of PMD and PMD_K is obviously better than that of other methods. Taking the F value as an example, the F value of PMD_K method is 21.9% higher than that of the previous 95%CLSVSM_K method. Combining PMD with text representation model not only improves the efficiency and accuracy of text topic clustering, but also avoids the complex computation of high-dimensional matrix.

Key words: CLSVSM(Co-occurrence Latent Semantic Vector Space Model), PMD, semantic kernel function, text topic clustering