Computer and Modernization

Previous Articles     Next Articles

A Similarity Algorithm for Chinese Text Based on Semantics

  

  1. 1. College of Information Engineering, Xiangtan University, Xiangtan 411105, China;

     2. Key Laboratory of Intelligent Computing and Information Processing(Xiangtan University), 
     Ministry of Education, Xiangtan 411105, China
  • Received:2015-01-30 Online:2015-04-27 Published:2015-04-29

Abstract:

This paper computes the semantic similarity of words using the HowNet and extracting the text keywords to compute the similarity of the texts. After segmenting
the text and filtering stop words, it calculates the weights of word to extract the key words of the text by combining the gender, word frequency and paragraph frequency of the
word. By calculating the similarity of the keywords, the similarity value of the texts is calculated. The analysis of the significant difference of the experimental results
shows that its accuracy is further improved compared with the traditional semantic algorithm and vector space model algorithm.

Key words: text similarity, semantic, HowNet, keywords, paragraph frequency