Computer and Modernization

Previous Articles     Next Articles

Text Similarity Calculation Method Based on Levenshtein and TFRSF

  

  1. (1. School of Information Science and Technology, Northeast Normal University, Changchun 130117, China;
    2.Key Laboratory of Intelligent Information Processing in Jilin Universities, Changchun 130117, China)
  • Online:2018-04-28 Published:2018-05-02

Abstract:  Finding and collecting personal information in social networks can establish the information system with the curriculum vitae, life, hobbies, friends and the other attributes. But there are lots of same name users in different social networks. In order to solve the ambiguity of the same name, we calculate the user information similarity to decide whether they belong to the same person. If the information describing the document position is reversed, it will lead to computer misjudgment. In order to solve this problem, the Levenshtein and TFRSF methods are used to calculate the word frequency and edit distance to judge whether the attribute values are the same. The experimental results show that the proposed method of calculating the similarity of texts improves the accuracy of various evaluation indexes. The precision, recall and F1 of this method are more than 87%.

Key words: personal information, social network, Levenshtein, TFRSF, similarity

CLC Number: