Computer and Modernization ›› 2015, Vol. 0 ›› Issue (3): 20-25.doi: 10.3969/j.issn.1006-2475.2015.03.005

Previous Articles     Next Articles

Calculating Similarity of XML Documents by Weighted Pq-gram Algorithm

  

  1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
  • Received:2014-11-20 Online:2015-03-23 Published:2015-03-26

Abstract: Clustering for XML documents is an important method for efficiently managing XML documents, and calculating similarity of XML documents is the pivotal step. Pq-gram algorithm is an efficient method to solve the problem of calculating similarity of XML documents. However, it ignores that the nodes of XML documents are ordered. Based on the pq-gram algorithm, weighted pq-gram algorithm, in accordance with the structural characteristics of XML documents, sets weight for nodes, and sets weight for pq-grams based on the weight of nodes, then applies the weight to the method of calculating similarity of XML documents. Experimental results show that the weighted pq-gram algorithm describes the contribution of nodes better in the process of calculating similarity of XML documents, and improves the precision of calculating of XML documents.

Key words: XML documents, calculate similarity, pq-gram, weight

CLC Number: