Computer and Modernization

Previous Articles     Next Articles

An Efficient XML Pattern Matching Algorithm for Supporting Wildcard Query

  

  1. (College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)

  • Received:2015-11-20 Online:2016-04-14 Published:2018-09-30

Abstract:

In XML query language, the wildcard query which includes “*” can effectively meet some special query requirements. But in the big data era, with the increasing of the XML file size and structural complexity, the existing algorithms which support wildcard query need huge amounts of memory to parse XML file and also need many single path matching operations and local result caching. Aiming at this situation, we propose a new XML pattern matching algorithm named WTwigList to solve the twig pattern containing the wildcard effectively. First, the hierarchical relationship of wildcard in the query pattern is processed to reduce unnecessary wildcard matching. Then the XML file is parsed as data stream pattern and the local Extended Dewey encoding is executed. After filtering operation, the ordered list of leaf node encoding is gotten, and the matching results can get from the list matching operations. A set of experimental result on both reallife and synthetic dataset demonstrates that WTwigList improves query efficiency andis of advantages in space efficiency, and it can deal with the P-C relationship quickly and accurately.

Key words: wildcard query, stream data processing, Extended Dewey Encoding, XML pattern matching

CLC Number: