Efficient Filtering of XML Documents with XPath Expression - - PowerPoint PPT Presentation

▶

Feb 05, 2024 178 likes •361 views

Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son

SLIDE 1

Page 1

Efficient Filtering of XML Documents with XPath Expression

Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com

Speaker: Lam-Son LE LamSon.Le@epfl.ch, EPFL, I&C Doctoral School, WS 2002/2003 Distributed Information Processing

SLIDE 2

Page 2

Outline

Introduction

– publish/subscribe systems, “bags of words” vs. XPath language

Background

– XPE-tree, unordered/ordered matching

XPE Decompositions and Matchings

– substring/minimal/simple decomposition, substring-tree

The XTrie Indexing Scheme

– substring table, Trie – matching algorithm

Evaluation

– comparison with XFilter

SLIDE 3

Page 3

Introduction

Selective data dissemination

– publishers selectively deliver data to subscribers

Simple matching schema: “bags of words”
XML data emergence

– XPath as filter-specification language, XPE – XPath Expression – Retrieval problem: Given a collection P of XPEs and an input XML document D, find the subset of XPEs in P that match D.

XTrie based on XPath expressions
XTrie efficiently filters XML documents

– Indexing on a set of substrings rather than individual element – support both ordered and unordered matching

SLIDE 4

Page 4

XML documents as trees

– root element, sub elements can be nested to any depth – level(root) = 1, level(d) = level(d’) + 1 if d’ is the parent of d

XPath expressions (XPEs)

– “/”: parent/child operator – “//”: ancestor/descendant operator – “”: wildcard operator – “[”, “]”: delimiting a predicate – example: p = //a//b[/c]/d – 2 patterns: path pattern and tree pattern

Background (1/3)

SLIDE 5

Page 5

Background (2/3)

XPE-tree

– predicate expressions give rise to branches of the tree – XPE-tree can have order if the elements in XPE are supposed to be ordered – relative level of a node in XPE-tree

relLevel(ti) = [k, ∝] if ti is prefixed with “//” followed by (k-1) “*”

a range

relLevel(ti) = [k, k] if ti is prefixed with “/” followed by (k-1) “*”

a precise value

SLIDE 6

Page 6

Background (3/3)

Unordered matching

– set of nodes with names matched – level differences of match nodes are according to relative level

Ordered matching is

stronger: the order of elements in the XPE-tree is taken into account

Matching example

– p = //a//b[*/c]/d – {a2, b4, c6, d7} is an ordered matching of D to p

/d [1,1] /*/c [2,2] //b [1,∝] //a [1,∝] b1 a2 b10 b3 b4 f8 e5 d7 c9 c6

XPE-tree T XML tree D

SLIDE 7

Page 7

XPE Decompositions (1/3)

Substring of an EXP

– a possible concatenation of node separated by “/” – example: p = /a/b[c/d//e][g//e/f]////e/f. Possible substrings: abg, bcd, ef, b

Substring decomposition: set of substring that

cover all nodes in XPE tree

Minimal decomposition: one substring couldn’t

be a prefix of another

– advantage: substring as longest pas possible, resulting in lower probability of being found and matched

SLIDE 8

Page 8

XPE Decompositions (2/3)

Simple decomposition: add a substring for each

branching node to the minimal decomposition

Substring-tree: nodes are substrings from simple

decomposition

– parent if a prefix of the child or – the last element of parent substring is the parent node

f the first element of the child substring
Relative level is extended to substrings

– computed based on the relative level of the different elements between the given substring and its parent

SLIDE 9

Page 9

XPE Decompositions (3/3)

Example for

p = /a/b[c/d//e][g//e/f]////e/f

/*/*/e /c /b /a /g /d //e //e /f /f /*/*/e /c /b /a /g /d //e //e /f /f ab abcd e ef abg ef

Minimal decomposition Simple decomposition Substring-tree

SLIDE 10

Page 10

Matching with Substrings (1/2)

A substring matches a node in XML document if its last

element match that node

Typically, XML documents are parsed in pre-order (SAX

parser). Substrings should also be ordered by pre-order traversal of the substring-tree

Partial matching: matching for all consecutive substrings

from the first to the given substring

Complete matching: partial matching for the final

substring

Subtree-matching: partial matching found at all

descendants of the given substring

Redundant matching: subtree-matching found at some

earlier node in the XML document

SLIDE 11

Page 11

Matching with Substrings (2/2)

Again, p = //a//b[*/c]/d

– s1 = a, s2 = b, s3 = c, s4 = db – matching at c9 and b10 are redundant

b1 a2 b10 b3 b4 f8 e5 d7 c6

substring-tree XML tree D

s1 = a [1,∝] s2 = b [1,∝] s3 = c [2,2] s4 = bd [1,1] c9 (s1) (s2) (s3) (s4) (s3) (s2)

SLIDE 12

Page 12

XTrie Indexing Schema (1/2)

XTrie indexing schema built for a set of XPEs

– derive the simple decomposition for all XPEs – associated them with relative levels

Consists of 2 data structures

– Trie T: a tree where edges are labeled with element name in the XML document – Substring-Table ST: each row represents a substring

SLIDE 13

Page 13

XTrie Indexing Schema (2/2)

1 2 3 4 5 7 8 9 10 6 11 12 13 14 15 a b c d a b c b d b c d c e cb cd d ab abc d bc ab abce bcd aabc ab substring 10 11 12 6 7 8 9 3 4 5 1 2 Index 1 1 1 1 1 [2, ∝] [2, ∝] [3, 3] 10 11 12 2 1 1 1 1 2 [2, 2] [1, 1] [2, 2] [2, ∝] 6 7 6 6 2 1 1 2 [2, 2] [2, 2] [4, 4] 3 3 3 1 1 1 [4, ∝] [3, 3] 1 Next row Number of children Rank Relative Level Parent row

Example 2 p1 = //a/a/b/c//a/b p2 = /a/b[c/e]//b/c/d p3 = /a/b[c//d]//b/c p4=//c/b//c/d//*/d

0 1 0 1 0 1 0 1 8 1 0 2 2 3 9 4 4 1 11 5 0 7 7 8 5 10 1 12 10 3

SLIDE 14

Page 14

XTrie Matching Algorithm (1/2)

Based on SAX to get notified when an element name is

parsed

Requires another 2-dimension array sized <number of

rows in ST> × <maximum level of XML document>

B[s, l] is

– is initialized to 0 at the beginning – incremented by 1 if non-redundant matching of s at level l is found – reset to 0 when end-tag at level l is parsed

An XPE p match the XML document if B[rs, l] = m + 1 for

some level l, where

– rs is the root substring in the substring-tree for p – m is the number of child substring of rs

SLIDE 15

Page 15

XTrie Matching Algorithm (2/2)

1 2 3 4 5 a b c d a b c bd substring 1 2 3 4 Index 1 2 1 1 1 2 [1, ∝] [1, ∝] [2,2] [1,1] 1 2 2 Next row Number of children Rank Relative Level Parent row

Again,

p = //a//b[*/c]/d

0 1 1 1 2 1 3 1 4 1

b1 a2 b10 b3 b4 f8 e5 d7 c6 c9

SLIDE 16

Page 16

Evaluation

In comparison with XFilter (using hashtable on single element

names)

0 100 200 300 400 500 1500 1000 500

Varying P (L=20, pw=0.1, pd=0.1, pb=0)

Filtering Time(ms) 20 100 1000 4000 3000 2000 1000

Varying doc. length (P=100k, L=20, pw=0.1, pd=0.1, pb=0)

Filtering Time(ms)

SLIDE 17

Page 17