Efficient Filtering of XML Documents with XPath Expression - - PowerPoint PPT Presentation

efficient filtering of xml documents with xpath expression
SMART_READER_LITE
LIVE PREVIEW

Efficient Filtering of XML Documents with XPath Expression - - PowerPoint PPT Presentation

Efficient Filtering of XML Documents with XPath Expression Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com Speaker: Lam-Son


slide-1
SLIDE 1

Page 1

Efficient Filtering of XML Documents with XPath Expression

Authors: Chee-Yong Chan, Pascal Felber, Minos Garofalakis, Rajeev Rastogi Bell Laboratories, Lucent Technologies {cychan,pascal,minos,rastogi}@research.bell-labs.com

Speaker: Lam-Son LE LamSon.Le@epfl.ch, EPFL, I&C Doctoral School, WS 2002/2003 Distributed Information Processing

slide-2
SLIDE 2

Page 2

Outline

  • Introduction

– publish/subscribe systems, “bags of words” vs. XPath language

  • Background

– XPE-tree, unordered/ordered matching

  • XPE Decompositions and Matchings

– substring/minimal/simple decomposition, substring-tree

  • The XTrie Indexing Scheme

– substring table, Trie – matching algorithm

  • Evaluation

– comparison with XFilter

slide-3
SLIDE 3

Page 3

Introduction

  • Selective data dissemination

– publishers selectively deliver data to subscribers

  • Simple matching schema: “bags of words”
  • XML data emergence

– XPath as filter-specification language, XPE – XPath Expression – Retrieval problem: Given a collection P of XPEs and an input XML document D, find the subset of XPEs in P that match D.

  • XTrie based on XPath expressions
  • XTrie efficiently filters XML documents

– Indexing on a set of substrings rather than individual element – support both ordered and unordered matching

slide-4
SLIDE 4

Page 4

  • XML documents as trees

– root element, sub elements can be nested to any depth – level(root) = 1, level(d) = level(d’) + 1 if d’ is the parent of d

  • XPath expressions (XPEs)

– “/”: parent/child operator – “//”: ancestor/descendant operator – “*”: wildcard operator – “[”, “]”: delimiting a predicate – example: p = //a//b[*/c]/d – 2 patterns: path pattern and tree pattern

Background (1/3)

slide-5
SLIDE 5

Page 5

Background (2/3)

  • XPE-tree

– predicate expressions give rise to branches of the tree – XPE-tree can have order if the elements in XPE are supposed to be ordered – relative level of a node in XPE-tree

  • relLevel(ti) = [k, ∝] if ti is prefixed with “//” followed by (k-1) “*”

a range

  • relLevel(ti) = [k, k] if ti is prefixed with “/” followed by (k-1) “*”

a precise value

slide-6
SLIDE 6

Page 6

Background (3/3)

  • Unordered matching

– set of nodes with names matched – level differences of match nodes are according to relative level

  • Ordered matching is

stronger: the order of elements in the XPE-tree is taken into account

  • Matching example

– p = //a//b[*/c]/d – {a2, b4, c6, d7} is an ordered matching of D to p

/d [1,1] /*/c [2,2] //b [1,∝] //a [1,∝] b1 a2 b10 b3 b4 f8 e5 d7 c9 c6

XPE-tree T XML tree D

slide-7
SLIDE 7

Page 7

XPE Decompositions (1/3)

  • Substring of an EXP

– a possible concatenation of node separated by “/” – example: p = /a/b[c/d//e][g//e/f]//*/*/e/f. Possible substrings: abg, bcd, ef, b

  • Substring decomposition: set of substring that

cover all nodes in XPE tree

  • Minimal decomposition: one substring couldn’t

be a prefix of another

– advantage: substring as longest pas possible, resulting in lower probability of being found and matched

slide-8
SLIDE 8

Page 8

XPE Decompositions (2/3)

  • Simple decomposition: add a substring for each

branching node to the minimal decomposition

  • Substring-tree: nodes are substrings from simple

decomposition

– parent if a prefix of the child or – the last element of parent substring is the parent node

  • f the first element of the child substring
  • Relative level is extended to substrings

– computed based on the relative level of the different elements between the given substring and its parent

slide-9
SLIDE 9

Page 9

XPE Decompositions (3/3)

  • Example for

p = /a/b[c/d//e][g//e/f]//*/*/e/f

/*/*/e /c /b /a /g /d //e //e /f /f /*/*/e /c /b /a /g /d //e //e /f /f ab abcd e ef abg ef

Minimal decomposition Simple decomposition Substring-tree

slide-10
SLIDE 10

Page 10

Matching with Substrings (1/2)

  • A substring matches a node in XML document if its last

element match that node

  • Typically, XML documents are parsed in pre-order (SAX

parser). Substrings should also be ordered by pre-order traversal of the substring-tree

  • Partial matching: matching for all consecutive substrings

from the first to the given substring

  • Complete matching: partial matching for the final

substring

  • Subtree-matching: partial matching found at all

descendants of the given substring

  • Redundant matching: subtree-matching found at some

earlier node in the XML document

slide-11
SLIDE 11

Page 11

Matching with Substrings (2/2)

  • Again, p = //a//b[*/c]/d

– s1 = a, s2 = b, s3 = c, s4 = db – matching at c9 and b10 are redundant

b1 a2 b10 b3 b4 f8 e5 d7 c6

substring-tree XML tree D

s1 = a [1,∝] s2 = b [1,∝] s3 = c [2,2] s4 = bd [1,1] c9 (s1) (s2) (s3) (s4) (s3) (s2)

slide-12
SLIDE 12

Page 12

XTrie Indexing Schema (1/2)

  • XTrie indexing schema built for a set of XPEs

– derive the simple decomposition for all XPEs – associated them with relative levels

  • Consists of 2 data structures

– Trie T: a tree where edges are labeled with element name in the XML document – Substring-Table ST: each row represents a substring

slide-13
SLIDE 13

Page 13

XTrie Indexing Schema (2/2)

1 2 3 4 5 7 8 9 10 6 11 12 13 14 15 a b c d a b c b d b c d c e cb cd d ab abc d bc ab abce bcd aabc ab substring 10 11 12 6 7 8 9 3 4 5 1 2 Index 1 1 1 1 1 [2, ∝] [2, ∝] [3, 3] 10 11 12 2 1 1 1 1 2 [2, 2] [1, 1] [2, 2] [2, ∝] 6 7 6 6 2 1 1 2 [2, 2] [2, 2] [4, 4] 3 3 3 1 1 1 [4, ∝] [3, 3] 1 Next row Number of children Rank Relative Level Parent row

Example 2 p1 = //a/a/b/c/*/a/b p2 = /a/b[c/e]/*/b/c/d p3 = /a/b[c/*/d]//b/c p4=//c/b//c/d/*/*/d

0 1 0 1 0 1 0 1 8 1 0 2 2 3 9 4 4 1 11 5 0 7 7 8 5 10 1 12 10 3

slide-14
SLIDE 14

Page 14

XTrie Matching Algorithm (1/2)

  • Based on SAX to get notified when an element name is

parsed

  • Requires another 2-dimension array sized <number of

rows in ST> × <maximum level of XML document>

  • B[s, l] is

– is initialized to 0 at the beginning – incremented by 1 if non-redundant matching of s at level l is found – reset to 0 when end-tag at level l is parsed

  • An XPE p match the XML document if B[rs, l] = m + 1 for

some level l, where

– rs is the root substring in the substring-tree for p – m is the number of child substring of rs

slide-15
SLIDE 15

Page 15

XTrie Matching Algorithm (2/2)

1 2 3 4 5 a b c d a b c bd substring 1 2 3 4 Index 1 2 1 1 1 2 [1, ∝] [1, ∝] [2,2] [1,1] 1 2 2 Next row Number of children Rank Relative Level Parent row

Again,

p = //a//b[*/c]/d

0 1 1 1 2 1 3 1 4 1

b1 a2 b10 b3 b4 f8 e5 d7 c6 c9

slide-16
SLIDE 16

Page 16

Evaluation

  • In comparison with XFilter (using hashtable on single element

names)

0 100 200 300 400 500 1500 1000 500

Varying P (L=20, pw=0.1, pd=0.1, pb=0)

Filtering Time(ms) 20 100 1000 4000 3000 2000 1000

Varying doc. length (P=100k, L=20, pw=0.1, pd=0.1, pb=0)

Filtering Time(ms)

slide-17
SLIDE 17

Page 17

Thank you! Questions?