XPath Evaluation in Linear Time Mikoaj Bojaczyk, Pawe Parys Warsaw - - PowerPoint PPT Presentation

xpath evaluation in linear time
SMART_READER_LITE
LIVE PREVIEW

XPath Evaluation in Linear Time Mikoaj Bojaczyk, Pawe Parys Warsaw - - PowerPoint PPT Presentation

XPath Evaluation in Linear Time Mikoaj Bojaczyk, Pawe Parys Warsaw University find the nodes in an XML document d Goal: that satisfy an XPath unary query q. We consider a fragment of XPath called FOXPath. Previous algorithms:


slide-1
SLIDE 1

XPath Evaluation in Linear Time

Mikołaj Bojańczyk, Paweł Parys Warsaw University

slide-2
SLIDE 2

find the nodes in an XML document d that satisfy an XPath unary query q.

We consider a fragment of XPath called FOXPath.

Previous algorithms:

– exponential in the document size – quadratic in the document size (Benedikt, Koch)

We give two algorithms:

– linear in the document size: O( 2|q|·|d| ) – good combined complexity: O(|q|·|d|·log(|d|))

Goal:

slide-3
SLIDE 3

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document

slide-4
SLIDE 4

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document

document node, i.e. opening tag attribute name attribute name

slide-5
SLIDE 5

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document

document node, i.e. opening tag attribute name attribute name

slide-6
SLIDE 6

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-7
SLIDE 7

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-8
SLIDE 8

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-9
SLIDE 9

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-10
SLIDE 10

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-11
SLIDE 11

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-12
SLIDE 12

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-13
SLIDE 13

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

document node, i.e. opening tag attribute name attribute name

slide-14
SLIDE 14

<document> <team name=”Borussia”> <player name=”Kuba”></player> <player name=”Frei”></player> </team> <team name=”Schalke”> <player name=”Kuranyi”> </team> <team name=”Poland”> <player name=”Kuba”></player> <player name=”Boruc”></player> </team> </document>

XML Document XPath query: “select teams that share a player with another team”

child[player]@name = sibling[team]/child[player]@name document node, i.e. opening tag attribute name attribute name

slide-15
SLIDE 15

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.
slide-16
SLIDE 16

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x

A node x is selected by p@a=q@b if

slide-17
SLIDE 17

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x

A node x is selected by p@a=q@b if there are some nodes y and z such that

slide-18
SLIDE 18

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x p y

A node x is selected by p@a=q@b if the pair (x,y) is selected by p. there are some nodes y and z such that

slide-19
SLIDE 19

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x p y q z

A node x is selected by p@a=q@b if the pair (x,y) is selected by p. the pair (x,z) is selected by q. there are some nodes y and z such that

slide-20
SLIDE 20

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x p y q z

A node x is selected by p@a=q@b if the pair (x,y) is selected by p. the pair (x,z) is selected by q. the attribute values y@a and z@b are the same. there are some nodes y and z such that

slide-21
SLIDE 21

FOXPath Programs - select node pairs.

  • child, parent, next-sibling, prev-sibling, descendant, etc.
  • any regular expression on programs is a program, e.g. child*
  • if t is a test, then [t] is a program that selects (x,x) if node x satisfies t

Tests - select single nodes.

  • any tag name a is a test that selects nodes with this tag.
  • boolean operations: or, and, not
  • if p,q are programs, and a,b attribute names, then p@a=q@b and p@a ≠ p@b are tests.

x p y q z

A node x is selected by p@a=q@b if the pair (x,y) is selected by p. the pair (x,z) is selected by q. the attribute values y@a and z@b are the same. there are some nodes y and z such that

m. Let t be an FOXPath test and d an XML document. e set of nodes in d selected by t can be computed in time O(|d|2|t|) as well as in time O(|d|log(|d|)|t|2)

slide-22
SLIDE 22

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-23
SLIDE 23

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-24
SLIDE 24

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-25
SLIDE 25

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-26
SLIDE 26

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-27
SLIDE 27

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

slide-28
SLIDE 28

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class

slide-29
SLIDE 29

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class

slide-30
SLIDE 30

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class p@a q@a

slide-31
SLIDE 31

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class p@a q@a

slide-32
SLIDE 32

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class p@a q@a p@a q@a

slide-33
SLIDE 33

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class p@a q@a p@a q@a

Goal: avoid repetition

– do a constant number of operations per node – or at least logarithmic

slide-34
SLIDE 34

find nodes that satisfy p@a = q@a

  • 1. decompose trees into classes

(class = set of nodes with same value of atribute a)

High level overview

  • 2. for each class, find nodes that are witnessed by

that class p@a q@a p@a q@a

Goal: avoid repetition

– do a constant number of operations per node – or at least logarithmic

Using Simon decompositions, a fancy algebraic result

slide-35
SLIDE 35

What is the Simon decomposition?

slide-36
SLIDE 36

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

slide-37
SLIDE 37

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-38
SLIDE 38

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-39
SLIDE 39

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-40
SLIDE 40

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-41
SLIDE 41

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-42
SLIDE 42

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-43
SLIDE 43

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-44
SLIDE 44

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

slide-45
SLIDE 45

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

j i

slide-46
SLIDE 46

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

j i

slide-47
SLIDE 47

a b b a a b b a

L a regular word language. Do a linear time precomputation on w=a1a2...an For any infix, membership ai...aj L can be computed in time log n

½ ¼ ⅛

A an automaton recognizing L, and Q its state space. Each word u induces a tranformation on states .

j i Big news: Simon decomposition does this with constant depth!

slide-48
SLIDE 48

Back to XPath evaluation...

slide-49
SLIDE 49

To simplify, consider a special case of XPath:

slide-50
SLIDE 50

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
slide-51
SLIDE 51

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
slide-52
SLIDE 52

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
slide-53
SLIDE 53

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a

p q

slide-54
SLIDE 54

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
  • each attribute value appears exactly twice

p q

slide-55
SLIDE 55

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
  • each attribute value appears exactly twice

p q

slide-56
SLIDE 56

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
  • programs p,q have no nested tests, except label tests
  • each attribute value appears exactly twice

p q

slide-57
SLIDE 57

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
  • programs p,q have no nested tests, except label tests
  • p only goes left, q only goes right
  • each attribute value appears exactly twice

p q

slide-58
SLIDE 58

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

To simplify, consider a special case of XPath:

  • words not trees
  • a test p@a =q@a
  • programs p,q have no nested tests, except label tests
  • p only goes left, q only goes right
  • each attribute value appears exactly twice

p q

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ L K x ∈ E

slide-59
SLIDE 59

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗

slide-60
SLIDE 60

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z

slide-61
SLIDE 61

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K

slide-62
SLIDE 62

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z x x x x x x x Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K

slide-63
SLIDE 63

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z x x x x x x x Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K time O(n2)

slide-64
SLIDE 64

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z x x x x x x x Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. Solves the problem in time O(n log(n)) time O(n2)

slide-65
SLIDE 65

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Naive algorithm. For every match (y, z) ∈ E ∈ E y z x x x x x x x Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. Solves the problem in time O(n log(n)) An algorithm that uses the Simon decomposition Solves the problem in time O(n) time O(n2)

slide-66
SLIDE 66

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Divide and conquer dynamic algorithm.

slide-67
SLIDE 67

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ Divide and conquer dynamic algorithm.

slide-68
SLIDE 68

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm.

slide-69
SLIDE 69

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. but only do logarithmically many operations each time

slide-70
SLIDE 70

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. but only do logarithmically many operations each time

slide-71
SLIDE 71

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. but only do logarithmically many operations each time

slide-72
SLIDE 72

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. but only do logarithmically many operations each time

slide-73
SLIDE 73

abaabaabbabababbababbabbababbababbabbbababbbabbababbabbbabaababbabba

  • Problem. Fix a set of tag names and regular word languages
  • Input. matching
  • Output. Set of nodes x with

a1 · · · an ∈ Σ∗ E ⊆ {1, . . . , n}2 Σ L, K ⊆ Σ∗ ∈ E y z For every match (y, z) ∈ E Find nodes x such that w[y..x] ∈ L w[x..z] ∈ K Divide and conquer dynamic algorithm. but only do logarithmically many operations each time

slide-74
SLIDE 74

Summary

slide-75
SLIDE 75

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

slide-76
SLIDE 76

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

– Works for both unary and binary queries

slide-77
SLIDE 77

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

– Works for both unary and binary queries – Semigroups a good tool for efficient query evaluation

slide-78
SLIDE 78

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

– Works for both unary and binary queries Future work – Semigroups a good tool for efficient query evaluation

slide-79
SLIDE 79

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

– Works for both unary and binary queries – Preliminary results indicate that semigroups can be avoided, and the constant becomes polynomial in the query. Future work – Semigroups a good tool for efficient query evaluation

slide-80
SLIDE 80

Summary – We evaluate XPath queries with linear time data complexity, improving previous quadratic algorithms.

(the constant is exponential in the query, because we use semigroup theory)

– Works for both unary and binary queries – Preliminary results indicate that semigroups can be avoided, and the constant becomes polynomial in the query. – We want to investigate more of XPath, and other languages Future work – Semigroups a good tool for efficient query evaluation

slide-81
SLIDE 81

Let A be an automaton with state space Q Two rules for splitting words.

slide-82
SLIDE 82

Let A be an automaton with state space Q Two rules for splitting words. Simon eorem. For fixed A, there is a splitting depth K, such that every word can be split in depth K down to single letters.

slide-83
SLIDE 83

Let A be an automaton with state space Q Two rules for splitting words. Simon eorem. For fixed A, there is a splitting depth K, such that every word can be split in depth K down to single letters. abaabbbababbbabba bbabbbabbbabbaba Rule 1.

split into two parts

slide-84
SLIDE 84

Let A be an automaton with state space Q Two rules for splitting words. Simon eorem. For fixed A, there is a splitting depth K, such that every word can be split in depth K down to single letters. abaabbbababbbabba bbabbbabbbabbaba Rule 1.

split into two parts

Rule 2.

split into many parts, each with the same transformation

abaab bbababb babba bba bbbabb babba ba

slide-85
SLIDE 85

Let A be an automaton with state space Q Two rules for splitting words. Simon eorem. For fixed A, there is a splitting depth K, such that every word can be split in depth K down to single letters. abaabbbababbbabba bbabbbabbbabbaba Rule 1.

split into two parts

Rule 2.

split into many parts, each with the same transformation

abaab bbababb babba bba bbbabb babba ba