Regular XPath: Algebra, Logic and Automata Balder ten Cate Szeged, - - PowerPoint PPT Presentation

regular xpath algebra logic and automata
SMART_READER_LITE
LIVE PREVIEW

Regular XPath: Algebra, Logic and Automata Balder ten Cate Szeged, - - PowerPoint PPT Presentation

Regular XPath: Algebra, Logic and Automata Balder ten Cate Szeged, 1 October 2006 Balder ten Cate Regular XPath: Algebra, Logic and Automata (1/19) The topic of this talk This talk is about languages for describing binary relations in trees.


slide-1
SLIDE 1

Regular XPath: Algebra, Logic and Automata

Balder ten Cate Szeged, 1 October 2006

Balder ten Cate Regular XPath: Algebra, Logic and Automata (1/19)

slide-2
SLIDE 2

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-3
SLIDE 3

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-4
SLIDE 4

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-5
SLIDE 5

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-6
SLIDE 6

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-7
SLIDE 7

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-8
SLIDE 8

The topic of this talk

This talk is about languages for describing binary relations in trees. Binary relations means that

Instead of sentences, we use formulas with two free variables φ(x, y) Our tree walking automata can start and finish their walk anywhere in the tree, not neccessarily at the root.

The motivation comes from XML Specifically, we are interested in the XML path language Regular XPath. We would like to characterize this language in terms of logic and/or automata.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (2/19)

slide-9
SLIDE 9

XML documents

XML documents are (for present purposes) are finite unranked sibling-ordered node labelled trees. So, an XML document is a tuple T = (N, R↓, R→, V) where

  • N is the set of nodes,
  • R↓ and R→ are the ‘child’ and ‘next sibling’ relations, and
  • V : N → Σ.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (3/19)

slide-10
SLIDE 10

XML documents

XML documents are (for present purposes) are finite unranked sibling-ordered node labelled trees. So, an XML document is a tuple T = (N, R↓, R→, V) where

  • N is the set of nodes,
  • R↓ and R→ are the ‘child’ and ‘next sibling’ relations, and
  • V : N → Σ.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (3/19)

slide-11
SLIDE 11

XML documents

XML documents are (for present purposes) are finite unranked sibling-ordered node labelled trees. So, an XML document is a tuple T = (N, R↓, R→, V) where

  • N is the set of nodes,
  • R↓ and R→ are the ‘child’ and ‘next sibling’ relations, and
  • V : N → Σ.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (3/19)

slide-12
SLIDE 12

XML documents

XML documents are (for present purposes) are finite unranked sibling-ordered node labelled trees. So, an XML document is a tuple T = (N, R↓, R→, V) where

  • N is the set of nodes,
  • R↓ and R→ are the ‘child’ and ‘next sibling’ relations, and
  • V : N → Σ.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (3/19)

slide-13
SLIDE 13

XML documents

XML documents are (for present purposes) are finite unranked sibling-ordered node labelled trees. So, an XML document is a tuple T = (N, R↓, R→, V) where

  • N is the set of nodes,
  • R↓ and R→ are the ‘child’ and ‘next sibling’ relations, and
  • V : N → Σ.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (3/19)

slide-14
SLIDE 14

Syntax of Regular XPath

Regular XPath has two types of expressions:

path expressions α ::= ↑ | ↓ | ← | → | . | α/β | α ∪ β | α∗ | α[φ] node expressions φ ::= p | ¬φ | φ ∧ ψ | α

Path expression define binary relations. When applied to a given “context node”, they yield a set of nodes. Node expressions define sets of nodes. We use /α as shorthand for ↑∗ [¬ ↑ ]/α.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (4/19)

slide-15
SLIDE 15

Syntax of Regular XPath

Regular XPath has two types of expressions:

path expressions α ::= ↑ | ↓ | ← | → | . | α/β | α ∪ β | α∗ | α[φ] node expressions φ ::= p | ¬φ | φ ∧ ψ | α

Path expression define binary relations. When applied to a given “context node”, they yield a set of nodes. Node expressions define sets of nodes. We use /α as shorthand for ↑∗ [¬ ↑ ]/α.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (4/19)

slide-16
SLIDE 16

Syntax of Regular XPath

Regular XPath has two types of expressions:

path expressions α ::= ↑ | ↓ | ← | → | . | α/β | α ∪ β | α∗ | α[φ] node expressions φ ::= p | ¬φ | φ ∧ ψ | α

Path expression define binary relations. When applied to a given “context node”, they yield a set of nodes. Node expressions define sets of nodes. We use /α as shorthand for ↑∗ [¬ ↑ ]/α.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (4/19)

slide-17
SLIDE 17

Syntax of Regular XPath

Regular XPath has two types of expressions:

path expressions α ::= ↑ | ↓ | ← | → | . | α/β | α ∪ β | α∗ | α[φ] node expressions φ ::= p | ¬φ | φ ∧ ψ | α

Path expression define binary relations. When applied to a given “context node”, they yield a set of nodes. Node expressions define sets of nodes. We use /α as shorthand for ↑∗ [¬ ↑ ]/α.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (4/19)

slide-18
SLIDE 18

Syntax of Regular XPath

Regular XPath has two types of expressions:

path expressions α ::= ↑ | ↓ | ← | → | . | α/β | α ∪ β | α∗ | α[φ] node expressions φ ::= p | ¬φ | φ ∧ ψ | α

Path expression define binary relations. When applied to a given “context node”, they yield a set of nodes. Node expressions define sets of nodes. We use /α as shorthand for ↑∗ [¬ ↑ ]/α.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (4/19)

slide-19
SLIDE 19

Semantics of Regular XPath

[ [α] ]M = Rα for α ∈ {↓, ↑, ←, →} [ [.] ]M = the identity relation on N [ [α/β] ]M = composition of [ [α] ]M and [ [β] ]M [ [α ∪ β] ]M = union of [ [α] ]M and [ [β] ]M [ [α∗] ]M = reflexive transitive closure of [ [α] ]M [ [α[φ]] ]M = {(n, m) ∈ [ [α] ]M | m ∈ [ [φ] ]M} [ [p] ]M = V(p) [ [φ ∧ ψ] ]M = [ [φ] ]M ∩ [ [ψ] ]M [ [¬φ] ]M = N \ [ [φ] ]M [ [α] ]M = domain of [ [α] ]M = {n | (n, m) ∈ [ [α] ]M}

Balder ten Cate Regular XPath: Algebra, Logic and Automata (5/19)

slide-20
SLIDE 20

An example

“Go to the next book that has at least two authors.” In Regular XPath: (→ [¬twoauthorbook])∗/ → [twoauthorbook] where twoauthorbook stands for book ∧ ↓ [author]/ →+ [author].

Balder ten Cate Regular XPath: Algebra, Logic and Automata (6/19)

slide-21
SLIDE 21

An example

“Go to the next book that has at least two authors.” In Regular XPath: (→ [¬twoauthorbook])∗/ → [twoauthorbook] where twoauthorbook stands for book ∧ ↓ [author]/ →+ [author].

Balder ten Cate Regular XPath: Algebra, Logic and Automata (6/19)

slide-22
SLIDE 22

Another example

The following can be expressed in Regular XPath: “Go to the root if it has an even number of descendants, otherwise retrieve nothing” To see this, note that Let (α while φ) be shorthand for (.[φ]/α)∗ Let suc be shorthand for ↓[¬←] ∪ .[¬↓]/

  • ↑ while ¬→
  • /→

(the successor in depth first left-to-right ordering). Then /(suc/suc)∗[¬↓]/

  • ↑ while ¬→
  • [¬↑]

expresses the intended query.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (7/19)

slide-23
SLIDE 23

Another example

The following can be expressed in Regular XPath: “Go to the root if it has an even number of descendants, otherwise retrieve nothing” To see this, note that Let (α while φ) be shorthand for (.[φ]/α)∗ Let suc be shorthand for ↓[¬←] ∪ .[¬↓]/

  • ↑ while ¬→
  • /→

(the successor in depth first left-to-right ordering). Then /(suc/suc)∗[¬↓]/

  • ↑ while ¬→
  • [¬↑]

expresses the intended query.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (7/19)

slide-24
SLIDE 24

Another example

The following can be expressed in Regular XPath: “Go to the root if it has an even number of descendants, otherwise retrieve nothing” To see this, note that Let (α while φ) be shorthand for (.[φ]/α)∗ Let suc be shorthand for ↓[¬←] ∪ .[¬↓]/

  • ↑ while ¬→
  • /→

(the successor in depth first left-to-right ordering). Then /(suc/suc)∗[¬↓]/

  • ↑ while ¬→
  • [¬↑]

expresses the intended query.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (7/19)

slide-25
SLIDE 25

Another example

The following can be expressed in Regular XPath: “Go to the root if it has an even number of descendants, otherwise retrieve nothing” To see this, note that Let (α while φ) be shorthand for (.[φ]/α)∗ Let suc be shorthand for ↓[¬←] ∪ .[¬↓]/

  • ↑ while ¬→
  • /→

(the successor in depth first left-to-right ordering). Then /(suc/suc)∗[¬↓]/

  • ↑ while ¬→
  • [¬↑]

expresses the intended query.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (7/19)

slide-26
SLIDE 26

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-27
SLIDE 27

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-28
SLIDE 28

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-29
SLIDE 29

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-30
SLIDE 30

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-31
SLIDE 31

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-32
SLIDE 32

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-33
SLIDE 33

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-34
SLIDE 34

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-35
SLIDE 35

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-36
SLIDE 36

One more example

Can we express the following query in Regular XPath? “Go to any node with an even number of descendants” The previous trick does not work: during the depth-first traversal we might accidentally leave the subtree. Yes, it is possible: we can count

the (parity of the) number of nodes in the entire tree the (parity of the) number of ancestors of a node n the (parity of the) number of nodes before n in the df order the (parity of the) number of nodes after n in the df order (not counting the descendants)

With a loop operator things would be much easier: [ [loop(α)] ] = {w | (w, w) ∈ [ [α] ]} Using loop we could express it as follows: / ↓∗ [loop

  • (suc/suc)∗[¬↓]/(↑ while ¬→)
  • ]

Balder ten Cate Regular XPath: Algebra, Logic and Automata (8/19)

slide-37
SLIDE 37

The main question

What is the expressive power of Regular XPath? I.e., which binary relations are definable by path expressions?

Balder ten Cate Regular XPath: Algebra, Logic and Automata (9/19)

slide-38
SLIDE 38

The main question

What is the expressive power of Regular XPath? I.e., which binary relations are definable by path expressions?

Balder ten Cate Regular XPath: Algebra, Logic and Automata (9/19)

slide-39
SLIDE 39

An educated guess

What we know: FO

  • Regular XPath

⊆ FO + TC1 (The first inclusion follows from Marx PODS’04). A natural conjecture: Regular XPath ≡ FO + TC1 (after all, Regular XPath has a transitive closure operator!) We managed to prove a result along these lines only by extending the language with loop.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (10/19)

slide-40
SLIDE 40

An educated guess

What we know: FO

  • Regular XPath

⊆ FO + TC1 (The first inclusion follows from Marx PODS’04). A natural conjecture: Regular XPath ≡ FO + TC1 (after all, Regular XPath has a transitive closure operator!) We managed to prove a result along these lines only by extending the language with loop.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (10/19)

slide-41
SLIDE 41

An educated guess

What we know: FO

  • Regular XPath

⊆ FO + TC1 (The first inclusion follows from Marx PODS’04). A natural conjecture: Regular XPath ≡ FO + TC1 (after all, Regular XPath has a transitive closure operator!) We managed to prove a result along these lines only by extending the language with loop.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (10/19)

slide-42
SLIDE 42

More about loop

loop provides a weak form of path intersection: loop(α) is equivalent to the node expression α ∩ .. We denote the extension of Regular XPath with loop by Regular XPath≈. Adding loop does not affect the complexity:

Query evaluation can still be performed in PTime. Query containment can still be solved in ExpTime.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (11/19)

slide-43
SLIDE 43

More about loop

loop provides a weak form of path intersection: loop(α) is equivalent to the node expression α ∩ .. We denote the extension of Regular XPath with loop by Regular XPath≈. Adding loop does not affect the complexity:

Query evaluation can still be performed in PTime. Query containment can still be solved in ExpTime.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (11/19)

slide-44
SLIDE 44

More about loop

loop provides a weak form of path intersection: loop(α) is equivalent to the node expression α ∩ .. We denote the extension of Regular XPath with loop by Regular XPath≈. Adding loop does not affect the complexity:

Query evaluation can still be performed in PTime. Query containment can still be solved in ExpTime.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (11/19)

slide-45
SLIDE 45

More about loop

loop provides a weak form of path intersection: loop(α) is equivalent to the node expression α ∩ .. We denote the extension of Regular XPath with loop by Regular XPath≈. Adding loop does not affect the complexity:

Query evaluation can still be performed in PTime. Query containment can still be solved in ExpTime.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (11/19)

slide-46
SLIDE 46

Main result

Let FO + TC1

np be the extension of first-order logic with

transitive closure over formulas with exactly two free variables. FO + TC1

np differs from FO + TC1: the latter has transitive

closure over formulas with two designated free variables plus possibly other free variables. Main result: Regular XPath≈ ≡ FO1

np

More precisely, Regular XPath≈ path expressions define the same binary relations as FO + TC1

np formulas with two free

variables. Corollary: Regular XPath≈ is closed under path intersection and complementation.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (12/19)

slide-47
SLIDE 47

Main result

Let FO + TC1

np be the extension of first-order logic with

transitive closure over formulas with exactly two free variables. FO + TC1

np differs from FO + TC1: the latter has transitive

closure over formulas with two designated free variables plus possibly other free variables. Main result: Regular XPath≈ ≡ FO1

np

More precisely, Regular XPath≈ path expressions define the same binary relations as FO + TC1

np formulas with two free

variables. Corollary: Regular XPath≈ is closed under path intersection and complementation.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (12/19)

slide-48
SLIDE 48

Main result

Let FO + TC1

np be the extension of first-order logic with

transitive closure over formulas with exactly two free variables. FO + TC1

np differs from FO + TC1: the latter has transitive

closure over formulas with two designated free variables plus possibly other free variables. Main result: Regular XPath≈ ≡ FO1

np

More precisely, Regular XPath≈ path expressions define the same binary relations as FO + TC1

np formulas with two free

variables. Corollary: Regular XPath≈ is closed under path intersection and complementation.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (12/19)

slide-49
SLIDE 49

Main result

Let FO + TC1

np be the extension of first-order logic with

transitive closure over formulas with exactly two free variables. FO + TC1

np differs from FO + TC1: the latter has transitive

closure over formulas with two designated free variables plus possibly other free variables. Main result: Regular XPath≈ ≡ FO1

np

More precisely, Regular XPath≈ path expressions define the same binary relations as FO + TC1

np formulas with two free

variables. Corollary: Regular XPath≈ is closed under path intersection and complementation.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (12/19)

slide-50
SLIDE 50

Main result

Let FO + TC1

np be the extension of first-order logic with

transitive closure over formulas with exactly two free variables. FO + TC1

np differs from FO + TC1: the latter has transitive

closure over formulas with two designated free variables plus possibly other free variables. Main result: Regular XPath≈ ≡ FO1

np

More precisely, Regular XPath≈ path expressions define the same binary relations as FO + TC1

np formulas with two free

variables. Corollary: Regular XPath≈ is closed under path intersection and complementation.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (12/19)

slide-51
SLIDE 51

Proof sketch

Difficult direction: FO + TC1

np ⊆ Regular XPath≈

Step 1. Restrict attention to binary branching trees. On such trees → can be written as .[→]/↑/↓[←], and likewise for ←. This helps reduce the number

  • f cases.

Step 2. A normal form: a path expression is separated if it is a union of expressions of the form α/β, with α walking upwards and β downwards in the tree. Step 3. The translation itself from formulas φ(x, y) ∈ FO + TC1

np to separated Regular XPath≈

path expressions. To enable an inductive translation, we use conjunctive tree queries over path expressions. Crucial lemma: showing that the separated expressions are closed under ∗.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (13/19)

slide-52
SLIDE 52

Proof sketch

Difficult direction: FO + TC1

np ⊆ Regular XPath≈

Step 1. Restrict attention to binary branching trees. On such trees → can be written as .[→]/↑/↓[←], and likewise for ←. This helps reduce the number

  • f cases.

Step 2. A normal form: a path expression is separated if it is a union of expressions of the form α/β, with α walking upwards and β downwards in the tree. Step 3. The translation itself from formulas φ(x, y) ∈ FO + TC1

np to separated Regular XPath≈

path expressions. To enable an inductive translation, we use conjunctive tree queries over path expressions. Crucial lemma: showing that the separated expressions are closed under ∗.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (13/19)

slide-53
SLIDE 53

Proof sketch

Difficult direction: FO + TC1

np ⊆ Regular XPath≈

Step 1. Restrict attention to binary branching trees. On such trees → can be written as .[→]/↑/↓[←], and likewise for ←. This helps reduce the number

  • f cases.

Step 2. A normal form: a path expression is separated if it is a union of expressions of the form α/β, with α walking upwards and β downwards in the tree. Step 3. The translation itself from formulas φ(x, y) ∈ FO + TC1

np to separated Regular XPath≈

path expressions. To enable an inductive translation, we use conjunctive tree queries over path expressions. Crucial lemma: showing that the separated expressions are closed under ∗.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (13/19)

slide-54
SLIDE 54

Proof sketch

Difficult direction: FO + TC1

np ⊆ Regular XPath≈

Step 1. Restrict attention to binary branching trees. On such trees → can be written as .[→]/↑/↓[←], and likewise for ←. This helps reduce the number

  • f cases.

Step 2. A normal form: a path expression is separated if it is a union of expressions of the form α/β, with α walking upwards and β downwards in the tree. Step 3. The translation itself from formulas φ(x, y) ∈ FO + TC1

np to separated Regular XPath≈

path expressions. To enable an inductive translation, we use conjunctive tree queries over path expressions. Crucial lemma: showing that the separated expressions are closed under ∗.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (13/19)

slide-55
SLIDE 55

Proof sketch

Difficult direction: FO + TC1

np ⊆ Regular XPath≈

Step 1. Restrict attention to binary branching trees. On such trees → can be written as .[→]/↑/↓[←], and likewise for ←. This helps reduce the number

  • f cases.

Step 2. A normal form: a path expression is separated if it is a union of expressions of the form α/β, with α walking upwards and β downwards in the tree. Step 3. The translation itself from formulas φ(x, y) ∈ FO + TC1

np to separated Regular XPath≈

path expressions. To enable an inductive translation, we use conjunctive tree queries over path expressions. Crucial lemma: showing that the separated expressions are closed under ∗.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (13/19)

slide-56
SLIDE 56

Separated path expressions are closed under ∗

Separated normal form A path expression is separated if it is of the form

i(αi/βi),

with each αi walking upwards and βi downwards in the tree (but allowing arbitrary tests). Example: How to separate

  • ↑[p]/↑[q]/↓[r]

∗ ? Answer: ↑[p]/

  • ↑[p ∧ q ∧ ↓[r]]

∗ /↑[q]/↓[r] ∪ . The general case: use loop to cut all detours short.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (14/19)

slide-57
SLIDE 57

Separated path expressions are closed under ∗

Separated normal form A path expression is separated if it is of the form

i(αi/βi),

with each αi walking upwards and βi downwards in the tree (but allowing arbitrary tests). Example: How to separate

  • ↑[p]/↑[q]/↓[r]

∗ ? Answer: ↑[p]/

  • ↑[p ∧ q ∧ ↓[r]]

∗ /↑[q]/↓[r] ∪ . The general case: use loop to cut all detours short.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (14/19)

slide-58
SLIDE 58

Separated path expressions are closed under ∗

Separated normal form A path expression is separated if it is of the form

i(αi/βi),

with each αi walking upwards and βi downwards in the tree (but allowing arbitrary tests). Example: How to separate

  • ↑[p]/↑[q]/↓[r]

∗ ? Answer: ↑[p]/

  • ↑[p ∧ q ∧ ↓[r]]

∗ /↑[q]/↓[r] ∪ . The general case: use loop to cut all detours short.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (14/19)

slide-59
SLIDE 59

Separated path expressions are closed under ∗

Separated normal form A path expression is separated if it is of the form

i(αi/βi),

with each αi walking upwards and βi downwards in the tree (but allowing arbitrary tests). Example: How to separate

  • ↑[p]/↑[q]/↓[r]

∗ ? Answer: ↑[p]/

  • ↑[p ∧ q ∧ ↓[r]]

∗ /↑[q]/↓[r] ∪ . The general case: use loop to cut all detours short.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (14/19)

slide-60
SLIDE 60

Summary of results

  • µRegular XPath

≡ MSO[↓, →] ⊆

  • (1)

Regular XPath≈ ≡ FO + TC1

np[↓, →]

  • ∗-positive Regular XPath≈

≡ FO + posTC1

np[↓, →]

  • Conditional XPath

≡ (2) FO[↓∗, →∗]

  • Core XPath

≡ (3) FO2[↓, →, ↓∗, →∗]

(1) Boja´ nczyk et al. (2006 ICALP) (2) Marx (2004 PODS) (3) Marx & De Rijke (2005 SIGMOD Record), only for absolute path expr’s

Balder ten Cate Regular XPath: Algebra, Logic and Automata (15/19)

slide-61
SLIDE 61

Questions

Is Regular XPath strictly contained in Regular XPath≈? (does loop really contribute to the expressive power of Regular XPath≈?) Is FO + (pos)TC1

np strictly contained in FO + (pos)TC1?

Is FO + TC1

np strictly contained in MSO?

Does Regular XPath or Regular XPath≈ admit an automata theoretic characterization, and, if so, can we use it to answer questions such as the above?

  • Partial result: a characterization for the ∗-positive fragment.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (16/19)

slide-62
SLIDE 62

Questions

Is Regular XPath strictly contained in Regular XPath≈? (does loop really contribute to the expressive power of Regular XPath≈?) Is FO + (pos)TC1

np strictly contained in FO + (pos)TC1?

Is FO + TC1

np strictly contained in MSO?

Does Regular XPath or Regular XPath≈ admit an automata theoretic characterization, and, if so, can we use it to answer questions such as the above?

  • Partial result: a characterization for the ∗-positive fragment.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (16/19)

slide-63
SLIDE 63

Questions

Is Regular XPath strictly contained in Regular XPath≈? (does loop really contribute to the expressive power of Regular XPath≈?) Is FO + (pos)TC1

np strictly contained in FO + (pos)TC1?

Is FO + TC1

np strictly contained in MSO?

Does Regular XPath or Regular XPath≈ admit an automata theoretic characterization, and, if so, can we use it to answer questions such as the above?

  • Partial result: a characterization for the ∗-positive fragment.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (16/19)

slide-64
SLIDE 64

Questions

Is Regular XPath strictly contained in Regular XPath≈? (does loop really contribute to the expressive power of Regular XPath≈?) Is FO + (pos)TC1

np strictly contained in FO + (pos)TC1?

Is FO + TC1

np strictly contained in MSO?

Does Regular XPath or Regular XPath≈ admit an automata theoretic characterization, and, if so, can we use it to answer questions such as the above?

  • Partial result: a characterization for the ∗-positive fragment.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (16/19)

slide-65
SLIDE 65

Questions

Is Regular XPath strictly contained in Regular XPath≈? (does loop really contribute to the expressive power of Regular XPath≈?) Is FO + (pos)TC1

np strictly contained in FO + (pos)TC1?

Is FO + TC1

np strictly contained in MSO?

Does Regular XPath or Regular XPath≈ admit an automata theoretic characterization, and, if so, can we use it to answer questions such as the above?

  • Partial result: a characterization for the ∗-positive fragment.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (16/19)

slide-66
SLIDE 66

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-67
SLIDE 67

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-68
SLIDE 68

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-69
SLIDE 69

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-70
SLIDE 70

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-71
SLIDE 71

Pebble tree walking automata

Let’s consider (non-deterministic) pebble tree walking automata with the following types of pebbles:

weak pebbles can only be lifted when the automaton is visiting the relevant node. strong pebbles can be lifted from anywhere. non-inspectable weak pebbles are weak pebbles that cannot be inspected. Note: the automaton might not know for sure whether lifting is a valid move! It may crash. return pebbles can be lifted from anywhere, with the side effect that the automaton moves to the relevant node. Furthermore, these pebbles cannot be inspected.

We use these automata to accept paths: the automata start somewhere in the tree and finish somewhere.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (17/19)

slide-72
SLIDE 72

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-73
SLIDE 73

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-74
SLIDE 74

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-75
SLIDE 75

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-76
SLIDE 76

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-77
SLIDE 77

Some partial results

Thm: ∗-positive Regular XPath≈ (hence also FO+posTC1

np) can

define the same binary relations as twa with non-inspectable weak pebbles. (extends a result of Goris and Marx ’05) Thm (Engelfriet and Hoogeboom ’05): ∗-positive FO+TC1 can define the same binary relations as twa with strong pebbles. Thm (Bojanczyk e.a. ’06): Weak pebbles suffice. Call an Regular XPath expression positive if it uses only atomic negation, of the from ¬↑, ¬↓, ¬← and ¬→ Thm: Positive Regular XPath can define the same binary relations as twa with return pebbles.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (18/19)

slide-78
SLIDE 78

That’s all.

Balder ten Cate Regular XPath: Algebra, Logic and Automata (19/19)