Exercise 11: Graph Databases and Path Queries Database Theory - - PowerPoint PPT Presentation

exercise 11 graph databases and path queries
SMART_READER_LITE
LIVE PREVIEW

Exercise 11: Graph Databases and Path Queries Database Theory - - PowerPoint PPT Presentation

Exercise 11: Graph Databases and Path Queries Database Theory 2020-07-06 Maximilian Marx, David Carral 1 / 49 Exercise 1 Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How could we


slide-1
SLIDE 1

Exercise 11: Graph Databases and Path Queries

Database Theory 2020-07-06 Maximilian Marx, David Carral

1 / 49

slide-2
SLIDE 2

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise.

2 / 49

slide-3
SLIDE 3

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution.

3 / 49

slide-4
SLIDE 4

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution. ◮ Let G be some labelled hypergraph.

4 / 49

slide-5
SLIDE 5

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution. ◮ Let G be some labelled hypergraph. ◮ We construct GRDF by reifying hyperedges: for every p-labelled hyperedge ϕ = p(t1, t2, . . . , tℓ) in G,

5 / 49

slide-6
SLIDE 6

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution. ◮ Let G be some labelled hypergraph. ◮ We construct GRDF by reifying hyperedges: for every p-labelled hyperedge ϕ = p(t1, t2, . . . , tℓ) in G, ◮ we add labels p1, p2, . . . , pℓ;

6 / 49

slide-7
SLIDE 7

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution. ◮ Let G be some labelled hypergraph. ◮ We construct GRDF by reifying hyperedges: for every p-labelled hyperedge ϕ = p(t1, t2, . . . , tℓ) in G, ◮ we add labels p1, p2, . . . , pℓ; ◮ a vertex vϕ; and

7 / 49

slide-8
SLIDE 8

Exercise 1

  • Exercise. It was explained in the lecture that RDF and Property Graph can encode the same graph structures. How

could we encode arbitrary hypergraphs (relational databases) in RDF? RDF can be considered as a synonym for “labelled directed graph” here – the technical details of the RDF standard are not important for this exercise. Solution. ◮ Let G be some labelled hypergraph. ◮ We construct GRDF by reifying hyperedges: for every p-labelled hyperedge ϕ = p(t1, t2, . . . , tℓ) in G, ◮ we add labels p1, p2, . . . , pℓ; ◮ a vertex vϕ; and ◮ edges p1(cϕ, t1), p2(cϕ, t2), . . . , pℓ(cϕ, tℓ) to GRDF .

8 / 49

slide-9
SLIDE 9

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 1. The “Same generation” Datalog program from the lecture:

S(x, x) ← human(x) S(x, y) ← parent(x, w) ∧ S(v, w) ∧ parent(y, v)

9 / 49

slide-10
SLIDE 10

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 1. The “Same generation” Datalog program from the lecture:

S(x, x) ← human(x) S(x, y) ← parent(x, w) ∧ S(v, w) ∧ parent(y, v) Solution.

10 / 49

slide-11
SLIDE 11

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 1. The “Same generation” Datalog program from the lecture:

S(x, x) ← human(x) S(x, y) ← parent(x, w) ∧ S(v, w) ∧ parent(y, v) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0.

11 / 49

slide-12
SLIDE 12

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 1. The “Same generation” Datalog program from the lecture:

S(x, x) ← human(x) S(x, y) ← parent(x, w) ∧ S(v, w) ∧ parent(y, v) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ.

12 / 49

slide-13
SLIDE 13

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 1. The “Same generation” Datalog program from the lecture:

S(x, x) ← human(x) S(x, y) ← parent(x, w) ∧ S(v, w) ∧ parent(y, v) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

13 / 49

slide-14
SLIDE 14

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 2. Ancestors born in the same city:

AncCity(x, y, x′, y′) ← parent(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y′) AncCity(x, y, x′′, y′′) ← AncCity(x, y, x′, y′) ∧ AncCity(x′, y′, x′′, y′′) Query(x, x′, y) ← AncCity(x, y, x′, y) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2.

14 / 49

slide-15
SLIDE 15

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 2. Ancestors born in the same city:

AncCity(x, y, x′, y′) ← parent(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y′) AncCity(x, y, x′′, y′′) ← AncCity(x, y, x′, y′) ∧ AncCity(x′, y′, x′′, y′′) Query(x, x′, y) ← AncCity(x, y, x′, y) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2. The following C2RPQ expresses Query: (parent ◦ parent∗)(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y)

15 / 49

slide-16
SLIDE 16

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 3. Ancestors of Dresden-based family lines:

DDAnc(x, y) ← parent(x, y) ∧ bornIn(x, dresden) ∧ bornIn(y, dresden) DDAnc(x, z) ← DDAnc(x, y) ∧ parent(y, z) ∧ bornIn(z, dresden) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2. The following C2RPQ expresses Query: (parent ◦ parent∗)(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y) 3.

16 / 49

slide-17
SLIDE 17

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 3. Ancestors of Dresden-based family lines:

DDAnc(x, y) ← parent(x, y) ∧ bornIn(x, dresden) ∧ bornIn(y, dresden) DDAnc(x, z) ← DDAnc(x, y) ∧ parent(y, z) ∧ bornIn(z, dresden) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2. The following C2RPQ expresses Query: (parent ◦ parent∗)(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y) 3.

◮ DDAnc matches paths where every node has a bornIn-connection to dresden.

17 / 49

slide-18
SLIDE 18

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 3. Ancestors of Dresden-based family lines:

DDAnc(x, y) ← parent(x, y) ∧ bornIn(x, dresden) ∧ bornIn(y, dresden) DDAnc(x, z) ← DDAnc(x, y) ∧ parent(y, z) ∧ bornIn(z, dresden) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2. The following C2RPQ expresses Query: (parent ◦ parent∗)(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y) 3.

◮ DDAnc matches paths where every node has a bornIn-connection to dresden. ◮ This is not expressible as a 2RPQ, since (bornIn ◦ bornIn−1)(x, y) will generally be true for x y.

18 / 49

slide-19
SLIDE 19

Exercise 2.

  • Exercise. Can the following Datalog programs be encoded using a C2RPQ? In each case, give a suitable C2RPQ or

explain why there is none.

  • 3. Ancestors of Dresden-based family lines:

DDAnc(x, y) ← parent(x, y) ∧ bornIn(x, dresden) ∧ bornIn(y, dresden) DDAnc(x, z) ← DDAnc(x, y) ∧ parent(y, z) ∧ bornIn(z, dresden) Solution. 1.

◮ S matches paths of the form parentn ◦ human ◦ parentn, with n ≥ 0. ◮ This is not a regular language, and hence cannot be expressed as a 2RPQ. ◮ Since the length of a matched path is not accessible in a C2RPQ, this can also not be expressed as a C2RPQ.

2. The following C2RPQ expresses Query: (parent ◦ parent∗)(x, x′) ∧ bornIn(x, y) ∧ bornIn(x′, y) 3.

◮ DDAnc matches paths where every node has a bornIn-connection to dresden. ◮ This is not expressible as a 2RPQ, since (bornIn ◦ bornIn−1)(x, y) will generally be true for x y. ◮ Since the intermediate nodes on a matched path are not accessible in a C2RPQ, this is also not expressible as a C2RPQ.

19 / 49

slide-20
SLIDE 20

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”?

20 / 49

slide-21
SLIDE 21

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution.

21 / 49

slide-22
SLIDE 22

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions.

22 / 49

slide-23
SLIDE 23

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′):

23 / 49

slide-24
SLIDE 24

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′):

◮ if E = ℓ ∈ L is a label, then N is the following NFA:

i start f ℓ

24 / 49

slide-25
SLIDE 25

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′):

◮ if E = ℓ ∈ L is a label, then N is the following NFA:

i start f ℓ

◮ if E = (E1 ◦ E2), and N1andN2 are NFAs deciding E1 and E2, then N is the following NFA:

i start iN1 . . . fN1 iN2 . . . fN2 f ε ε ε

25 / 49

slide-26
SLIDE 26

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′):

◮ if E = ℓ ∈ L is a label, then N is the following NFA:

i start f ℓ

◮ if E = (E1 ◦ E2), and N1andN2 are NFAs deciding E1 and E2, then N is the following NFA:

i start iN1 . . . fN1 iN2 . . . fN2 f ε ε ε

◮ if E = (E1 + E2), and N1andN2 are NFAs deciding E1 and E2, then N is the following NFA:

i start iN1 . . . fN1 iN2 . . . fN2 f ε ε ε ε

26 / 49

slide-27
SLIDE 27

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′):

◮ if E = ℓ ∈ L is a label, then N is the following NFA:

i start f ℓ

◮ if E = (E1 ◦ E2), and N1andN2 are NFAs deciding E1 and E2, then N is the following NFA:

i start iN1 . . . fN1 iN2 . . . fN2 f ε ε ε

◮ if E = (E1 + E2), and N1andN2 are NFAs deciding E1 and E2, then N is the following NFA:

i start iN1 . . . fN1 iN2 . . . fN2 f ε ε ε ε

◮ If E = E∗

1 and N1 is an NFA deciding E1, then N is the following NFA:

i start iN1 . . . fN1 f ε ε ε ε

27 / 49

slide-28
SLIDE 28

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′.

28 / 49

slide-29
SLIDE 29

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′.

29 / 49

slide-30
SLIDE 30

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′.

30 / 49

slide-31
SLIDE 31

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′. ◮ E ⊑ E′ iff L( ˆ D) is empty: if there is w ∈ L( ˆ D), then w ∈ L(E) but w L(E′).

31 / 49

slide-32
SLIDE 32

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′. ◮ E ⊑ E′ iff L( ˆ D) is empty: if there is w ∈ L( ˆ D), then w ∈ L(E) but w L(E′). ◮ L( ˆ D) is empty iff the final state is not reachable from the initial state.

32 / 49

slide-33
SLIDE 33

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′. ◮ E ⊑ E′ iff L( ˆ D) is empty: if there is w ∈ L( ˆ D), then w ∈ L(E) but w L(E′). ◮ L( ˆ D) is empty iff the final state is not reachable from the initial state. ◮ Reachability on directed graphs can be checked in nondeterministic logarithmic space.

33 / 49

slide-34
SLIDE 34

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′. ◮ E ⊑ E′ iff L( ˆ D) is empty: if there is w ∈ L( ˆ D), then w ∈ L(E) but w L(E′). ◮ L( ˆ D) is empty iff the final state is not reachable from the initial state. ◮ Reachability on directed graphs can be checked in nondeterministic logarithmic space. ◮ Since the state graph of ˆ D is exponentially large, we can decide emptiness in nondeterministic polynomial space.

34 / 49

slide-35
SLIDE 35

Exercise 3.

  • Exercise. Consider the method for checking RPQ containment as sketched on slide “Containment for RPQs” in the
  • lecture. Explain the procedure and the resulting complexity bounds in your own words. How could one construct the

required automaton “on the fly”? Solution. ◮ Let E, E′ be regular expressions. ◮ Construct NFAs N and N′ deciding L(E) and L(E′). ◮ Use the powerset construction to obtain equivalent (but exponentially large) DFAs D and D′. ◮ Let D′ be the DFA obtained from D′ by making all accepting states reject, and vice versa. Then w ∈ D′ iff w D′. ◮ Construct the (polynomially large) product automaton ˆ D of D and D′; then ˆ D decides E ∩ E′. ◮ E ⊑ E′ iff L( ˆ D) is empty: if there is w ∈ L( ˆ D), then w ∈ L(E) but w L(E′). ◮ L( ˆ D) is empty iff the final state is not reachable from the initial state. ◮ Reachability on directed graphs can be checked in nondeterministic logarithmic space. ◮ Since the state graph of ˆ D is exponentially large, we can decide emptiness in nondeterministic polynomial space. ◮ Because of Savitch’s Theorem, we can thus decide containment in PSpace.

35 / 49

slide-36
SLIDE 36

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer.

36 / 49

slide-37
SLIDE 37

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution.

37 / 49

slide-38
SLIDE 38

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ.

38 / 49

slide-39
SLIDE 39

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ.

39 / 49

slide-40
SLIDE 40

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ. ◮ Indeed, most linear binary C2RPQ can be expressed by a 2RPQ:

40 / 49

slide-41
SLIDE 41

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ. ◮ Indeed, most linear binary C2RPQ can be expressed by a 2RPQ: ◮ Every atom p(xi, xi+1) in the query can be viewed as an RPQ with label p.

41 / 49

slide-42
SLIDE 42

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ. ◮ Indeed, most linear binary C2RPQ can be expressed by a 2RPQ: ◮ Every atom p(xi, xi+1) in the query can be viewed as an RPQ with label p. ◮ Since every 2RPQ in the query starts at the endpoint of the previous 2RPQ, the conjunctions can be replaced by composition.

42 / 49

slide-43
SLIDE 43

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ. ◮ Indeed, most linear binary C2RPQ can be expressed by a 2RPQ: ◮ Every atom p(xi, xi+1) in the query can be viewed as an RPQ with label p. ◮ Since every 2RPQ in the query starts at the endpoint of the previous 2RPQ, the conjunctions can be replaced by composition. ◮ Thus, ∃x2, . . . , xn−1. (R1 ◦ R2 ◦ · · · ◦ Rn−1)(x1, xn) is an equivalent 2RPQ.

43 / 49

slide-44
SLIDE 44

Exercise 4.

  • Exercise. Give an example for a binary C2RPQ that cannot be expressed as a 2RPQ.

By a binary linear C2RPQ we mean a C2RPQ of the form ∃xk1, . . . , xkm. R1(x1, x2) ∧ R2(x2, x3) ∧ · · · ∧ Rn−1(xn−1, xn) where each Ri(xi, xi+1) is an atom or a 2RPQ, and the xkj are among the variables that occur in the query. Can every linear binary C2RPQ be expressed by a 2RPQ? Explain your answer. Solution. ◮ Consider, e.g., ∃x. a(x, x) ∧ b(x, x), which cannot be expressed as a 2RPQ. ◮ Note that this is not a linear C2RPQ. ◮ Indeed, most linear binary C2RPQ can be expressed by a 2RPQ: ◮ Every atom p(xi, xi+1) in the query can be viewed as an RPQ with label p. ◮ Since every 2RPQ in the query starts at the endpoint of the previous 2RPQ, the conjunctions can be replaced by composition. ◮ Thus, ∃x2, . . . , xn−1. (R1 ◦ R2 ◦ · · · ◦ Rn−1)(x1, xn) is an equivalent 2RPQ. ◮ But in a 2RPQ, we lose access to x2, . . . , xn−1.

44 / 49

slide-45
SLIDE 45

Exercise 5.

  • Exercise. Give an example of a Datalog query that contains both of the following (and maybe also other) rules

Query(x, z) ← pa(x, y) ∧ pb(y, z) Query(x, z) ← pa(x, x′) ∧ Query(x′, z′) ∧ pb(z′, z) and that can be expressed as a C2RPQ.

45 / 49

slide-46
SLIDE 46

Exercise 5.

  • Exercise. Give an example of a Datalog query that contains both of the following (and maybe also other) rules

Query(x, z) ← pa(x, y) ∧ pb(y, z) Query(x, z) ← pa(x, x′) ∧ Query(x′, z′) ∧ pb(z′, z) and that can be expressed as a C2RPQ. Solution.

46 / 49

slide-47
SLIDE 47

Exercise 5.

  • Exercise. Give an example of a Datalog query that contains both of the following (and maybe also other) rules

Query(x, z) ← pa(x, y) ∧ pb(y, z) Query(x, z) ← pa(x, x′) ∧ Query(x′, z′) ∧ pb(z′, z) and that can be expressed as a C2RPQ. Solution. ◮ The query would match paths of the form anbn with n ≥ 0, which is not a regular language.

47 / 49

slide-48
SLIDE 48

Exercise 5.

  • Exercise. Give an example of a Datalog query that contains both of the following (and maybe also other) rules

Query(x, z) ← pa(x, y) ∧ pb(y, z) Query(x, z) ← pa(x, x′) ∧ Query(x′, z′) ∧ pb(z′, z) and that can be expressed as a C2RPQ. Solution. ◮ The query would match paths of the form anbn with n ≥ 0, which is not a regular language. ◮ We add rules so that all paths of the form anbm with n, m ≥ 0 match, which is a regular language: p(a+b)∗(x, y) ← pa(x, y) p(a+b)∗(x, y) ← pb(x, y) p(a+b)∗(x, y) ← p(a+b)∗(x, z) ∧ pa(z, y) p(a+b)∗(x, y) ← p(a+b)∗(x, z) ∧ pb(z, y) Query(x, y) ← p(a+b)∗(x, y)

48 / 49

slide-49
SLIDE 49

Exercise 5.

  • Exercise. Give an example of a Datalog query that contains both of the following (and maybe also other) rules

Query(x, z) ← pa(x, y) ∧ pb(y, z) Query(x, z) ← pa(x, x′) ∧ Query(x′, z′) ∧ pb(z′, z) and that can be expressed as a C2RPQ. Solution. ◮ The query would match paths of the form anbn with n ≥ 0, which is not a regular language. ◮ We add rules so that all paths of the form anbm with n, m ≥ 0 match, which is a regular language: p(a+b)∗(x, y) ← pa(x, y) p(a+b)∗(x, y) ← pb(x, y) p(a+b)∗(x, y) ← p(a+b)∗(x, z) ∧ pa(z, y) p(a+b)∗(x, y) ← p(a+b)∗(x, z) ∧ pb(z, y) Query(x, y) ← p(a+b)∗(x, y) ◮ The resulting program is equivalent to the C2RPQ (a + b)∗(x, y)

49 / 49