On the optimization of recursive relational queries. DIG Seminar - - PowerPoint PPT Presentation

on the optimization of recursive relational queries
SMART_READER_LITE
LIVE PREVIEW

On the optimization of recursive relational queries. DIG Seminar - - PowerPoint PPT Presentation

On the optimization of recursive relational queries. DIG Seminar Louis Jachiet Centre de Recherche en Informatique, Signal et Automatique de Lille 1 The relational algebra Generalities The relational algebra [Cod70] a set of base


slide-1
SLIDE 1

On the optimization of recursive relational queries.

DIG Seminar

Louis Jachiet

Centre de Recherche en Informatique, Signal et Automatique de Lille

1

slide-2
SLIDE 2

The relational algebra

slide-3
SLIDE 3

Generalities

The relational algebra [Cod70]

  • a set of base relations

the tables in SQL

  • combined through operators

union, projection, filter, join, etc.

  • operates on named tuples

2

slide-4
SLIDE 4

Syntax

ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filter | ρb

a (ϕ)

rename | πP (ϕ) projection

Figure 1: Syntax of the relational algebra

3

slide-5
SLIDE 5

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble πto (T) to Paris Saclay Grenoble

4

slide-6
SLIDE 6

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ρstep

to

(T) from step Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble

4

slide-7
SLIDE 7

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ρstep

to

(T) ⊲ ⊳ ρstep

from (T)

from step to Lille Paris Saclay Lille Paris Grenoble Lille Saclay Grenoble Paris Saclay Grenoble

4

slide-8
SLIDE 8

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble σfrom=Lille (T) from to Lille Paris Lille Saclay

4

slide-9
SLIDE 9

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble σfrom=Paris (T) ∪ σfrom=Lille (T) from to Lille Paris Lille Saclay Paris Saclay Paris Grenoble

4

slide-10
SLIDE 10

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble πto (T) ⊲ σfrom=Lille (T) to Grenoble

4

slide-11
SLIDE 11

Recursive relational algebra

slide-12
SLIDE 12

Syntax

ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb

a (ϕ)

rename | πc1,...,cn (ϕ) projection

Figure 2: Syntax of our relational algebra

5

slide-13
SLIDE 13

Syntax

ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb

a (ϕ)

rename | ˜ πc (ϕ) anti-projection

Figure 2: Syntax of our relational algebra

5

slide-14
SLIDE 14

Syntax

ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb

a (ϕ)

rename | ˜ πc (ϕ) anti-projection | βb

a (ϕ)

duplication

Figure 2: Syntax of our relational algebra

5

slide-15
SLIDE 15

Syntax

ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb

a (ϕ)

rename | ˜ πc (ϕ) anti-projection | βb

a (ϕ)

duplication | µ(X = ϕ) fixpoint

Figure 2: Syntax of our relational algebra

5

slide-16
SLIDE 16

Differences

Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ)

6

slide-17
SLIDE 17

Differences

Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ) Duplication Copy a column βb

a (ϕ) = σa=b

  • ϕ ⊲

⊳ ρb

a (ϕ)

  • 6
slide-18
SLIDE 18

Differences

Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ) Duplication Copy a column βb

a (ϕ) = σa=b

  • ϕ ⊲

⊳ ρb

a (ϕ)

  • Fixpoints

Compute the least fixpoint of a function S → ϕ[X/S] µ(X = ϕ)V = limn→∞Un U0 = ∅ Un+1 = Un ∪ ϕV [X/Un]

6

slide-19
SLIDE 19

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ˜ πfrom (T) to Paris Saclay Grenoble

7

slide-20
SLIDE 20

Examples

T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble βto

from (˜

πto (T)) from to Lille Lille Saclay Saclay Paris Paris

7

slide-21
SLIDE 21

Examples

T + = µ(X = T ∪ T/X) T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to

8

slide-22
SLIDE 22

Examples

T + = µ(X = T ∪ ˜ πs (ρs

to (X) ⊲

⊳ ρs

from (T)))

T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble

8

slide-23
SLIDE 23

Examples

T + = µ(X = T ∪ ˜ πs (ρs

to (X) ⊲

⊳ ρs

from (T)))

T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble

8

slide-24
SLIDE 24

Examples

T + = µ(X = T ∪ ˜ πs (ρs

to (X) ⊲

⊳ ρs

from (T)))

T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble Lille Lyon Paris Grenoble

8

slide-25
SLIDE 25

Examples

T + = µ(X = T ∪ ˜ πs (ρs

to (X) ⊲

⊳ ρs

from (T)))

T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble Lille Lyon Paris Grenoble Lille Grenoble

8

slide-26
SLIDE 26

Limitations

Limitations on fixpoints

  • recursive variables must appear positively

No µ(X = R ⊲ X)

9

slide-27
SLIDE 27

Limitations

Limitations on fixpoints

  • recursive variables must appear positively

No µ(X = R ⊲ X)

  • no join between recursive terms

No µ(X = X ⋊ ⋉ X)

9

slide-28
SLIDE 28

Limitations

Limitations on fixpoints

  • recursive variables must appear positively

No µ(X = R ⊲ X)

  • no join between recursive terms

No µ(X = X ⋊ ⋉ X)

  • no mutually recursive fixpoints

No µ(X = µ(Y = X ∪ Y ))

9

slide-29
SLIDE 29

Limitations

Limitations on fixpoints

  • recursive variables must appear positively

No µ(X = R ⊲ X)

  • no join between recursive terms

No µ(X = X ⋊ ⋉ X)

  • no mutually recursive fixpoints

No µ(X = µ(Y = X ∪ Y )) → corresponds to linear datalog! → superset of WITH RECURSIVE in SQL!

9

slide-30
SLIDE 30

Performance of recursive queries

slide-31
SLIDE 31

An example

:Lille :TGV/:Bus∗ ?o

10

slide-32
SLIDE 32

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-33
SLIDE 33

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-34
SLIDE 34

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-35
SLIDE 35

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-36
SLIDE 36

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-37
SLIDE 37

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-38
SLIDE 38

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-39
SLIDE 39

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-40
SLIDE 40

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-41
SLIDE 41

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-42
SLIDE 42

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-43
SLIDE 43

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-44
SLIDE 44

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-45
SLIDE 45

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-46
SLIDE 46

An example

:Lille :TGV/:Bus∗ ?o

L

10

slide-47
SLIDE 47

An example

:L :TGV/:Bus∗ ?o

11

slide-48
SLIDE 48

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus))) 11

slide-49
SLIDE 49

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ :Bus/X))) 11

slide-50
SLIDE 50

Benchmarking

102 103 104 105 106 10−2 10−1 100 101 number of nodes n Time (s) Postgres SQLite Virtuoso ARQ1 ARQ2 DLV1 DLV2 Ramsdell2 Ramsdell1 Vlog1 Vlog2

12

slide-51
SLIDE 51

Benchmarking

Logicblox

  • Materialization of all intermediate predicate
  • ... except for “on demand” predicate
  • therefore manual optimization

13

slide-52
SLIDE 52

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters?

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

14

slide-53
SLIDE 53

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters?

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

  • pushing joins?

ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)

14

slide-54
SLIDE 54

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters?

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

  • pushing joins?

ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)

  • pushing antijoins?

µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)

14

slide-55
SLIDE 55

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters?

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

  • pushing joins?

ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)

  • pushing antijoins?

µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)

  • pushing anti-projections?

˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))

14

slide-56
SLIDE 56

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters?

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

  • pushing joins?

ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)

  • pushing antijoins?

µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)

  • pushing anti-projections?

˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))

  • combine fixpoints?

µ(X = ψ ∪ κ) ⊲ ⊳ µ(X = ϕ ∪ ξ) ? = µ(X = ψ ⊲ ⊳ ϕ ∪ ξ ∪ κ)

14

slide-57
SLIDE 57

Rewrite rules

Rewrite rules for fixpoints

  • Reverse fixpoints?

14

slide-58
SLIDE 58

An example

:L :TGV/:Bus∗ ?o

15

slide-59
SLIDE 59

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus))) 15

slide-60
SLIDE 60

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV/βo

s (AllNodes) ∪ X/:Bus))) 15

slide-61
SLIDE 61

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV/βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus)))

15

slide-62
SLIDE 62

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV/βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus))) ˜ π?s (µ(X = σ?s=:L (:TGV) ∪ X/:Bus))

15

slide-63
SLIDE 63

An example

:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV/βo

s (AllNodes) ∪ X/:Bus)))

˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus))) ˜ π?s (µ(X = σ?s=:L (:TGV) ∪ X/:Bus)) µ(X = ˜ π?s (σ?s=:L (:TGV)) ∪ X/:Bus)

15

slide-64
SLIDE 64

Other methods of optimization

Datalog? No combination of fixpoints Automata based techniques? Step by Step and no conjunction (a/b/c)+ vs ((a/b)/c)+ vs ((a/b/c))+ Special joins (RDF-3X ferari)? Compute efficiently A ⊲ ⊳ (B)∗ but same problem... Waveguide Efficient on a single RPQ but cannot optimize across RPQ.

16

slide-65
SLIDE 65

Theoretical framework

slide-66
SLIDE 66

Decomposition

Decomposed fixpoints Given a fixpoint µ(X = ϕ) it can be rewritten to µ(X = ϕcon ∪ ϕrec) with:

  • ϕcon constant, i.e. ϕconV [X/∅] = ϕconV [X/S]
  • ϕrec recursive, i.e. ϕconV [X/∅] = ∅

17

slide-67
SLIDE 67

Lineage

Linearity of fixpoints Given a fixpoint µ(X = ϕ): ϕV [X/S] = ϕV [X/∅]

  • w∈S

ϕV [X/{w}]

18

slide-68
SLIDE 68

Lineage

Linearity of fixpoints Given a fixpoint µ(X = ϕ): ϕV [X/S] = ϕV [X/∅]

  • w∈S

ϕV [X/{w}] Lineage For each m ∈ Ui+1 \ Ui we can find w ∈ Ui such that m ∈ f (w) with f (w) = ϕV [X/{w}] \ ϕV [X/∅]. ∅ u u1 f w w1 w2 f w3 w4 f f f

18

slide-69
SLIDE 69

Examples

∅ LYS/GRE SAC/LYS SAC/GRE PAR/SAC PAR/LYS PAR/GRE LIL/PAR LIL/SAC LIL/LYO LIL/GRE T + = µ(X = T ∪ X/T) T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble

19

slide-70
SLIDE 70

Invariant

How the elements of f (w) depend on w?

20

slide-71
SLIDE 71

Invariant

How the elements of f (w) depend on w? Stabilizers For each w, m ∈ f (w) and c ∈ stab(ϕ): m(c) = w(c).

20

slide-72
SLIDE 72

Invariant

How the elements of f (w) depend on w? Stabilizers For each w, m ∈ f (w) and c ∈ stab(ϕ): m(c) = w(c). ∅ u u1 f w w1 w2 f w3 w4 f f f σfilter (µ(X = ϕ)) = µ(X = σfilter (ϕ)) when filter operates on stab(ϕ)

20

slide-73
SLIDE 73

Examples

∅ LYS/GRE SAC/LYS SAC/GRE PAR/SAC PAR/LYS PAR/GRE LIL/PAR LIL/SAC LIL/LYO LIL/GRE σfrom=Lille

  • T +

= µ(X = σfrom=Lille (T) ∪ X/T) σto=Lille

  • T +

= µ(X = σfrom=Lille (T) ∪ X/T)

21

slide-74
SLIDE 74

Invariant

How the elements of f (w) depend on w? Added columns For each c ∈ add(ϕ): f (w) ⊲ ⊳ |c → v| = f (w ⊲ ⊳ |c → v|) ∅ u u1 f w w1 w2 f w3 w4 f f f ψ ⊲ ⊳ µ(X = ϕ) = µ(X = ψ ⊲ ⊳ ϕ) when sort(ψ) ⊆ stab(ϕ) and sort(ψ) ⊆ add(ϕ) ∪ sort(µ(X = ϕ))

22

slide-75
SLIDE 75

Rewrite rules

Rewrite rules for fixpoints

  • pushing filters

σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))

  • pushing joins

ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)

  • pushing antijoins

µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)

  • pushing anti-projections

˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))

  • combine fixpoints

µ(X = ψ ∪ κ) ⊲ ⊳ µ(X = ϕ ∪ ξ) ? = µ(X = ψ ⊲ ⊳ ϕ ∪ ξ ∪ κ)

23

slide-76
SLIDE 76

Streams

slide-77
SLIDE 77

Streams

Streams are one-way communication channels R S

24

slide-78
SLIDE 78

Streams

Streams are one-way communication channels R S

24

slide-79
SLIDE 79

Streams

Streams are one-way communication channels R S

24

slide-80
SLIDE 80

Streams

Streams are one-way communication channels R S

24

slide-81
SLIDE 81

Streams

Streams are one-way communication channels R S end

24

slide-82
SLIDE 82

Streams: a a good abstraction for iterative distributed execu- tion

  • no order of messages

25

slide-83
SLIDE 83

Streams: a a good abstraction for iterative distributed execu- tion

  • no order of messages
  • not necessarily a DAG

25

slide-84
SLIDE 84

Streams: a a good abstraction for iterative distributed execu- tion

  • no order of messages
  • not necessarily a DAG
  • fast single machine communication and slow inter-machine

communication

25

slide-85
SLIDE 85

Streams: a a good abstraction for iterative distributed execu- tion

  • no order of messages
  • not necessarily a DAG
  • fast single machine communication and slow inter-machine

communication

  • partial typing of message content (for fast serialization)

25

slide-86
SLIDE 86

Execution of µ-algebra terms with streams

start Xn . . . X1 I2 I1

ϕ

  • ut

26

slide-87
SLIDE 87

Streams

Streams for µ(X = X/T ∪ T)

X/T T X ∪

  • ut

27

slide-88
SLIDE 88

Benchmarking

slide-89
SLIDE 89

Q2

102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB

Figure 4: ?a (P1+)/(P5+) ?b.

28

slide-90
SLIDE 90

Q3

102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB

Figure 5: ?a (P1+)/P2 ?b . ?b P3 + ?c.

29

slide-91
SLIDE 91

Q7

102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB

Figure 6: N0 P1/(P2+) ?a

30

slide-92
SLIDE 92

Q10

102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB

Figure 7: ?a (P4+)/(P5+)/(P3+) ?b

31

slide-93
SLIDE 93

Why recursive queries?

  • Evaluate property paths
  • Evaluate general recursive queries
  • OBDA without rewriting nor materialization

32

slide-94
SLIDE 94

What am I doing now?

33

slide-95
SLIDE 95

Questions?

34

slide-96
SLIDE 96

Edgar F Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387, 1970.

slide-97
SLIDE 97

Semantics

slide-98
SLIDE 98

Semantics

ϕ1 ⊲ ⊳ ϕ2V = {m1 + m2 | m1 ∈ ϕ1V ∧ m2 ∈ ϕ2V ∧ m1 ∼ m2} ϕ1 ∪ ϕ2V = ϕ1V ∪ ϕ2V ϕ1 ⊲ ϕ2V = {m ∈ ϕ1V | ∀m′ ∈ ϕ2V ¬(m′ ∼ m)} ˜ πa (ϕ)V =

  • {c → v ∈ m | c = a}
  • m ∈ ϕV
  • XV

= V (X) βb

a (ϕ)V

=

  • {c → v ∈ m | c = b} ∪ {b → v | a → v ∈ m}
  • m ∈ ϕV
  • σfilter (ϕ)V

= {m | m ∈ ϕV ∧ filter(m) = ⊤} µ(X = ϕ)V = XV [X/U∞], U0 = ∅, Ui+1 = Ui ∪ ϕV [X/Ui]