SLIDE 1 On the optimization of recursive relational queries.
DIG Seminar
Louis Jachiet
Centre de Recherche en Informatique, Signal et Automatique de Lille
1
SLIDE 2
The relational algebra
SLIDE 3 Generalities
The relational algebra [Cod70]
the tables in SQL
- combined through operators
union, projection, filter, join, etc.
2
SLIDE 4
Syntax
ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filter | ρb
a (ϕ)
rename | πP (ϕ) projection
Figure 1: Syntax of the relational algebra
3
SLIDE 5
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble πto (T) to Paris Saclay Grenoble
4
SLIDE 6
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ρstep
to
(T) from step Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble
4
SLIDE 7
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ρstep
to
(T) ⊲ ⊳ ρstep
from (T)
from step to Lille Paris Saclay Lille Paris Grenoble Lille Saclay Grenoble Paris Saclay Grenoble
4
SLIDE 8
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble σfrom=Lille (T) from to Lille Paris Lille Saclay
4
SLIDE 9
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble σfrom=Paris (T) ∪ σfrom=Lille (T) from to Lille Paris Lille Saclay Paris Saclay Paris Grenoble
4
SLIDE 10
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble πto (T) ⊲ σfrom=Lille (T) to Grenoble
4
SLIDE 11
Recursive relational algebra
SLIDE 12
Syntax
ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb
a (ϕ)
rename | πc1,...,cn (ϕ) projection
Figure 2: Syntax of our relational algebra
5
SLIDE 13
Syntax
ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb
a (ϕ)
rename | ˜ πc (ϕ) anti-projection
Figure 2: Syntax of our relational algebra
5
SLIDE 14
Syntax
ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb
a (ϕ)
rename | ˜ πc (ϕ) anti-projection | βb
a (ϕ)
duplication
Figure 2: Syntax of our relational algebra
5
SLIDE 15
Syntax
ϕ ::= term | X relation variable | |c → v| constant | ∅ empty set | ϕ1 ∪ ϕ2 union | ϕ1 ⊲ ⊳ ϕ2 join | ϕ1 ⊲ ϕ2 antijoin | σfilter (ϕ) filtering | ρb
a (ϕ)
rename | ˜ πc (ϕ) anti-projection | βb
a (ϕ)
duplication | µ(X = ϕ) fixpoint
Figure 2: Syntax of our relational algebra
5
SLIDE 16
Differences
Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ)
6
SLIDE 17 Differences
Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ) Duplication Copy a column βb
a (ϕ) = σa=b
⊳ ρb
a (ϕ)
SLIDE 18 Differences
Anti-projection Remove a column ˜ πd1 (. . . ˜ πdk (ϕ) . . . ) = πc1...cn (ϕ) Duplication Copy a column βb
a (ϕ) = σa=b
⊳ ρb
a (ϕ)
Compute the least fixpoint of a function S → ϕ[X/S] µ(X = ϕ)V = limn→∞Un U0 = ∅ Un+1 = Un ∪ ϕV [X/Un]
6
SLIDE 19
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble ˜ πfrom (T) to Paris Saclay Grenoble
7
SLIDE 20
Examples
T from to Lille Paris Lille Saclay Paris Grenoble Paris Saclay Saclay Grenoble βto
from (˜
πto (T)) from to Lille Lille Saclay Saclay Paris Paris
7
SLIDE 21
Examples
T + = µ(X = T ∪ T/X) T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to
8
SLIDE 22
Examples
T + = µ(X = T ∪ ˜ πs (ρs
to (X) ⊲
⊳ ρs
from (T)))
T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble
8
SLIDE 23
Examples
T + = µ(X = T ∪ ˜ πs (ρs
to (X) ⊲
⊳ ρs
from (T)))
T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble
8
SLIDE 24
Examples
T + = µ(X = T ∪ ˜ πs (ρs
to (X) ⊲
⊳ ρs
from (T)))
T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble Lille Lyon Paris Grenoble
8
SLIDE 25
Examples
T + = µ(X = T ∪ ˜ πs (ρs
to (X) ⊲
⊳ ρs
from (T)))
T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble Lille Saclay Paris Lyon Saclay Grenoble Lille Lyon Paris Grenoble Lille Grenoble
8
SLIDE 26 Limitations
Limitations on fixpoints
- recursive variables must appear positively
No µ(X = R ⊲ X)
9
SLIDE 27 Limitations
Limitations on fixpoints
- recursive variables must appear positively
No µ(X = R ⊲ X)
- no join between recursive terms
No µ(X = X ⋊ ⋉ X)
9
SLIDE 28 Limitations
Limitations on fixpoints
- recursive variables must appear positively
No µ(X = R ⊲ X)
- no join between recursive terms
No µ(X = X ⋊ ⋉ X)
- no mutually recursive fixpoints
No µ(X = µ(Y = X ∪ Y ))
9
SLIDE 29 Limitations
Limitations on fixpoints
- recursive variables must appear positively
No µ(X = R ⊲ X)
- no join between recursive terms
No µ(X = X ⋊ ⋉ X)
- no mutually recursive fixpoints
No µ(X = µ(Y = X ∪ Y )) → corresponds to linear datalog! → superset of WITH RECURSIVE in SQL!
9
SLIDE 30
Performance of recursive queries
SLIDE 31
An example
:Lille :TGV/:Bus∗ ?o
10
SLIDE 32
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 33
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 34
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 35
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 36
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 37
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 38
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 39
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 40
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 41
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 42
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 43
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 44
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 45
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 46
An example
:Lille :TGV/:Bus∗ ?o
L
10
SLIDE 47
An example
:L :TGV/:Bus∗ ?o
11
SLIDE 48
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus))) 11
SLIDE 49
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ :Bus/X))) 11
SLIDE 50 Benchmarking
102 103 104 105 106 10−2 10−1 100 101 number of nodes n Time (s) Postgres SQLite Virtuoso ARQ1 ARQ2 DLV1 DLV2 Ramsdell2 Ramsdell1 Vlog1 Vlog2
∗
12
SLIDE 51 Benchmarking
Logicblox
- Materialization of all intermediate predicate
- ... except for “on demand” predicate
- therefore manual optimization
13
SLIDE 52 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
14
SLIDE 53 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)
14
SLIDE 54 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)
µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)
14
SLIDE 55 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)
µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)
- pushing anti-projections?
˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))
14
SLIDE 56 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)
µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)
- pushing anti-projections?
˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))
µ(X = ψ ∪ κ) ⊲ ⊳ µ(X = ϕ ∪ ξ) ? = µ(X = ψ ⊲ ⊳ ϕ ∪ ξ ∪ κ)
14
SLIDE 57 Rewrite rules
Rewrite rules for fixpoints
14
SLIDE 58
An example
:L :TGV/:Bus∗ ?o
15
SLIDE 59
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus))) 15
SLIDE 60
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV/βo
s (AllNodes) ∪ X/:Bus))) 15
SLIDE 61
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV/βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus)))
15
SLIDE 62
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV/βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus))) ˜ π?s (µ(X = σ?s=:L (:TGV) ∪ X/:Bus))
15
SLIDE 63
An example
:L :TGV/:Bus∗ ?o ˜ π?s (σ?s=:L (:TGV/µ(X = βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV/βo
s (AllNodes) ∪ X/:Bus)))
˜ π?s (σ?s=:L (µ(X = :TGV ∪ X/:Bus))) ˜ π?s (µ(X = σ?s=:L (:TGV) ∪ X/:Bus)) µ(X = ˜ π?s (σ?s=:L (:TGV)) ∪ X/:Bus)
15
SLIDE 64
Other methods of optimization
Datalog? No combination of fixpoints Automata based techniques? Step by Step and no conjunction (a/b/c)+ vs ((a/b)/c)+ vs ((a/b/c))+ Special joins (RDF-3X ferari)? Compute efficiently A ⊲ ⊳ (B)∗ but same problem... Waveguide Efficient on a single RPQ but cannot optimize across RPQ.
16
SLIDE 65
Theoretical framework
SLIDE 66 Decomposition
Decomposed fixpoints Given a fixpoint µ(X = ϕ) it can be rewritten to µ(X = ϕcon ∪ ϕrec) with:
- ϕcon constant, i.e. ϕconV [X/∅] = ϕconV [X/S]
- ϕrec recursive, i.e. ϕconV [X/∅] = ∅
17
SLIDE 67 Lineage
Linearity of fixpoints Given a fixpoint µ(X = ϕ): ϕV [X/S] = ϕV [X/∅]
ϕV [X/{w}]
18
SLIDE 68 Lineage
Linearity of fixpoints Given a fixpoint µ(X = ϕ): ϕV [X/S] = ϕV [X/∅]
ϕV [X/{w}] Lineage For each m ∈ Ui+1 \ Ui we can find w ∈ Ui such that m ∈ f (w) with f (w) = ϕV [X/{w}] \ ϕV [X/∅]. ∅ u u1 f w w1 w2 f w3 w4 f f f
18
SLIDE 69
Examples
∅ LYS/GRE SAC/LYS SAC/GRE PAR/SAC PAR/LYS PAR/GRE LIL/PAR LIL/SAC LIL/LYO LIL/GRE T + = µ(X = T ∪ X/T) T from to Lille Paris Paris Saclay Saclay Lyon Lyon Grenoble
19
SLIDE 70
Invariant
How the elements of f (w) depend on w?
20
SLIDE 71
Invariant
How the elements of f (w) depend on w? Stabilizers For each w, m ∈ f (w) and c ∈ stab(ϕ): m(c) = w(c).
20
SLIDE 72
Invariant
How the elements of f (w) depend on w? Stabilizers For each w, m ∈ f (w) and c ∈ stab(ϕ): m(c) = w(c). ∅ u u1 f w w1 w2 f w3 w4 f f f σfilter (µ(X = ϕ)) = µ(X = σfilter (ϕ)) when filter operates on stab(ϕ)
20
SLIDE 73 Examples
∅ LYS/GRE SAC/LYS SAC/GRE PAR/SAC PAR/LYS PAR/GRE LIL/PAR LIL/SAC LIL/LYO LIL/GRE σfrom=Lille
= µ(X = σfrom=Lille (T) ∪ X/T) σto=Lille
= µ(X = σfrom=Lille (T) ∪ X/T)
21
SLIDE 74
Invariant
How the elements of f (w) depend on w? Added columns For each c ∈ add(ϕ): f (w) ⊲ ⊳ |c → v| = f (w ⊲ ⊳ |c → v|) ∅ u u1 f w w1 w2 f w3 w4 f f f ψ ⊲ ⊳ µ(X = ϕ) = µ(X = ψ ⊲ ⊳ ϕ) when sort(ψ) ⊆ stab(ϕ) and sort(ψ) ⊆ add(ϕ) ∪ sort(µ(X = ϕ))
22
SLIDE 75 Rewrite rules
Rewrite rules for fixpoints
σfilter (µ(X = ϕ)) ? = µ(X = σfilter (ϕ))
ψ ⊲ ⊳ µ(X = ϕ) ? = µ(X = ψ ⊲ ⊳ ϕ)
µ(X = ϕ) ⊲ ψ ? = µ(X = ϕ ⊲ ψ)
˜ πp (µ(X = ϕ)) ? = µ(X = ˜ πp (ϕ))
µ(X = ψ ∪ κ) ⊲ ⊳ µ(X = ϕ ∪ ξ) ? = µ(X = ψ ⊲ ⊳ ϕ ∪ ξ ∪ κ)
23
SLIDE 76
Streams
SLIDE 77
Streams
Streams are one-way communication channels R S
24
SLIDE 78
Streams
Streams are one-way communication channels R S
24
SLIDE 79
Streams
Streams are one-way communication channels R S
24
SLIDE 80
Streams
Streams are one-way communication channels R S
24
SLIDE 81
Streams
Streams are one-way communication channels R S end
24
SLIDE 82 Streams: a a good abstraction for iterative distributed execu- tion
25
SLIDE 83 Streams: a a good abstraction for iterative distributed execu- tion
- no order of messages
- not necessarily a DAG
25
SLIDE 84 Streams: a a good abstraction for iterative distributed execu- tion
- no order of messages
- not necessarily a DAG
- fast single machine communication and slow inter-machine
communication
25
SLIDE 85 Streams: a a good abstraction for iterative distributed execu- tion
- no order of messages
- not necessarily a DAG
- fast single machine communication and slow inter-machine
communication
- partial typing of message content (for fast serialization)
25
SLIDE 86 Execution of µ-algebra terms with streams
start Xn . . . X1 I2 I1
ϕ
26
SLIDE 87 Streams
Streams for µ(X = X/T ∪ T)
X/T T X ∪
27
SLIDE 88
Benchmarking
SLIDE 89
Q2
102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB
Figure 4: ?a (P1+)/(P5+) ?b.
28
SLIDE 90
Q3
102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB
Figure 5: ?a (P1+)/P2 ?b . ?b P3 + ?c.
29
SLIDE 91
Q7
102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB
Figure 6: N0 P1/(P2+) ?a
30
SLIDE 92
Q10
102 103 104 105 106 107 108 10−2 10−1 100 101 102 Number of nodes Time (s) Prototype Ramsdell DLV Vlog Postgres MariaDB
Figure 7: ?a (P4+)/(P5+)/(P3+) ?b
31
SLIDE 93 Why recursive queries?
- Evaluate property paths
- Evaluate general recursive queries
- OBDA without rewriting nor materialization
32
SLIDE 94
What am I doing now?
33
SLIDE 95
Questions?
34
SLIDE 96
Edgar F Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387, 1970.
SLIDE 97
Semantics
SLIDE 98 Semantics
ϕ1 ⊲ ⊳ ϕ2V = {m1 + m2 | m1 ∈ ϕ1V ∧ m2 ∈ ϕ2V ∧ m1 ∼ m2} ϕ1 ∪ ϕ2V = ϕ1V ∪ ϕ2V ϕ1 ⊲ ϕ2V = {m ∈ ϕ1V | ∀m′ ∈ ϕ2V ¬(m′ ∼ m)} ˜ πa (ϕ)V =
- {c → v ∈ m | c = a}
- m ∈ ϕV
- XV
= V (X) βb
a (ϕ)V
=
- {c → v ∈ m | c = b} ∪ {b → v | a → v ∈ m}
- m ∈ ϕV
- σfilter (ϕ)V
= {m | m ∈ ϕV ∧ filter(m) = ⊤} µ(X = ϕ)V = XV [X/U∞], U0 = ∅, Ui+1 = Ui ∪ ϕV [X/Ui]