Joins Aggregates Optimization https://fdbresearch.github.io Dan - - PowerPoint PPT Presentation

joins aggregates optimization
SMART_READER_LITE
LIVE PREVIEW

Joins Aggregates Optimization https://fdbresearch.github.io Dan - - PowerPoint PPT Presentation

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 22, 2018 1 / 99 Acknowledgements Some work reported in this course has been done in the context of the FDB


slide-1
SLIDE 1

1 / 99

Joins → Aggregates → Optimization

https://fdbresearch.github.io Dan Olteanu

PhD Open School University of Warsaw November 22, 2018

slide-2
SLIDE 2

Acknowledgements

Some work reported in this course has been done in the context of the FDB project, LogicBlox, and RelationalAI by Zavodn´ y, Schleich, Kara, Nikolic, Zhang, Ciucanu, and Olteanu (Oxford) Abo Khamis and Ngo (RelationalAI), Nguyen (U. Michigan) Some of the following slides are derived from presentations by Abo Khamis (optimization diagrams) Kara (covers, IVMǫ, and many graphics) Ngo (functional aggregate queries) Schleich (performance and quizzes) Lastly, Kara and Schleich proofread the slides. I would like to thank them for their support!

2 / 99

slide-3
SLIDE 3

Goal of This Course

Introduction to a principled approach to in-database computation This course starts where mainstream databases courses finish.

Part 1: Joins

◮ Basic building blocks in query languages. Studied extensively. ◮ Systematic study of redundancy in the computation and representation of join results [OZ12,OZ15,KO18] ◮ Worst-case optimal join algorithms [NPRR12,NRR13,V14,OZ15,ANS17]

Part 2: Aggregates Part 3: Optimization

3 / 99

slide-4
SLIDE 4

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

4 / 99

Outline of Part 1: Joins

slide-5
SLIDE 5

Join Queries

Q(A1 ∪ · · · ∪ An)

  • head

= R1(A1), . . . , Rn(An)

  • body

Query variables: A1 ∪ · · · ∪ An. All variables in the body occur in the head. Relational atoms: R1, . . . , Rn Natural join: Same variable occurs in different relational atoms Examples of bodies of queries used in the following slides:

Path: O(customer, day, dish), D(dish, item), I(item, price) Path: R1(A, B), R2(B, C), R3(C, D) Acyclic: R(A, B, C), S(A, B, D), T(A, E), U(E, F). Triangle: R1(A, B), R2(A, C), R3(B, C) Loop: R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1)

5 / 99

slide-6
SLIDE 6

Join Example: Itemized Customer Orders

Orders (O for short) customer day dish Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog Dish (D for short) dish item burger patty burger

  • nion

burger bun hotdog bun hotdog

  • nion

hotdog sausage Items (I for short) item price patty 6

  • nion

2 bun 2 sausage 4

Consider the natural join of the above relations:

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

6 / 99

slide-7
SLIDE 7

Join Example: Listing the Triangles in the Database

R1 R2 R3 R1(A, B), R2(A, C), R3(B, C) A B a0 b0 a0 . . . a0 bm a1 b0 . . . b0 am b0 A C a0 c0 a0 . . . a0 cm a1 c0 . . . c0 am c0 B C b0 c0 b0 . . . b0 cm b1 c0 . . . c0 bm c0 A B C a0 b0 c0 a0 b0 . . . a0 b0 cm a0 b1 c0 a0 . . . c0 a0 bm c0 a1 b0 c0 . . . b0 c0 a1 b0 c0

7 / 99

slide-8
SLIDE 8

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

8 / 99

Outline of Part 1: Joins

slide-9
SLIDE 9

Join Hypergraphs

We associate a (multi)hypergraph H = (V, E) with every join query Q Each variable in Q is a node in V The set of variables of each relation symbol in Q is a (hyper)edge in E Example: Triangle query R1(A, B), R2(A, C), R3(B, C) R1 R2 R3 A B C V = {A, B, C} E = {{A, B}, {A, C}, {B, C}}

9 / 99

slide-10
SLIDE 10

Join Hypergraphs

We associate a (multi)hypergraph H = (V, E) with every join query Q Each variable in Q is a node in V The set of variables of each relation symbol in Q is a (hyper)edge in E Example: Order query O(cust, day, dish), D(dish, item), I(item, price) O D I dish day item cust price V = {cust, day, dish, item, price} E = {{cust, day, dish}, {dish, item}, {item, price}}

10 / 99

slide-11
SLIDE 11

Hypertree Decompositions

Definition[GLS99]: A (hypertree) decomposition T of the hypergraph (V, E) of a query Q is a pair (T, χ), where T is a tree χ is a function mapping each node in T to a subset of V called bag. Properties of a decomposition T : Coverage: ∀e ∈ E, there must be a node t ∈ T such that e ⊆ χ(t). Connectivity: ∀v ∈ V, {t | t ∈ T, v ∈ χ(t)} forms a connected subtree.

The hypergraph of the query R1(A, B), R2(B, C), R3(C, D) A hypertree decomposition A B C D A, B B, C C, D

11 / 99

slide-12
SLIDE 12

Hypertree Decompositions

Definition[GLS99]: A (hypertree) decomposition T of the hypergraph (V, E) of a query Q is a pair (T, χ), where T is a tree χ is a function mapping each node in T to a subset of V called bag. Properties of the decomposition T : Coverage: ∀e ∈ E, there must be a node t ∈ T such that e ⊆ χ(t). Connectivity: ∀v ∈ V, {t | t ∈ T, v ∈ χ(t)} forms a connected subtree.

The hypergraph of the triangle query R1(A, B), R2(A, C), R3(B, C) A hypertree decomposition R1 R2 R3 A B C A, B, C

12 / 99

slide-13
SLIDE 13

Variable Orders

Definition[OZ15]: A variable order ∆ for a query Q is a pair (F, key), where F is a rooted forest with one node per variable in Q key is a function mapping each variable A to a subset of its ancestor variables in F. Properties of a variable order ∆ for Q: For each relation symbol, its variables lie along the same root-to-leaf path in F. For any such variables A and B, A ∈ key(B) if A is an ancestor of B. For every child B of A, key(B) ⊆ key(A) ∪ {A}. Possible variable orders for the path query R1(A, B), R2(B, C), R3(C, D):

A B C D key(A) = ∅ key(B) = {A} key(C) = {B} key(D) = {C} B A C D key(B) = ∅ key(C) = {B} key(D) = {C} key(A) = {B}

13 / 99

slide-14
SLIDE 14

Variable Orders

Definition[OZ15]: A variable order ∆ for a query Q is a pair (F, key), where F is a rooted forest with one node per variable in Q key is a function mapping each variable A to a subset of its ancestor variables in F. Properties of a variable order ∆ for Q: For each relation symbol, its variables lie along the same root-to-leaf path in F. For any such variables A and B, A ∈ key(B) if A is an ancestor of B. For every child B of A, key(B) ⊆ key(A) ∪ {A}. Possible variable orders for the triangle query R1(A, B), R2(A, C), R3(B, C):

A B C key(A) = ∅ key(B) = {A} key(C) = {A, B} B A C key(B) = ∅ key(A) = {B} key(C) = {A, B} C B A key(C) = ∅ key(B) = {C} key(A) = {B, C}

14 / 99

slide-15
SLIDE 15

Hypertree Decompositions ⇔ Variable Orders

From variable order ∆ to hypertree decomposition T : [OZ15] For each node A in ∆, create a bag key(A) ∪ {A}. The bag for A is connected to the bags for its children and parent. Optionally, remove redundant bags Example: Triangle query R1(A, B), R2(A, C), R3(B, C)

A B C key(A) = ∅ key(B) = {A} key(C) = {A, B} ⇒ A A, B A, B, C ⇒ A, B, C

15 / 99

slide-16
SLIDE 16

Hypertree Decompositions ⇔ Variable Orders

From variable order ∆ to hypertree decomposition T : [OZ15] For each node A in ∆, create a bag key(A) ∪ {A}. The bag for A is connected to the bags for its children and parent. Optionally, remove redundant bags Example: Path query R1(A, B), R2(B, C), R3(C, D)

A B C D key(A) = ∅ key(B) = {A} key(C) = {B} key(D) = {C} ⇒ A A, B B, C C, D ⇒ A, B B, C C, D

16 / 99

slide-17
SLIDE 17

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Triangle query R1(A, B), R2(A, C), R3(B, C)

A, B, C

17 / 99

slide-18
SLIDE 18

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Triangle query R1(A, B), R2(A, C), R3(B, C)

Step 1: A is removed from T and inserted into ∆ A,B, C ⇒ A key(A) = ∅

18 / 99

slide-19
SLIDE 19

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Triangle query R1(A, B), R2(A, C), R3(B, C)

Step 2: B is removed from T and inserted into ∆ A, B,C ⇒ A B key(A) = ∅ key(B) = {A}

19 / 99

slide-20
SLIDE 20

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Triangle query R1(A, B), R2(A, C), R3(B, C)

Step 3: C is removed from T and inserted into ∆ A, B, C ⇒ A B C key(A) = ∅ key(B) = {A} key(C) = {A, B}

20 / 99

slide-21
SLIDE 21

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Path query R1(A, B), R2(B, C), R3(C, D)

A, B B, C C, D

21 / 99

slide-22
SLIDE 22

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Path query R1(A, B), R2(B, C), R3(C, D)

Step 1: A is removed from T and inserted into ∆ A,B B, C C, D ⇒ A key(A) = ∅

22 / 99

slide-23
SLIDE 23

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Path query R1(A, B), R2(B, C), R3(C, D)

Step 2: B is removed from T and inserted into ∆ A, B B,C C, D ⇒ A B key(A) = ∅ key(B) = {A}

23 / 99

slide-24
SLIDE 24

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Path query R1(A, B), R2(B, C), R3(C, D)

Step 3: C is removed from T and inserted into ∆ A, B B, C C,D ⇒ A B C key(A) = ∅ key(B) = {A} key(C) = {B}

24 / 99

slide-25
SLIDE 25

Hypertree Decompositions ⇔ Variable Orders

From hypertree decomposition T to variable order ∆: [OZ15] Create a node A in ∆ for a variable A in the top bag in T Recurse with T where A is removed from all bags in T . If top bag empty, then recurse independently on each of its child bags and create children of A in ∆ Update key for each variable at each step. Example: Path query R1(A, B), R2(B, C), R3(C, D)

Step 4: D is removed from T and inserted into ∆ A, B B, C C, D ⇒ A B C D key(A) = ∅ key(B) = {A} key(C) = {B} key(D) = {C}

25 / 99

slide-26
SLIDE 26

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

26 / 99

Outline of Part 1: Joins

slide-27
SLIDE 27

How Can We Bound the Size of the Join Result?

Example: the path query R1(A, B), R2(B, C), R3(C, D) Assumption: All relations have size N. The query result is included in the result of R1(A, B), R3(C, D)

◮ Its size is upper bounded by N2 = |R1| × |R3| ◮ All variables are ”covered” by the relations R1 and R3

There are databases for which the result size is at least N2

◮ Let R1 = [N] × {1}, R2 = {1} × [N], R3 = [N] × {1}.

27 / 99

slide-28
SLIDE 28

How Can We Bound the Size of the Join Result?

Example: the path query R1(A, B), R2(B, C), R3(C, D) Assumption: All relations have size N. The query result is included in the result of R1(A, B), R3(C, D)

◮ Its size is upper bounded by N2 = |R1| × |R3| ◮ All variables are ”covered” by the relations R1 and R3

There are databases for which the result size is at least N2

◮ Let R1 = [N] × {1}, R2 = {1} × [N], R3 = [N] × {1}.

Conclusion: Size of the query result is Θ(N2) for some input classes

27 / 99

slide-29
SLIDE 29

How Can We Bound the Size of the Join Result?

Example: the triangle query R1(A, B), R2(A, C), R3(B, C) Assumption: All relations have size N. The query result is included in the result of R1(A, B), R3(B, C)

◮ Its size is upper bounded by N2 = |R1| × |R3| ◮ All variables are ”covered” by the relations R1 and R3

There are databases for which the result size is at least N

◮ Let R1 = [N] × {1}, R2 = [N] × {1}, R3 ⊇ {(1, 1)}

28 / 99

slide-30
SLIDE 30

How Can We Bound the Size of the Join Result?

Example: the triangle query R1(A, B), R2(A, C), R3(B, C) Assumption: All relations have size N. The query result is included in the result of R1(A, B), R3(B, C)

◮ Its size is upper bounded by N2 = |R1| × |R3| ◮ All variables are ”covered” by the relations R1 and R3

There are databases for which the result size is at least N

◮ Let R1 = [N] × {1}, R2 = [N] × {1}, R3 ⊇ {(1, 1)}

Conclusion: Size gap between the N2 upper bound and the N lower bound! Question: Can we close this gap and give tight size bounds?

28 / 99

slide-31
SLIDE 31

Edge Covers and Independent Sets

We can generalize the previous examples as follows: For the size upper bound: Cover all nodes (variables) by k edges (relations) ⇒ size ≤ Nk. This is an edge cover of the query hypergraph! For the size lower bound: m independent nodes ⇒ construct database such that size ≥ Nm. This is an independent set of the query hypergraph! maxm = |IndependentSet(Q)| ≤ |EdgeCover(Q)| = mink maxm and mink do not necessarily meet! Can we further refine this analysis?

29 / 99

slide-32
SLIDE 32

The Fractional Edge Cover Number ρ∗(Q)

The two bounds meet if we take their fractional versions [AGM08] Fractional edge cover of Q with weight k ⇒ size ≤ Nk. Fractional independent set with weight m ⇒ size ≥ Nm. By duality of linear programming: maxm = |FractionalIndependentSet(Q)| = |FractionalEdgeCover(Q)| = mink This is the fractional edge cover number ρ∗(Q)! For query Q and database of size N, the query result has size O(Nρ∗(Q)).

30 / 99

slide-33
SLIDE 33

The Fractional Edge Cover Number ρ∗(Q)

For a join query Q(A1 ∪ · · · ∪ An) = R1(A1), . . . , Rn(An), ρ∗(Q) is the cost of an optimal solution to the linear program: minimize

  • i∈[n] xRi

subject to

  • i:edge Ri covers node A xRi ≥ 1 ∀A ∈

j∈[n] Aj,

xRi ≥ 0 ∀i ∈ [n]. xRi is the weight of edge (relation) Ri in the hypergraph of Q Each node (variable) has to be covered by edges with sum of weights ≥ 1 In the integer program variant for the edge cover, xRi ∈ {0, 1}

31 / 99

slide-34
SLIDE 34

Example: Compute the Fractional Edge Cover (1/3)

Consider the join query Q: R(A, B, C), S(A, B, D), T(A, E), U(E, F).

A B C D E F R S U T

The three edges R, S, U can cover all nodes. FractionalEdgeCover(Q) ≤ 3 Each node C, D, and F must be covered by a distinct edge. FractionalIndependentSet(Q) ≥ 3 ⇒ ρ∗(Q) = 3 ⇒ Size ≤ N3 and for some inputs is Θ(N3).

32 / 99

slide-35
SLIDE 35

Example: Compute the Fractional Edge Cover (2/3)

Consider the triangle query: R1(A, B), R2(A, C), R3(B, C). R1 R2 R3 A B C minimize xR1 + xR2 + xR3 subject to A : xR1 + xR2 ≥ 1 B : xR1 + xR3 ≥ 1 C : xR2 + xR3 ≥ 1 xR1 ≥ 0 xR2 ≥ 0 xR3 ≥ 0 Our previous size upper bound was N2: This is obtained by setting any two of xR1, xR2, xR3 to 1. What is the fractional edge cover number for the triangle query?

33 / 99

slide-36
SLIDE 36

Example: Compute the Fractional Edge Cover (2/3)

Consider the triangle query: R1(A, B), R2(A, C), R3(B, C). R1 R2 R3 A B C minimize xR1 + xR2 + xR3 subject to A : xR1 + xR2 ≥ 1 B : xR1 + xR3 ≥ 1 C : xR2 + xR3 ≥ 1 xR1 ≥ 0 xR2 ≥ 0 xR3 ≥ 0 Our previous size upper bound was N2: This is obtained by setting any two of xR1, xR2, xR3 to 1. What is the fractional edge cover number for the triangle query? We can do better: xR1 = xR2 = xR3 = 1/2. Then, ρ∗ = 3/2. Lower bound reaches N3/2 for R1 = R2 = R3 = [ √ N] × [ √ N].

33 / 99

slide-37
SLIDE 37

Example: Compute the Fractional Edge Cover (3/3)

Consider the (4-cycle) join: R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1). The linear program for its fractional edge cover number:

R T W S A1 A2 A3 A4

minimize xR + xS + xT + xW subject to A1 : xR + xW ≥ 1 A2 : xR + xS ≥ 1 A3 : xS + xT ≥ 1 A4 : xT + xW ≥ 1 xR ≥ 0 xS ≥ 0 xT ≥ 0 xW ≥ 0 Possible solution: xR = xT = 1. Another solution: xS = xW = 1. Then, ρ∗ = 2. Lower bound reaches N2 for R = T = [N] × {1} and S = W = {1} × [N].

34 / 99

slide-38
SLIDE 38

Historical Note on the Fractional Edge Cover Number

Tight size bounds via ρ∗ have been known from earlier works in other contexts: (special case) Loomis-Whitney inequality [LW49] (general case) number of occurrences of a subgraph in a graph [A81] generalization of Loomis-Whitney that subsumes the AGM bound [BT95] Recent insightful travel through the history of this result [H18]

35 / 99

slide-39
SLIDE 39

Refinement under Cardinality Constraints

Common case in practice: Relations have different sizes Small-size projections of relations may be added to the join query Recall the linear program for computing the fractional edge cover number ρ∗(Q) of a join query Q(A1 ∪ · · · ∪ An) = R1(A1), . . . , Rn(An): minimize

  • i∈[n] xRi

subject to

  • i:edge Ri covers node A xRi ≥ 1 ∀A ∈

j∈[n] Aj,

xRi ≥ 0 ∀i ∈ [n].

36 / 99

slide-40
SLIDE 40

Refinement under Cardinality Constraints

Common case in practice: Relations have different sizes Small-size projections of relations may be added to the join query Add relation sizes into the linear program that computes the result size of a join query Q(A1 ∪ · · · ∪ An) = R1(A1), . . . , Rn(An): minimize N

  • i∈[n] xRi

subject to

  • i:edge Ri covers node A xRi ≥ 1 ∀A ∈

j∈[n] Aj,

xRi ≥ 0 ∀i ∈ [n]. Assumption: All relations have the same size N.

37 / 99

slide-41
SLIDE 41

Refinement under Cardinality Constraints

Common case in practice: Relations have different sizes Small-size projections of relations may be added to the join query Add relation sizes into the linear program that computes the result size of a join query Q(A1 ∪ · · · ∪ An) = R1(A1), . . . , Rn(An): minimize

  • i∈[n] Nxi

subject to

  • i:edge Ri covers node A xRi ≥ 1 ∀A ∈

j∈[n] Aj,

xRi ≥ 0 ∀i ∈ [n]. Assumption: All relations have the same size N.

38 / 99

slide-42
SLIDE 42

Refinement under Cardinality Constraints

Common case in practice: Relations have different sizes Small-size projections of relations may be added to the join query Add relation sizes into the linear program that computes the result size of a join query Q(A1 ∪ · · · ∪ An) = R1(A1), . . . , Rn(An): minimize

  • i∈[n] Nxi

i

subject to

  • i:edge Ri covers node A xRi ≥ 1 ∀A ∈

j∈[n] Aj,

xRi ≥ 0 ∀i ∈ [n]. Assumption: Relation Ri has size Ni, ∀i ∈ [n].

39 / 99

slide-43
SLIDE 43

Size Bounds for Factorized Representations

  • f Join Results

40 / 99

slide-44
SLIDE 44

Recall the Itemized Customer Orders Example

Orders (O for short) customer day dish Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog Dish (D for short) dish item burger patty burger

  • nion

burger bun hotdog bun hotdog

  • nion

hotdog sausage Items (I for short) item price patty 6

  • nion

2 bun 2 sausage 4

Consider the natural join of the above relations:

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

41 / 99

slide-45
SLIDE 45

Factor Out Common Data Blocks

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

The listing representation of the above query result is:

Elise × Monday × burger × patty × 6 ∪ Elise × Monday × burger ×

  • nion

× 2 ∪ Elise × Monday × burger × bun × 2 ∪ Elise × Friday × burger × patty × 6 ∪ Elise × Friday × burger ×

  • nion

× 2 ∪ Elise × Friday × burger × bun × 2 ∪ . . .

It uses relational product (×), union (∪), and data (singleton relations). The attribute names are not shown to avoid clutter.

42 / 99

slide-46
SLIDE 46

This is How A Factorized Join Looks Like!

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item customer price

Var order Factorized representation of the join result There are several algebraically equivalent factorized representations defined: by distributivity of product over union and their commutativity; as groundings of variable orders.

43 / 99

slide-47
SLIDE 47

.. Now with Further Compression using Caching

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish ∅ day {dish} item {dish} customer {dish, day} price {item}

Observation: price is under item, which is under dish, but only depends on item, .. so the same price appears under an item regardless of the dish. Idea: Cache price for a specific item and avoid repetition!

44 / 99

slide-48
SLIDE 48

Same Data, Different Factorization

∪ Monday Friday × × ∪ ∪ Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Joe × ∪ hotdog × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 Steve × ∪ hotdog × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 day customer dish item price

45 / 99

slide-49
SLIDE 49

.. and Further Compressed using Caching

∪ Monday Friday × × ∪ ∪ Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Elise × ∪ burger × Joe × ∪ hotdog × ∪ bun onion sausage × × × ∪ 4 Steve × ∪ hotdog × day ∅ customer {day} dish {customer, day} item {dish} price {item}

46 / 99

slide-50
SLIDE 50

Which factorization should we choose?

The size of a factorization is the number of its values. Example: F1 =

  • 1 ∪ · · · ∪ n
  • ×
  • 1 ∪ · · · ∪ m
  • F2 =1 × 1 ∪ · · · ∪ 1 × m

∪ · · · ∪ n × 1 ∪ · · · ∪ n × m. F1 is factorized, F2 is a listing representation F1 ≡ F2 BUT |F1| = m + n ≪ |F2| = m ∗ n. How much space does factorization save over the listing representation?

47 / 99

slide-51
SLIDE 51

Size Bounds for Join Results

Given a join query Q, for any database of size N, the join result admits a listing representation of size O(Nρ∗(Q)). [LW49,A81,BT95,AGM08]

48 / 99

slide-52
SLIDE 52

Size Bounds for Join Results

Given a join query Q, for any database of size N, the join result admits a listing representation of size O(Nρ∗(Q)). [LW49,A81,BT95,AGM08] a factorization without caching of size O(Ns(Q)). [OZ12]

48 / 99

slide-53
SLIDE 53

Size Bounds for Join Results

Given a join query Q, for any database of size N, the join result admits a listing representation of size O(Nρ∗(Q)). [LW49,A81,BT95,AGM08] a factorization without caching of size O(Ns(Q)). [OZ12] a factorization with caching of size O(Nfhtw(Q)). [OZ15]

48 / 99

slide-54
SLIDE 54

Size Bounds for Join Results

Given a join query Q, for any database of size N, the join result admits a listing representation of size O(Nρ∗(Q)). [LW49,A81,BT95,AGM08] a factorization without caching of size O(Ns(Q)). [OZ12] a factorization with caching of size O(Nfhtw(Q)). [OZ15] 1 ≤ fhtw(Q) ≤

  • up to log |Q|

s(Q) ≤

  • up to |Q|

ρ∗(Q) ≤ |Q| |Q| is the number of relations in Q ρ∗(Q) is the fractional edge cover number of Q s(Q) is the factorization width of Q fhtw(Q) is the fractional hypertree width of Q [M10]

48 / 99

slide-55
SLIDE 55

Size Bounds for Join Results

Given a join query Q, for any database of size N, the join result admits a listing representation of size O(Nρ∗(Q)). [LW49,A81,BT95,AGM08] a factorization without caching of size O(Ns(Q)). [OZ12] a factorization with caching of size O(Nfhtw(Q)). [OZ15] These size bounds are asymptotically tight! Best possible size bounds for factorized representations over variable

  • rders of Q and for listing representation, but not database optimal!

There exists arbitrarily large databases for which

◮ the listing representation has size Ω(Nρ∗(Q)) ◮ the factorization with/without caching over any variable order of Q has size Ω(Ns(Q)) and Ω(Nfhtw(Q)) respectively.

49 / 99

slide-56
SLIDE 56

Example: The Factorization Width s

A B C D E F R S U T A B C D E F

The structure of the factorization over the above variable order ∆:

  • a∈A
  • a ×
  • b∈B
  • b ×

c∈C

c

  • ×

d∈D

d

  • ×
  • e∈E
  • e ×

f ∈F

f

  • The number of values for a variable is dictated by the number of valid tuples of

values for its ancestors in ∆: One value f for each tuple (a, e, f ) in the join result. Size of factorization = sum of sizes of results of subqueries along paths.

50 / 99

slide-57
SLIDE 57

Example: The Factorization Width s

A B C D E F R S U T A B C D E F

The factorization width for ∆ is the largest ρ∗ over subqueries defined by root-to-leaf paths in ∆ s(Q) is the minimum factorization width over all variable orders of Q In our example: Path A–E–F has fractional edge cover number 2. ⇒ The number of F-values is ≤ N2, but can be ∼ N2. All other root-to-leaf paths have fractional edge cover number 1. ⇒ The number of other values is ≤ N. s(Q) = 2 ⇒ Factorization size is O(N2) Recall that ρ∗(Q) = 3 ⇒ Listing representation size is O(N3)

51 / 99

slide-58
SLIDE 58

Example: The Fractional Hypertree Width fhtw

Idea: Avoid repeating identical expressions, store them once and use pointers.

A B C D E F R S U T A B C D E F key(A) = ∅ key(B) = {A} key(C) = {A, B} key(D) = {A, B} key(E) = {A} key(F) = {E}

  • a∈A
  • a × · · · ×
  • e∈E
  • e ×

f ∈F

f

  • Observation:

Variable F only depends on E and not on A: key(F) = {E} A value e maps to the same union

(e,f )∈Uf regardless of its pairings

with A-values. ⇒ Define Ue =

(e,f )∈Uf once for each value e and reuse it

52 / 99

slide-59
SLIDE 59

Example: The Fractional Hypertree Width fhtw

Idea: Avoid repeating identical expressions, store them once and use pointers.

A B C D E F R S U T A B C D E F key(A) = ∅ key(B) = {A} key(C) = {A, B} key(D) = {A, B} key(E) = {A} key(F) = {E}

A factorization with caching would be:

  • a∈A
  • a × · · · ×
  • e∈E
  • e × Ue
  • ;
  • Ue =
  • (e,f )∈U

f

  • fhtw for ∆ is the largest ρ∗(Qkey(X)∪{X}) over subqueries Qkey(X)∪{X}

defined by the variables key(X) ∪ {X} for each variable X in ∆ fhtw(Q) is the minimum fhtw over all variable orders of Q In our example: fhtw(Q) = 1 < s(Q) = 2 < ρ∗(Q) = 3.

53 / 99

slide-60
SLIDE 60

Alternative Characterizations of fhtw

The fractional hypertree width fhtw has been originally defined for hypertree decompositions. [M10] Given a join query Q. Let T be the set of hypertree decompositions of the hypergraph of Q. fhtw(Q) = min(T,χ)∈T maxn∈T ρ∗(Qχ(n))

54 / 99

slide-61
SLIDE 61

Alternative Characterizations of fhtw

The fractional hypertree width fhtw has been originally defined for hypertree decompositions. [M10] Given a join query Q. Let T be the set of hypertree decompositions of the hypergraph of Q. fhtw(Q) = min(T,χ)∈T maxn∈T ρ∗(Qχ(n)) Alternative characterization of the fractional hypertree width fhtw using the mapping between hypertree decompositions and variable orders [OZ15] Given a join query Q. Let VO be the set of variable orders of Q. fhtw(Q) = min(F,key)∈VO maxv∈F ρ∗(Qkey(v)∪{v})

54 / 99

slide-62
SLIDE 62

Compression by Factorization in Practice

55 / 99

slide-63
SLIDE 63

Compression Contest: Factorized vs. Zipped Relations

1 10 100 1 2 4 8 16 32 Compression ratio Database Scale Tabular/Factorized Tabular/Gzip(Tabular) Factorized/Gzip(Factorized)

Result of query Orders ✶ Dish ✶ Items [BKOZ13] Tabular = listing representation in CSV text format Gzip (compression level 6) outputs binary format Factorized representation in text format (each digit takes one character) Observations: Gzip does not exploit distant repetitions! Factorizations can be arbitrarily more succinct than gzipped relations. Gzipping factorizations improves the compression by 3x.

56 / 99

slide-64
SLIDE 64

Factorization Gains in Practice (1/4)

Retailer dataset used for LogicBlox analytics Relations: Inventory (84M), Sales (1.5M), Clearance (368K), Promotions (183K), Census (1K), Location (1K). Compression factors (caching not used):

◮ 26.61x for natural join of Inventory, Census, Location. ◮ 159.59x for natural join of Inventory, Sales, Clearance, Promotions

57 / 99

slide-65
SLIDE 65

Factorization Gains in Practice (2/4)

LastFM public dataset Relations: UserArtists (93K), UserFriends (25K), TaggedArtists (186K). Compression factors:

◮ 143.54x for joining two copies of Userartists and Userfriends With caching: 982.86x ◮ 253.34x when also joining on TaggedArtists ◮ 2.53x/ 3.04x/ 924.46x for triangle/4-clique/bowtie query on UserFriends ◮ 9213.51x/ 552Kx/ ≥86Mx for versions of triangle/4-clique/bowtie queries with copies for UserArtists for each UserFriend copy

58 / 99

slide-66
SLIDE 66

Factorization Gains in Practice (3/4)

Twitter public dataset Relation: Follower-Followee (1M) Compression factors:

◮ 2.69x for triangle query ◮ 3.48x for 4-clique query ◮ 4918.73x for bowtie query

59 / 99

slide-67
SLIDE 67

Factorization Gains in Practice (4/4)

Yelp Dataset Challenge Relations: Business (174K), User (1.3M), Review (5.2M), Category(667K), Attribute (1.3M) Compression factors:

◮ 39.43x for natural join of Business, User, Review, Attribute (with caching) ◮ 185.87x for natural join of Business, User, Review, Attribute, Category (with caching)

60 / 99

slide-68
SLIDE 68

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

61 / 99

Outline of Part 1: Joins

slide-69
SLIDE 69

How Fast Can We Compute Join Results?

Given a join query Q, for any database of size N, the join result can be computed in time O(Nρ∗(Q)) as listing representation [NPRR12,V14] O(Ns(Q)) as factorization without caching [OZ15] O(Nfhtw(Q)) as factorization with caching [OZ15] These upper bounds essentially follow the succinctness gap. They are: worst-case optimal (modulo log N) within the given representation model with respect to data complexity

◮ additional quadratic factor in the number of variables and linear factor in the number of relations in Q

62 / 99

slide-70
SLIDE 70

Example: Computing the Factorized Join Result with FDB

Our join: O(customer, day, dish), D(dish, item), I(item, price) can be grounded to a factorized representation as follows:

Variable Order

dish ∅ day {dish} item {dish} customer {dish, day} price {item}

FDB execution plan

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day

×

  • O(customer,day,dish)customer
  • D(dish,item)item

×

  • I(item,price)price

63 / 99

slide-71
SLIDE 71

Example: Computing the Factorized Join Result with FDB

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day
  • D(dish,item)item

×

  • O(c,day,dish)c

×

  • I(item,p)p

64 / 99

slide-72
SLIDE 72

Example: Computing the Factorized Join Result with FDB

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day
  • D(dish,item)item

×

  • O(c,day,dish)c

×

  • I(item,p)p

64 / 99

slide-73
SLIDE 73

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × ×

  • O( ,day,burger)day
  • D(burger,item)item

×

  • O(c,day,burger)c

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

64 / 99

slide-74
SLIDE 74

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × ×

  • O( ,day,burger)day
  • D(burger,item)item

×

  • O(c,day,burger)c

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

64 / 99

slide-75
SLIDE 75

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × ×

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • O(c,monday,burger)c
  • O(c,friday,burger)c

64 / 99

slide-76
SLIDE 76

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × ×

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • O(c,monday,burger)c
  • O(c,friday,burger)c

64 / 99

slide-77
SLIDE 77

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • O(c,friday,burger)c

64 / 99

slide-78
SLIDE 78

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • O(c,friday,burger)c

64 / 99

slide-79
SLIDE 79

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

64 / 99

slide-80
SLIDE 80

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise

  • D(burger,item)item

×

  • I(item,p)p
  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

64 / 99

slide-81
SLIDE 81

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × ×

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

64 / 99

slide-82
SLIDE 82

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × ×

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

64 / 99

slide-83
SLIDE 83

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(bun,p)p
  • I(onion,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-84
SLIDE 84

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(bun,p)p
  • I(onion,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-85
SLIDE 85

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(onion,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-86
SLIDE 86

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c
  • I(onion,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-87
SLIDE 87

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-88
SLIDE 88

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache!

  • D(hotdog,item)item
  • O( ,day,hotdog)day

×

  • I(item,p)p

×

  • O(c,day,hotdog)c

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-89
SLIDE 89

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(bun,p)p
  • I(onion,p)p
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-90
SLIDE 90

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(bun,p)p
  • I(onion,p)p
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items!

64 / 99

slide-91
SLIDE 91

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(onion,p)p
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-92
SLIDE 92

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(onion,p)p
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-93
SLIDE 93

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-94
SLIDE 94

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × ×

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c
  • I(sausage,p)p

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-95
SLIDE 95

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × × ∪ 4

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-96
SLIDE 96

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × × ∪ 4

  • O( ,day,hotdog)day

×

  • O(c,day,hotdog)c

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-97
SLIDE 97

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × × ∪ 4 ∪ Friday ×

  • O(c,friday,hotdog)c

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-98
SLIDE 98

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × × ∪ 4 ∪ Friday ×

  • O(c,friday,hotdog)c

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-99
SLIDE 99

Example: Computing the Factorized Join Result with FDB

∪ burger hotdog × × ∪ Monday Friday × × ∪ Elise ∪ Elise ∪ patty bun

  • nion

× × × ∪ 6 cache! ∪ 2 cache! ∪ 2 cache! ∪ bun onion sausage × × × ∪ 4 ∪ Friday × ∪ Joe Steve

price depends on item, but not on dish. Cache prices for specific items! Reuse cached prices for specific items!

64 / 99

slide-100
SLIDE 100

Example: Computing the Factorized Join Result with FDB

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day

×

  • O(customer,day,dish)customer
  • D(dish,item)item

×

  • I(item,price)price

Relations are sorted following any topological order of the variable order The intersection of relations O and D on dish takes time O(Nmin log(Nmax/Nmin)), where Nm = m(|πdishO|, |πdishD|). The remaining operations are lookups in the relations, where we first fix the dish value and then the day and item values.

65 / 99

slide-101
SLIDE 101

LeapFrog TrieJoin Algorithm

Much acclaimed worst-case optimal join algorithm used by LogicBlox [V14] Computes a listing representation of the join result ⇒ It does not exploit factorization ≈ Glorified multi-way sort-merge join with an efficient list intersection Several generalizations, e.g., Tetris, Minesweeper, and PANDA [NRR13,ANS17] LeapFrog TrieJoin is a special case of FDB, where the input variable order ∆ is a path (i.e., no branching) for each variable A, key(A) consists of all ancestors of A in ∆. (i.e., no caching)

66 / 99

slide-102
SLIDE 102

Example: Computing the Full Join Result

The listing representation of the result of our join: O(customer, day, dish), D(dish, item), I(item, price) can be computed by FDB using any total variable order.

Variable order

day customer dish item price

FDB execution plan

  • O( ,day, )day

×

  • O(customer,day, )customer

×

  • O(customer,day,dish),D(dish, )dish

×

  • D(dish,item),I(item, )item

×

  • I(item,price)price

67 / 99

slide-103
SLIDE 103

Example: Computing the Full Join Result

  • O( ,day, )day

×

  • O(customer,day, )customer

×

  • O(customer,day,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-104
SLIDE 104

Example: Computing the Full Join Result

  • O( ,day, )day

×

  • O(customer,day, )customer

×

  • O(customer,day,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-105
SLIDE 105

Example: Computing the Full Join Result

∪ Monday Friday × ×

  • O(customer,Monday, )customer

×

  • O(customer,Monday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-106
SLIDE 106

Example: Computing the Full Join Result

∪ Monday Friday × ×

  • O(customer,Monday, )customer

×

  • O(customer,Monday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-107
SLIDE 107

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Elise,Monday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-108
SLIDE 108

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Elise,Monday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-109
SLIDE 109

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • D(burger,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-110
SLIDE 110

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • D(burger,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-111
SLIDE 111

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-112
SLIDE 112

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × ×

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-113
SLIDE 113

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-114
SLIDE 114

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-115
SLIDE 115

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(onion,p)p

68 / 99

slide-116
SLIDE 116

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(onion,p)p

68 / 99

slide-117
SLIDE 117

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-118
SLIDE 118

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2

  • O(customer,Friday, )customer

×

  • O(customer,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-119
SLIDE 119

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve ×

  • O(Elise,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-120
SLIDE 120

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve ×

  • O(Elise,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-121
SLIDE 121

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger ×

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • D(burger,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-122
SLIDE 122

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger ×

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • D(burger,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-123
SLIDE 123

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × ×

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-124
SLIDE 124

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × ×

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(patty,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-125
SLIDE 125

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-126
SLIDE 126

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(bun,p)p
  • I(onion,p)p

68 / 99

slide-127
SLIDE 127

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6 ∪ 2

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(onion,p)p

68 / 99

slide-128
SLIDE 128

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6 ∪ 2

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • I(onion,p)p

68 / 99

slide-129
SLIDE 129

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6 ∪ 2 ∪ 2

  • O(Joe,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p
  • O(Steve,Friday,dish),D(dish, )dish

×

  • D(dish,i),I(i, )i

×

  • I(i,p)p

68 / 99

slide-130
SLIDE 130

Example: Computing the Full Join Result

∪ Monday Friday × × ∪ Elise × ∪ burger × ∪ patty bun

  • nion

× × × ∪ 6 ∪ 2 ∪ 2 ∪ Elise × Joe × Steve × ∪ burger × ∪ patty bun onion × × × ∪ 6 ∪ 2 ∪ 2 ∪ hotdog × ∪ bun onionsausage × × × ∪ 2 ∪ 2 ∪ 4 ∪ hotdog × ∪ bun onionsausage × × × ∪ 2 ∪ 2 ∪ 4

68 / 99

slide-131
SLIDE 131

Experiment: Factorized vs. Listing Computation

Retailer (3B) LastFM (5.8M) Join Factorization 169M 316K Size Listing 3.6B 591M (values) Compression 21.4× 1870.7× Join FDB 30 10 Time PostgreSQL 217 61 (sec) Speedup 7× 6.1×

10-1 100 101 102 103 104 105 106 2 4 6 8 10 12 14 16 10-1 100 101 102 103 104 105 106 Size (x106 values, logscale) Join time (seconds, logscale) Scale Factor (s), Housing (8M for s=16) Size (List) Join time (List) Size (Fact) Join time (Fact) 10-1 100 101 102 103 104 105 2 4 6 8 10 12 14 16 Compression/speedup (logscale) Scale Factor (s), Housing (8M for s=16) Compression ratio Join speedup

Both FDB and PostgreSQL list the records in the results of the join queries.

69 / 99

slide-132
SLIDE 132

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

70 / 99

Outline of Part 1: Joins

slide-133
SLIDE 133

Relevant Work not Covered in the Course

Widths, results sizes, and join computation under functional dependencies [GLVV12,ANS16,GT17,ANS17] Adaptive join processing with lower complexity [AYZ97,ANS17]

◮ We exemplify this next with the 4-cycle join [AYZ97]

Covers: Relational counterpart of factorized representation [KO18]

71 / 99

slide-134
SLIDE 134

Recall the (4-cycle) Join

Q(A1, A2, A3, A4) = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1). The linear program for its fractional edge cover number:

R T W S A1 A2 A3 A4

minimize xR + xS + xT + xW subject to A1 : xR + xW ≥ 1 A2 : xR + xS ≥ 1 A3 : xS + xT ≥ 1 A4 : xT + xW ≥ 1 xR ≥ 0 xS ≥ 0 xT ≥ 0 xW ≥ 0 Solutions: xR = xT = 1 or xS = xW = 1. Then, ρ∗ = 2. Also, fhtw = 2. Lower bound Ω(N2) obtained for R(A1, A2) = T(A3, A4) = [N] × {1} and S(A2, A3) = W (A4, A1) = {1} × [N] The variables A1 and A3 get values [N] The variable A2 and A4 get value {1}

72 / 99

slide-135
SLIDE 135

Can We Do The Boolean 4-Cycle Join Faster?

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1).

R T W S A1 A2 A3 A4

We can use one of the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 : {A4, A1, A2}

  • B3

− {A2, A3, A4}

  • B4

Lower-bound: A1 and A3 get values [N] and A2 and A4 get value {1}. Use T1: R(A1, A2), S(A2, A3)

  • N·N=N2

cover B1, T(A3, A4), W (A4, A1)

  • N·N=N2

cover B2

73 / 99

slide-136
SLIDE 136

Can We Do The Boolean 4-Cycle Join Faster?

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1).

R T W S A1 A2 A3 A4

We can use one of the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 : {A4, A1, A2}

  • B3

− {A2, A3, A4}

  • B4

Lower-bound: A1 and A3 get values [N] and A2 and A4 get value {1}. Use T1: R(A1, A2), S(A2, A3)

  • N·N=N2

cover B1, T(A3, A4), W (A4, A1)

  • N·N=N2

cover B2 Use T2: R(A1, A2), W (A4, A1)

  • N

cover B3, S(A2, A3), T(A3, A4)

  • N

cover B4

73 / 99

slide-137
SLIDE 137

Can We Do The Boolean 4-Cycle Join Faster?

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1).

R T W S A1 A2 A3 A4

We can use one of the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 : {A4, A1, A2}

  • B3

− {A2, A3, A4}

  • B4

Lower-bound: A1 and A3 get values [N] and A2 and A4 get value {1}. Use T1: R(A1, A2), S(A2, A3)

  • N·N=N2

cover B1, T(A3, A4), W (A4, A1)

  • N·N=N2

cover B2 Use T2: R(A1, A2), W (A4, A1)

  • N

cover B3, S(A2, A3), T(A3, A4)

  • N

cover B4 Idea: Why not use different decompositions for different classes of input databases or even for different partitions of a relation?

73 / 99

slide-138
SLIDE 138

Light and Heavy Values

Fix ǫ ∈ [0, 1]. A value a of variable A in relation R is: HEAVY if |σA=a(R)| ≥ Nǫ LIGHT if |σA=a(R)| < Nǫ

74 / 99

slide-139
SLIDE 139

Light and Heavy Values

Fix ǫ ∈ [0, 1]. A value a of variable A in relation R is: HEAVY if |σA=a(R)| ≥ Nǫ LIGHT if |σA=a(R)| < Nǫ Partition R(A1, A2) and T(A3, A4) into heavy and light parts: R = {(a1, a2) ∈ R | a1 is heavy}

  • Rh

∪ {(a1, a2) ∈ R | a1 is light}

  • Rl

T = {(a3, a4) ∈ T | a3 is heavy}

  • Th

∪ {(a3, a4) ∈ T | a3 is light}

  • Tl

74 / 99

slide-140
SLIDE 140

Evaluation of the 4-Cycle Boolean Query in O(N3/2)

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) Recall the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 :

B3

  • {A4, A1, A2} −

B4

  • {A2, A3, A4}

We rewrite Q as Q() = Q1() ∪ Q2() ∪ Q3(), where Q1() = Rh(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) Q2() = Rl(A1, A2), S(A2, A3), Th(A3, A4), W (A4, A1) Q3() = Rl(A1, A2), S(A2, A3), Tl(A3, A4), W (A4, A1)

75 / 99

slide-141
SLIDE 141

Evaluation of the 4-Cycle Boolean Query in O(N3/2)

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) Recall the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 :

B3

  • {A4, A1, A2} −

B4

  • {A2, A3, A4}

We evaluate Q1() = Rh(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) using T1: πA1Rh(A1), S(A2, A3)

  • N1−ǫ·N=N2−ǫ

covers B1, πA1Rh(A1), T(A3, A4)

  • N1−ǫ·N=N2−ǫ

covers B2 For ǫ = 1/2, the time to compute Q1 is N3/2.

76 / 99

slide-142
SLIDE 142

Evaluation of the 4-Cycle Boolean Query in O(N3/2)

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) Recall the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 :

B3

  • {A4, A1, A2} −

B4

  • {A2, A3, A4}

We evaluate Q2() = Rl(A1, A2), S(A2, A3), Th(A3, A4), W (A4, A1) using T1: πA3Th(A3), Rl(A1, A2)

  • N1−ǫ·N=N2−ǫ

covers B1, πA3Th(A3), W (A1, A4)

  • N1−ǫ·N=N2−ǫ

covers B2 For ǫ = 1/2, the time to compute Q2 is N3/2.

77 / 99

slide-143
SLIDE 143

Evaluation of the 4-Cycle Boolean Query in O(N3/2)

Q() = R(A1, A2), S(A2, A3), T(A3, A4), W (A4, A1) Recall the two decompositions: T1 :

B1

  • {A1, A2, A3} −

B2

  • {A1, A3, A4}

T2 :

B3

  • {A4, A1, A2} −

B4

  • {A2, A3, A4}

We evaluate Q3() = Rl(A1, A2), S(A2, A3), Tl(A3, A4), W (A4, A1) using T2: W (A4, A1), Rl(A1, A2)

  • N·Nǫ=N1+ǫ

covers B1, S(A2, A3), Tl(A3, A4)

  • N·Nǫ=N1+ǫ

covers B2 For ǫ = 1/2, the time to compute Q3 is N3/2.

78 / 99

slide-144
SLIDE 144

Covers: Relational Counterparts of Factorizations

Factorized representations are not relational :(

◮ This makes it difficult to integrate them into relational data systems

Covers of Query Results [KO17]

◮ Relations that are lossless representations of query results, yet are as succinct as factorized representations ◮ For a join query Q and any database of size N, a cover has size O(Nfhtw(Q)) and can be computed in time O(Nfhtw(Q))

How to get a cover?

◮ Construct a hypertree decomposition of the query ◮ Project query result onto the bags of the hypertree decomposition ◮ Construct on these projections the hypergraph of the query result ◮ Take a minimal edge cover of this hypergraph

79 / 99

slide-145
SLIDE 145

Recall the Itemized Customer Orders Example

Orders (O for short) customer day dish Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog Dish (D for short) dish item burger patty burger

  • nion

burger bun hotdog bun hotdog

  • nion

hotdog sausage Items (I for short) item price patty 6

  • nion

2 bun 2 sausage 4

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

80 / 99

slide-146
SLIDE 146

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

81 / 99

slide-147
SLIDE 147

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

82 / 99

slide-148
SLIDE 148

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

83 / 99

slide-149
SLIDE 149

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

84 / 99

slide-150
SLIDE 150

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

85 / 99

slide-151
SLIDE 151

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

86 / 99

slide-152
SLIDE 152

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

87 / 99

slide-153
SLIDE 153

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

88 / 99

slide-154
SLIDE 154

The Hypergraph of the Query Result

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

89 / 99

slide-155
SLIDE 155

A Minimal Edge Cover of the Hypergraph

Elise Monday burger Elise Friday burger burger patty burger

  • nion

burger bun patty 6

  • nion

2 bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

90 / 99

slide-156
SLIDE 156

A Cover of (a part of) the Query Result

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2

customer,day,dish dish,item item,price

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

91 / 99

slide-157
SLIDE 157

References

LW49 An inequality related to the isoperimetric inequality. Loomis, Whitney. In Bull. Amer. Math. Soc., 55 (1949). https://www.ams.org/journals/bull/1949-55-10/ A81 On the number of subgraphs of prescribed type of graphs with a given number

  • f edges.
  • Alon. In Israel J. Math., 38 (1981).

https://link.springer.com/content/pdf/10.1007/BF02761855.pdf BT95 Projections of bodies and hereditary properties of hypergraphs. Bolloba´ as, Thomason. In Bull. London Math. Soc., 27 (1995). https://pdfs.semanticscholar.org/02c2/ 9f48e698ccbe7854be8012439c535453634f.pdf AYZ97 Finding and counting given length cycles. Alon, Yuster, Zwick. In Algorithmica 17, 3 (1997). https://m.tau.ac.il/~nogaa/PDFS/ayz97.pdf GLS99 Hypertree decompositions and tractable queries. Gottlob, Leone, Scarcello. In PODS 1999. https://arxiv.org/abs/cs/9812022 AGM08 Size bounds and query plans for relational joins. Atserias, Grohe, Marx. In FOCS 2008 and SIAM J. Comput., 42(4) 2013. http://epubs.siam.org/doi/10.1137/110859440

92 / 99

slide-158
SLIDE 158

References

M10 Approximating fractional hypertree width.

  • Marx. In ACM TALG 2010.

http://dl.acm.org/citation.cfm?id=1721845 NPRR12 Worst-case optimal join algorithms: [extended abstract] Ngo, Porat, R´ e, Rudra. In PODS 2012. http://dl.acm.org/citation.cfm?id=2213565 OZ12 Factorised representations of query results: size bounds and readability. Olteanu, Zavodny. In ICDT 2012. http://dl.acm.org/citation.cfm?doid=2274576.2274607 Also https://arxiv.org/abs/1104.0867, April 2011. GLVV12 Size and treewidth bounds for conjunctive queries. Gottlob, Lee, Valiant, Valiant. In J. ACM, 59 (2012). https://www.cs.ox.ac.uk/files/5024/GLVV_7_11_conjqueries_jacm.pdf NRR13 Skew Strikes Back: New Developments in the Theory of Join Algorithms. Ngo, R´ e, Rudra. In SIGMOD Rec. 2013. https://arxiv.org/abs/1310.3314 V14 Triejoin: A Simple, Worst-Case Optimal Join Algorithm.

  • Veldhuizen. In ICDT 2014.

http://openproceedings.org/ICDT/2014/paper_13.pdf

93 / 99

slide-159
SLIDE 159

References

OZ15 Size Bounds for Factorised Representations of Query Results. Olteanu, Zavodny. In ACM TODS 2015 (submitted July 2013). http://dl.acm.org/citation.cfm?doid=2656335 ANS16 Computing join queries with functional dependencies. Abo Khamis, Ngo, Suciu. In PODS 2017. https://arxiv.org/abs/1604.00111 GT17 Entropy Bounds for Conjunctive Queries with Functional Dependencies. Gogacz, Torunczyk. In ICDT 2017. http://drops.dagstuhl.de/opus/volltexte/2017/7047/ ANS17 What do Shannon-type inequalities, submodular width, and disjunctive Datalog have to do with one another? Abo Khamis, Ngo, Suciu. In PODS 2017. https://arxiv.org/abs/1612.02503 KO18 Covers of Query Results. Kara, Olteanu. In ICDT 2018. https://arxiv.org/abs/1709.01600 N18 Worst-Case Optimal Join Algorithms: Techniques, Results, and Open Problems.

  • Ngo. In PODS 2018.

https://arxiv.org/abs/1803.09930

94 / 99

slide-160
SLIDE 160

Introduction by Examples Decompositions and Variable Orders Size Bounds for Join Results Worst-Case Optimal Join Algorithms Further Work and References Quiz

95 / 99

Outline of Part 1: Joins

slide-161
SLIDE 161

QUIZ on Joins (1/4)

For each of the following queries, please show the following:

  • 1. A hypertree decomposition and an equivalent variable order
  • 2. The fractional edge cover number and the fractional hypertree width

Path Query of length n: Pn(X1, . . . , Xn+1) = R1(X1, X2), R2(X2, X3), R3(X3, X4), . . . , Rn(Xn, Xn+1).

96 / 99

slide-162
SLIDE 162

QUIZ on Joins (2/4)

For each of the following queries, please show the following:

  • 1. A hypertree decomposition and an equivalent variable order
  • 2. The fractional edge cover number and the fractional hypertree width

Loop Query of length n: Ln(X1, . . . , Xn+1) =R1(X1, X2), R2(X2, X3), R3(X3, X4), . . . , Rn(Xn, X1).

97 / 99

slide-163
SLIDE 163

QUIZ on Joins (3/4)

For each of the following queries, please show the following:

  • 1. A hypertree decomposition and an equivalent variable order
  • 2. The fractional edge cover number and the fractional hypertree width

Bowtie Query: Q⊲

⊳(A, B, C, D, E) = R1(A, C), R2(A, B), R3(B, C), R4(C, E), R5(E, D), R6(C, D).

A E B D C R2 R5 R3 R6 R1 R4

98 / 99

slide-164
SLIDE 164

QUIZ on Joins (4/4)

For each of the following queries, please show the following:

  • 1. A hypertree decomposition and an equivalent variable order
  • 2. The fractional edge cover number and the fractional hypertree width

Loomis-Whitney Queries of length n: A LWn query has n variables X1, . . . , Xn and n relation symbols such that for every i ∈ [n] the relation symbol Ri has variables {X1, . . . , Xn} − {Xi}: LWn(X1, . . . , Xn) = R1(X2, . . . , Xn), . . . , Ri(X1, . . . , Xi−1, Xi+1, . . . , Xn), . . . , Rn(X1, . . . , Xn−1) LWn captures the Loomis–Whitney inequality: Estimate the ”size” of a d-dimensional set by the sizes of its (d − 1)-dimensional projections. LW3 is the triangle query.

99 / 99