Joins Aggregates Optimization https://fdbresearch.github.io Dan - - PowerPoint PPT Presentation

joins aggregates optimization
SMART_READER_LITE
LIVE PREVIEW

Joins Aggregates Optimization https://fdbresearch.github.io Dan - - PowerPoint PPT Presentation

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 23, 2018 1 / 78 Acknowledgements Some work reported in this course has been done in the context of the FDB


slide-1
SLIDE 1

1 / 78

Joins → Aggregates → Optimization

https://fdbresearch.github.io Dan Olteanu

PhD Open School University of Warsaw November 23, 2018

slide-2
SLIDE 2

Acknowledgements

Some work reported in this course has been done in the context of the FDB project, LogicBlox, and RelationalAI by Zavodn´ y, Schleich, Kara, Nikolic, Zhang, Ciucanu, and Olteanu (Oxford) Abo Khamis and Ngo (RelationalAI), Nguyen (U. Michigan) Some of the following slides are derived from presentations by Abo Khamis (optimization diagrams) Kara (covers, IVMǫ, and many graphics) Ngo (functional aggregate queries) Schleich (performance and quizzes) Lastly, Kara and Schleich proofread the slides. I would like to thank them for their support!

2 / 78

slide-3
SLIDE 3

Goal of This Course

Introduction to a principled approach to in-database computation This course starts where mainstream databases courses finish. Part 1: Joins

Part 2: Aggregates

◮ Important functionality of DB query languages and essential for applications ◮ Natural generalization of aggregates over joins can express problems across Computer Science in, e.g., DB, logic, probabilistic graphical models [ANR16] ◮ Algorithm with lowest known computational complexity [BKOZ13,ANR16] ◮ Aggregates under data updates [NO18,KNNOZ19]

Part 3: Optimization

3 / 78

slide-4
SLIDE 4

4 / 78

Outline of Part 2: Aggregates

slide-5
SLIDE 5

FAQ: Functional Aggregate Queries [ANR16]

We use the following notation (i ∈ [n] = {1, . . . , n}): Xi are variables, xi are values in discrete domain Dom(Xi) x = (x1, . . . , xn) ∈ Dom(X1) × · · · × Dom(Xn) For any S ⊆ [n], xS = (xi)i∈S ∈

  • i∈S

Dom(Xi) e.g. x{2,5,8} = (x2, x5, x8) ∈ Dom(X2) × Dom(X5) × Dom(X8)

5 / 78

slide-6
SLIDE 6

Functional Aggregate Query: The Problem

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4)

6 / 78

slide-7
SLIDE 7

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

n = 4

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D

7 / 78

slide-8
SLIDE 8

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

V = {1, 2, 3, 4} E = {{1, 4}, {1, 3}, {2, 3}, {1, 2, 4}}

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D

7 / 78

slide-9
SLIDE 9

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

V = {1, 2, 3, 4} E = {{1, 4}, {1, 3}, {2, 3}, {1, 2, 4}} V = {1, 2, 3, 4}

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D

7 / 78

slide-10
SLIDE 10

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

V = {1, 2, 3, 4} E = {{1, 4}, {1, 3}, {2, 3}, {1, 2, 4}} E = {{1, 4}, {1, 3}, {2, 3}, {1, 2, 4}}

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D

7 / 78

slide-11
SLIDE 11

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

V = {1, 2, 3, 4} E = {{1, 4}, {1, 3}, {2, 3}, {1, 2, 4}}

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D R+, {true, false}, {0, 1}, 2U, etc.

7 / 78

slide-12
SLIDE 12

Functional Aggregate Query: The Input

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

F = {3}

n variables X1, . . . , Xn a multi-hypergraph H = (V, E)

◮ Each vertex is a variable (notation overload: V = [n]) ◮ To each hyperedge S ∈ E there corresponds a factor ψS ψS :

  • i∈S

Dom(Xi) → D R+, {true, false}, {0, 1}, 2U, etc.

a set F ⊆ V of free variables (wlog, F = [f ] = {1, . . . , f })

7 / 78

slide-13
SLIDE 13

Functional Aggregate Query: The Output

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

Compute the function ϕ :

i∈F Dom(Xi) → D.

8 / 78

slide-14
SLIDE 14

Functional Aggregate Query: The Output

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

Compute the function ϕ :

i∈F Dom(Xi) → D.

ϕ defined by the FAQ-expression ϕ(x[f ]) = (f +1)

xf +1∈Dom(Xf +1)

· · · (n−1)

xn−1∈Dom(Xn−1)

(n)

xn∈Dom(Xn)

  • S∈E

ψS(xS)

8 / 78

slide-15
SLIDE 15

Functional Aggregate Query: The Output

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

Compute the function ϕ :

i∈F Dom(Xi) → D.

ϕ defined by the FAQ-expression ϕ(x[f ]) = (f +1)

xf +1∈Dom(Xf +1)

· · · (n−1)

xn−1∈Dom(Xn−1)

(n)

xn∈Dom(Xn)

  • S∈E

ψS(xS) For each (i)

8 / 78

slide-16
SLIDE 16

Functional Aggregate Query: The Output

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

Compute the function ϕ :

i∈F Dom(Xi) → D.

ϕ defined by the FAQ-expression ϕ(x[f ]) = (f +1)

xf +1∈Dom(Xf +1)

· · · (n−1)

xn−1∈Dom(Xn−1)

(n)

xn∈Dom(Xn)

  • S∈E

ψS(xS) For each (i)

◮ Either

  • D, (i),

is a commutative semiring

8 / 78

slide-17
SLIDE 17

Functional Aggregate Query: The Output

FAQ-expression ϕ(x3) =

x1

  • x2 maxx4 ψ1,2,4ψ2,3ψ1,3ψ1,4

ϕ(X3) ψ124(X1, X2, X4) ψ23(X2, X3) ψ13(X1, X3) ψ14(X1, X4) All functions have the same range D

Compute the function ϕ :

i∈F Dom(Xi) → D.

ϕ defined by the FAQ-expression ϕ(x[f ]) = (f +1)

xf +1∈Dom(Xf +1)

· · · (n−1)

xn−1∈Dom(Xn−1)

(n)

xn∈Dom(Xn)

  • S∈E

ψS(xS) For each (i)

◮ Either

  • D, (i),

is a commutative semiring ◮ Or (i) =

8 / 78

slide-18
SLIDE 18

Commutative Semirings

(D, ⊕, ⊗) is a commutative semiring when (D, ⊕) is a commutative monoid with identity element 0:

◮ (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a

(D, ⊗) is a commutative monoid with identity element 1:

◮ (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a

Multiplication distributes over addition:

◮ a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)

Multiplication by 0 annihilates D:

◮ 0 ⊗ a = a ⊗ 0 = 0

9 / 78

slide-19
SLIDE 19

Commutative Semirings

(D, ⊕, ⊗) is a commutative semiring when (D, ⊕) is a commutative monoid with identity element 0:

◮ (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a

(D, ⊗) is a commutative monoid with identity element 1:

◮ (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a

Multiplication distributes over addition:

◮ a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)

Multiplication by 0 annihilates D:

◮ 0 ⊗ a = a ⊗ 0 = 0

Common examples (there are many more!) Boolean ({true, false}, ∨, ∧) sum-product (R, +, ×) max-product (R+, max, ×) set (2U, ∪, ∩)

9 / 78

slide-20
SLIDE 20

SumProduct ⊂ FAQ

Problem (SumProduct)

Given a commutative semiring (D, ⊕, ⊗), compute the function ϕ(x1, . . . , xf ) =

  • xf +1
  • xf +2

· · ·

  • xn
  • S∈E

ψS(xS) For ⊕ = + and ⊗ = ∗, ϕ can be expressed in SQL as: SELECT x1, . . . , xf , SUM(R1.val ∗ · · · ∗ Rn.val) FROM R1 NATURAL JOIN . . . Rn GROUP BY x1, . . . , xf ; where each function ψi over variables XS is encoded as a relation Ri over XS and an additional variable val to account for the values of ψi. This formulation is equivalent to: SumProduct [D99] Marginalize a Product Function [AM00]

10 / 78

slide-21
SLIDE 21

Many examples for SumProduct

({true, false}, ∨, ∧)

◮ Constraint satisfaction problems ◮ Boolean conjunctive query evaluation ◮ SAT ◮ k-colorability ◮ etc.

(U, ∪, ∩)

◮ Conjunctive query evaluation

(R, +, ×)

◮ Permanent ◮ DFT ◮ Inference in probabilistic graphical models ◮ #CSP ◮ Matrix chain multiplication ◮ Aggregates in DB

(R+, max, ×)

◮ MAP queries in probabilistic graphical models

11 / 78

slide-22
SLIDE 22

SumProduct Example 1: Boolean Query Evaluation

Boolean Conjunctive Queries: Boolean query Φ with set rels(Φ) of relation symbols Each relation symbol R ∈ rels(Φ) has variables vars(R) Φ = ∃X1 . . . ∃Xn :

  • R∈rels(Φ)

R(vars(R)) FAQ encoding: φ =

  • x
  • S∈E

ψS(xS), where φ has the hypergraph (V, E) with V =

R∈rels(Φ) vars(R) and E = {vars(R) | R ∈ rels(Φ)}

For each S ∈ E, there is a factor ψS such that ψS(xS) = (xS ∈ R)

12 / 78

slide-23
SLIDE 23

SumProduct Example 2: Matrix Chain Multiplication

Compute the product A = A1 · · · An of n matrices Each matrix Ai is over field F and has dimensions pi × pi+1 FAQ encoding: We use n + 1 variables X1, . . . , Xn+1 with domains Dom(Xi) = [pi] Each matrix Ai can be viewed as a function of two variables: ψi,i+1 : Dom(Xi) × Dom(Xi+1) → F, where ψi,i+1(xi, xi+1) = (Ai)xi xi+1 The problem is now to compute the FAQ expression φ(x1, xn+1) =

  • x2∈Dom(X2)

· · ·

  • xn∈Dom(Xn)
  • i∈[n]

ψi,i+1(xi, xi+1).

13 / 78

slide-24
SLIDE 24

SumProduct Example 3: Queries in Graphical Models

Discrete undirected graphical model represented by a hypergraph (V, E) V = {X1, . . . , Xn} consists of n discrete random variables There is a factor ψS :

i∈S Dom(Xi) → R+ for each edge S ∈ E

FAQ expression to compute the marginal Maximum A Posteriori estimates: φ(x1, . . . , xf ) = max

xf +1∈Dom(Xf +1) · · ·

max

xn∈Dom(Xn)

  • S∈E

ψS(xS) FAQ expression to compute the marginal distribution of variables X1, . . . , Xf : φ(x1, . . . , xf ) =

  • xf +1∈Dom(Xf +1)

· · ·

  • xn∈Dom(Xn)
  • S∈E

ψS(xS) For conditional distributions prob(XA | XB = xB), we set XB to xB.

14 / 78

slide-25
SLIDE 25

15 / 78

Outline of Part 2: Aggregates

slide-26
SLIDE 26

How to compute a SumProduct FAQ ϕ

Find a variable order for ϕ Compute ϕ by eliminating variables in the given order This is a dynamic programming algorithm. Two flavours:

◮ FDB: Top-down with memoization (caching) [BKOZ13] We exemplify two variants:

  • 1. Compute the factorized join and the aggregates in one pass over the factorization
  • 2. Translate the factorized computation into relational queries

◮ InsideOut: Bottom-up with indicator projections [ANR16]

The complexity is given by the width of the variable order: Given a database of size N, an FAQ ϕ, a variable order for ϕ with width w, ϕ can be computed in time O(Nw + |OUT|), where |OUT| is the output size.

16 / 78

slide-27
SLIDE 27

Finding a Variable Order for a SumProduct FAQ ϕ

First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free?

17 / 78

slide-28
SLIDE 28

Finding a Variable Order for a SumProduct FAQ ϕ

First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free? The free variables sit above the bound variables in ∆. [BKOZ13,OZ15] Equivalent constraint for hypertree decompositions: [ANR16] Take a hypertree decomposition for the join part of ϕ such that the bags with the free variables form a connected subtree. Implication on complexity: The width for ϕ is at least the width for its join part ⇒ ϕ may be more expensive than its join part if it has free variables This new width is called the FAQ-width in the literature [ANR16]

17 / 78

slide-29
SLIDE 29

Computing COUNT over Factorized Join using FDB

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish ∅ day {dish} item {dish} customer {dish, day} price {item}

ϕ =

... O(customer, day, dish) · D(dish, item) · I(item, price)

In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to (N, +, ∗):

◮ values → 1 ∪ → + × → ∗

18 / 78

slide-30
SLIDE 30

Computing COUNT over Factorized Join using FDB

+ 1 1 ∗ ∗ + 1 1 1 ∗ ∗ ∗ + 1 + 1 ∗ + 1 1 + 1 1 1 ∗ ∗ ∗ + + + 1 1 1 + 1 ∗ + 1 1 ∗ + 1 dish ∅ day {dish} item {dish} customer {dish, day} price {item}

12 6 6 2 3 1 1 1 1 1 3 2 1 2

ϕ =

... O(customer, day, dish) · D(dish, item) · I(item, price)

In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to (N, +, ∗):

◮ values → 1 ∪ → + × → ∗

19 / 78

slide-31
SLIDE 31

Computing SUM over Factorized Join using FDB

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish ∅ day {dish} item {dish} customer {dish, day} price {item}

ϕ =

... f (dish)·price·O(customer, day, dish)·D(dish, item)·I(item, price)

In SQL: SELECT SUM(f (dish) * price) FROM O NATURAL JOIN .. I;

◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price → 1, ∪ → +, × → ∗.

20 / 78

slide-32
SLIDE 32

Computing SUM over Factorized Join using FDB

+ f (burger) f (hotdog) ∗ ∗ + 1 1 1 ∗ ∗ ∗ + 4 + 1 ∗ + 1 1 + 1 1 1 ∗ ∗ ∗ + + + 6 2 2 + 1 ∗ + 1 1 ∗ + 1 dish ∅ day {dish} item {dish} customer {dish, day} price {item}

20∗f (burger)+16∗f (hotdog) 16 20 2 10 1 1 6 2 2 8 2 4 2

ϕ =

... f (dish)·price·O(customer, day, dish)·D(dish, item)·I(item, price)

In SQL: SELECT SUM(f (dish) * price) FROM O NATURAL JOIN .. I;

◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price → 1, ∪ → +, × → ∗.

21 / 78

slide-33
SLIDE 33

Computing SUM over Factorized Join using FDB

+ f (burger) f (hotdog) ∗ ∗ + 1 1 1 ∗ ∗ ∗ + 4 + 1 ∗ + 1 1 + 1 1 1 ∗ ∗ ∗ + + + 6 2 2 + 1 ∗ + 1 1 ∗ + 1 dish ∅ day {dish} item {dish} customer {dish, day} price {item}

20∗f (burger)+16∗f (hotdog) 16 20 2 10 1 1 6 2 2 8 2 4 2

If f turns dish into indicator vectors: ϕ(dish) =

... price · O(customer, day, dish) · D(dish, item) · I(item, price)

In SQL: SELECT dish, SUM(price) FROM O NATURAL JOIN..I GROUP BY dish;

22 / 78

slide-34
SLIDE 34

To Compute or Not To Compute the Factorized Join

Aggregates can be computed without materializing the factorized join [OZ15,OS16,ANNOS18a+b] The factorized join becomes the trace of the aggregate computation This is called factorized aggregate computation

23 / 78

slide-35
SLIDE 35

Example of Factorized Computation via Query Rewriting

The 4-path count query Q4 on a graph with 4 copies of the edge relation E: Q4() =

  • a,b,c,d,e

V1(a) · E1(a, b) · E2(b, c) · E3(c, d) · E4(d, e) · V2(e)

  • J(a,b,c,d,e)

C B D A E V1 V2 E1 E4 E2 E3 C B D A E key(A) = {B} key(E) = {D} key(B) = key(D) = {C} key(C) = ∅

24 / 78

slide-36
SLIDE 36

Example of Factorized Computation via Query Rewriting

The 4-path count query Q4 on a graph with 4 copies of the edge relation E: Q4() =

  • a,b,c,d,e

V1(a) · E1(a, b) · E2(b, c) · E3(c, d) · E4(d, e) · V2(e)

  • J(a,b,c,d,e)

C B D A E V1 V2 E1 E4 E2 E3 C B D A E key(A) = {B} key(E) = {D} key(B) = key(D) = {C} key(C) = ∅

Sizes for listing/factorized representations of the result of the join J of Q4 ρ∗(J) = 3 ⇒ listing representation has size O(|E|3). fhtw(J) = 1 ⇒ factorization with caching has size O(|E|).

24 / 78

slide-37
SLIDE 37

Example of Factorized Computation via Query Rewriting

We would like to compute Q4: in O(|E|) time (no free variables, so use best variable order) using optimized queries that are derived from the variable order of Q4 without materializing the factorized join J

25 / 78

slide-38
SLIDE 38

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒

26 / 78

slide-39
SLIDE 39

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ U1(b) =

  • a

V1(a) · E1(b, a)

26 / 78

slide-40
SLIDE 40

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ C D E U2 E3 V2 E4 ⇒ U1(b) =

  • a

V1(a) · E1(b, a) U2(c) =

  • b

E2(c, b) · U1(b)

26 / 78

slide-41
SLIDE 41

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ C D E U2 E3 V2 E4 ⇒ C D U2 E3 U3 ⇒ U1(b) =

  • a

V1(a) · E1(b, a) U2(c) =

  • b

E2(c, b) · U1(b) U3(d) =

  • e

V2(e) · E4(d, e)

26 / 78

slide-42
SLIDE 42

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ C D E U2 E3 V2 E4 ⇒ C D U2 E3 U3 ⇒ C U2 U4 ⇒ U1(b) =

  • a

V1(a) · E1(b, a) U2(c) =

  • b

E2(c, b) · U1(b) U3(d) =

  • e

V2(e) · E4(d, e) U4(c) =

  • d

E3(c, d) · U3(d)

26 / 78

slide-43
SLIDE 43

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ C D E U2 E3 V2 E4 ⇒ C D U2 E3 U3 ⇒ C U2 U4 ⇒ U5 U1(b) =

  • a

V1(a) · E1(b, a) U2(c) =

  • b

E2(c, b) · U1(b) U3(d) =

  • e

V2(e) · E4(d, e) U4(c) =

  • d

E3(c, d) · U3(d) U5 =

  • c

U2(c) · U4(c)

26 / 78

slide-44
SLIDE 44

Example of Factorized Computation via Query Rewriting

V1 V2 E1 E4 E2 E3 C B D A E ⇒ C B D E U1 E2 V2 E3 E4 ⇒ C D E U2 E3 V2 E4 ⇒ C D U2 E3 U3 ⇒ C U2 U4 ⇒ U5

This computation strategy corresponds to the following query rewriting:

  • a,b,c,d,e

V1(a) · E1(b, a) · E2(c, b) · E3(c, d) · E4(d, e) · V2(e) =

  • c
  • b

E2(c, b) ·

  • a

V1(a) · E1(b, a)

  • ·
  • d

E3(c, d) ·

  • e

E4(d, e) · V2(e)

  • 27 / 78
slide-45
SLIDE 45

Example of FAQ Computation using InsideOut

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3)·ψ2(x2, x4, x5)·ψ3(x4, x5, x6)·ψ4(x6, x8)·ψ5(x5, x7)

X2 X1 X3 X4 X5 X7 X6 X8 ψ1 ψ2 ψ3 ψ4 ψ5 X2 X1 X3 X4 X5 X7 X6 X8 key(X2) = ∅ key(X1) = {X2} key(X3) = {X1, X2} key(X4) = {X2} key(X5) = {X2, X4} key(X6) = {X4, X5} key(X8) = {X6} key(X7) = {X5}

28 / 78

slide-46
SLIDE 46

Example of FAQ Computation using InsideOut

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3)·ψ2(x2, x4, x5)·ψ3(x4, x5, x6)·ψ4(x6, x8)·ψ5(x5, x7)

X2 X1 X3 X4 X5 X7 X6 X8 ψ1 ψ2 ψ3 ψ4 ψ5 X2 X1 X3 X4 X5 X7 X6 X8 key(X2) = ∅ key(X1) = {X2} key(X3) = {X1, X2} key(X4) = {X2} key(X5) = {X2, X4} key(X6) = {X4, X5} key(X8) = {X6} key(X7) = {X5}

ρ∗(ϕ) = 4, s(ϕ) = 2, fhtw(ϕ) = 1. The above variable order ∆ has the free variables x1, x2, x4 on top of the others and fhtw(∆) = 1. The query result has size: O(N) when factorized; O(N2) when listed

28 / 78

slide-47
SLIDE 47

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

29 / 78

slide-48
SLIDE 48

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

x3

ψ1(x1, x2, x3)

  • ψ6(x1,x2)
  • · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

30 / 78

slide-49
SLIDE 49

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

31 / 78

slide-50
SLIDE 50

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) ·

x8

ψ4(x6, x8)

  • ψ7(x6)
  • · ψ5(x5, x7)

32 / 78

slide-51
SLIDE 51

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

33 / 78

slide-52
SLIDE 52

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) ·

x7

ψ5(x5, x7)

  • ψ8(x5)
  • 34 / 78
slide-53
SLIDE 53

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 ⇒ X2 X1 X4 X5 X6

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ8(x5)

  • O(N)

35 / 78

slide-54
SLIDE 54

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 ⇒ X2 X1 X4 X5 X6

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) =

  • x5

ψ6(x1, x2) · ψ2(x2, x4, x5) ·

x6

ψ3(x4, x5, x6) · ψ7(x6)

  • ψ9(x4,x5)
  • · ψ8(x5)

36 / 78

slide-55
SLIDE 55

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 ⇒ X2 X1 X4 X5 X6 ⇒ X2 X1 X4 X5

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) =

  • x5

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ9(x4, x5) · ψ8(x5)

  • O(N)

37 / 78

slide-56
SLIDE 56

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 ⇒ X2 X1 X4 X5 X6 ⇒ X2 X1 X4 X5

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) =

  • x5

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ9(x4, x5) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) = ψ6(x1, x2) ·

x5

ψ2(x2, x4, x5) · ψ9(x4, x5) · ψ8(x5)

  • ψ10(x2,x4)
  • 38 / 78
slide-57
SLIDE 57

Example of FAQ Computation using InsideOut

X2 X1 X3 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 X8 ⇒ X2 X1 X4 X5 X7 X6 ⇒ X2 X1 X4 X5 X6 ⇒ X2 X1 X4 X5 ⇒ X2 X1 X4

ϕ(x1, x2, x4) =

  • x3,x5,x6,x7,x8

ψ1(x1, x2, x3) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7) ϕ(x1, x2, x4) =

  • x5,x6,x7,x8

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ4(x6, x8) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6,x7

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ5(x5, x7)

  • O(N)

ϕ(x1, x2, x4) =

  • x5,x6

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ3(x4, x5, x6) · ψ7(x6) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) =

  • x5

ψ6(x1, x2) · ψ2(x2, x4, x5) · ψ9(x4, x5) · ψ8(x5)

  • O(N)

ϕ(x1, x2, x4) = ψ6(x1, x2) · ψ10(x2, x4)

  • O(N)

39 / 78

slide-58
SLIDE 58

Example of FAQ Computation with Indicator Projections

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2)·ψ2(x2, x3)·ψ3(x3, x1)·ψ4(x1, x4)·ψ5(x4, x5)·ψ6(x5, x1)

X1 X2 X3 X4 X5 ψ1 ψ2 ψ3 ψ6 ψ5 ψ4 X1 X2 X3 X4 X5 key(X1) = ∅ key(X2) = {X1} key(X3) = {X1, X2} key(X4) = {X1} key(X5) = {X1, X4}

40 / 78

slide-59
SLIDE 59

Example of FAQ Computation with Indicator Projections

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2)·ψ2(x2, x3)·ψ3(x3, x1)·ψ4(x1, x4)·ψ5(x4, x5)·ψ6(x5, x1)

X1 X2 X3 X4 X5 ψ1 ψ2 ψ3 ψ6 ψ5 ψ4 X1 X2 X3 X4 X5 key(X1) = ∅ key(X2) = {X1} key(X3) = {X1, X2} key(X4) = {X1} key(X5) = {X1, X4}

ρ∗(ϕ) = 2.5, s(ϕ) = 1.5, fhtw(ϕ) = 1.5. The above variable order ∆ has the free variable x1 on top of the others and fhtw(∆) = 1.5. The (unary) query result has size O(N) when factorized or listed.

40 / 78

slide-60
SLIDE 60

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

41 / 78

slide-61
SLIDE 61

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) ·

x3

ψ′

1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ′ 4(x1) · ψ′ 6(x1)

  • ψ7(x1,x2)
  • ·

ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

ψ′

1 is an indicator projection of ψ1 (similarly, ψ′ 4 and ψ′ 6):

It has the same support as ψ1, i.e., same tuples (x1, x2) ψ′

1(x1, x2) = 1 even in case ψ1(x1, x2) = 1 and ψ1(x1, x2) = 0

42 / 78

slide-62
SLIDE 62

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

43 / 78

slide-63
SLIDE 63

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

x2

ψ1(x1, x2) · ψ7(x1, x2) · ψ′

4(x1) · ψ′ 6(x1)

  • ψ8(x1)
  • ·

ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

The indicator projections ψ′

4 and ψ′ 6 are redundant here, as they were already

used for computing φ7.

44 / 78

slide-64
SLIDE 64

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5 ⇒ X1 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

ψ8(x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N)

45 / 78

slide-65
SLIDE 65

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5 ⇒ X1 X4 X5

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

ψ8(x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N)

ϕ(x1) =

  • x4

ψ8(x1) · ψ4(x1, x4) ·

x5

ψ′

8(x1) · ψ′ 4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • ψ9(x1,x4)
  • 46 / 78
slide-66
SLIDE 66

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5 ⇒ X1 X4 X5 ⇒ X1 X4

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

ψ8(x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N)

ϕ(x1) =

  • x4

ψ8(x1) · ψ4(x1, x4) · ψ9(x1, x4)

  • O(N1.5)

47 / 78

slide-67
SLIDE 67

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5 ⇒ X1 X4 X5 ⇒ X1 X4

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

ψ8(x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N)

ϕ(x1) =

  • x4

ψ8(x1) · ψ4(x1, x4) · ψ9(x1, x4)

  • O(N1.5)

ϕ(x1) = ψ8(x1) ·

x4

ψ′

8(x1) · ψ4(x1, x4) · ψ9(x1, x4)

  • ψ10(x1)
  • 48 / 78
slide-68
SLIDE 68

Example of FAQ Computation with Indicator Projections

X1 X2 X3 X4 X5 ⇒ X1 X2 X4 X5 ⇒ X1 X4 X5 ⇒ X1 X4 ⇒ X1

ϕ(x1) =

  • x2,x3,x4,x5

ψ1(x1, x2) · ψ2(x2, x3) · ψ3(x3, x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1) ϕ(x1) =

  • x2,x4,x5

ψ1(x1, x2) · ψ7(x1, x2) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N1.5)

ϕ(x1) =

  • x4,x5

ψ8(x1) · ψ4(x1, x4) · ψ5(x4, x5) · ψ6(x5, x1)

  • O(N)

ϕ(x1) =

  • x4

ψ8(x1) · ψ4(x1, x4) · ψ9(x1, x4)

  • O(N1.5)

ϕ(x1) = ψ8(x1) · ψ10(x1)

  • O(N)

49 / 78

slide-69
SLIDE 69

Is Factorized Aggregate Computation Practical?

A glimpse at performance experiments [ANNOS18b]

Retailer dataset (records) excerpt (17M) full (86M) PostgreSQL computing the join 50.63 sec 216.56 sec Aggregates for a linear regression model FDB computing join+aggregates 25.51 sec 380.31 sec Number of aggregates (scalar+group-by) 595+2,418 595+145k Aggregates for a polynomial regression model FDB computing join+aggregates 132.43 sec 1,819.80 sec Number of aggregates (scalar+group-by) 158k+742k 158k+37M

In this experiment: FDB only used one core of a commodity machine For both PostgreSQL and FDB, the dataset was entirely in memory The aggregates represent gradients (or parts thereof) used for learning degree 1 and 2 polynomial regression models

50 / 78

slide-70
SLIDE 70

51 / 78

Outline of Part 2: Aggregates

slide-71
SLIDE 71

Problem Setting

Maintain the triangle count Q under single-tuple updates to R, S, and T! R T S A B C Q counts the number of tuples in the join of R, S, and T. Q =

a,b,c R(a, b) · S(b, c) · T(c, a)

52 / 78

slide-72
SLIDE 72

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3

53 / 78

slide-73
SLIDE 73

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4

53 / 78

slide-74
SLIDE 74

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9

53 / 78

slide-75
SLIDE 75

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-76
SLIDE 76

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-77
SLIDE 77

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 3 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-78
SLIDE 78

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 1 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-79
SLIDE 79

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 1 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 3 · 1 · 3 = 9 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-80
SLIDE 80

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 1 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 1 · 1 · 3 = 3 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 9 = 19

53 / 78

slide-81
SLIDE 81

Updates to the Triangle Count

R A B a1 b1 2 a2 b1 1 S B C b1 c1 2 b1 c2 1 T C A c1 a1 1 c2 a1 3 c2 a2 3 R · S · T A B C a1 b1 c2 2 · 2 · 1 = 4 a1 b1 c2 2 · 1 · 3 = 6 a2 b1 c3 1 · 1 · 3 = 3 δR = {(a2, b1) → −2} A B a2 b1 −2 Q(D) ∅ ( ) 4 + 6 + 3 = 13

53 / 78

slide-82
SLIDE 82

Data Updates need the Additive Inverse

Data updates can be inserts (tuples with positive multiplicity) and deletes (tuples with negative multiplicity): Semirings are enough if we only want inserts or no updates Recall that FAQs use commutative semirings (D, ⊕, ⊗): (D, ⊕) is a commutative monoid with identity element 0:

◮ (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a

(D, ⊗) is a commutative monoid with identity element 1:

◮ (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a

Multiplication distributes over addition:

◮ a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)

Multiplication by 0 annihilates D:

◮ 0 ⊗ a = a ⊗ 0 = 0

54 / 78

slide-83
SLIDE 83

From Semirings to Rings

We need a commutative ring (D, ⊕, ⊗) if we want to support deletes as well: (D, ⊕) is an abelian group with identity element 0:

◮ (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a ◮ ∃ − a ∈ D : a ⊕ (−a) = (−a) ⊕ a = 0

(D, ⊗) is a commutative monoid with identity element 1:

◮ (a ⊗ b) ⊗ c = a ⊗ (b ⊗ c) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a

Multiplication distributes over addition:

◮ a ⊗ (b ⊕ c) = (a ⊗ b) ⊕ (a ⊗ c)

Multiplication by 0 annihilates D:

◮ 0 ⊗ a = a ⊗ 0 = 0

Examples: Z, Q, R, C, Rn, polynomial ring. We used the ring (Z, +, ∗) in our previous example.

55 / 78

slide-84
SLIDE 84

The Maintenance Problem

database D0

single-tuple update

D1

single-tuple update

D2

single-tuple update

auxiliary data structure A0 A1

maintain

A2

maintain

triangle count Q(D0) Q(D1)

maintain

Q(D2)

maintain

Given a current database D and a single-tuple update, what are the time and space complexities for maintaining Q(D)?

56 / 78

slide-85
SLIDE 85

Much Ado about Triangles

The Triangle Query Served as Milestone in Many Fields Worst-case optimal join algorithms [Algorithmica 1997, SIGMOD R. 2013] Parallel query evaluation [Found. & Trends DB 2018] Randomized approximation in static settings [FOCS 2015] Randomized approximation in data streams

[SODA 2002, COCOON 2005, PODS 2006, PODS 2016, Theor. Comput. Sci. 2017]

Investigation of Answering Queries under Updates Theoretical developments [PODS 2017, ICDT 2018] Systems developments [F. & T. DB 2012, VLDB J. 2014, SIGMOD 2017, 2018] Lower bounds [STOC 2015, ICM 2018]

57 / 78

slide-86
SLIDE 86

Na¨ ıve Maintenance

“Compute from scratch!” δR = {(a′, b′) → m}

  • a,b,c
  • R(a, b) + δR(a, b)
  • newR
  • · S(b, c) · T(c, a)

=

  • a,b,c newR(a, b) · S(b, c) · T(c, a)

Maintenance Complexity Time: O(|D|1.5) using worst-case optimal join algorithms Space: O(|D|) to store input relations

58 / 78

slide-87
SLIDE 87

Classical Incremental View Maintenance [CY12]

“Compute the difference!” δR = {(a′, b′) → m}

  • a,b,c
  • R(a, b) + δR(a, b)
  • · S(b, c) · T(c, a)

=

  • a,b,c R(a, b) · S(b, c) · T(c, a)

+ δR(a′, b′) ·

c S(b′, c) · T(c, a′)

Maintenance Complexity Time: O(|D|) to intersect C-values from S and T Space: O(|D|) to store input relations

59 / 78

slide-88
SLIDE 88

Factorized Incremental View Maintenance [NO18]

“Compute the difference by using pre-materialized views!” δR = {(a′, b′) → m} Pre-materialize VST(b, a) =

c S(b, c) · T(c, a)!

  • a,b,c
  • R(a, b) + δR(a, b)
  • · S(b, c) · T(c, a)

=

  • a,b,c R(a, b) · S(b, c) · T(c, a)

+ δR(a′, b′) · VST(b′, a′) Maintenance Complexity Time for updates to R: O(1) to look up in VST Time for updates to S and T: O(|D|) to maintain VST Space: O(|D|2) to store input relations and VST

60 / 78

slide-89
SLIDE 89

Closing the Complexity Gap

Complexity bounds for the maintenance of the triangle count

Known Upper Bound

Maintenance Time: O(|D|) Space: O(|D|)

Lower Bound

Amortized maintenance time: not O(|D|0.5−γ) for any γ > 0 (under reasonable complexity theoretic assumptions)

61 / 78

slide-90
SLIDE 90

Closing the Complexity Gap

Complexity bounds for the maintenance of the triangle count

Known Upper Bound

Maintenance Time: O(|D|) Space: O(|D|) Can the triangle count be maintained in sublinear time?

Lower Bound

Amortized maintenance time: not O(|D|0.5−γ) for any γ > 0 (under reasonable complexity theoretic assumptions)

61 / 78

slide-91
SLIDE 91

Closing the Complexity Gap

Complexity bounds for the maintenance of the triangle count

Known Upper Bound

Maintenance Time: O(|D|) Space: O(|D|) Can the triangle count be maintained in sublinear time? Yes! IVMε [KNNOZ19] Amortized maintenance time: O(|D|0.5) This is worst-case optimal!

Lower Bound

Amortized maintenance time: not O(|D|0.5−γ) for any γ > 0 (under reasonable complexity theoretic assumptions)

61 / 78

slide-92
SLIDE 92

IVMε Exhibits a Time-Space Tradeoff [KNNOZ19]

Given ε ∈ [0, 1] and a database D, IVMε maintains the triangle count with O(|D|max{ε,1−ε}) amortized update time O(|D|1+min{ε,1−ε}) space O(|D|3/2) preprocessing time O(1) answer time.

0.5 1 O(|D|0.5) O(|D|) O(|D|1.5) ε Space Update Time complexity worst-case optimality ε = 0.5

Known maintenance approaches are recovered by IVMε.

62 / 78

slide-93
SLIDE 93

Main Ideas in IVMε

Compute the difference like in classical IVM! Materialize views like in Factorized IVM! New ingredient: Use adaptive processing based on data skew! = ⇒ Treat heavy values differently from light values!

63 / 78

slide-94
SLIDE 94

Quick Look inside IVMε

Partition R into a light part RL = {t ∈ R | |σA=t.A| < |D|ε}, a heavy part RH = R\RL!

R A B · · a b1 . . . . . . a bn · · · · a′ b′

1

. . . . . . . . . . . . a′ b′

m

· ·

light part

RL A B . . . . . .

heavy part

RH A B . . . . . .

n < |D|ε m ≥ |D|ε

64 / 78

slide-95
SLIDE 95

Quick Look inside IVMε

Partition R into a light part RL = {t ∈ R | |σA=t.A| < |D|ε}, a heavy part RH = R\RL!

R A B · · a b1 . . . . . . a bn · · · · a′ b′

1

. . . . . . . . . . . . a′ b′

m

· ·

light part

RL A B . . . . . .

heavy part

RH A B . . . . . .

n < |D|ε m ≥ |D|ε Derived Bounds for all A-values a: |σA=aRL| < |D|ε |πARH| ≤ |D|1−ε

64 / 78

slide-96
SLIDE 96

Quick Look inside IVMε

Partition R into a light part RL = {t ∈ R | |σA=t.A| < |D|ε}, a heavy part RH = R\RL!

R A B · · a b1 . . . . . . a bn · · · · a′ b′

1

. . . . . . . . . . . . a′ b′

m

· ·

light part

RL A B . . . . . .

heavy part

RH A B . . . . . .

n < |D|ε m ≥ |D|ε Derived Bounds for all A-values a: |σA=aRL| < |D|ε |πARH| ≤ |D|1−ε Likewise, partition S = SL ∪ SH based on B, and T = TL ∪ TH based on C!

64 / 78

slide-97
SLIDE 97

Quick Look inside IVMε

Partition R into a light part RL = {t ∈ R | |σA=t.A| < |D|ε}, a heavy part RH = R\RL!

R A B · · a b1 . . . . . . a bn · · · · a′ b′

1

. . . . . . . . . . . . a′ b′

m

· ·

light part

RL A B . . . . . .

heavy part

RH A B . . . . . .

n < |D|ε m ≥ |D|ε Derived Bounds for all A-values a: |σA=aRL| < |D|ε |πARH| ≤ |D|1−ε Likewise, partition S = SL ∪ SH based on B, and T = TL ∪ TH based on C! Q is the sum of skew-aware views RU(a, b) · SV (b, c) · TW (c, a) with U, V , W ∈ {L, H}.

64 / 78

slide-98
SLIDE 98

Adaptive Maintenance Strategy

Given an update δR∗ = {(a′, b′) → m}, compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time

  • a,b,c

R∗(a, b) · SL(b, c) · TL(c, a) δR∗(a′, b′) ·

c

SL(b′, c) · TL(c, a′) O(|D|ε)

65 / 78

slide-99
SLIDE 99

Adaptive Maintenance Strategy

Given an update δR∗ = {(a′, b′) → m}, compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time

  • a,b,c

R∗(a, b) · SL(b, c) · TL(c, a) δR∗(a′, b′) ·

c

SL(b′, c) · TL(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TH(c, a) δR∗(a′, b′) ·

c

TH(c, a′) · SH(b′, c) O(|D|1−ε)

65 / 78

slide-100
SLIDE 100

Adaptive Maintenance Strategy

Given an update δR∗ = {(a′, b′) → m}, compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time

  • a,b,c

R∗(a, b) · SL(b, c) · TL(c, a) δR∗(a′, b′) ·

c

SL(b′, c) · TL(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TH(c, a) δR∗(a′, b′) ·

c

TH(c, a′) · SH(b′, c) O(|D|1−ε) δR∗(a′, b′) ·

c

SL(b′, c) · TH(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SL(b, c) · TH(c, a)

  • r

δR∗(a′, b′) ·

c

TH(c, a′) · SL(b′, c) O(|D|1−ε)

65 / 78

slide-101
SLIDE 101

Adaptive Maintenance Strategy

Given an update δR∗ = {(a′, b′) → m}, compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time

  • a,b,c

R∗(a, b) · SL(b, c) · TL(c, a) δR∗(a′, b′) ·

c

SL(b′, c) · TL(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TH(c, a) δR∗(a′, b′) ·

c

TH(c, a′) · SH(b′, c) O(|D|1−ε) δR∗(a′, b′) ·

c

SL(b′, c) · TH(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SL(b, c) · TH(c, a)

  • r

δR∗(a′, b′) ·

c

TH(c, a′) · SL(b′, c) O(|D|1−ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TL(c, a) δR∗(a′, b′) · VST(b′, a′) O(1)

65 / 78

slide-102
SLIDE 102

Adaptive Maintenance Strategy

Given an update δR∗ = {(a′, b′) → m}, compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time

  • a,b,c

R∗(a, b) · SL(b, c) · TL(c, a) δR∗(a′, b′) ·

c

SL(b′, c) · TL(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TH(c, a) δR∗(a′, b′) ·

c

TH(c, a′) · SH(b′, c) O(|D|1−ε) δR∗(a′, b′) ·

c

SL(b′, c) · TH(c, a′) O(|D|ε)

  • a,b,c

R∗(a, b) · SL(b, c) · TH(c, a)

  • r

δR∗(a′, b′) ·

c

TH(c, a′) · SL(b′, c) O(|D|1−ε)

  • a,b,c

R∗(a, b) · SH(b, c) · TL(c, a) δR∗(a′, b′) · VST(b′, a′) O(1) Overall update time: O(|D|max{ε,1−ε})

65 / 78

slide-103
SLIDE 103

Materialized Auxiliary Views

VRS(a, c) =

b

RH(a, b) · SL(b, c) VST(b, a) =

c

SH(b, c) · TL(c, a) VTR(c, b) =

a

TH(c, a) · RL(a, b) Maintenance of VRS(a, c) =

b

RH(a, b) · SL(b, c) Update Compute the difference for VRS Time δRH = {(a′, b′) → m} δRH(a′, b′) · SL(b′, c) O(|D|ε) δSL = {(b′, c′) → m} δSL(b′, c′) · RH(a, b′) O(|D|1−ε)

66 / 78

slide-104
SLIDE 104

Materialized Auxiliary Views

VRS(a, c) =

b

RH(a, b) · SL(b, c) VST(b, a) =

c

SH(b, c) · TL(c, a) VTR(c, b) =

a

TH(c, a) · RL(a, b) Maintenance of VRS(a, c) =

b

RH(a, b) · SL(b, c) Update Compute the difference for VRS Time δRH = {(a′, b′) → m} δRH(a′, b′) · SL(b′, c) O(|D|ε) δSL = {(b′, c′) → m} δSL(b′, c′) · RH(a, b′) O(|D|1−ε) Size of VRS(a, c) =

b

RH(a, b) · SL(b, c) |VRS(a, c)| ≤ |RH| · maxb{|SL(b, c)|} = O(|D|1+ε) |VRS(a, c)| ≤ |SL| · maxb{|RH(a, b)|} = O(|D|1+(1−ε))

66 / 78

slide-105
SLIDE 105

Materialized Auxiliary Views

VRS(a, c) =

b

RH(a, b) · SL(b, c) VST(b, a) =

c

SH(b, c) · TL(c, a) VTR(c, b) =

a

TH(c, a) · RL(a, b) Maintenance of VRS(a, c) =

b

RH(a, b) · SL(b, c) Update Compute the difference for VRS Time δRH = {(a′, b′) → m} δRH(a′, b′) · SL(b′, c) O(|D|ε) δSL = {(b′, c′) → m} δSL(b′, c′) · RH(a, b′) O(|D|1−ε) Size of VRS(a, c) =

b

RH(a, b) · SL(b, c) |VRS(a, c)| ≤ |RH| · maxb{|SL(b, c)|} = O(|D|1+ε) |VRS(a, c)| ≤ |SL| · maxb{|RH(a, b)|} = O(|D|1+(1−ε)) Overall: Update Time O(|D|max{ε,1−ε}) and Space O(|D|1+min{ε,1−ε})

66 / 78

slide-106
SLIDE 106

Rebalancing Partitions

Full details available in the paper [KNNOZ19] Updates can change the frequencies of values and the heavy/light threshold! This may require rebalancing of partitions: = ⇒ Minor rebalancing: Transfer tuples from one to the other part of the same relation! = ⇒ Major rebalancing: Recompute partitions and views from scratch! Both forms of rebalancing require superlinear time. The rebalancing times amortize over sequences of updates.

67 / 78

slide-107
SLIDE 107

Lower Bound for Maintaining the Triangle Count

The lower bound already holds for the Boolean Triangle Detection Problem, which is a special case of the Triangle Count. For any γ > 0, there is no algorithm that incrementally maintains the Triangle Detection Problem with amortized update time answer time O(|D|

1 2 −γ)

O(|D|1−γ) unless the Online Vector-Matrix-Vector Multiplication (OuMv) Conjecture fails.

68 / 78

slide-108
SLIDE 108

Lower Bound for Maintaining the Triangle Count

The lower bound already holds for the Boolean Triangle Detection Problem, which is a special case of the Triangle Count. For any γ > 0, there is no algorithm that incrementally maintains the Triangle Detection Problem with amortized update time answer time O(|D|

1 2 −γ)

O(|D|1−γ) unless the Online Vector-Matrix-Vector Multiplication (OuMv) Conjecture fails. The OuMv Problem Input: An n × n Boolean matrix M and n pairs (u1, v1), . . . , (un, vn) of Boolean column-vectors of size n arriving one after the other. Goal: After seeing each pair (ur, vr), output uT

r Mvr

The OuMv Conjecture [HKNS15] For any γ > 0, there is no algorithm that solves the OuMv Problem in time O(n3−γ).

68 / 78

slide-109
SLIDE 109

Proof Sketch (1/2)

Assume there is an algorithm A maintaining Triangle Detection with amortized update time answer time O(|D|

1 2 −γ)

O(|D|1−γ) for some γ > 0. Goal: Design an algorithm B using algorithm A as oracle that solves OuMv in subcubic time. = ⇒ Contradicts the OuMv Conjecture!

69 / 78

slide-110
SLIDE 110

Proof Sketch (1/2)

Assume there is an algorithm A maintaining Triangle Detection with amortized update time answer time O(|D|

1 2 −γ)

O(|D|1−γ) for some γ > 0. Goal: Design an algorithm B using algorithm A as oracle that solves OuMv in subcubic time. = ⇒ Contradicts the OuMv Conjecture! Algorithm B Use relation S to encode the matrix M. In each round r ∈ [n]:

◮ Use relations R and T to encode ur and vr, respectively, such that

uT

r Mvr = 1

if and only if R ✶ S ✶ T = ∅

◮ Check whether R ✶ S ✶ T contains a triangle.

69 / 78

slide-111
SLIDE 111

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1}

70 / 78

slide-112
SLIDE 112

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A)

70 / 78

slide-113
SLIDE 113

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

70 / 78

slide-114
SLIDE 114

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

70 / 78

slide-115
SLIDE 115

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

Time analysis: The database size is O(n2).

70 / 78

slide-116
SLIDE 116

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

Time analysis: The database size is O(n2). Time in (1): O( n2

  • #updates

· (n2)

1 2 −γ

  • update time

) = O(n2 · n1−2γ) = O(n3−2γ)

70 / 78

slide-117
SLIDE 117

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

Time analysis: The database size is O(n2). Time in (1): O( n2

  • #updates

· (n2)

1 2 −γ

  • update time

) = O(n2 · n1−2γ) = O(n3−2γ) Time for each round in (2): O( 4n

  • #updates

· (n2)

1 2 −γ

  • update time

+ (n2)1−γ

answer time

) = O(n2−2γ)

70 / 78

slide-118
SLIDE 118

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

Time analysis: The database size is O(n2). Time in (1): O( n2

  • #updates

· (n2)

1 2 −γ

  • update time

) = O(n2 · n1−2γ) = O(n3−2γ) Time for each round in (2): O( 4n

  • #updates

· (n2)

1 2 −γ

  • update time

+ (n2)1−γ

answer time

) = O(n2−2γ) Time for n rounds in (2): O(n · n2−2γ) = O(n3−2γ)

70 / 78

slide-119
SLIDE 119

Proof Sketch (2/2)

Algorithm B in more detail: (1) Insert at most n2 tuples into S such that S(B, C) = {(i, j) | M(i, j) = 1} (2) In each round r ∈ [n]:

◮ Delete all tuples in R(A, B) and T(C, A) ◮ Insert at most n tuples into R(A, B) and T(C, A) such that

R = {(a, i) | ur(i) = 1} and T = {(i, a) | vr(i) = 1} for a constant a

◮ Check R ✶ S ✶ T = ∅: This holds iff uT

r Mvr = 1

iff ∃i, j ∈ [n] with ur(i) = 1, M(i, j) = 1, and vr(j) = 1

Time analysis: The database size is O(n2). Time in (1): O( n2

  • #updates

· (n2)

1 2 −γ

  • update time

) = O(n2 · n1−2γ) = O(n3−2γ) Time for each round in (2): O( 4n

  • #updates

· (n2)

1 2 −γ

  • update time

+ (n2)1−γ

answer time

) = O(n2−2γ) Time for n rounds in (2): O(n · n2−2γ) = O(n3−2γ) Overall time: subcubic! O(n3−2γ)

70 / 78

slide-120
SLIDE 120

Notes on the OuMv Conjecture

The hardness of many dynamic problems is based on Online Matrix-vector multiplication problem (OMv) OuMv is at least as hard as OMv Examples: [HKNS15] source-target reachability source-target shortest path (in unweighted graphs) transitive closure

71 / 78

slide-121
SLIDE 121

72 / 78

Outline of Part 2: Aggregates

slide-122
SLIDE 122

References

D99 Bucket Elimination: A Unifying Framework for Reasoning.

  • Dechter. In Artif. Intell. 113(1-2): 41-85 (1999)

https://www.ics.uci.edu/~dechter/publications/r48b.pdf AM00 The generalized distributive law. Aji, McEliece. In IEEE Trans. Information Theory 46(2): 325-343 (2000) https://ieeexplore.ieee.org/document/825794/ CY12 Materialized Views. Chirkova, Yang. In Foundations & Trends DB, 4(4):295–405, 2012. https://ieeexplore.ieee.org/document/8187272?arnumber=8187272 BKOZ13 Aggregation and Ordering in Factorised Databases. Bakibayev, Kocisky, Olteanu, Zavodny. In PVLDB 2013. https://arxiv.org/abs/1307.0441 HKNS15 Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. Henzinger, Krinninger, Nanongkai, Saranurak. In STOC 2015. https://arxiv.org/abs/1511.06773 ANR16 FAQ: Questions Asked Frequently. Abo Khamis, Ngo, Rudra. In PODS 2016. https://arxiv.org/abs/1504.04044

73 / 78

slide-123
SLIDE 123

References

OS16 Factorized Databases. Olteanu, Schleich. In SIGMOD Record 45(2): 5-16 (2016). https://dl.acm.org/citation.cfm?doid=3003665.3003667 NO18 Incremental View Maintenance with Triple Lock Factorization Benefits. Nikolic, Olteanu. In SIGMOD 2018. https://arxiv.org/abs/1703.07484 ANNOS18b AC/DC: In-Database Learning Thunderstruck. Abo Khamis, Ngo, Nguyen, Olteanu, Schleich. In DEEM@SIGMOD 2018. https://arxiv.org/abs/1803.07480 KNNOZ19 Counting triangles under updates in worst-case optimal time. Kara, Ngo, Nikolic, Olteanu, Zhang. In ICDT 2019. http://arxiv.org/abs/1804.02780

74 / 78

slide-124
SLIDE 124

75 / 78

Outline of Part 2: Aggregates

slide-125
SLIDE 125

QUIZ on Aggregates (1/3)

For each of of the following functional aggregate queries:

  • 1. Give a hypertree decomposition and variable order.
  • 2. If you were to compute it as stated below (with all sums done after the

products), what would be its time complexity? (Assume all functions have the same size.)

  • 3. Is there an equivalent rewriting of ϕ that would allow for quadratic or even

linear time complexity? The n-hop query: ϕ(x1, xn+1) =

  • x2,...,xn

ψ1(X1, X2) · ψ2(X2, X3) · ψ3(X3, X4) · . . . · ψn(Xn, Xn+1).

76 / 78

slide-126
SLIDE 126

QUIZ on Aggregates (2/3)

For each of of the following functional aggregate queries:

  • 1. Give a hypertree decomposition and variable order.
  • 2. If you were to compute it as stated below (with all sums done after the

products), what would be its time complexity? Assume all functions have the same size.

  • 3. Is there an equivalent rewriting of ϕ that would allow for quadratic or even

linear time complexity? Query: ϕ =

  • a
  • b
  • c
  • f
  • d
  • e

ψ1(a, b) · ψ2(a, c) · ψ3(c, d) · ψ4(b, c, e) · ψ5(e, f ).

77 / 78

slide-127
SLIDE 127

QUIZ on Aggregates (3/3)

Give the update time and necessary space for the maintenance of the following FAQs as a function of the database size and the heavy/light threshold parameter ǫ ∈ [0, 1]: ϕ1 =

a,b,c,d R(a, b) · S(b, c) · T(c, d)

ϕ2 =

a,b,c,d R(a, b) · S(b, c) · T(b, d)

ϕ3 =

a,b,c,d R(a, b) · S(b, c) · T(c, d) · W (d, a)

78 / 78