Multi-join Query Evaluation on Big Data Lecture 3 Dan Suciu March, - - PowerPoint PPT Presentation

multi join query evaluation on big data lecture 3
SMART_READER_LITE
LIVE PREVIEW

Multi-join Query Evaluation on Big Data Lecture 3 Dan Suciu March, - - PowerPoint PPT Presentation

Algorithm Lower Bound Equivalence Summary Multi-join Query Evaluation on Big Data Lecture 3 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 3 March, 2015 1 / 26 Algorithm Lower Bound Equivalence Summary Multi-join Query


slide-1
SLIDE 1

Algorithm Lower Bound Equivalence Summary

Multi-join Query Evaluation on Big Data Lecture 3

Dan Suciu March, 2015

Dan Suciu Multi-Joins – Lecture 3 March, 2015 1 / 26

slide-2
SLIDE 2

Algorithm Lower Bound Equivalence Summary

Multi-join Query Evaluation – Outline

Part 1 Optimal Sequential Algorithms. Part 2 Lower bounds for Parallel Algorithms. Part 3 Optimal Parallel Algorithms. Part 3 Data Skew.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 2 / 26

slide-3
SLIDE 3

Algorithm Lower Bound Equivalence Summary

Summary so far

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Sequential World Cost: output size of Q Upper bound AGM(Q) = mρ∗. Fractional edge cover. Lower bound (tightness): fractional vertex packing Generic-join algorithm. Parallel World Cost: communication. 1-round, skew-free, equal-cardinalities. Lower bound m/p1/τ ∗. Fractional edge packing Upper bound: fractional vertex cover. HyperCube algorithm.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 3 / 26

slide-4
SLIDE 4

Algorithm Lower Bound Equivalence Summary

Summary so far

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Sequential World Cost: output size of Q Upper bound AGM(Q) = mρ∗. Fractional edge cover. Lower bound (tightness): fractional vertex packing Generic-join algorithm. Parallel World Cost: communication. 1-round, skew-free, equal-cardinalities. Lower bound m/p1/τ ∗. Fractional edge packing Upper bound: fractional vertex cover. HyperCube algorithm.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 3 / 26

slide-5
SLIDE 5

Algorithm Lower Bound Equivalence Summary

Summary so far

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Sequential World Cost: output size of Q Upper bound AGM(Q) = mρ∗. Fractional edge cover. Lower bound (tightness): fractional vertex packing Generic-join algorithm. Parallel World Cost: communication. 1-round, skew-free, equal-cardinalities. Lower bound m/p1/τ ∗. Fractional edge packing Upper bound: fractional vertex cover. HyperCube algorithm.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 3 / 26

slide-6
SLIDE 6

Algorithm Lower Bound Equivalence Summary

Outline of Lecture 3

HyperCube Algorithm for arbitrary cardinalities Lower bound formula for arbitrary cardinalities Prove that they are equal Summary Will consider only databases without skew

Dan Suciu Multi-Joins – Lecture 3 March, 2015 4 / 26

slide-7
SLIDE 7

Algorithm Lower Bound Equivalence Summary

Why Databases without Skew Matter

In practice, skewed values are detected and treated separately; cost should be a function of the degree of skew. Example: join Q(x,y,z) = R(x,y),S(y,z). Without skew: L = m/p. (Common case) With skew, as bad as cartesian product: L ≥ m/p1/2. In general, for any query Q: Without skew: L = m/p1/τ ∗. With skew: L ≥ m/p1/ρ∗ (lecture 2)

Dan Suciu Multi-Joins – Lecture 3 March, 2015 5 / 26

slide-8
SLIDE 8

Algorithm Lower Bound Equivalence Summary

Review of the HyperCube Algorithm

Afrati and Ullman described in EDBT’2010 an algorithm for computing any conjunctive in one MapReduce job. Same as a

  • ne-round algorithm on the MPC model. Later, it was called the

Shares algorithm. Beame, Koutris, and S. analyzed in PODS’2013 and PODS ’2014 the parameters for the algorithm, and called it the HyperCube algorithm. We will use this name.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 6 / 26

slide-9
SLIDE 9

Algorithm Lower Bound Equivalence Summary

Review of the HyperCube Algorithm

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Compute Q on p servers.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 7 / 26

slide-10
SLIDE 10

Algorithm Lower Bound Equivalence Summary

Review of the HyperCube Algorithm

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Compute Q on p servers. Organize the p servers in a hypercube: [p] = [p1] × ⋯ × [pk]. The numbers p1,...,pk are called shares. Choose k independent hash functions h1,...,hk Round 1 Each server sends each tuple Rj(xj1,xj2,...) to all servers whose coordinates j1,j2,... are hj1(xj1),hj2(xj2),... and broadcasts along the missing dimensions. Then, each server computes Q on its local data. Problem: compute the shares p1,...,pk.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 7 / 26

slide-11
SLIDE 11

Algorithm Lower Bound Equivalence Summary

They HyperCube Algorithm – Computing the Shares

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

The Shares-Problem

Find shares p1,...,pk s.t. ∏i pi = p and the load is minimized. Number of tuples that a server receives from Rj is:

mj ∏i∈Rj pi

Dan Suciu Multi-Joins – Lecture 3 March, 2015 8 / 26

slide-12
SLIDE 12

Algorithm Lower Bound Equivalence Summary

They HyperCube Algorithm – Computing the Shares

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

The Shares-Problem

Find shares p1,...,pk s.t. ∏i pi = p and the load is minimized. Number of tuples that a server receives from Rj is:

mj ∏i∈Rj pi

[Afrati&Ullman’10] Optimize L = ∑j

mj ∏i∈Rj pi . Non-linear.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 8 / 26

slide-13
SLIDE 13

Algorithm Lower Bound Equivalence Summary

They HyperCube Algorithm – Computing the Shares

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

The Shares-Problem

Find shares p1,...,pk s.t. ∏i pi = p and the load is minimized. Number of tuples that a server receives from Rj is:

mj ∏i∈Rj pi

[Afrati&Ullman’10] Optimize L = ∑j

mj ∏i∈Rj pi . Non-linear.

[Beame’14] Optimize L = maxj

mj ∏i∈Rj pi :

The Shares Problem: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

Will show that this is equivalent to a linear optimization problem.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 8 / 26

slide-14
SLIDE 14

Algorithm Lower Bound Equivalence Summary

E-Shares: A Linear Optimization Problem

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Optimization problem: find shares p1,...,pℓ such that The Shares Problem The E-Shares Linear Problem Parameter: Value: Shares p1,...,pk Sizes m1,...,mℓ Load L Optimize: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

Dan Suciu Multi-Joins – Lecture 3 March, 2015 9 / 26

slide-15
SLIDE 15

Algorithm Lower Bound Equivalence Summary

E-Shares: A Linear Optimization Problem

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Optimization problem: find shares p1,...,pℓ such that The Shares Problem The E-Shares Linear Problem Parameter: Value: logp Value: Shares p1,...,pk Sizes m1,...,mℓ Load L Optimize: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

Dan Suciu Multi-Joins – Lecture 3 March, 2015 9 / 26

slide-16
SLIDE 16

Algorithm Lower Bound Equivalence Summary

E-Shares: A Linear Optimization Problem

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Optimization problem: find shares p1,...,pℓ such that The Shares Problem The E-Shares Linear Problem Parameter: Value: logp Value: Shares p1,...,pk e1,...,ek Sizes m1,...,mℓ µ1,...,µℓ Load L λ Optimize: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

Dan Suciu Multi-Joins – Lecture 3 March, 2015 9 / 26

slide-17
SLIDE 17

Algorithm Lower Bound Equivalence Summary

E-Shares: A Linear Optimization Problem

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Optimization problem: find shares p1,...,pℓ such that The Shares Problem The E-Shares Linear Problem Parameter: Value: logp Value: Shares p1,...,pk e1,...,ek Sizes m1,...,mℓ µ1,...,µℓ Load L λ Optimize: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

minimize λ −e1 − e2 − ... − ek ≥ −1 ∀j ∶ λ + ∑i∶i∈Rj ei ≥ µj

Dan Suciu Multi-Joins – Lecture 3 March, 2015 9 / 26

slide-18
SLIDE 18

Algorithm Lower Bound Equivalence Summary

E-Shares: A Linear Optimization Problem

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Optimization problem: find shares p1,...,pℓ such that The Shares Problem The E-Shares Linear Problem Parameter: Value: logp Value: Shares p1,...,pk e1,...,ek Sizes m1,...,mℓ µ1,...,µℓ Load L λ Optimize: minimize L p1 ⋅ p2⋯pk ≤ p ∀j ∶ L ≥

mj ∏i∈Rj pi

minimize λ −e1 − e2 − ... − ek ≥ −1 ∀j ∶ λ + ∑i∶i∈Rj ei ≥ µj Optimal shares: p1 = pe∗

1 ,...,pk = pe∗ k

Optimal load: L = pλ∗

Dan Suciu Multi-Joins – Lecture 3 March, 2015 9 / 26

slide-19
SLIDE 19

Algorithm Lower Bound Equivalence Summary

Discussion

For equal-cardinalities, L = m/p1/τ ∗. Speedup given by the optimal fractional edge packing. What is the speedup now? The E-Shares formula L = pλ∗ is not insightful, as λ∗ depends on µ1,...,µℓ. Goal: analyze how L depends on p (speedup) and on the cardinalities m1,...,mℓ.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 10 / 26

slide-20
SLIDE 20

Algorithm Lower Bound Equivalence Summary

Review: Fractional Vertex Cover / Edge Packing

Hypergraph: Q = R1(x1),...,Rℓ(xℓ) Nodes: x1,...,xk, edges: R1,...,Rℓ. Vertex cover= set of nodes such that every edge contains a node in the set.

Definition

A fractional vertex cover of Q is a sequence v1 ≥ 0,...,vk ≥ 0 such that: ∀j ∶ ∑

i∶xi∈Rj

vi ≥ 1 Edge packing= set of edges with no vertex in common.

Definition

A fractional edge packing of Q is a sequence u1 ≥ 0,...,uℓ ≥ 0 such that: ∀i ∶ ∑

j∶xi∈Rj

uj ≤ 1 By duality: minv ∑i vi = maxu ∑j uj = τ ∗

Dan Suciu Multi-Joins – Lecture 3 March, 2015 11 / 26

slide-21
SLIDE 21

Algorithm Lower Bound Equivalence Summary

Review: Fractional Vertex Cover / Edge Packing

Hypergraph: Q = R1(x1),...,Rℓ(xℓ) Nodes: x1,...,xk, edges: R1,...,Rℓ. Vertex cover= set of nodes such that every edge contains a node in the set.

Definition

A fractional vertex cover of Q is a sequence v1 ≥ 0,...,vk ≥ 0 such that: ∀j ∶ ∑

i∶xi∈Rj

vi ≥ 1 Edge packing= set of edges with no vertex in common.

Definition

A fractional edge packing of Q is a sequence u1 ≥ 0,...,uℓ ≥ 0 such that: ∀i ∶ ∑

j∶xi∈Rj

uj ≤ 1 By duality: minv ∑i vi = maxu ∑j uj = τ ∗

Dan Suciu Multi-Joins – Lecture 3 March, 2015 11 / 26

slide-22
SLIDE 22

Algorithm Lower Bound Equivalence Summary

Adding v0

We add a variable v0 to the fractional vertex cover, which becomes the new objective: Fractional Vertex Cover Fractional Vertex Cover with v0 minimize ∑i vi ∀j ∶ ∑i∶xi∈Rj vi ≥ 1 minimize v0 −∑i vi ≥ −v0 ∀j ∶ ∑i∶xi∈Rj vi ≥ 1

Dan Suciu Multi-Joins – Lecture 3 March, 2015 12 / 26

slide-23
SLIDE 23

Algorithm Lower Bound Equivalence Summary

Checking the Equal-Cardinalities Case

When m1 = ... = mk = m, we know L = m/p1/τ ∗. What does E-Shares do? E-Shares LP Fractional Vertex Cover with v0 minimize λ −∑i ei ≥ −1 ∀j ∶ ∑i∶i∈Rj ei ≥ µ − λ minimize v0 −∑i vi ≥ −v0 ∀j ∶ ∑i∶i∈Rj vi ≥ 1 The following are 1-1 inverse mappings between feasible solutions: (λ,e1,...,ek) ↦ (v0 =

1 µ−λ,v1 = e1 µ−λ,...,vk = ek µ−λ)

(λ = µ − 1

v0 ,e1 = v1 v0 ,...,ek = vk v0 )

↤ (v0,v1,...,vk) The optimal load of E-Shares is L = pλ∗ =

pµ p1/v∗

0 = m/p1/τ ∗, same as before. Dan Suciu Multi-Joins – Lecture 3 March, 2015 13 / 26

slide-24
SLIDE 24

Algorithm Lower Bound Equivalence Summary

Goal: Analyze L for Arbitrary Cardinalities

To analyze the load L for arbitrary cardinalities m1,...,mℓ, we first give a lower bound formula. This is a closed formula, based on fractional edge packings. As for the AGM bound, I will first give a simple intuition behind the lower bound, based on cartesian products. Next steps: (1) prove that the formula is a lower bound for any algorithm A: this generalizes the proof from Lecture 2 and is omitted; (2) prove that the formula is equal to E-Shares: we will use duality.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 14 / 26

slide-25
SLIDE 25

Algorithm Lower Bound Equivalence Summary

Optimal Load L for Cartesian Products

Problem: find the optimal load for a cartesian product: Q(x,y) = R(x),S(y), ∣R∣ = m1,∣S∣ = m2 Solution: organize the servers in a matrix [p] = [p1] × [p2] Send R(x) → (h1(x),∗), send S(y) → (∗,h2(y)).

S(y) ↓ 1 h2(y) p2 1 R(x) → h1(x) (x, y) p1

Load: L = m1

p1 + m2 p2 ≥ 2

√m1m2

p

= optimal.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 15 / 26

slide-26
SLIDE 26

Algorithm Lower Bound Equivalence Summary

Optimal Load L for Cartesian Products

Problem: find the optimal load for a cartesian product: Q(x,y) = R(x),S(y), ∣R∣ = m1,∣S∣ = m2 Solution: organize the servers in a matrix [p] = [p1] × [p2] Send R(x) → (h1(x),∗), send S(y) → (∗,h2(y)).

S(y) ↓ 1 h2(y) p2 1 R(x) → h1(x) (x, y) p1

Load: L = m1

p1 + m2 p2 ≥ 2

√m1m2

p

= optimal.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 15 / 26

slide-27
SLIDE 27

Algorithm Lower Bound Equivalence Summary

Optimal Load L for Cartesian Products

Problem: find the optimal load for a cartesian product: Q(x,y) = R(x),S(y), ∣R∣ = m1,∣S∣ = m2 Solution: organize the servers in a matrix [p] = [p1] × [p2] Send R(x) → (h1(x),∗), send S(y) → (∗,h2(y)).

S(y) ↓ 1 h2(y) p2 1 R(x) → h1(x) (x, y) p1

Load: L = m1

p1 + m2 p2 ≥ 2

√m1m2

p

= optimal.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 15 / 26

slide-28
SLIDE 28

Algorithm Lower Bound Equivalence Summary

Side Note: Criticism of the MapReduce Model

Load of cartesian product m/p1/2. Data is replicated p1/2 times. Ullman [ACM Crossroads’12] described the drug interaction problem, using MapReduce: Compute a cartesian product, then apply a UDF to all pairs. In MapReduce, p corresponds to the number of reducers, which the programmer can set at will. Recommendation: use as many reducers as possible. “Obvious” solution: p = m2 reducers, meaning the entire data is replicated m times! Failed dramatically. Lesson: the MapReduce model hides the true number of severs p, which makes it difficult to design and analyze algorithms.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 16 / 26

slide-29
SLIDE 29

Algorithm Lower Bound Equivalence Summary

Optimal Load L for Cartesian Products

Proposition

Any algorithm computing R × S has load ≥ √ m1m2/p.

Proof.

Let L be the load. Any single server can report ≤ L2 pairs (x,y). The p servers can report ≤ pL2 pairs Hence, pL2 ≥ m1m2.

Proposition

Any algorithm for computing R1 × R2 × ⋯ × Rk has load ≥ k (m1m2⋯mk

p

)

1/k

Dan Suciu Multi-Joins – Lecture 3 March, 2015 17 / 26

slide-30
SLIDE 30

Algorithm Lower Bound Equivalence Summary

Optimal Load L for Cartesian Products

Let Q(x) = R1(x1),...,Rℓ(xℓ).

Theorem

Let Rj1,...,Rjk be an edge packing. Any algorithm for Q has load L ≥ k (mj1mj2⋯mjk p )

1/k

Proof.

Any algorithm that computes Q correctly, must send every tuple in Rj1 × ... × Rjk to some server. Suppose not: no server receives a tuple t = (xi1,xi2,...,xik). Modify the other relations by adding new tuples, to make t part of a query answer: but the algorithm fails to find it.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 18 / 26

slide-31
SLIDE 31

Algorithm Lower Bound Equivalence Summary

The Load of the HyperCube Algorithm

Denote L(u) = (mu1

1 ⋯muℓ ℓ

p )

1 u0

where u0 = u1 + ... + uℓ Llower = max

u

L(u)

Theorem (Beame’14)

1 Let u be any fractional edge packing. Any algorithm computing Q

has a load: Ω(L(u)), even on databases of partial matchingsa.

2 The optimal load of HyperCube algorithm is Llower. Hence, optimal. aSubset Rj ⊆ [n]r of size mj that is a matching up to renaming of values.

The proof of (1) generalizes our previous proof for matchings (omitted). We will prove (2) today, but first, let’s discuss the formula. Problem: prove that argmaxuL(u) is a vertex of the edge packing polytope.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 19 / 26

slide-32
SLIDE 32

Algorithm Lower Bound Equivalence Summary

The Load of the HyperCube Algorithm

Denote L(u) = (mu1

1 ⋯muℓ ℓ

p )

1 u0

where u0 = u1 + ... + uℓ Llower = max

u

L(u)

Theorem (Beame’14)

1 Let u be any fractional edge packing. Any algorithm computing Q

has a load: Ω(L(u)), even on databases of partial matchingsa.

2 The optimal load of HyperCube algorithm is Llower. Hence, optimal. aSubset Rj ⊆ [n]r of size mj that is a matching up to renaming of values.

The proof of (1) generalizes our previous proof for matchings (omitted). We will prove (2) today, but first, let’s discuss the formula. Problem: prove that argmaxuL(u) is a vertex of the edge packing polytope.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 19 / 26

slide-33
SLIDE 33

Algorithm Lower Bound Equivalence Summary

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x) L(u) = (mu1

1 mu2 2 mu3 3 /p)1/(u1+u2+u3):

1 Dan Suciu Multi-Joins – Lecture 3 March, 2015 20 / 26

slide-34
SLIDE 34

Algorithm Lower Bound Equivalence Summary

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x) L(u) = (mu1

1 mu2 2 mu3 3 /p)1/(u1+u2+u3):

u L(u) (1/2,1/2,1/2) (m1m2m3)1/3/p2/3 L = maxu Llower(u)

1 Dan Suciu Multi-Joins – Lecture 3 March, 2015 20 / 26

slide-35
SLIDE 35

Algorithm Lower Bound Equivalence Summary

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x) L(u) = (mu1

1 mu2 2 mu3 3 /p)1/(u1+u2+u3):

u L(u) (1/2,1/2,1/2) (m1m2m3)1/3/p2/3 (1,0,0) m1/p L = maxu Llower(u)

1 Dan Suciu Multi-Joins – Lecture 3 March, 2015 20 / 26

slide-36
SLIDE 36

Algorithm Lower Bound Equivalence Summary

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x) L(u) = (mu1

1 mu2 2 mu3 3 /p)1/(u1+u2+u3):

u L(u) (1/2,1/2,1/2) (m1m2m3)1/3/p2/3 (1,0,0) m1/p (0,1,0) m2/p (0,0,1) m3/p (0,0,0) 0 (why1?) L = maxu Llower(u)

1L(u) ≤ max(m1, m2, m3)/p1/(u1+u2+u3) → 0 when u1 + u2 + u3 → 0. Dan Suciu Multi-Joins – Lecture 3 March, 2015 20 / 26

slide-37
SLIDE 37

Algorithm Lower Bound Equivalence Summary

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x) L(u) = (mu1

1 mu2 2 mu3 3 /p)1/(u1+u2+u3):

u L(u) (1/2,1/2,1/2) (m1m2m3)1/3/p2/3 (1,0,0) m1/p (0,1,0) m2/p (0,0,1) m3/p (0,0,0) 0 (why1?) L = maxu Llower(u) Suppose m1 < m2 = m3, then L = max(m3/p,(m1m2m3)1/3/p2/3): p ≤ m3/m1 Then Llower = m3/p, linear speedup. HyperCube: compute S ⋈ T, broadcast R. p > m3/m1 Then Llower = (m1m2m3)1/3/p2/3, speedup decreased to 1/p2/3.

1L(u) ≤ max(m1, m2, m3)/p1/(u1+u2+u3) → 0 when u1 + u2 + u3 → 0. Dan Suciu Multi-Joins – Lecture 3 March, 2015 20 / 26

slide-38
SLIDE 38

Algorithm Lower Bound Equivalence Summary

Properties of Hypercube – Informal

HyperCube takes advantage of un-equal cardinalities m1,...,mℓ: allocate fewer shares to the smalle realtions, or even broadcast them. When p increases, the utility of the smaller relations diminishes: the speedup decreases, eventually reaching 1/pτ ∗.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 21 / 26

slide-39
SLIDE 39

Algorithm Lower Bound Equivalence Summary

Properties of Hypercube – Formal

L(u) = (

mu1

1 ⋯m uℓ ℓ

p

)

1 u0

u0 = ∑i ui Llower = maxu L(u) u∗ = argmaxuL(u) Speedup is 1/p1/u∗

0 . Dan Suciu Multi-Joins – Lecture 3 March, 2015 22 / 26

slide-40
SLIDE 40

Algorithm Lower Bound Equivalence Summary

Properties of Hypercube – Formal

L(u) = (

mu1

1 ⋯m uℓ ℓ

p

)

1 u0

u0 = ∑i ui Llower = maxu L(u) u∗ = argmaxuL(u) Speedup is 1/p1/u∗

0 .

As p increases, u∗

0 increases too. (Proof: if u0 < u′

0, then L(u′) − L(u) = a/p1/u′

0 − b/p1/u0 strictly increasing in p.)

Meaning: speedup deteriorates, eventually reaching 1/p1/τ ∗.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 22 / 26

slide-41
SLIDE 41

Algorithm Lower Bound Equivalence Summary

Properties of Hypercube – Formal

L(u) = (

mu1

1 ⋯m uℓ ℓ

p

)

1 u0

u0 = ∑i ui Llower = maxu L(u) u∗ = argmaxuL(u) Speedup is 1/p1/u∗

0 .

As p increases, u∗

0 increases too. (Proof: if u0 < u′

0, then L(u′) − L(u) = a/p1/u′

0 − b/p1/u0 strictly increasing in p.)

Meaning: speedup deteriorates, eventually reaching 1/p1/τ ∗. If mj < Llower, then u∗

j = 0. (Proof. L(uj) = exp((auj + b)/(cuj + d)), with a, b, c, d > 0; Since Llower = L(u∗

j ) > limuj →∞ L(uj) = mj, it is strictly decreasing on (0, ∞).)

Meaning: relations smaller than the optimal load are broadcast.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 22 / 26

slide-42
SLIDE 42

Algorithm Lower Bound Equivalence Summary

Properties of Hypercube – Formal

L(u) = (

mu1

1 ⋯m uℓ ℓ

p

)

1 u0

u0 = ∑i ui Llower = maxu L(u) u∗ = argmaxuL(u) Speedup is 1/p1/u∗

0 .

As p increases, u∗

0 increases too. (Proof: if u0 < u′

0, then L(u′) − L(u) = a/p1/u′

0 − b/p1/u0 strictly increasing in p.)

Meaning: speedup deteriorates, eventually reaching 1/p1/τ ∗. If mj < Llower, then u∗

j = 0. (Proof. L(uj) = exp((auj + b)/(cuj + d)), with a, b, c, d > 0; Since Llower = L(u∗

j ) > limuj →∞ L(uj) = mj, it is strictly decreasing on (0, ∞).)

Meaning: relations smaller than the optimal load are broadcast. Let mj0 = maxj mj. If mj < mj0/p, then u∗

j = 0 (since mj0/p ≤ Llower).

Meaning: relations less than 1/p of some other relation are broadcast.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 22 / 26

slide-43
SLIDE 43

Algorithm Lower Bound Equivalence Summary

Proof of Equivalence

Q(x) = R1(x1),...,Rℓ(xℓ) ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Recall: E-Shares: minimize λ −∑i ei ≥ −1 ∀j ∶ λ + ∑i∶i∈Rj ei ≥ µj Where µ1 = logp m1,...,µℓ = logp µℓ. The shares and load of HyperCube: p1 = pe∗

1 ,...,pk = pe∗ k

L = pλ∗

Dan Suciu Multi-Joins – Lecture 3 March, 2015 23 / 26

slide-44
SLIDE 44

Algorithm Lower Bound Equivalence Summary

Proof of Equivalence

Llower = max

u

(mu1

1 ⋯muℓ ℓ

p )

1 u0

u0 = ∑

j

uj Apply logp, and change to ∑j uj ≤ u0 w.l.o.g. (why?): Log-Llower: maximize (µ1u1 + ... + µℓuℓ − 1)/u0 u1 + ... + uℓ ≤ u0 ∀i ∶ ∑j∶i∈Rj uj ≤ 1 Where µ1 = logp m1,...,µℓ = logp µℓ. This is not a linear program.

Dan Suciu Multi-Joins – Lecture 3 March, 2015 24 / 26

slide-45
SLIDE 45

Algorithm Lower Bound Equivalence Summary

Proof of Equivalence

E-Shares Dual E-Shares Log-Llower

min λ − ∑i ei ≥ −1 ∀j ∶ λ + ∑i∶i∈Rj ei ≥ µj max µ1f1 + . . . + µℓfℓ − f0 f1 + . . . + fℓ ≤ 1 ∀i ∶ ∑j∶i∈Rj fj ≤ f0 max (µ1u1 + . . . + µℓuℓ − 1)/u0 u1 + . . . + uℓ ≤ u0 ∀i ∶ ∑j∶i∈Rj uj ≤ 1

E-Shares and Dual E-Shares are duals: optimal loads are equal. Dual E-Shares and Log-Llower have the same optimal load, by the following mapping between feasible solutions: (f0,f1,...,fℓ) ↦ (u0 = 1/f0,u1 = f1/f0,...,uℓ = fℓ/f0) (f0 = 1/u0,f1 = u1/u0,...,fℓ = uℓ/u0) ↤ (u1,u2,...,uℓ,u0)

Dan Suciu Multi-Joins – Lecture 3 March, 2015 25 / 26

slide-46
SLIDE 46

Algorithm Lower Bound Equivalence Summary

Summary of Lecture 3

The roles of Primal/Dual:

▸ The primal LP describes the HyperCube parallel algorithm;

Fractional Vertex Cover.

▸ The dual LP describes the Lower Bound for Parallel Evaluation;

Fractional Edge Packing.

AGM bound versus Lower Bound for parallel evaluation:

▸ Vertex packing / Edge cover ρ∗ versus Vertex cover / Edge packing τ ∗. ▸ General (skewed) databases versus skew-free databases. ▸ Both formulas generalize simple observations for cartesian products.

Skewed data: will discuss in Lecture 4 (mostly open problems) Multi rounds: will discuss briefly in lecture 4 (almost entirely open).

Dan Suciu Multi-Joins – Lecture 3 March, 2015 26 / 26

slide-47
SLIDE 47

Algorithm Lower Bound Equivalence Summary

Summary of Lecture 3

The roles of Primal/Dual:

▸ The primal LP describes the HyperCube parallel algorithm;

Fractional Vertex Cover.

▸ The dual LP describes the Lower Bound for Parallel Evaluation;

Fractional Edge Packing.

AGM bound versus Lower Bound for parallel evaluation:

▸ Vertex packing / Edge cover ρ∗ versus Vertex cover / Edge packing τ ∗. ▸ General (skewed) databases versus skew-free databases. ▸ Both formulas generalize simple observations for cartesian products.

Skewed data: will discuss in Lecture 4 (mostly open problems) Multi rounds: will discuss briefly in lecture 4 (almost entirely open).

Dan Suciu Multi-Joins – Lecture 3 March, 2015 26 / 26

slide-48
SLIDE 48

Algorithm Lower Bound Equivalence Summary

Summary of Lecture 3

The roles of Primal/Dual:

▸ The primal LP describes the HyperCube parallel algorithm;

Fractional Vertex Cover.

▸ The dual LP describes the Lower Bound for Parallel Evaluation;

Fractional Edge Packing.

AGM bound versus Lower Bound for parallel evaluation:

▸ Vertex packing / Edge cover ρ∗ versus Vertex cover / Edge packing τ ∗. ▸ General (skewed) databases versus skew-free databases. ▸ Both formulas generalize simple observations for cartesian products.

Skewed data: will discuss in Lecture 4 (mostly open problems) Multi rounds: will discuss briefly in lecture 4 (almost entirely open).

Dan Suciu Multi-Joins – Lecture 3 March, 2015 26 / 26

slide-49
SLIDE 49

Algorithm Lower Bound Equivalence Summary

Summary of Lecture 3

The roles of Primal/Dual:

▸ The primal LP describes the HyperCube parallel algorithm;

Fractional Vertex Cover.

▸ The dual LP describes the Lower Bound for Parallel Evaluation;

Fractional Edge Packing.

AGM bound versus Lower Bound for parallel evaluation:

▸ Vertex packing / Edge cover ρ∗ versus Vertex cover / Edge packing τ ∗. ▸ General (skewed) databases versus skew-free databases. ▸ Both formulas generalize simple observations for cartesian products.

Skewed data: will discuss in Lecture 4 (mostly open problems) Multi rounds: will discuss briefly in lecture 4 (almost entirely open).

Dan Suciu Multi-Joins – Lecture 3 March, 2015 26 / 26