Massively Parallel Communication and Query Evaluation
Paul Beame
- U. of Washington
Based on joint work with Paraschos Koutris and Dan Suciu [PODS 13], [PODS 14]
1
Massively Parallel Communication and Query Evaluation Paul Beame - - PowerPoint PPT Presentation
Massively Parallel Communication and Query Evaluation Paul Beame U. of Washington Based on joint work with Paraschos Koutris and Dan Suciu [PODS 13], [PODS 14] 1 Massively Parallel Systems 2 MapReduce [Dean,Ghemawat 2004] Rounds of Map:
Based on joint work with Paraschos Koutris and Dan Suciu [PODS 13], [PODS 14]
1
2
3
– 1 round can simulate data streams on symmetric functions, using Savitch- like small space simulation – Exact computation of frequency moments in 2 rounds of MapReduce
– For n1-ε processors and n1-ε storage per processor, O(t) rounds can simulate t PRAM steps so O(logkn) rounds can simulate NCk – Minimum spanning trees and connectivity on dense graphs in 2 rounds of MapReduce – Generalization of parameters, sharper simulations, sorting and computational geometry applications [Goodrich, Sitchinava, Zhang 2011]
4
– Upper bounds for database join queries [Afrati,Ullman 2010] – Upper and lower bounds for finding triangles, matrix multiplication, finding neighboring strings [Afrati, Sarma, Salihoglu, Ullman 2012]
5
6
7
8
Server 1 Server p . . . . Server 1 Server p . . . . Server 1 Server p . . . . Input (size=N) . . . . Step 1 Step 2 Step 3
10
11
[WZ 12], [PVZ12], WZ13], [BEOPV13],...
communication bound possible is ≤ N
12
– Since N/p ≤ L ≤ N, the range of variation in L is a factor
pε for 0≤ε≤1
– still interesting theoretical/practical questions – many open questions
– e.g. PointerJumping, i.e., st-connectivity in out-degree 1 graphs.
13
14
15
16
17
exactly once in each coordinate.
18
X Y a1 b3 a2 b1 a3 b2 Z X c1 a2 c2 a1 c3 a3 R1= R2= R3= Y Z b1 c2 b2 c3 b3 c1 X Y Z a3 b2 c3 C3 = Algorithm 1: For each server 1 ≤ u ≤ p: Input: n/p tuples from each of R1, R2, R3 Step 1: send R1(x,y) to server (y mod p) send R2(y,z) to server (y mod p) Step 2: join R1(x,y) with R2(y,z) send [R1(x,y),R2(y,z)] to server (z mod p) send R3(z,x) to server (z mod p) Output join [R1(x,y),R2(y,z)] with R3(z,x’)
Load: O(n/p) tuples (i.e. ε=0) Number of rounds: r = 2
19
X Y a1 b3 a2 b1 a3 b2 Z X c1 a2 c2 a1 c3 a3 R1= R2= R3= Y Z b1 c2 b2 c3 b3 c1 X Y Z a3 b2 c3 C3 = Algorithm 2: Servers form a cube: [p] ≅ [p1/3] × [p1/3] × [p1/3] For each server 1 ≤ u ≤ p: Step 1: Choose random hash functions h1,h2,h3 send R1(x,y) to servers (h1(x) mod p1/3, h2(y) mod p1/3, *) send R2(y,z) to servers (*, h2(y) mod p1/3, h3(z) mod p1/3) send R3(z,x) to servers (h1(x) mod p1/3, *, h3(z) mod p1/3) Output all triangles R1(x,y), R2(y,z), R3(z,x) Load: O(n/p × p1/3) tuples (ε = 1/3) Number of rounds: r = 1 [Ganguly’92, Afrati’10]
20 i j k i j k (i,j,k)
p1/3
Load: O(n/p × p1/3) tuples (ε = 1/3) Number of rounds: r = 1
21
22
τ*(Ck)=k/2
23
24
25
26
Step 1 Step 2 Step 3 Server 1 Server p . . . . Server 1 Server p . . . . R1 R2 . . . . Rk . . . .
27
Server u R1 R2 . . . . Rk
28
msg1 msg2 msgk
msg = (msg1,…,msgk) I =(R1,…, Rk)
number τ*(Q) to obtain the bound
29
30
Given: (Query) hypergraph " with hyper-edges #$ ⊆ & for all ' ∈ [ℓ] For all + ∈ , - write +$ for the projection of + on coordinates of #$ Variables .
/(+$) for each +$ ∈ , #$
Then for any fractional edge cover 0 = 23, 25, … , 2ℓ of "
7 8 .
/ +$ ≤ 8
7 .
/ +$ 3 9: +$∈ ; 9: ℓ /<3 ℓ /<3 +∈ ;
Apply with .
/ +$ = Probability (over the input distribution) that
processor learns that +$ ∈ =$ LHS = Expected number of answer tuples processor learns RHS = Product of independent quantities based on what a processor learns about each relation
31
32
>B @ , >C @ , >B>C @
>B @ , >C @ , >H @ , >B>C @
>C>H @
>B>H @
>B>C>H
B H
@
C H
33
Step 1 Step 2 Step 3 Server 1 Server p . . . . Server 1 Server p . . . . R1 R2 . . . . Rk . . . .
34
35
Step 1 Step 2 Step 3 Server 1 Server p . . . . Server 1 Server p . . . . R1 R2 . . . . Rk . . . .
36
① Join tuple: [R1(a,b)R2(b,c)] ② Or known join tuple containing it
37
Proof ideas: Inductively eliminate 2nd round until algorithm reduced to 1 round:
round that are larger than kδ - give away full extension of each to an answer for Q (reduces n)
38
39
40
41
42
43
n/(pβ+1)
– All lower bounds known depend on the need to produce multiple outputs
– Intermediate results may be very large even when few answers
– Some depth 2 circuit lower bounds are known for arbitrary gates
44
45
46
47