Tutorial: Message Passing Communication Model
David Woodruff IBM Almaden
Communication Model David Woodruff IBM Almaden k-party - - PowerPoint PPT Presentation
Tutorial: Message Passing Communication Model David Woodruff IBM Almaden k-party Number-In-Hand Model P 1 x 1 - Point-to-point P k P 2 communication x 2 x k - Protocol transcript P 3 x 3 determines who speaks next P 4 x 4 Goals: -
David Woodruff IBM Almaden
P1 P2 P3 Pk P4
x1 x2 x3 x4
communication
determines who speaks next
xk
C P1 P2 P3 Pk …
Convenient to introduce a “coordinator” C who may
All communication goes through the coordinator Communication only affected by a factor of 2 (plus one word per message)
x1 x2 x3 xk
– For speed – Just doesn’t fit on one device
– Communication very power-intensive – Bandwidth limitations
– Continuously monitor a statistic of distributed data – Don’t want to keep sending all data to one place
function f:
sum of all individual message lengths, maximized over all inputs and random coins
every set of inputs, fails in computing f with probability < 1/3
C P1 P2 P3 Pk …
x1 x2 x3
Some well-studied problems
p)1/p
distinct values (focus of this talk)
xi
S µ [n]
T µ [n]
What is the randomized communication cost as a function of k, ε, and n? Note that understanding the dependence on ε is critical, e.g., ε < .01
distinct elements using O(1/ ε2 + log n) bits of space
– Start with a simpler problem GAP- THRESHOLD
– Player Pi holds a bit Zi – Zi are i.i.d. Bernoulli(1/2) – Decide if i=1
k Zi > k/2 + k1/2 or i=1 k Zi < k/2 - k1/2
Otherwise don’t care (distributional problem)
then so do (x,b) and (a,y)
Pr[seeing a transcript τ given inputs a,b] = p a, τ ⋅ q b, τ
Z1, Z2, …, Zk are independent conditioned on ¿
principle
a combinatorial rectangle: S = S1 x S2 x … x Sk where Si in {0,1}
in S, they are still independent!
C P1 P2 P3 Pk …
Z1 Z2 Z3 Zk
i=1
k Zi > k/2 + k1/2 or i=1 k Zi < k/2 - k1/2
Zi to be 0 or 1, and the remaining Zi to be Bernoulli(1/2)
|E[ i=1
k Zi | ¿ ] – k/2 | < 100 k1/2
k Zi | ¿] < k for any ¿, by
Chebyshev’s inequality, w.p.r. > 1/2, | i=1
k Zi – k/2| > 50k1/2
contradicting concentration
communication is o(k), then w.pr. 1-o(1), over the transcript ¿, for a 1-o(1) fraction of the indices i, Zi | ¿ is Bernoulli(1/2)
W.pr. .001, over the Zi | ¿ E[ i=1
k Zi | ¿] - i=1 k Zi | ¿ > 100 k1/2
W.pr. .001, over the Zi | ¿ i=1
k Zi | ¿ - E[ i=1 k Zi | ¿] > 100 k1/2
i=1
k Zi > βk + (β k)1/2 or i=1 k Zi < βk – (βk)1/2
generalizes: any successful protocol must satisfy: Pr¿ [for 1-o(1) fraction of indices i, H(Zi | ¿) = o(1)] > 2/3
C P1 P2 P3 Pk …
T3 T2 T1 Tk
give Pi a random set Ti so that DISJ(S,Ti) = 0
k DISJ(S,Ti) > k/2 + k1/2 or i=1 k DISJ(S, Ti)< k/2 - k1/2 ?
k Zi > k/2 + k1/2 or i=1 k Zi < k/2 - k1/2
DISJ
S
Pr¿ [for 1-o(1) fraction of indices i, H(Zi | ¿) = o(1)] > 2/3
coordinator C and Pi
Pj is sent, for j != i, can be simulated locally!
probability, a contradiction!
C Pi Pk …
T3 T2 T1 S
i=1
k Zi > βk + (β k)1/2 or i=1 k Zi < βk – (βk)1/2
Set probability β of intersection to be 1/(4kε2)
C P1 P2 P3 Pk …
T3 T2 T1 Tk
DISJ
S
– If DISJ(S,Ti) = 1 and DISJ(S,Tj) = 1, they typically add different items
k Zi
C P1 P2 P3 Pk …
T3 T2 T1 Tk
DISJ
S
– This time to a 2-player Equality problem (details omitted)
– Each player has a column of A – Is the number of rows with at least one 1 larger than n/2?
probability at least 2/3 (kn) lower bound for connectivity and bipartiteness without edge duplications
elements
invertible
1 ∩ T2 ∩ ⋯ ∩ Tk = ∅?
– random half of the items appear in all sets except a random one – random half the items independently occur in each Ti – with probability 1/2, make a random item occur in each Ti
C P1 P2 P3 Pk …
T3 T2 T1 Tk
can't easily communicate this to the players
but can't easily communicate this to the coordinator
1 1 1 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1 ... Random rows Each player has a column
communication in topologies other than star topology – Obtain bounds that depend on 1-median of the network
– Only players at a subset of nodes have an input – Communication cost depends on Steiner tree cost
communication via the distinct elements problem
– Statistical problems (lp norms) – Graph problems – Linear algebra problems
– Rounds vs. communication – Connections to other models, e.g., MapReduce – Topology-sensitive problems