- Amit Chakrabarti Dartmouth College
- Graham Cormode AT&T Labs
- Andrew McGregor UC San Diego
Robust Lower Bounds for Communication and Stream Computation Amit - - PowerPoint PPT Presentation
Robust Lower Bounds for Communication and Stream Computation Amit - - PowerPoint PPT Presentation
Robust Lower Bounds for Communication and Stream Computation Amit Chakrabarti Dartmouth College Graham Cormode AT&T Labs Andrew McGregor UC San Diego Communication Complexity Communication Complexity
Communication Complexity
- Goal: Evaluate f(x1, ... , xn) when input is split among p players:
- How much communication is required to evaluate f?
- Consider randomized, blackboard, one-way, multi-round, ...
x1 ... x10 x11 ... x20 x21 ... x30
Communication Complexity
- Goal: Evaluate f(x1, ... , xn) when input is split among p players:
- How much communication is required to evaluate f?
- Consider randomized, blackboard, one-way, multi-round, ...
x1 ... x10 x11 ... x20 x21 ... x30
Communication Complexity
- Goal: Evaluate f(x1, ... , xn) when input is split among p players:
- How much communication is required to evaluate f?
- Consider randomized, blackboard, one-way, multi-round, ...
- How important is the split?
- Is f hard for many splits or only hard for a few bad splits?
- Previous work on worst and best partitions.
- [Aho, Ullman, Yannakakis ’83] [Papadimitriou, Sipser ’84]
x1 ... x10 x11 ... x20 x21 ... x30
Communication Complexity
- Goal: Evaluate f(x1, ... , xn) when input is split among p players:
- How much communication is required to evaluate f?
- Consider randomized, blackboard, one-way, multi-round, ...
- How important is the split?
- Is f hard for many splits or only hard for a few bad splits?
- Previous work on worst and best partitions.
- [Aho, Ullman, Yannakakis ’83] [Papadimitriou, Sipser ’84]
- Consider random partitions:
- Define error probability over coin flips and random split.
x1 ... x10 x11 ... x20 x21 ... x30
Communication Complexity
- Goal: Evaluate f(x1, ... , xn) given sequential access:
x1 x2 x3 x4 x5 ... ... xn
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]
Stream Computation
- Goal: Evaluate f(x1, ... , xn) given sequential access:
x1 x2 x3 x4 x5 ... ... xn
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]
Stream Computation
- Goal: Evaluate f(x1, ... , xn) given sequential access:
- How much working memory is required to evaluate f?
- Consider randomized, approximate, multi-pass, etc.
x1 x2 x3 x4 x5 ... ... xn
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]
Stream Computation
- Goal: Evaluate f(x1, ... , xn) given sequential access:
- How much working memory is required to evaluate f?
- Consider randomized, approximate, multi-pass, etc.
- Random-order streams: Assume f is order-invariant:
- Upper Bounds: e.g., stream of i.i.d. samples.
- Lower Bounds: is a “hard” problem hard in practice?
- [Munro, Paterson ’78] [Demaine, López-Ortiz, Munro ’02]
[Guha, McGregor ’06, ’07a, ’07b] [Chakrabarti, Jayram, Patrascu ’08]
x1 x2 x3 x4 x5 ... ... xn
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]
Stream Computation
- Goal: Evaluate f(x1, ... , xn) given sequential access:
- How much working memory is required to evaluate f?
- Consider randomized, approximate, multi-pass, etc.
- Random-order streams: Assume f is order-invariant:
- Upper Bounds: e.g., stream of i.i.d. samples.
- Lower Bounds: is a “hard” problem hard in practice?
- [Munro, Paterson ’78] [Demaine, López-Ortiz, Munro ’02]
[Guha, McGregor ’06, ’07a, ’07b] [Chakrabarti, Jayram, Patrascu ’08]
- Random-partition-CC bounds give random-order bounds
x1 x2 x3 x4 x5 ... ... xn
[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]
Stream Computation
Results
- t-party Set-Disjointess: Any protocol for (t2)-player random-
partition requires (n/t) bits communication. ∴ 2-approx. for kth freq. moments requires (n1-3/k) space.
- Median: Any p-round protocol for p-player random-
partition requires (mf(p)) where f(p)=1/3p ∴ Polylog(m)-space algorithm requires (log log m) passes.
- Gap-Hamming: Any one-way protocol for 2-player random-
partition requires (n) bits communicated. ∴ (1+)-approx. for F0 or entropy requires (-2) space.
- Index: Any one-way protocol for 2-player random-partition
(with duplicates) requires (n) bits communicated. ∴ Connectivity of a graph G=(V, E) requires (|V|) space.
5 21 2 6 23
...
8 24 24 1 8
...
25 25 0 ...
The Challenge...
- Naive reduction from fixed-partition-CC:
- 1. Players determine random partition, send necessary data.
- 2. Simulate protocol on random partition.
5 21 2 6 23
...
8 24 24 1 8
...
25 25 0 ...
The Challenge...
- Naive reduction from fixed-partition-CC:
- 1. Players determine random partition, send necessary data.
- 2. Simulate protocol on random partition.
5 21 2 6 23
...
8 24 24 1 8
...
25 25
... The Challenge...
- Naive reduction from fixed-partition-CC:
- 1. Players determine random partition, send necessary data.
- 2. Simulate protocol on random partition.
- Problem: Seems to require too much communication.
5 21 2 6 23
...
8 24 24 1 8
...
25 25
... The Challenge...
- Naive reduction from fixed-partition-CC:
- 1. Players determine random partition, send necessary data.
- 2. Simulate protocol on random partition.
- Problem: Seems to require too much communication.
- Consider random input and public coins:
- Issue #1: Need independence of input and partition.
- Issue #2: Generalize information statistics techniques.
5 21 2 6 23
...
8 24 24 1 8
...
25 25
... The Challenge...
a) Disjointness b) Selection a) Disjointness b) Selection
a) Disjointness b) Selection
Multi-Party Set-Disjointness
- Instance: t x n matrix,
- and define,
X = 1 1 1 1 1 1 1 1 1 1 1
DISJn,t =
i ANDt(x1,i, ... , xt,i)
Multi-Party Set-Disjointness
- Instance: t x n matrix,
- and define,
- Unique intersection: Each column has weight 0, 1, or t and at
most one column has weight t.
X = 1 1 1 1 1 1 1 1 1 1 1
DISJn,t =
i ANDt(x1,i, ... , xt,i)
Multi-Party Set-Disjointness
- Instance: t x n matrix,
- and define,
- Unique intersection: Each column has weight 0, 1, or t and at
most one column has weight t.
- Thm: (n/t) bound if t-players each get a row.
- [Kalyanasundaram, Schnitger ’92] [Razborov ’92]
- [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
X = 1 1 1 1 1 1 1 1 1 1 1
DISJn,t =
i ANDt(x1,i, ... , xt,i)
Multi-Party Set-Disjointness
- Instance: t x n matrix,
- and define,
- Unique intersection: Each column has weight 0, 1, or t and at
most one column has weight t.
- Thm: (n/t) bound if t-players each get a row.
- [Kalyanasundaram, Schnitger ’92] [Razborov ’92]
- [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- Thm: (n/t) bound for random partition for (t2) players.
X = 1 1 1 1 1 1 1 1 1 1 1
DISJn,t =
i ANDt(x1,i, ... , xt,i)
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
- Information Cost: icost()= I(X: (X))
- Lower bound on the length of the protocol
- Amenable to direct-sum results...
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
- Information Cost: icost()= I(X: (X))
- Lower bound on the length of the protocol
- Amenable to direct-sum results...
Step 1:
icost(Π)≥
jI(X j :Π(X))
where X j is j th column
- f matrix X
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
- Information Cost: icost()= I(X: (X))
- Lower bound on the length of the protocol
- Amenable to direct-sum results...
Step 1:
icost(Π)≥
jI(X j :Π(X))
where X j is j th column
- f matrix X
Step 2:
I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
- Information Cost: icost()= I(X: (X))
- Lower bound on the length of the protocol
- Amenable to direct-sum results...
Step 1:
icost(Π)≥
jI(X j :Π(X))
where X j is j th column
- f matrix X
Step 2:
I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt
Step 3:
icost(Π′) ≥ Ω(1/t) assuming ’ is private-coin,
- ne-way protocol
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
- (X) is transcript of -error protocol on random input X~µ.
- Information Cost: icost()= I(X: (X))
- Lower bound on the length of the protocol
- Amenable to direct-sum results...
- Necessary Generalization:
- Step 1: Condition “icost” on public coins.
- Step 2: Error of ’ is best +Birthday(t,p) error protocol.
Step 3: Generalize result for public-coin protocols.
Step 1:
icost(Π)≥
jI(X j :Π(X))
where X j is j th column
- f matrix X
Step 2:
I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt
Step 3:
icost(Π′) ≥ Ω(1/t) assuming ’ is private-coin,
- ne-way protocol
Generalize Information Statistics Approach...
- [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
Frequency Moments
- Define: Fk(S) =
i(freq. of i)k
Frequency Moments
- Define:
- Reduction from set-disjointness:
[Alon, Matias, Szegedy ’99]
S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0
Fk(S) =
i(freq. of i)k
Frequency Moments
- Define:
- Reduction from set-disjointness:
[Alon, Matias, Szegedy ’99]
- Thm: (n1-3/k) space bound for random order streams.
- Proof: Set tk=2n to prove (n1-1/k) total communication
- Per-message communication is (n1-1/k/p)= (n1-3/k)
S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0
Fk(S) =
i(freq. of i)k
Frequency Moments
- Define:
- Reduction from set-disjointness:
[Alon, Matias, Szegedy ’99]
- Thm: (n1-3/k) space bound for random order streams.
- Proof: Set tk=2n to prove (n1-1/k) total communication
- Per-message communication is (n1-1/k/p)= (n1-3/k)
- Open Problem: (n1-2/k) bound for random order?
S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0
Fk(S) =
i(freq. of i)k
Frequency Moments
a) Disjointness b) Selection
Selection in Streams
- Find median of stream of m values in polylog(m) space.
Selection in Streams
- Find median of stream of m values in polylog(m) space.
- Thm: For adversarial-order stream, (lg m / lg lg m) pass
- [Munro, Paterson ’78] [Guha, McGregor ’07a]
Selection in Streams
- Find median of stream of m values in polylog(m) space.
- Thm: For adversarial-order stream, (lg m / lg lg m) pass
- [Munro, Paterson ’78] [Guha, McGregor ’07a]
- Thm: For random-order stream, (lg lg m) pass
- [Guha, McGregor ’06] [Chakrabarti, Jayram, Patrascu ’08]
Selection in Streams
- Find median of stream of m values in polylog(m) space.
- Thm: For adversarial-order stream, (lg m / lg lg m) pass
- [Munro, Paterson ’78] [Guha, McGregor ’07a]
- Thm: For random-order stream, (lg lg m) pass
- [Guha, McGregor ’06] [Chakrabarti, Jayram, Patrascu ’08]
- Our result: Using random-partition-CC techniques we get
simpler and tighter pass/space trade-offs...
Selection in Streams
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
- Thm: With p-players, if ith player knows f(v) when level(v)=i:
Any p-round protocol requires (t) communication.
- Tree Pointer Jumping (TPJ)...
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
- Thm: With p-players, if ith player knows f(v) when level(v)=i:
Any p-round protocol requires (t) communication.
f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3
- Tree Pointer Jumping (TPJ)...
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3
- Tree Pointer Jumping (TPJ)...
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3
- Tree Pointer Jumping (TPJ)...
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3
- Tree Pointer Jumping (TPJ)...
- Instance: Function on nodes of (p+1)-level, t-ary tree,
if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}
- Goal: Compute f(f(... f(vroot)....)).
- Thm: With p-players, if ith player knows f(v) when level(v)=i:
Any p-round protocol requires (t) communication.
f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3
- Tree Pointer Jumping (TPJ)...
Reduction from TPJ to Median...
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
Reduction from TPJ to Median...
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
2 5 7 11 13 15 19 21 22 3 4 6 10 12 14 18 20 23 25
Reduction from TPJ to Median...
1 8 9 16 17 24
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
2 5 7 11 13 15 19 21 22 3 4 6 10 12 14 18 20 23 25
Reduction from TPJ to Median...
1 8 9 16 17 24
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23 9 16
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23 9 9
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23 24 24 1 8 9 9
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23 24 24 1 8 9 9 25
25
25 25 25 25 0 0 0 0 0
- With each node v associate two values (v) < (v) such
that (v) < (u) < (u) < (v) for any descendent u of v.
- For each node: Generate multiple copies of (v) and (v)
such that median of values corresponds to TPJ solution.
- Relationship between t and # copies determines bound.
Reduction from TPJ to Median...
5 10 13 21 2 6 18 14 23 24 24 1 8 9 9 25
25
25 25 25 25 0 0 0 0 0
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
- 2) Bob “fixes” balance of tokens under his control.
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
- 2) Bob “fixes” balance of tokens under his control.
Simulating Random-Partition Protocol...
- Consider node v where f(v) is known to Bob.
- Creating Instance of Random-Partition Median Finding:
- 1) Using public coin, players determine partition of tokens
and set half to and half to .
- 2) Bob “fixes” balance of tokens under his control.
- Thm: Partition looks random if total number of tokens is
greater than (max bias)2. Hence, .
Simulating Random-Partition Protocol...
- m = exp(2p lg t)
Summary
Introduced notion of Robust Lower Bounds Tight communication bounds for disjointness, indexing, gap-hamming, and improved selection bound. Data streams bounds including frequency moments, connectivity, entropy, F0, quantile estimation, ... Many open problems... Thanks!