Robust Lower Bounds for Communication and Stream Computation Amit - - PowerPoint PPT Presentation

robust lower bounds
SMART_READER_LITE
LIVE PREVIEW

Robust Lower Bounds for Communication and Stream Computation Amit - - PowerPoint PPT Presentation

Robust Lower Bounds for Communication and Stream Computation Amit Chakrabarti Dartmouth College Graham Cormode AT&T Labs Andrew McGregor UC San Diego Communication Complexity Communication Complexity


slide-1
SLIDE 1
  • Amit Chakrabarti Dartmouth College
  • Graham Cormode AT&T Labs
  • Andrew McGregor UC San Diego

“Robust” Lower Bounds

for Communication and Stream Computation

slide-2
SLIDE 2

Communication Complexity

slide-3
SLIDE 3
  • Goal: Evaluate f(x1, ... , xn) when input is split among p players:
  • How much communication is required to evaluate f?
  • Consider randomized, blackboard, one-way, multi-round, ...

x1 ... x10 x11 ... x20 x21 ... x30

Communication Complexity

slide-4
SLIDE 4
  • Goal: Evaluate f(x1, ... , xn) when input is split among p players:
  • How much communication is required to evaluate f?
  • Consider randomized, blackboard, one-way, multi-round, ...

x1 ... x10 x11 ... x20 x21 ... x30

Communication Complexity

slide-5
SLIDE 5
  • Goal: Evaluate f(x1, ... , xn) when input is split among p players:
  • How much communication is required to evaluate f?
  • Consider randomized, blackboard, one-way, multi-round, ...
  • How important is the split?
  • Is f hard for many splits or only hard for a few bad splits?
  • Previous work on worst and best partitions.
  • [Aho, Ullman, Yannakakis ’83] [Papadimitriou, Sipser ’84]

x1 ... x10 x11 ... x20 x21 ... x30

Communication Complexity

slide-6
SLIDE 6
  • Goal: Evaluate f(x1, ... , xn) when input is split among p players:
  • How much communication is required to evaluate f?
  • Consider randomized, blackboard, one-way, multi-round, ...
  • How important is the split?
  • Is f hard for many splits or only hard for a few bad splits?
  • Previous work on worst and best partitions.
  • [Aho, Ullman, Yannakakis ’83] [Papadimitriou, Sipser ’84]
  • Consider random partitions:
  • Define error probability over coin flips and random split.

x1 ... x10 x11 ... x20 x21 ... x30

Communication Complexity

slide-7
SLIDE 7
  • Goal: Evaluate f(x1, ... , xn) given sequential access:

x1 x2 x3 x4 x5 ... ... xn

[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]

Stream Computation

slide-8
SLIDE 8
  • Goal: Evaluate f(x1, ... , xn) given sequential access:

x1 x2 x3 x4 x5 ... ... xn

[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]

Stream Computation

slide-9
SLIDE 9
  • Goal: Evaluate f(x1, ... , xn) given sequential access:
  • How much working memory is required to evaluate f?
  • Consider randomized, approximate, multi-pass, etc.

x1 x2 x3 x4 x5 ... ... xn

[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]

Stream Computation

slide-10
SLIDE 10
  • Goal: Evaluate f(x1, ... , xn) given sequential access:
  • How much working memory is required to evaluate f?
  • Consider randomized, approximate, multi-pass, etc.
  • Random-order streams: Assume f is order-invariant:
  • Upper Bounds: e.g., stream of i.i.d. samples.
  • Lower Bounds: is a “hard” problem hard in practice?
  • [Munro, Paterson ’78] [Demaine, López-Ortiz, Munro ’02]

[Guha, McGregor ’06, ’07a, ’07b] [Chakrabarti, Jayram, Patrascu ’08]

x1 x2 x3 x4 x5 ... ... xn

[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]

Stream Computation

slide-11
SLIDE 11
  • Goal: Evaluate f(x1, ... , xn) given sequential access:
  • How much working memory is required to evaluate f?
  • Consider randomized, approximate, multi-pass, etc.
  • Random-order streams: Assume f is order-invariant:
  • Upper Bounds: e.g., stream of i.i.d. samples.
  • Lower Bounds: is a “hard” problem hard in practice?
  • [Munro, Paterson ’78] [Demaine, López-Ortiz, Munro ’02]

[Guha, McGregor ’06, ’07a, ’07b] [Chakrabarti, Jayram, Patrascu ’08]

  • Random-partition-CC bounds give random-order bounds

x1 x2 x3 x4 x5 ... ... xn

[Morris ’78] [Munro, Paterson ’78] [Flajolet, Martin ’85] [Alon, Matias, Szegedy ’96] [Henzinger, Raghavan, Rajagopalan ’98]

Stream Computation

slide-12
SLIDE 12

Results

  • t-party Set-Disjointess: Any protocol for (t2)-player random-

partition requires (n/t) bits communication. ∴ 2-approx. for kth freq. moments requires (n1-3/k) space.

  • Median: Any p-round protocol for p-player random-

partition requires (mf(p)) where f(p)=1/3p ∴ Polylog(m)-space algorithm requires (log log m) passes.

  • Gap-Hamming: Any one-way protocol for 2-player random-

partition requires (n) bits communicated. ∴ (1+)-approx. for F0 or entropy requires (-2) space.

  • Index: Any one-way protocol for 2-player random-partition

(with duplicates) requires (n) bits communicated. ∴ Connectivity of a graph G=(V, E) requires (|V|) space.

slide-13
SLIDE 13

5 21 2 6 23

...

8 24 24 1 8

...

25 25 0 ...

The Challenge...

slide-14
SLIDE 14
  • Naive reduction from fixed-partition-CC:
  • 1. Players determine random partition, send necessary data.
  • 2. Simulate protocol on random partition.

5 21 2 6 23

...

8 24 24 1 8

...

25 25 0 ...

The Challenge...

slide-15
SLIDE 15
  • Naive reduction from fixed-partition-CC:
  • 1. Players determine random partition, send necessary data.
  • 2. Simulate protocol on random partition.

5 21 2 6 23

...

8 24 24 1 8

...

25 25

... The Challenge...

slide-16
SLIDE 16
  • Naive reduction from fixed-partition-CC:
  • 1. Players determine random partition, send necessary data.
  • 2. Simulate protocol on random partition.
  • Problem: Seems to require too much communication.

5 21 2 6 23

...

8 24 24 1 8

...

25 25

... The Challenge...

slide-17
SLIDE 17
  • Naive reduction from fixed-partition-CC:
  • 1. Players determine random partition, send necessary data.
  • 2. Simulate protocol on random partition.
  • Problem: Seems to require too much communication.
  • Consider random input and public coins:
  • Issue #1: Need independence of input and partition.
  • Issue #2: Generalize information statistics techniques.

5 21 2 6 23

...

8 24 24 1 8

...

25 25

... The Challenge...

slide-18
SLIDE 18

a) Disjointness b) Selection a) Disjointness b) Selection

slide-19
SLIDE 19

a) Disjointness b) Selection

slide-20
SLIDE 20

Multi-Party Set-Disjointness

  • Instance: t x n matrix,
  • and define,

X =       1 1 1 1 1 1 1 1 1 1 1      

DISJn,t =

i ANDt(x1,i, ... , xt,i)

slide-21
SLIDE 21

Multi-Party Set-Disjointness

  • Instance: t x n matrix,
  • and define,
  • Unique intersection: Each column has weight 0, 1, or t and at

most one column has weight t.

X =       1 1 1 1 1 1 1 1 1 1 1      

DISJn,t =

i ANDt(x1,i, ... , xt,i)

slide-22
SLIDE 22

Multi-Party Set-Disjointness

  • Instance: t x n matrix,
  • and define,
  • Unique intersection: Each column has weight 0, 1, or t and at

most one column has weight t.

  • Thm: (n/t) bound if t-players each get a row.
  • [Kalyanasundaram, Schnitger ’92] [Razborov ’92]
  • [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]

X =       1 1 1 1 1 1 1 1 1 1 1      

DISJn,t =

i ANDt(x1,i, ... , xt,i)

slide-23
SLIDE 23

Multi-Party Set-Disjointness

  • Instance: t x n matrix,
  • and define,
  • Unique intersection: Each column has weight 0, 1, or t and at

most one column has weight t.

  • Thm: (n/t) bound if t-players each get a row.
  • [Kalyanasundaram, Schnitger ’92] [Razborov ’92]
  • [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
  • Thm: (n/t) bound for random partition for (t2) players.

X =       1 1 1 1 1 1 1 1 1 1 1      

DISJn,t =

i ANDt(x1,i, ... , xt,i)

slide-24
SLIDE 24

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-25
SLIDE 25
  • (X) is transcript of -error protocol on random input X~µ.

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-26
SLIDE 26
  • (X) is transcript of -error protocol on random input X~µ.
  • Information Cost: icost()= I(X: (X))
  • Lower bound on the length of the protocol
  • Amenable to direct-sum results...

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-27
SLIDE 27
  • (X) is transcript of -error protocol on random input X~µ.
  • Information Cost: icost()= I(X: (X))
  • Lower bound on the length of the protocol
  • Amenable to direct-sum results...

Step 1:

icost(Π)≥

jI(X j :Π(X))

where X j is j th column

  • f matrix X

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-28
SLIDE 28
  • (X) is transcript of -error protocol on random input X~µ.
  • Information Cost: icost()= I(X: (X))
  • Lower bound on the length of the protocol
  • Amenable to direct-sum results...

Step 1:

icost(Π)≥

jI(X j :Π(X))

where X j is j th column

  • f matrix X

Step 2:

I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-29
SLIDE 29
  • (X) is transcript of -error protocol on random input X~µ.
  • Information Cost: icost()= I(X: (X))
  • Lower bound on the length of the protocol
  • Amenable to direct-sum results...

Step 1:

icost(Π)≥

jI(X j :Π(X))

where X j is j th column

  • f matrix X

Step 2:

I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt

Step 3:

icost(Π′) ≥ Ω(1/t) assuming ’ is private-coin,

  • ne-way protocol

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-30
SLIDE 30
  • (X) is transcript of -error protocol on random input X~µ.
  • Information Cost: icost()= I(X: (X))
  • Lower bound on the length of the protocol
  • Amenable to direct-sum results...
  • Necessary Generalization:
  • Step 1: Condition “icost” on public coins.
  • Step 2: Error of ’ is best +Birthday(t,p) error protocol.

Step 3: Generalize result for public-coin protocols.

Step 1:

icost(Π)≥

jI(X j :Π(X))

where X j is j th column

  • f matrix X

Step 2:

I(X j :Π(X)) ≥ icost(Π′) where ’ is “best” -error protocol for ANDt

Step 3:

icost(Π′) ≥ Ω(1/t) assuming ’ is private-coin,

  • ne-way protocol

Generalize Information Statistics Approach...

  • [Chakrabarti, Shi, Wirth, Yao ’01] [Chakrabarti, Khot, Sun ’03] [Bar-Yossef, Jayram, Kumar, Sivakumar ’04]
slide-31
SLIDE 31

Frequency Moments

slide-32
SLIDE 32
  • Define: Fk(S) =

i(freq. of i)k

Frequency Moments

slide-33
SLIDE 33
  • Define:
  • Reduction from set-disjointness:

[Alon, Matias, Szegedy ’99]

S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0

Fk(S) =

i(freq. of i)k

Frequency Moments

slide-34
SLIDE 34
  • Define:
  • Reduction from set-disjointness:

[Alon, Matias, Szegedy ’99]

  • Thm: (n1-3/k) space bound for random order streams.
  • Proof: Set tk=2n to prove (n1-1/k) total communication
  • Per-message communication is (n1-1/k/p)= (n1-3/k)

S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0

Fk(S) =

i(freq. of i)k

Frequency Moments

slide-35
SLIDE 35
  • Define:
  • Reduction from set-disjointness:

[Alon, Matias, Szegedy ’99]

  • Thm: (n1-3/k) space bound for random order streams.
  • Proof: Set tk=2n to prove (n1-1/k) total communication
  • Per-message communication is (n1-1/k/p)= (n1-3/k)
  • Open Problem: (n1-2/k) bound for random order?

S = {i : xij = 1} Fk(S) ≥ tk if DISJn,t(X) = 1 Fk(S) ≤ n if DISJn,t(X) = 0

Fk(S) =

i(freq. of i)k

Frequency Moments

slide-36
SLIDE 36

a) Disjointness b) Selection

slide-37
SLIDE 37

Selection in Streams

slide-38
SLIDE 38
  • Find median of stream of m values in polylog(m) space.

Selection in Streams

slide-39
SLIDE 39
  • Find median of stream of m values in polylog(m) space.
  • Thm: For adversarial-order stream, (lg m / lg lg m) pass
  • [Munro, Paterson ’78] [Guha, McGregor ’07a]

Selection in Streams

slide-40
SLIDE 40
  • Find median of stream of m values in polylog(m) space.
  • Thm: For adversarial-order stream, (lg m / lg lg m) pass
  • [Munro, Paterson ’78] [Guha, McGregor ’07a]
  • Thm: For random-order stream, (lg lg m) pass
  • [Guha, McGregor ’06] [Chakrabarti, Jayram, Patrascu ’08]

Selection in Streams

slide-41
SLIDE 41
  • Find median of stream of m values in polylog(m) space.
  • Thm: For adversarial-order stream, (lg m / lg lg m) pass
  • [Munro, Paterson ’78] [Guha, McGregor ’07a]
  • Thm: For random-order stream, (lg lg m) pass
  • [Guha, McGregor ’06] [Chakrabarti, Jayram, Patrascu ’08]
  • Our result: Using random-partition-CC techniques we get

simpler and tighter pass/space trade-offs...

Selection in Streams

slide-42
SLIDE 42
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).
  • Thm: With p-players, if ith player knows f(v) when level(v)=i:

Any p-round protocol requires (t) communication.

  • Tree Pointer Jumping (TPJ)...
slide-43
SLIDE 43
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).
  • Thm: With p-players, if ith player knows f(v) when level(v)=i:

Any p-round protocol requires (t) communication.

f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3

  • Tree Pointer Jumping (TPJ)...
slide-44
SLIDE 44
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).

f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3

  • Tree Pointer Jumping (TPJ)...
slide-45
SLIDE 45
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).

f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3

  • Tree Pointer Jumping (TPJ)...
slide-46
SLIDE 46
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).

f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3

  • Tree Pointer Jumping (TPJ)...
slide-47
SLIDE 47
  • Instance: Function on nodes of (p+1)-level, t-ary tree,

if v is an internal node: f maps v to a child of v if v is a leaf: f maps v to {0,1}

  • Goal: Compute f(f(... f(vroot)....)).
  • Thm: With p-players, if ith player knows f(v) when level(v)=i:

Any p-round protocol requires (t) communication.

f=0 f=1 f=1 f=1 f=0 f=1 f=1 f=0 f=1 f=1 f=2 f=1 f=3

  • Tree Pointer Jumping (TPJ)...
slide-48
SLIDE 48

Reduction from TPJ to Median...

slide-49
SLIDE 49
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

Reduction from TPJ to Median...

slide-50
SLIDE 50
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

2 5 7 11 13 15 19 21 22 3 4 6 10 12 14 18 20 23 25

Reduction from TPJ to Median...

1 8 9 16 17 24

slide-51
SLIDE 51
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

2 5 7 11 13 15 19 21 22 3 4 6 10 12 14 18 20 23 25

Reduction from TPJ to Median...

1 8 9 16 17 24

slide-52
SLIDE 52
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23

slide-53
SLIDE 53
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23 9 16

slide-54
SLIDE 54
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23 9 9

slide-55
SLIDE 55
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23 24 24 1 8 9 9

slide-56
SLIDE 56
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23 24 24 1 8 9 9 25

25

25 25 25 25 0 0 0 0 0

slide-57
SLIDE 57
  • With each node v associate two values (v) < (v) such

that (v) < (u) < (u) < (v) for any descendent u of v.

  • For each node: Generate multiple copies of (v) and (v)

such that median of values corresponds to TPJ solution.

  • Relationship between t and # copies determines bound.

Reduction from TPJ to Median...

5 10 13 21 2 6 18 14 23 24 24 1 8 9 9 25

25

25 25 25 25 0 0 0 0 0

slide-58
SLIDE 58

Simulating Random-Partition Protocol...

slide-59
SLIDE 59
  • Consider node v where f(v) is known to Bob.

Simulating Random-Partition Protocol...

slide-60
SLIDE 60
  • Consider node v where f(v) is known to Bob.

Simulating Random-Partition Protocol...

slide-61
SLIDE 61
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

Simulating Random-Partition Protocol...

slide-62
SLIDE 62
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

Simulating Random-Partition Protocol...

slide-63
SLIDE 63
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

Simulating Random-Partition Protocol...

slide-64
SLIDE 64
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

  • 2) Bob “fixes” balance of tokens under his control.

Simulating Random-Partition Protocol...

slide-65
SLIDE 65
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

  • 2) Bob “fixes” balance of tokens under his control.

Simulating Random-Partition Protocol...

slide-66
SLIDE 66
  • Consider node v where f(v) is known to Bob.
  • Creating Instance of Random-Partition Median Finding:
  • 1) Using public coin, players determine partition of tokens

and set half to and half to .

  • 2) Bob “fixes” balance of tokens under his control.
  • Thm: Partition looks random if total number of tokens is

greater than (max bias)2. Hence, .

Simulating Random-Partition Protocol...

  • m = exp(2p lg t)
slide-67
SLIDE 67

Summary

Introduced notion of Robust Lower Bounds Tight communication bounds for disjointness, indexing, gap-hamming, and improved selection bound. Data streams bounds including frequency moments, connectivity, entropy, F0, quantile estimation, ... Many open problems... Thanks!