Data Streams: Random Order & Multiple Passes 2009 Barbados - - PowerPoint PPT Presentation

data streams random order multiple passes
SMART_READER_LITE
LIVE PREVIEW

Data Streams: Random Order & Multiple Passes 2009 Barbados - - PowerPoint PPT Presentation

Data Streams: Random Order & Multiple Passes 2009 Barbados Workshop on Computational Complexity Andrew McGregor Introduction Random Order Streams: Average case analysis: data is worst-case but order is random. Lower bounds are more


slide-1
SLIDE 1

Data Streams: Random Order & Multiple Passes

2009 Barbados Workshop on Computational Complexity Andrew McGregor

slide-2
SLIDE 2

Introduction

Random Order Streams:

◮ Average case analysis: data is worst-case but order is random. ◮ Lower bounds are more useful than in the adversarial case. ◮ Streams ordered randomly: e.g., space-efficient sampling

slide-3
SLIDE 3

Introduction

Random Order Streams:

◮ Average case analysis: data is worst-case but order is random. ◮ Lower bounds are more useful than in the adversarial case. ◮ Streams ordered randomly: e.g., space-efficient sampling

Multiple Pass Streams:

◮ How much extra power do you get with a few extra passes? ◮ With external data, it’s easier to access data sequentially.

slide-4
SLIDE 4

Pass-Space Trade-Offs

Problem

Given a stream of n values from [n], what’s smallest value that doesn’t appear in stream? You have p passes over the data.

slide-5
SLIDE 5

Pass-Space Trade-Offs

Problem

Given a stream of n values from [n], what’s smallest value that doesn’t appear in stream? You have p passes over the data.

◮ Version 1: All values appear exactly once except for the

missing value. ˜ Θ(1)

slide-6
SLIDE 6

Pass-Space Trade-Offs

Problem

Given a stream of n values from [n], what’s smallest value that doesn’t appear in stream? You have p passes over the data.

◮ Version 1: All values appear exactly once except for the

missing value. ˜ Θ(1)

◮ Version 2: All values less than smallest missing value appear

exactly once ˜ Θ(n1/p)

slide-7
SLIDE 7

Pass-Space Trade-Offs

Problem

Given a stream of n values from [n], what’s smallest value that doesn’t appear in stream? You have p passes over the data.

◮ Version 1: All values appear exactly once except for the

missing value. ˜ Θ(1)

◮ Version 2: All values less than smallest missing value appear

exactly once ˜ Θ(n1/p)

◮ Version 3: General problem,

˜ Θ(n/p)

slide-8
SLIDE 8

Pass-Space Trade-Offs

Problem

Given a stream of n values from [n], what’s smallest value that doesn’t appear in stream? You have p passes over the data.

◮ Version 1: All values appear exactly once except for the

missing value. ˜ Θ(1)

◮ Version 2: All values less than smallest missing value appear

exactly once ˜ Θ(n1/p)

◮ Version 3: General problem,

˜ Θ(n/p) Other trade-offs: Find length k increasing sequence given it exists: ˜ Θ(k1+1/(2p−1)) [Liben-Nowell et al. ’06, Guha, McGregor ’08]

slide-9
SLIDE 9

Random Order Streams

Problem

Given m values from [n], find median in polylog(m, n) space.

slide-10
SLIDE 10

Random Order Streams

Problem

Given m values from [n], find median in polylog(m, n) space. Approximate Median (i.e., one with rank m/2 ± t) in One Pass:

◮ Adversarial: ˜

Θ(m)-approx [Greenwald, Khanna ’01]

◮ Random: ˜

O(m1/2)-approx [Guha, McGregor ’06]

slide-11
SLIDE 11

Random Order Streams

Problem

Given m values from [n], find median in polylog(m, n) space. Approximate Median (i.e., one with rank m/2 ± t) in One Pass:

◮ Adversarial: ˜

Θ(m)-approx [Greenwald, Khanna ’01]

◮ Random: ˜

O(m1/2)-approx [Guha, McGregor ’06] Exact Median in Multiple Passes

◮ Adversarial: Θ(log m/ log log m) pass [Munro, Paterson ’78, Guha,

McGregor ’07]

◮ Random: Θ(log log m) pass [Guha, McGregor ’06, Chakrabarti,

Jayram, Patrascu ’08, Chakrabarti, Cormode, McGregor ’08]

slide-12
SLIDE 12

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-13
SLIDE 13

Outline

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-14
SLIDE 14

Outline

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-15
SLIDE 15

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

slide-16
SLIDE 16

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

◮ Already seen one pass result:

◮ Can find elements with rank iǫm ± ǫm for i ∈ [ǫ−1]

slide-17
SLIDE 17

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

◮ Already seen one pass result:

◮ Can find elements with rank iǫm ± ǫm for i ∈ [ǫ−1]

◮ For multiple-pass result:

slide-18
SLIDE 18

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

◮ Already seen one pass result:

◮ Can find elements with rank iǫm ± ǫm for i ∈ [ǫ−1]

◮ For multiple-pass result:

◮ In pass 1, use one pass alg. with ǫ =

1 log m to find a and b s.t.

rank(a) = m 2 − 2m log m ± m log m and rank(b) = m 2 + 2m log m ± m log m

slide-19
SLIDE 19

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

◮ Already seen one pass result:

◮ Can find elements with rank iǫm ± ǫm for i ∈ [ǫ−1]

◮ For multiple-pass result:

◮ In pass 1, use one pass alg. with ǫ =

1 log m to find a and b s.t.

rank(a) = m 2 − 2m log m ± m log m and rank(b) = m 2 + 2m log m ± m log m

◮ In pass 2, compute rank(a) and rank(b)

slide-20
SLIDE 20

Algorithms for Median in Adversarial-Order Stream

Theorem (Adversarial Order)

Can find element of rank m/2 ± ǫm in one pass and ˜ O(ǫ−1) space. Can find median in O(log m/ log log m) passes and ˜ O(1) space.

◮ Already seen one pass result:

◮ Can find elements with rank iǫm ± ǫm for i ∈ [ǫ−1]

◮ For multiple-pass result:

◮ In pass 1, use one pass alg. with ǫ =

1 log m to find a and b s.t.

rank(a) = m 2 − 2m log m ± m log m and rank(b) = m 2 + 2m log m ± m log m

◮ In pass 2, compute rank(a) and rank(b) ◮ Recurse on elements in the range (a, b).

slide-21
SLIDE 21

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

slide-22
SLIDE 22

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t]

slide-23
SLIDE 23

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]}

slide-24
SLIDE 24

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2}

slide-25
SLIDE 25

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2} ◮ Median of the 2t − 1 values is 2j + xj

slide-26
SLIDE 26

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2} ◮ Median of the 2t − 1 values is 2j + xj ◮ ∴ Exact median requires Ω(t) = Ω(m) space.

slide-27
SLIDE 27

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2} ◮ Median of the 2t − 1 values is 2j + xj ◮ ∴ Exact median requires Ω(t) = Ω(m) space. ◮ For approximate result, duplicate each element 2mδ + 1 times.

slide-28
SLIDE 28

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2} ◮ Median of the 2t − 1 values is 2j + xj ◮ ∴ Exact median requires Ω(t) = Ω(m) space. ◮ For approximate result, duplicate each element 2mδ + 1 times. ◮ ∴ Approx median requires Ω(t) = Ω(m/mδ) space.

slide-29
SLIDE 29

One Pass Lower Bound

Theorem

Finding m/2 ± mδ rank element in 1 pass requires Ω(m1−δ) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t] ◮ Alice constructs A = {2i + xi : i ∈ [t]} ◮ Bob constructs B = {t − j copies of 0, j − 1 copies of 2t + 2} ◮ Median of the 2t − 1 values is 2j + xj ◮ ∴ Exact median requires Ω(t) = Ω(m) space. ◮ For approximate result, duplicate each element 2mδ + 1 times. ◮ ∴ Approx median requires Ω(t) = Ω(m/mδ) space.

Exercise

Prove an algorithm that doesn’t know m in advance requires Ω(m) space to find median even when the data comes in sorted order.

slide-30
SLIDE 30

Two Pass Lower Bound

Theorem

Finding median in 2 passes requires Ω(m1/2) space.

slide-31
SLIDE 31

Two Pass Lower Bound

Theorem

Finding median in 2 passes requires Ω(m1/2) space.

◮ “2-level index” Reduction: Alice has x1, . . . , xt ∈ {0, 1}t,

Bob has y ∈ [t]t, Charlie has i ∈ [t]. To determine xi

j where

j = yi after two rounds, requires Ω(t) bits of communication.

[Nisan, Widgerson ’91]

slide-32
SLIDE 32

Two Pass Lower Bound

Theorem

Finding median in 2 passes requires Ω(m1/2) space.

◮ “2-level index” Reduction: Alice has x1, . . . , xt ∈ {0, 1}t,

Bob has y ∈ [t]t, Charlie has i ∈ [t]. To determine xi

j where

j = yi after two rounds, requires Ω(t) bits of communication.

[Nisan, Widgerson ’91]

◮ For j ∈ [t], appropriate players construct

Ai = {2j + xi

j : i ∈ [t]} + oi where oi = B(i − 1)

Bi = {t − yi copies of 0 and yi − 1 copies of B} + oi C = {t − i copies of 0 and i − 1 copies of Bot}

slide-33
SLIDE 33

Two Pass Lower Bound

Theorem

Finding median in 2 passes requires Ω(m1/2) space.

◮ “2-level index” Reduction: Alice has x1, . . . , xt ∈ {0, 1}t,

Bob has y ∈ [t]t, Charlie has i ∈ [t]. To determine xi

j where

j = yi after two rounds, requires Ω(t) bits of communication.

[Nisan, Widgerson ’91]

◮ For j ∈ [t], appropriate players construct

Ai = {2j + xi

j : i ∈ [t]} + oi where oi = B(i − 1)

Bi = {t − yi copies of 0 and yi − 1 copies of B} + oi C = {t − i copies of 0 and i − 1 copies of Bot}

◮ Median of the O(t2) values is oi + 2j + xi j where j = yi

slide-34
SLIDE 34

Two Pass Lower Bound

Theorem

Finding median in 2 passes requires Ω(m1/2) space.

◮ “2-level index” Reduction: Alice has x1, . . . , xt ∈ {0, 1}t,

Bob has y ∈ [t]t, Charlie has i ∈ [t]. To determine xi

j where

j = yi after two rounds, requires Ω(t) bits of communication.

[Nisan, Widgerson ’91]

◮ For j ∈ [t], appropriate players construct

Ai = {2j + xi

j : i ∈ [t]} + oi where oi = B(i − 1)

Bi = {t − yi copies of 0 and yi − 1 copies of B} + oi C = {t − i copies of 0 and i − 1 copies of Bot}

◮ Median of the O(t2) values is oi + 2j + xi j where j = yi ◮ ∴ Exact median requires Ω(t) = Ω(m1/2) space.

slide-35
SLIDE 35

Outline

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-36
SLIDE 36

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

slide-37
SLIDE 37

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

slide-38
SLIDE 38

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m)

slide-39
SLIDE 39

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m) ◮ At start of i-th segment: we think rank(ai) < m/2 < rank(bi).

slide-40
SLIDE 40

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m) ◮ At start of i-th segment: we think rank(ai) < m/2 < rank(bi). ◮ Let c be first element in segment with ai < c < bi

slide-41
SLIDE 41

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m) ◮ At start of i-th segment: we think rank(ai) < m/2 < rank(bi). ◮ Let c be first element in segment with ai < c < bi ◮ In rest of segment, estimate rank(c) by ˜

r

slide-42
SLIDE 42

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m) ◮ At start of i-th segment: we think rank(ai) < m/2 < rank(bi). ◮ Let c be first element in segment with ai < c < bi ◮ In rest of segment, estimate rank(c) by ˜

r

◮ If ˜

r = m/2 ± ˜ O(√m) return ˜ r, otherwise: (ai+1, bi+1) = (ai, c) if ˜ r > m/2 (c, bi) if ˜ r < m/2

slide-43
SLIDE 43

Random Order Algorithms

Theorem

Can find element of rank m/2 ± ˜ O(√m) in one pass and ˜ O(1)

  • space. Can find median in O(log log m) passes and ˜

O(1) space.

◮ One pass result:

◮ Split stream into O(log m) segments of length O(m/ log m) ◮ At start of i-th segment: we think rank(ai) < m/2 < rank(bi). ◮ Let c be first element in segment with ai < c < bi ◮ In rest of segment, estimate rank(c) by ˜

r

◮ If ˜

r = m/2 ± ˜ O(√m) return ˜ r, otherwise: (ai+1, bi+1) = (ai, c) if ˜ r > m/2 (c, bi) if ˜ r < m/2

◮ For multiple-pass result: Recurse with care!

slide-44
SLIDE 44

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

slide-45
SLIDE 45

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

slide-46
SLIDE 46

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

◮ For some constant c > 0, define:

A = {2i + xi : i ∈ [t]} B = {ct2 + t − j copies of 0 and ct2 + j − 1 copies of 2t + 2}

slide-47
SLIDE 47

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

◮ For some constant c > 0, define:

A = {2i + xi : i ∈ [t]} B = {ct2 + t − j copies of 0 and ct2 + j − 1 copies of 2t + 2}

◮ Alice and Bob simulate algorithm on random permutation of

A ∪ B. Alice determines 1st half and Bob determines 2nd half:

slide-48
SLIDE 48

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

◮ For some constant c > 0, define:

A = {2i + xi : i ∈ [t]} B = {ct2 + t − j copies of 0 and ct2 + j − 1 copies of 2t + 2}

◮ Alice and Bob simulate algorithm on random permutation of

A ∪ B. Alice determines 1st half and Bob determines 2nd half:

◮ Alice assumes j = t/2: Bob “fixes” the balance.

slide-49
SLIDE 49

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

◮ For some constant c > 0, define:

A = {2i + xi : i ∈ [t]} B = {ct2 + t − j copies of 0 and ct2 + j − 1 copies of 2t + 2}

◮ Alice and Bob simulate algorithm on random permutation of

A ∪ B. Alice determines 1st half and Bob determines 2nd half:

◮ Alice assumes j = t/2: Bob “fixes” the balance. ◮ Bob guesses values of xi if 2i + xi appears in his half.

slide-50
SLIDE 50

Random Order One Pass Lower Bound

Theorem

Finding median in 1 pass requires Ω(m1/2) space.

◮ index Reduction: Alice has x ∈ {0, 1}t, Bob has j ∈ [t].

Solving problem requires Ω(t) even when x ∈R {0, 1}t.

◮ For some constant c > 0, define:

A = {2i + xi : i ∈ [t]} B = {ct2 + t − j copies of 0 and ct2 + j − 1 copies of 2t + 2}

◮ Alice and Bob simulate algorithm on random permutation of

A ∪ B. Alice determines 1st half and Bob determines 2nd half:

◮ Alice assumes j = t/2: Bob “fixes” the balance. ◮ Bob guesses values of xi if 2i + xi appears in his half.

◮ Choosing large c ensures ordering is sufficiently random.

slide-51
SLIDE 51

Outline

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-52
SLIDE 52

Frequency Moments

Problem

Given m elements from [n], find (1 + ǫ) approx for Fk =

i∈[n] f k i

with probability 1 − δ where fi is the frequency of item i.

slide-53
SLIDE 53

Frequency Moments

Problem

Given m elements from [n], find (1 + ǫ) approx for Fk =

i∈[n] f k i

with probability 1 − δ where fi is the frequency of item i.

Theorem (Chakrabarti et al. ’03, Indyk, Woodruff ’05)

˜ Θǫ,δ(n1−2/k) space when stream is in adversarial order.

slide-54
SLIDE 54

Frequency Moments

Problem

Given m elements from [n], find (1 + ǫ) approx for Fk =

i∈[n] f k i

with probability 1 − δ where fi is the frequency of item i.

Theorem (Chakrabarti et al. ’03, Indyk, Woodruff ’05)

˜ Θǫ,δ(n1−2/k) space when stream is in adversarial order.

Theorem (Andoni et al. ’08)

˜ Ω(n1−2.5/k) space necessary when the stream is in random order.

slide-55
SLIDE 55

Frequency Moments

Problem

Given m elements from [n], find (1 + ǫ) approx for Fk =

i∈[n] f k i

with probability 1 − δ where fi is the frequency of item i.

Theorem (Chakrabarti et al. ’03, Indyk, Woodruff ’05)

˜ Θǫ,δ(n1−2/k) space when stream is in adversarial order.

Theorem (Andoni et al. ’08)

˜ Ω(n1−2.5/k) space necessary when the stream is in random order. Rumor has it that that this has been tightened to Ω(n1−2/k) . . .

slide-56
SLIDE 56

Adversarial Order Lower Bound

◮ t-DISJ Reduction: t sets S1, . . . , St ⊂ [n] of size n/t. Are

sets pairwise-disjoint or does there exists common element?

slide-57
SLIDE 57

Adversarial Order Lower Bound

◮ t-DISJ Reduction: t sets S1, . . . , St ⊂ [n] of size n/t. Are

sets pairwise-disjoint or does there exists common element?

◮ If i-th player has Si, t-DISJ requires ˜

Ω(n/t) communication.

[Bar-Yosseff et al. ’02, Chakrabarti et al. ’03]

slide-58
SLIDE 58

Adversarial Order Lower Bound

◮ t-DISJ Reduction: t sets S1, . . . , St ⊂ [n] of size n/t. Are

sets pairwise-disjoint or does there exists common element?

◮ If i-th player has Si, t-DISJ requires ˜

Ω(n/t) communication.

[Bar-Yosseff et al. ’02, Chakrabarti et al. ’03]

◮ Let S = ∪i∈[t]Si. If tk > 2n,

(Fk(S) ≤ n) ⇒ (t-DISJ(S) = “disjoint”) (Fk(S) ≥ 2n) ⇒ (t-DISJ(S) = “common element”)

slide-59
SLIDE 59

Adversarial Order Lower Bound

◮ t-DISJ Reduction: t sets S1, . . . , St ⊂ [n] of size n/t. Are

sets pairwise-disjoint or does there exists common element?

◮ If i-th player has Si, t-DISJ requires ˜

Ω(n/t) communication.

[Bar-Yosseff et al. ’02, Chakrabarti et al. ’03]

◮ Let S = ∪i∈[t]Si. If tk > 2n,

(Fk(S) ≤ n) ⇒ (t-DISJ(S) = “disjoint”) (Fk(S) ≥ 2n) ⇒ (t-DISJ(S) = “common element”)

◮ An 1-pass, s-space algorithm that 2-approximates Fk gives a

ts-space algorithm that solves (2n)1/k-DISJ

slide-60
SLIDE 60

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

slide-61
SLIDE 61

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t.

slide-62
SLIDE 62

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t. ◮ Using public random bits, players pick random stream S from

[2n]n, random map f : [n] → [n], and random permutations πi

slide-63
SLIDE 63

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t. ◮ Using public random bits, players pick random stream S from

[2n]n, random map f : [n] → [n], and random permutations πi

slide-64
SLIDE 64

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t. ◮ Using public random bits, players pick random stream S from

[2n]n, random map f : [n] → [n], and random permutations πi

◮ Player i computes string σ(f (Si))

slide-65
SLIDE 65

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t. ◮ Using public random bits, players pick random stream S from

[2n]n, random map f : [n] → [n], and random permutations πi

◮ Player i computes string σ(f (Si)) ◮ Players embed the strings in S at random locations:

◮ If embedding of two strings overlap, abort algorithm. ◮ Probability of aborting is sufficiently small if n′ = n1−1/k

slide-66
SLIDE 66

Random Order Lower Bound

Theorem (Andoni et al. ’08)

˜ Ω(n1−3/k) space necessary for random order stream.

◮ t-DISJ Reduction: S1, . . . , St ⊂ [n′] of size n′/t. ◮ Using public random bits, players pick random stream S from

[2n]n, random map f : [n] → [n], and random permutations πi

◮ Player i computes string σ(f (Si)) ◮ Players embed the strings in S at random locations:

◮ If embedding of two strings overlap, abort algorithm. ◮ Probability of aborting is sufficiently small if n′ = n1−1/k

Extending ideas, gives ˜ Ω(n1−2/k).

slide-67
SLIDE 67

Outline

Selection Adversarial Order Random Order Frequency Moments Hamming Distance

slide-68
SLIDE 68

Hamming Distance Lower Bound

Problem

Alice knows x ∈ {0, 1}n and Bob knows y ∈ {0, 1}n. Want to estimate hamming distance up to ±o(√n) with probability 9/10.

slide-69
SLIDE 69

Hamming Distance Lower Bound

Problem

Alice knows x ∈ {0, 1}n and Bob knows y ∈ {0, 1}n. Want to estimate hamming distance up to ±o(√n) with probability 9/10.

Theorem (Woodruff 2004, Jayram et al. 2008)

Any one-way protocol requires Ω(n) bits of communication.

slide-70
SLIDE 70

Hamming Distance Lower Bound

Problem

Alice knows x ∈ {0, 1}n and Bob knows y ∈ {0, 1}n. Want to estimate hamming distance up to ±o(√n) with probability 9/10.

Theorem (Woodruff 2004, Jayram et al. 2008)

Any one-way protocol requires Ω(n) bits of communication.

Theorem (Brody, Chakrabarti last week)

Any O(1)-round protocol requires Ω(n) bits of communication.

slide-71
SLIDE 71

Hamming Distance Lower Bound

Problem

Alice knows x ∈ {0, 1}n and Bob knows y ∈ {0, 1}n. Want to estimate hamming distance up to ±o(√n) with probability 9/10.

Theorem (Woodruff 2004, Jayram et al. 2008)

Any one-way protocol requires Ω(n) bits of communication.

Theorem (Brody, Chakrabarti last week)

Any O(1)-round protocol requires Ω(n) bits of communication.

Corollary

Any O(1)-pass algorithm that (1 + ǫ) approximates F0 or F2 requires Ω(ǫ−2) space.

slide-72
SLIDE 72

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

slide-73
SLIDE 73

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

◮ Alice and Bob pick r ∈R {−1, 1}n using public random bits.

slide-74
SLIDE 74

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

◮ Alice and Bob pick r ∈R {−1, 1}n using public random bits. ◮ Alice computes sn(r.z) and Bob computes sn(rj)

slide-75
SLIDE 75

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

◮ Alice and Bob pick r ∈R {−1, 1}n using public random bits. ◮ Alice computes sn(r.z) and Bob computes sn(rj) ◮ Claim: For some constant c > 0,

P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

slide-76
SLIDE 76

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

◮ Alice and Bob pick r ∈R {−1, 1}n using public random bits. ◮ Alice computes sn(r.z) and Bob computes sn(rj) ◮ Claim: For some constant c > 0,

P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ Repeat n = O(t) times to construct

xi = I[sn(r.z) = +] and yi = I[sn(rj) = +]

slide-77
SLIDE 77

One-Pass Lower Bound (1/2)

◮ Reduction from index problem: Alice knows z ∈ {0, 1}t and

Bob knows j ∈ [t]. Let’s assume |z| = t/2 and this is odd.

◮ Alice and Bob pick r ∈R {−1, 1}n using public random bits. ◮ Alice computes sn(r.z) and Bob computes sn(rj) ◮ Claim: For some constant c > 0,

P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ Repeat n = O(t) times to construct

xi = I[sn(r.z) = +] and yi = I[sn(rj) = +]

◮ With probability 9/10, for some constants c1 < c2,

zj = 0 ⇒ ∆(x, y) ≥ n/2 − c1 √n zj = 1 ⇒ ∆(x, y) ≤ n/2 − c2 √n

slide-78
SLIDE 78

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

slide-79
SLIDE 79

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ If zj = 0 then sn(r.z) and sn(rj) are independent.

slide-80
SLIDE 80

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ If zj = 0 then sn(r.z) and sn(rj) are independent. ◮ If zj = 1, let s = r.z − rj, A = {sn(r.z) = sn(rj)}:

slide-81
SLIDE 81

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ If zj = 0 then sn(r.z) and sn(rj) are independent. ◮ If zj = 1, let s = r.z − rj, A = {sn(r.z) = sn(rj)}:

P [A] = P [A|s = 0] P [s = 0] + P [A|s = 0] P [s = 0]

slide-82
SLIDE 82

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ If zj = 0 then sn(r.z) and sn(rj) are independent. ◮ If zj = 1, let s = r.z − rj, A = {sn(r.z) = sn(rj)}:

P [A] = P [A|s = 0] P [s = 0] + P [A|s = 0] P [s = 0] P [A|s = 0] = 1 and P [A|s = 0] = 1/2

slide-83
SLIDE 83

One-Pass Lower Bound (2/2)

Claim

For some constant c > 0, P [sn(r.z) = sn(rj)] = 1/2 if zj = 0 1/2 + c/√t if zj = 1

◮ If zj = 0 then sn(r.z) and sn(rj) are independent. ◮ If zj = 1, let s = r.z − rj, A = {sn(r.z) = sn(rj)}:

P [A] = P [A|s = 0] P [s = 0] + P [A|s = 0] P [s = 0] P [A|s = 0] = 1 and P [A|s = 0] = 1/2 P [s = 0] = 2c/√n for some constant c > 0

slide-84
SLIDE 84

Summary: We looked at some nice problems, our curiousity is piqued, and now we want to start finding more problems to solve. Thanks!