Algorithms for Big Data (VIII) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data viii
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (VIII) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (VIII) Chihao Zhang Shanghai Jiao Tong University Nov. 8, 2019 Algorithms for Big Data (VIII) 1/14 Recall that we have the following simple algorithm for counting triangles. Review Last week, we learnt a few graph


slide-1
SLIDE 1

Algorithms for Big Data (VIII)

Chihao Zhang

Shanghai Jiao Tong University

  • Nov. 8, 2019

Algorithms for Big Data (VIII) 1/14

slide-2
SLIDE 2

Review

Last week, we learnt a few graph streaming algorithms. Recall that we have the following simple algorithm for counting triangles. Let f be the vector where for , . The algorithm simply returns , where f .

Algorithms for Big Data (VIII) 2/14

slide-3
SLIDE 3

Review

Last week, we learnt a few graph streaming algorithms. Recall that we have the following simple algorithm for counting triangles. Let f be the vector where for , . The algorithm simply returns , where f .

Algorithms for Big Data (VIII) 2/14

slide-4
SLIDE 4

Review

Last week, we learnt a few graph streaming algorithms. Recall that we have the following simple algorithm for counting triangles. Let f be the vector where for , . The algorithm simply returns , where f .

Algorithms for Big Data (VIII) 2/14

slide-5
SLIDE 5

Review

Last week, we learnt a few graph streaming algorithms. Recall that we have the following simple algorithm for counting triangles. Let f = (fT)T∈([n]

3 ) be the vector where for T = {x, y, z}, fT = |{{x, y} , {x, z} , {y, z}} ∩ E|.

The algorithm simply returns , where f .

Algorithms for Big Data (VIII) 2/14

slide-6
SLIDE 6

Review

Last week, we learnt a few graph streaming algorithms. Recall that we have the following simple algorithm for counting triangles. Let f = (fT)T∈([n]

3 ) be the vector where for T = {x, y, z}, fT = |{{x, y} , {x, z} , {y, z}} ∩ E|.

The algorithm simply returns F0 − 1.5F1 + 0.5F2 , where Fi = ∥f∥i

i.

Algorithms for Big Data (VIII) 2/14

slide-7
SLIDE 7

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” 1 satisfies ; . The multiplicative error of the algorithm is unbounded! I leave the analysis of the algorithm as an exercise.

Algorithms for Big Data (VIII) 3/14

slide-8
SLIDE 8

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” f(x) = 0.5x2 − 1.5x + 1[x ̸= 0] satisfies ▶ f(0) = f(1) = f(2) = 0; ▶ f(3) = 1. The multiplicative error of the algorithm is unbounded! I leave the analysis of the algorithm as an exercise.

Algorithms for Big Data (VIII) 3/14

slide-9
SLIDE 9

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” f(x) = 0.5x2 − 1.5x + 1[x ̸= 0] satisfies ▶ f(0) = f(1) = f(2) = 0; ▶ f(3) = 1. The multiplicative error of the algorithm is unbounded! I leave the analysis of the algorithm as an exercise.

Algorithms for Big Data (VIII) 3/14

slide-10
SLIDE 10

We can expand F0 − 1.5F1 + 0.5F2 as ∑

T∈([n]

3 )

0.5f2

T − 1.5fT + 1[fT ̸= 0].

The “polynomial” f(x) = 0.5x2 − 1.5x + 1[x ̸= 0] satisfies ▶ f(0) = f(1) = f(2) = 0; ▶ f(3) = 1. The multiplicative error of the algorithm is unbounded! I leave the analysis of the algorithm as an exercise.

Algorithms for Big Data (VIII) 3/14

slide-11
SLIDE 11

Communication complexity

Suppose we want to compute some function where and . Alice has and Bob has , they collaborate to compute . The compleixty is measured by bits communicated between the two. We consider one-way communication model, with possible public random coins.

Algorithms for Big Data (VIII) 4/14

slide-12
SLIDE 12

Communication complexity

Suppose we want to compute some function f(x, y) where x ∈ {0, 1}a and y ∈ {0, 1}b. Alice has and Bob has , they collaborate to compute . The compleixty is measured by bits communicated between the two. We consider one-way communication model, with possible public random coins.

Algorithms for Big Data (VIII) 4/14

slide-13
SLIDE 13

Communication complexity

Suppose we want to compute some function f(x, y) where x ∈ {0, 1}a and y ∈ {0, 1}b. Alice has x and Bob has y, they collaborate to compute f. The compleixty is measured by bits communicated between the two. We consider one-way communication model, with possible public random coins.

Algorithms for Big Data (VIII) 4/14

slide-14
SLIDE 14

Communication complexity

Suppose we want to compute some function f(x, y) where x ∈ {0, 1}a and y ∈ {0, 1}b. Alice has x and Bob has y, they collaborate to compute f. The compleixty is measured by bits communicated between the two. We consider one-way communication model, with possible public random coins.

Algorithms for Big Data (VIII) 4/14

slide-15
SLIDE 15

Communication complexity

Suppose we want to compute some function f(x, y) where x ∈ {0, 1}a and y ∈ {0, 1}b. Alice has x and Bob has y, they collaborate to compute f. The compleixty is measured by bits communicated between the two. We consider one-way communication model, with possible public random coins.

Algorithms for Big Data (VIII) 4/14

slide-16
SLIDE 16

Example: Eqality

Consider the function EQ if

  • therwise, where

. The one-way complexity of EQ is . This can be shown by a simple counting argument: If the number of bits sent by Alice is less than , then she can send at most distinct messages. By the pigeonhole principle, two difgerent strings and share the same message. Bob can then use to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-17
SLIDE 17

Example: Eqality

Consider the function f(x, y) = EQ(x, y) = { 1 if x = y;

  • therwise, where x, y ∈ {0, 1}n.

The one-way complexity of EQ is n. This can be shown by a simple counting argument: If the number of bits sent by Alice is less than , then she can send at most distinct messages. By the pigeonhole principle, two difgerent strings and share the same message. Bob can then use to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-18
SLIDE 18

Example: Eqality

Consider the function f(x, y) = EQ(x, y) = { 1 if x = y;

  • therwise, where x, y ∈ {0, 1}n.

The one-way complexity of EQ is n. This can be shown by a simple counting argument: If the number of bits sent by Alice is less than , then she can send at most distinct messages. By the pigeonhole principle, two difgerent strings and share the same message. Bob can then use to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-19
SLIDE 19

Example: Eqality

Consider the function f(x, y) = EQ(x, y) = { 1 if x = y;

  • therwise, where x, y ∈ {0, 1}n.

The one-way complexity of EQ is n. This can be shown by a simple counting argument: If the number of bits sent by Alice is less than n, then she can send at most 21 + 22 + · · · + 2n−1 = 2n − 2 distinct messages. By the pigeonhole principle, two difgerent strings and share the same message. Bob can then use to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-20
SLIDE 20

Example: Eqality

Consider the function f(x, y) = EQ(x, y) = { 1 if x = y;

  • therwise, where x, y ∈ {0, 1}n.

The one-way complexity of EQ is n. This can be shown by a simple counting argument: If the number of bits sent by Alice is less than n, then she can send at most 21 + 22 + · · · + 2n−1 = 2n − 2 distinct messages. By the pigeonhole principle, two difgerent strings x and x′ share the same message. Bob can then use to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-21
SLIDE 21

Example: Eqality

Consider the function f(x, y) = EQ(x, y) = { 1 if x = y;

  • therwise, where x, y ∈ {0, 1}n.

The one-way complexity of EQ is n. This can be shown by a simple counting argument: If the number of bits sent by Alice is less than n, then she can send at most 21 + 22 + · · · + 2n−1 = 2n − 2 distinct messages. By the pigeonhole principle, two difgerent strings x and x′ share the same message. Bob can then use y = x to fool the algorithm, a contradiction.

Algorithms for Big Data (VIII) 5/14

slide-22
SLIDE 22

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat and as two integers in . Alice picks a random prime . She sends mod to Bob. Bob outputs if mod mod , and outputs

  • therwise.

If , the algorithm is always correct. If , the algorithm is wrong only if mod . The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-23
SLIDE 23

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat and as two integers in . Alice picks a random prime . She sends mod to Bob. Bob outputs if mod mod , and outputs

  • therwise.

If , the algorithm is always correct. If , the algorithm is wrong only if mod . The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-24
SLIDE 24

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. Alice picks a random prime . She sends mod to Bob. Bob outputs if mod mod , and outputs

  • therwise.

If , the algorithm is always correct. If , the algorithm is wrong only if mod . The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-25
SLIDE 25

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. ▶ Alice picks a random prime p ∈ [n2, 2n2]. ▶ She sends (p, x mod p) to Bob. ▶ Bob outputs 1 if y mod p = x mod p, and outputs 0 otherwise. If , the algorithm is always correct. If , the algorithm is wrong only if mod . The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-26
SLIDE 26

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. ▶ Alice picks a random prime p ∈ [n2, 2n2]. ▶ She sends (p, x mod p) to Bob. ▶ Bob outputs 1 if y mod p = x mod p, and outputs 0 otherwise. If x = y, the algorithm is always correct. If , the algorithm is wrong only if mod . The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-27
SLIDE 27

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. ▶ Alice picks a random prime p ∈ [n2, 2n2]. ▶ She sends (p, x mod p) to Bob. ▶ Bob outputs 1 if y mod p = x mod p, and outputs 0 otherwise. If x = y, the algorithm is always correct. If x ̸= y, the algorithm is wrong only if x = y mod p. The number of primes between and is

log

(prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-28
SLIDE 28

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. ▶ Alice picks a random prime p ∈ [n2, 2n2]. ▶ She sends (p, x mod p) to Bob. ▶ Bob outputs 1 if y mod p = x mod p, and outputs 0 otherwise. If x = y, the algorithm is always correct. If x ̸= y, the algorithm is wrong only if x = y mod p. The number of primes between n2 and 2n2 is Θ (

n2 log n

) (prime number theorem). At most primes satisfy mod since .

Algorithms for Big Data (VIII) 6/14

slide-29
SLIDE 29

Randomness in communication

We can design a more efgicient protocol for EQ by tossing coins. We treat x and y as two integers in {0, 2n − 1}. ▶ Alice picks a random prime p ∈ [n2, 2n2]. ▶ She sends (p, x mod p) to Bob. ▶ Bob outputs 1 if y mod p = x mod p, and outputs 0 otherwise. If x = y, the algorithm is always correct. If x ̸= y, the algorithm is wrong only if x = y mod p. The number of primes between n2 and 2n2 is Θ (

n2 log n

) (prime number theorem). At most n primes q satisfy x − y mod q since x, y < 2n.

Algorithms for Big Data (VIII) 6/14

slide-30
SLIDE 30

Disjointness

The function DISJ tests whether two sets represented by and respectively share common elements. Formally, DISJ if

  • therwise

, where . Same argument as EQ shows that computing DISJ deterministically requires bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-31
SLIDE 31

Disjointness

The function DISJ(x, y) tests whether two sets represented by x and y respectively share common elements. Formally, DISJ if

  • therwise

, where . Same argument as EQ shows that computing DISJ deterministically requires bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-32
SLIDE 32

Disjointness

The function DISJ(x, y) tests whether two sets represented by x and y respectively share common elements. Formally, DISJ(x, y) = { 1 if ⟨x, y⟩ > 0

  • therwise. , where x, y ∈ {0, 1}n.

Same argument as EQ shows that computing DISJ deterministically requires bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-33
SLIDE 33

Disjointness

The function DISJ(x, y) tests whether two sets represented by x and y respectively share common elements. Formally, DISJ(x, y) = { 1 if ⟨x, y⟩ > 0

  • therwise. , where x, y ∈ {0, 1}n.

Same argument as EQ shows that computing DISJ deterministically requires n bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-34
SLIDE 34

Disjointness

The function DISJ(x, y) tests whether two sets represented by x and y respectively share common elements. Formally, DISJ(x, y) = { 1 if ⟨x, y⟩ > 0

  • therwise. , where x, y ∈ {0, 1}n.

Same argument as EQ shows that computing DISJ deterministically requires n bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-35
SLIDE 35

Disjointness

The function DISJ(x, y) tests whether two sets represented by x and y respectively share common elements. Formally, DISJ(x, y) = { 1 if ⟨x, y⟩ > 0

  • therwise. , where x, y ∈ {0, 1}n.

Same argument as EQ shows that computing DISJ deterministically requires n bits of

  • ne-way communication.

How about randomized protocols? Unlike EQ, the power of randomness does not help much here…

Algorithms for Big Data (VIII) 7/14

slide-36
SLIDE 36

Lower bound for DISJ

Theorem

Randomized protocol for with correct probability at least needs bits of

  • ne-way communication.

We prove this for a special case of DISJ, the problem of INDEX. So the lower bound is stronger. INDEX: Alice holds a string , Bob holds an index . INDEX .

Algorithms for Big Data (VIII) 8/14

slide-37
SLIDE 37

Lower bound for DISJ

Theorem

Randomized protocol for DISJ with correct probability at least 2/3 needs Ω(n) bits of

  • ne-way communication.

We prove this for a special case of DISJ, the problem of INDEX. So the lower bound is stronger. INDEX: Alice holds a string , Bob holds an index . INDEX .

Algorithms for Big Data (VIII) 8/14

slide-38
SLIDE 38

Lower bound for DISJ

Theorem

Randomized protocol for DISJ with correct probability at least 2/3 needs Ω(n) bits of

  • ne-way communication.

We prove this for a special case of DISJ, the problem of INDEX. So the lower bound is stronger. INDEX: Alice holds a string , Bob holds an index . INDEX .

Algorithms for Big Data (VIII) 8/14

slide-39
SLIDE 39

Lower bound for DISJ

Theorem

Randomized protocol for DISJ with correct probability at least 2/3 needs Ω(n) bits of

  • ne-way communication.

We prove this for a special case of DISJ, the problem of INDEX. So the lower bound is stronger. INDEX: Alice holds a string , Bob holds an index . INDEX .

Algorithms for Big Data (VIII) 8/14

slide-40
SLIDE 40

Lower bound for DISJ

Theorem

Randomized protocol for DISJ with correct probability at least 2/3 needs Ω(n) bits of

  • ne-way communication.

We prove this for a special case of DISJ, the problem of INDEX. So the lower bound is stronger. INDEX: Alice holds a string x ∈ {0, 1}n, Bob holds an index i ∈ [n]. INDEX(x, i) = xi.

Algorithms for Big Data (VIII) 8/14

slide-41
SLIDE 41

Yao’s principle

The main tool we will use to derive the lower bound is Yao’s principle.

Lemma

If there exists some distribution

  • ver

such that any deterministic

  • ne-way communication protocol

with Pr is wrong on costs at least bits, then any randomized one-way protocol with error at most

  • n any

input also costs at least bits one-way communication.

Algorithms for Big Data (VIII) 9/14

slide-42
SLIDE 42

Yao’s principle

The main tool we will use to derive the lower bound is Yao’s principle.

Lemma

If there exists some distribution

  • ver

such that any deterministic

  • ne-way communication protocol

with Pr is wrong on costs at least bits, then any randomized one-way protocol with error at most

  • n any

input also costs at least bits one-way communication.

Algorithms for Big Data (VIII) 9/14

slide-43
SLIDE 43

Yao’s principle

The main tool we will use to derive the lower bound is Yao’s principle.

Lemma

If there exists some distribution D over {0, 1}a × {0, 1}b such that any deterministic

  • ne-way communication protocol P with

Pr(x,y)∼D [P is wrong on (x, y)] ≤ ε costs at least k bits, then any randomized one-way protocol with error at most ε on any input also costs at least k bits one-way communication.

Algorithms for Big Data (VIII) 9/14

slide-44
SLIDE 44

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution

  • ver

so that for any protocol with costs , it outputs the correct answer with probability less than . We let be the uniform distribution over . Assume there exists a protocol that uses at most bits of one-way communication. Namely, Alice holds a function . On input , she sends to Bob. Upon receiving , Bob outputs some number . We collect the outputs (for all possible as a vector . The algorithm is correct if .

Algorithms for Big Data (VIII) 10/14

slide-45
SLIDE 45

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution D over {0, 1}n × [n] so that for any protocol with costs o(n), it outputs the correct answer with probability less than 7/8. We let be the uniform distribution over . Assume there exists a protocol that uses at most bits of one-way communication. Namely, Alice holds a function . On input , she sends to Bob. Upon receiving , Bob outputs some number . We collect the outputs (for all possible as a vector . The algorithm is correct if .

Algorithms for Big Data (VIII) 10/14

slide-46
SLIDE 46

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution D over {0, 1}n × [n] so that for any protocol with costs o(n), it outputs the correct answer with probability less than 7/8. We let D be the uniform distribution over {0, 1}n × [n]. Assume there exists a protocol that uses at most bits of one-way communication. Namely, Alice holds a function . On input , she sends to Bob. Upon receiving , Bob outputs some number . We collect the outputs (for all possible as a vector . The algorithm is correct if .

Algorithms for Big Data (VIII) 10/14

slide-47
SLIDE 47

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution D over {0, 1}n × [n] so that for any protocol with costs o(n), it outputs the correct answer with probability less than 7/8. We let D be the uniform distribution over {0, 1}n × [n]. Assume there exists a protocol P that uses at most 0.1n bits of one-way communication. Namely, Alice holds a function f : {0, 1}n → {0, 1}0.1n. On input x, she sends f(x) to Bob. Upon receiving , Bob outputs some number . We collect the outputs (for all possible as a vector . The algorithm is correct if .

Algorithms for Big Data (VIII) 10/14

slide-48
SLIDE 48

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution D over {0, 1}n × [n] so that for any protocol with costs o(n), it outputs the correct answer with probability less than 7/8. We let D be the uniform distribution over {0, 1}n × [n]. Assume there exists a protocol P that uses at most 0.1n bits of one-way communication. Namely, Alice holds a function f : {0, 1}n → {0, 1}0.1n. On input x, she sends f(x) to Bob. Upon receiving f(x), Bob outputs some number y(f(x))i. We collect the outputs (for all possible i ∈ [n] as a vector y(f(x)) ∈ {0, 1}n. The algorithm is correct if .

Algorithms for Big Data (VIII) 10/14

slide-49
SLIDE 49

Lower bound for INDEX

By Yao’s principle, we only need to construct a distribution D over {0, 1}n × [n] so that for any protocol with costs o(n), it outputs the correct answer with probability less than 7/8. We let D be the uniform distribution over {0, 1}n × [n]. Assume there exists a protocol P that uses at most 0.1n bits of one-way communication. Namely, Alice holds a function f : {0, 1}n → {0, 1}0.1n. On input x, she sends f(x) to Bob. Upon receiving f(x), Bob outputs some number y(f(x))i. We collect the outputs (for all possible i ∈ [n] as a vector y(f(x)) ∈ {0, 1}n. The algorithm is correct if xi = y(f(x))i.

Algorithms for Big Data (VIII) 10/14

slide-50
SLIDE 50

Therefore, we only need to upper bound Pr(x,i)∼D [xi = y(f(x))i] where both f : {0, 1}n → {0, 1}0.1n and y : {0, 1}0.1n → {0, 1}n are fixed! Since is uniform in , for any two strings , Pr

Algorithms for Big Data (VIII) 11/14

slide-51
SLIDE 51

Therefore, we only need to upper bound Pr(x,i)∼D [xi = y(f(x))i] where both f : {0, 1}n → {0, 1}0.1n and y : {0, 1}0.1n → {0, 1}n are fixed! Since i is uniform in [n], for any two strings x, y ∈ {0, 1}m, Pri∈[n] [xi ̸= yi] = dH(x, y) n .

Algorithms for Big Data (VIII) 11/14

slide-52
SLIDE 52

Let S = y(x({0, 1}n)) ⊆ {0, 1}n be a set of size at most {0, 1}0.1n. Since x is uniform in {0, 1}n, we only need to show: there are many x ∈ {0, 1}n satisfying dH(x, S) ≥ n/4. This is true since Ball .

Algorithms for Big Data (VIII) 12/14

slide-53
SLIDE 53

Let S = y(x({0, 1}n)) ⊆ {0, 1}n be a set of size at most {0, 1}0.1n. Since x is uniform in {0, 1}n, we only need to show: there are many x ∈ {0, 1}n satisfying dH(x, S) ≥ n/4. This is true since Ball ( S, n

4

) ≤ 20.1n · ∑ n

4

j=0

(n

j

) ≤ n20.95n.

Algorithms for Big Data (VIII) 12/14

slide-54
SLIDE 54

Lower bound for F∞

Our motivation for introducing communication model is to prove lower bound for streaming problems. For example, we can use the lower bound for DISJ to derive lower bound for estimating .

Theorem

Any randomized algorithm to estimate within err requires bits of memory.

Proof.

Treat x and y as streams and respectively.

Algorithms for Big Data (VIII) 13/14

slide-55
SLIDE 55

Lower bound for F∞

Our motivation for introducing communication model is to prove lower bound for streaming problems. For example, we can use the lower bound for DISJ to derive lower bound for estimating .

Theorem

Any randomized algorithm to estimate within err requires bits of memory.

Proof.

Treat x and y as streams and respectively.

Algorithms for Big Data (VIII) 13/14

slide-56
SLIDE 56

Lower bound for F∞

Our motivation for introducing communication model is to prove lower bound for streaming problems. For example, we can use the lower bound for DISJ to derive lower bound for estimating F∞.

Theorem

Any randomized algorithm to estimate within err requires bits of memory.

Proof.

Treat x and y as streams and respectively.

Algorithms for Big Data (VIII) 13/14

slide-57
SLIDE 57

Lower bound for F∞

Our motivation for introducing communication model is to prove lower bound for streaming problems. For example, we can use the lower bound for DISJ to derive lower bound for estimating F∞.

Theorem

Any randomized algorithm to estimate F∞ within err ε = 0.2 requires Ω(n) bits of memory.

Proof.

Treat x and y as streams and respectively.

Algorithms for Big Data (VIII) 13/14

slide-58
SLIDE 58

Lower bound for F∞

Our motivation for introducing communication model is to prove lower bound for streaming problems. For example, we can use the lower bound for DISJ to derive lower bound for estimating F∞.

Theorem

Any randomized algorithm to estimate F∞ within err ε = 0.2 requires Ω(n) bits of memory.

Proof.

Treat x and y as streams {i ∈ [n] | xi = 1} and {i ∈ [n] | yi = 1} respectively.

Algorithms for Big Data (VIII) 13/14

slide-59
SLIDE 59

A general paradiam

Previous proof provides a general paradiam for proving streaming lower bound based on communication lower bound: A streaming algorithm to compute using bits of memory implies a protocol to compute using at most bits of one-way communication. We will see more applications next time!

Algorithms for Big Data (VIII) 14/14

slide-60
SLIDE 60

A general paradiam

Previous proof provides a general paradiam for proving streaming lower bound based on communication lower bound: A streaming algorithm to compute using bits of memory implies a protocol to compute using at most bits of one-way communication. We will see more applications next time!

Algorithms for Big Data (VIII) 14/14

slide-61
SLIDE 61

A general paradiam

Previous proof provides a general paradiam for proving streaming lower bound based on communication lower bound: A streaming algorithm to compute f(x, y) using s bits of memory implies a protocol to compute f(x, y) using at most s bits of one-way communication. We will see more applications next time!

Algorithms for Big Data (VIII) 14/14

slide-62
SLIDE 62

A general paradiam

Previous proof provides a general paradiam for proving streaming lower bound based on communication lower bound: A streaming algorithm to compute f(x, y) using s bits of memory implies a protocol to compute f(x, y) using at most s bits of one-way communication. We will see more applications next time!

Algorithms for Big Data (VIII) 14/14