Algorithms for Big Data (I) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data i
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (I) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (I) Chihao Zhang Shanghai Jiao Tong University Sept. 20, 2019 Algorithms for Big Data (I) 1/19 Course Information Course Homepage: http://chihaozhang.com/teaching/BDA2019fall Time: Every Friday, 12:55 - 15:40 Ofgice


slide-1
SLIDE 1

Algorithms for Big Data (I)

Chihao Zhang

Shanghai Jiao Tong University

  • Sept. 20, 2019

Algorithms for Big Data (I) 1/19

slide-2
SLIDE 2

Course Information

▶ Instructor: Chihao Zhang Course Homepage: http://chihaozhang.com/teaching/BDA2019fall Time: Every Friday, 12:55 - 15:40 Ofgice hour: Every Monday, 18:00 - 20:00 Grading: Homework 60%, Final Exam 40% Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-3
SLIDE 3

Course Information

▶ Instructor: Chihao Zhang ▶ Course Homepage: http://chihaozhang.com/teaching/BDA2019fall Time: Every Friday, 12:55 - 15:40 Ofgice hour: Every Monday, 18:00 - 20:00 Grading: Homework 60%, Final Exam 40% Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-4
SLIDE 4

Course Information

▶ Instructor: Chihao Zhang ▶ Course Homepage: http://chihaozhang.com/teaching/BDA2019fall ▶ Time: Every Friday, 12:55 - 15:40 Ofgice hour: Every Monday, 18:00 - 20:00 Grading: Homework 60%, Final Exam 40% Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-5
SLIDE 5

Course Information

▶ Instructor: Chihao Zhang ▶ Course Homepage: http://chihaozhang.com/teaching/BDA2019fall ▶ Time: Every Friday, 12:55 - 15:40 ▶ Ofgice hour: Every Monday, 18:00 - 20:00 Grading: Homework 60%, Final Exam 40% Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-6
SLIDE 6

Course Information

▶ Instructor: Chihao Zhang ▶ Course Homepage: http://chihaozhang.com/teaching/BDA2019fall ▶ Time: Every Friday, 12:55 - 15:40 ▶ Ofgice hour: Every Monday, 18:00 - 20:00 ▶ Grading: Homework 60%, Final Exam 40% Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-7
SLIDE 7

Course Information

▶ Instructor: Chihao Zhang ▶ Course Homepage: http://chihaozhang.com/teaching/BDA2019fall ▶ Time: Every Friday, 12:55 - 15:40 ▶ Ofgice hour: Every Monday, 18:00 - 20:00 ▶ Grading: Homework 60%, Final Exam 40% ▶ Pre-request: Algorithms, Basic Probability Theory

Algorithms for Big Data (I) 2/19

slide-8
SLIDE 8

Algorithms

We learnt many efgicient algorithms before… Dijkstra algorithm, Floyd algorithm, Blossom algorithm… These algorithms costs polynomial-time. What if the input is too large to store? Throw some of them away! sublinear space algorithms.

Algorithms for Big Data (I) 3/19

slide-9
SLIDE 9

Algorithms

We learnt many efgicient algorithms before… Dijkstra algorithm, Floyd algorithm, Blossom algorithm… These algorithms costs polynomial-time. What if the input is too large to store? Throw some of them away! sublinear space algorithms.

Algorithms for Big Data (I) 3/19

slide-10
SLIDE 10

Algorithms

We learnt many efgicient algorithms before… ▶ Dijkstra algorithm, Floyd algorithm, Blossom algorithm… ▶ These algorithms costs polynomial-time. What if the input is too large to store? Throw some of them away! sublinear space algorithms.

Algorithms for Big Data (I) 3/19

slide-11
SLIDE 11

Algorithms

We learnt many efgicient algorithms before… ▶ Dijkstra algorithm, Floyd algorithm, Blossom algorithm… ▶ These algorithms costs polynomial-time. What if the input is too large to store? Throw some of them away! sublinear space algorithms.

Algorithms for Big Data (I) 3/19

slide-12
SLIDE 12

Algorithms

We learnt many efgicient algorithms before… ▶ Dijkstra algorithm, Floyd algorithm, Blossom algorithm… ▶ These algorithms costs polynomial-time. What if the input is too large to store? Throw some of them away! sublinear space algorithms.

Algorithms for Big Data (I) 3/19

slide-13
SLIDE 13

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-14
SLIDE 14

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-15
SLIDE 15

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-16
SLIDE 16

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

23

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-17
SLIDE 17

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

38

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-18
SLIDE 18

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

45

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-19
SLIDE 19

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

28

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-20
SLIDE 20

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

11

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-21
SLIDE 21

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

10

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-22
SLIDE 22

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

36

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-23
SLIDE 23

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

17

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-24
SLIDE 24

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-25
SLIDE 25

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

2

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-26
SLIDE 26

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

23

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-27
SLIDE 27

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

40

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-28
SLIDE 28

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

23

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-29
SLIDE 29

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

18

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-30
SLIDE 30

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

24

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-31
SLIDE 31

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

31

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-32
SLIDE 32

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

3

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-33
SLIDE 33

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

48

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-34
SLIDE 34

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

25

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-35
SLIDE 35

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

43

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-36
SLIDE 36

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

14

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-37
SLIDE 37

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

21

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-38
SLIDE 38

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

17

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-39
SLIDE 39

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

46

How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-40
SLIDE 40

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

46

▶ How many numbers? How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-41
SLIDE 41

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

46

▶ How many numbers? ▶ How many distinct numbers? What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-42
SLIDE 42

A programmer for routers

A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it.

46

▶ How many numbers? ▶ How many distinct numbers? ▶ What is the most frequent number?

Algorithms for Big Data (I) 4/19

slide-43
SLIDE 43

Streaming Model

The input is a sequence a a am where each ai n One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s

  • min m n

. We can ask How many numbers (what is m?) How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-44
SLIDE 44

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s

  • min m n

. We can ask How many numbers (what is m?) How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-45
SLIDE 45

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s

  • min m n

. We can ask How many numbers (what is m?) How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-46
SLIDE 46

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask How many numbers (what is m?) How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-47
SLIDE 47

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask ▶ How many numbers (what is m?) How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-48
SLIDE 48

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask ▶ How many numbers (what is m?) ▶ How many distinct numbers? What is the median of ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-49
SLIDE 49

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask ▶ How many numbers (what is m?) ▶ How many distinct numbers? ▶ What is the median of σ? What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-50
SLIDE 50

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask ▶ How many numbers (what is m?) ▶ How many distinct numbers? ▶ What is the median of σ? ▶ What is the most frequent number? …

Algorithms for Big Data (I) 5/19

slide-51
SLIDE 51

Streaming Model

The input is a sequence σ = ⟨a1, a2, . . . , am⟩ where each ai ∈ [n] One can process the input stream using at most s bits of memory We say the algorithm is sublinear if s = o(min {m, n}). We can ask ▶ How many numbers (what is m?) ▶ How many distinct numbers? ▶ What is the median of σ? ▶ What is the most frequent number? ▶ …

Algorithms for Big Data (I) 5/19

slide-52
SLIDE 52

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k k . How many bits of memory needed? log m. Can be improved to o log m ? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-53
SLIDE 53

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log m. Can be improved to o log m ? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-54
SLIDE 54

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log m. Can be improved to o log m ? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-55
SLIDE 55

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o log m ? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-56
SLIDE 56

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o(log m)? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-57
SLIDE 57

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o(log m)? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-58
SLIDE 58

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o(log m)? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-59
SLIDE 59

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o(log m)? Impossible (Why?) Possible if allow approximation: For every , compute a number m such that m m

Algorithms for Big Data (I) 6/19

slide-60
SLIDE 60

How many numbers?

We can maintain a counter k. Whenever one reads a number ai, let k = k + 1. How many bits of memory needed? log2 m. Can be improved to o(log m)? Impossible (Why?) Possible if allow approximation: For every ε > 0, compute a number m such that 1 − ε ≤ m m ≤ 1 + ε.

Algorithms for Big Data (I) 6/19

slide-61
SLIDE 61

Morris’ algorithm

Algorithm Morris’ Algorithm for Counting Elements Init: A variable X . On Input y: increase X with probability

X.

Output: Output m

X

. This is a randomized algorithm. Therefore we look at the expectation of its output.

Algorithms for Big Data (I) 7/19

slide-62
SLIDE 62

Morris’ algorithm

Algorithm Morris’ Algorithm for Counting Elements Init: A variable X ← 0. On Input y: increase X with probability 2−X. Output: Output m = 2X − 1. ▶ This is a randomized algorithm. Therefore we look at the expectation of its output.

Algorithms for Big Data (I) 7/19

slide-63
SLIDE 63

Morris’ algorithm

Algorithm Morris’ Algorithm for Counting Elements Init: A variable X ← 0. On Input y: increase X with probability 2−X. Output: Output m = 2X − 1. ▶ This is a randomized algorithm. ▶ Therefore we look at the expectation of its output.

Algorithms for Big Data (I) 7/19

slide-64
SLIDE 64

Analysis

The output m is a random variable, we prove that its expectation E m m by induction

  • n m.

Since X when m , we have E m . Assume it is true for smaller m, let Xi denote the value of X afuer processing ith input.

Algorithms for Big Data (I) 8/19

slide-65
SLIDE 65

Analysis

The output m is a random variable, we prove that its expectation E [ m] = m by induction

  • n m.

Since X when m , we have E m . Assume it is true for smaller m, let Xi denote the value of X afuer processing ith input.

Algorithms for Big Data (I) 8/19

slide-66
SLIDE 66

Analysis

The output m is a random variable, we prove that its expectation E [ m] = m by induction

  • n m.

Since X = 1 when m = 1, we have E [ m] = 1. Assume it is true for smaller m, let Xi denote the value of X afuer processing ith input.

Algorithms for Big Data (I) 8/19

slide-67
SLIDE 67

Analysis

The output m is a random variable, we prove that its expectation E [ m] = m by induction

  • n m.

Since X = 1 when m = 1, we have E [ m] = 1. Assume it is true for smaller m, let Xi denote the value of X afuer processing ith input.

Algorithms for Big Data (I) 8/19

slide-68
SLIDE 68

Analysis (cont’d)

E m E

Xm m i

Pr Xm i

i m i

Pr Xm i

i

Pr Xm i

i i m i

Pr Xm i

i

E

Xm

m (induction hypothesis)

Algorithms for Big Data (I) 9/19

slide-69
SLIDE 69

Analysis (cont’d)

E [ m] = E [ 2Xm] − 1 =

m

i=0

Pr [Xm = i] · 2i − 1 =

m

i=0

( Pr [Xm−1 = i] (1 − 2−i) + Pr [Xm−1 = i − 1] · 21−i) · 2i − 1 =

m−1

i=0

Pr [Xm−1 = i] ( 2i + 1 ) − 1 = E [ 2Xm−1] = m (induction hypothesis)

Algorithms for Big Data (I) 9/19

slide-70
SLIDE 70

It is now clear that Morris’ algorithm is an unbiased estimator for m. It uses approximately O log log m bits of memory. However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. That is, we want to establish concentration inequality of the form Pr m m for . For fixed , the smaller is, the betuer the algorithm will be.

Algorithms for Big Data (I) 10/19

slide-71
SLIDE 71

It is now clear that Morris’ algorithm is an unbiased estimator for m. It uses approximately O(log log m) bits of memory. However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. That is, we want to establish concentration inequality of the form Pr m m for . For fixed , the smaller is, the betuer the algorithm will be.

Algorithms for Big Data (I) 10/19

slide-72
SLIDE 72

It is now clear that Morris’ algorithm is an unbiased estimator for m. It uses approximately O(log log m) bits of memory. However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. That is, we want to establish concentration inequality of the form Pr m m for . For fixed , the smaller is, the betuer the algorithm will be.

Algorithms for Big Data (I) 10/19

slide-73
SLIDE 73

It is now clear that Morris’ algorithm is an unbiased estimator for m. It uses approximately O(log log m) bits of memory. However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. That is, we want to establish concentration inequality of the form Pr [| m − m| > ε] ≤ δ, for ε,δ > 0. For fixed , the smaller is, the betuer the algorithm will be.

Algorithms for Big Data (I) 10/19

slide-74
SLIDE 74

It is now clear that Morris’ algorithm is an unbiased estimator for m. It uses approximately O(log log m) bits of memory. However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. That is, we want to establish concentration inequality of the form Pr [| m − m| > ε] ≤ δ, for ε,δ > 0. For fixed ε, the smaller δ is, the betuer the algorithm will be.

Algorithms for Big Data (I) 10/19

slide-75
SLIDE 75

Concentration

We need some probabilistic tools to establish the concentration inequality.

Markov’s inequality

For every nonnegative random variable X and every a , it holds that Pr X a E X a

Chebyshev’s inequality

For every random variable X and every a , it holds that Pr X E X a Var X a

Algorithms for Big Data (I) 11/19

slide-76
SLIDE 76

Concentration

We need some probabilistic tools to establish the concentration inequality.

Markov’s inequality

For every nonnegative random variable X and every a , it holds that Pr X a E X a

Chebyshev’s inequality

For every random variable X and every a , it holds that Pr X E X a Var X a

Algorithms for Big Data (I) 11/19

slide-77
SLIDE 77

Concentration

We need some probabilistic tools to establish the concentration inequality.

Markov’s inequality

For every nonnegative random variable X and every a ≥ 0, it holds that Pr [X ≥ a] ≤ E [X] a .

Chebyshev’s inequality

For every random variable X and every a , it holds that Pr X E X a Var X a

Algorithms for Big Data (I) 11/19

slide-78
SLIDE 78

Concentration

We need some probabilistic tools to establish the concentration inequality.

Markov’s inequality

For every nonnegative random variable X and every a ≥ 0, it holds that Pr [X ≥ a] ≤ E [X] a .

Chebyshev’s inequality

For every random variable X and every a ≥ 0, it holds that Pr [|X − E [X]| ≥ a] ≤ Var [X] a2 .

Algorithms for Big Data (I) 11/19

slide-79
SLIDE 79

Concentration (cont’d)

In order to apply Chebyshev’s inequality, we have to compute the variance of m.

Lemma

E

Xm

m m We can prove the claim using an induction argument similar to our proof for the expectation. Therefore, Var m E m E m E

Xm

m m

Algorithms for Big Data (I) 12/19

slide-80
SLIDE 80

Concentration (cont’d)

In order to apply Chebyshev’s inequality, we have to compute the variance of m.

Lemma

E [( 2Xm )2] = 3 2m2 + 3 2m + 1. We can prove the claim using an induction argument similar to our proof for the expectation. Therefore, Var m E m E m E

Xm

m m

Algorithms for Big Data (I) 12/19

slide-81
SLIDE 81

Concentration (cont’d)

In order to apply Chebyshev’s inequality, we have to compute the variance of m.

Lemma

E [( 2Xm )2] = 3 2m2 + 3 2m + 1. We can prove the claim using an induction argument similar to our proof for the expectation. Therefore, Var m E m E m E

Xm

m m

Algorithms for Big Data (I) 12/19

slide-82
SLIDE 82

Concentration (cont’d)

In order to apply Chebyshev’s inequality, we have to compute the variance of m.

Lemma

E [( 2Xm )2] = 3 2m2 + 3 2m + 1. We can prove the claim using an induction argument similar to our proof for the expectation. Therefore, Var [ m] = E [

  • m2]

− E [ m]2 = E [( 2Xm − 1 )2] − m2 ≤ m2 2

Algorithms for Big Data (I) 12/19

slide-83
SLIDE 83

Applying Chebyshev’s inequality, we obtain for every ε > 0, Pr [| m − m| ≥ εm] ≤ 1 2ε2 . Can we improve the concentration? Two common tricks work here.

Algorithms for Big Data (I) 13/19

slide-84
SLIDE 84

Applying Chebyshev’s inequality, we obtain for every ε > 0, Pr [| m − m| ≥ εm] ≤ 1 2ε2 . Can we improve the concentration? Two common tricks work here.

Algorithms for Big Data (I) 13/19

slide-85
SLIDE 85

Applying Chebyshev’s inequality, we obtain for every ε > 0, Pr [| m − m| ≥ εm] ≤ 1 2ε2 . Can we improve the concentration? Two common tricks work here.

Algorithms for Big Data (I) 13/19

slide-86
SLIDE 86

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies Var a X a Var X ; Var X Y Var X Var Y for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be m mt. The final output is m

t i

mi t

. Apply Chebyshev’s inequality to m : Pr m m m t

Algorithms for Big Data (I) 14/19

slide-87
SLIDE 87

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies Var a X a Var X ; Var X Y Var X Var Y for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be m mt. The final output is m

t i

mi t

. Apply Chebyshev’s inequality to m : Pr m m m t

Algorithms for Big Data (I) 14/19

slide-88
SLIDE 88

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies ▶ Var [a · X] = a2 · Var [X]; ▶ Var [X + Y] = Var [X] + Var [Y] for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be m mt. The final output is m

t i

mi t

. Apply Chebyshev’s inequality to m : Pr m m m t

Algorithms for Big Data (I) 14/19

slide-89
SLIDE 89

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies ▶ Var [a · X] = a2 · Var [X]; ▶ Var [X + Y] = Var [X] + Var [Y] for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be

  • m1, . . . ,

mt. The final output is m

t i

mi t

. Apply Chebyshev’s inequality to m : Pr m m m t

Algorithms for Big Data (I) 14/19

slide-90
SLIDE 90

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies ▶ Var [a · X] = a2 · Var [X]; ▶ Var [X + Y] = Var [X] + Var [Y] for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be

  • m1, . . . ,

mt. The final output is m∗ :=

∑t

i=1

mi t

. Apply Chebyshev’s inequality to m : Pr m m m t

Algorithms for Big Data (I) 14/19

slide-91
SLIDE 91

Averaging trick

The Chebyshev’s inequality tells us that we can improve the concentration by reducing the variance. Note that variance satisfies ▶ Var [a · X] = a2 · Var [X]; ▶ Var [X + Y] = Var [X] + Var [Y] for independent X and Y. We can independently run Morris algorithm t time in parallel, and let the outputs be

  • m1, . . . ,

mt. The final output is m∗ :=

∑t

i=1

mi t

. Apply Chebyshev’s inequality to m∗: Pr [| m∗ − m| ≥ εm] ≤ 1 t · 2ε2 .

Algorithms for Big Data (I) 14/19

slide-92
SLIDE 92

For t ≥

1 2ε2δ , we have

Pr [| m∗ − m| ≥ εm] ≤ δ. Our algorithm uses O

log log n

bits of memory. A trade-ofg between the quality of the randomized algorithm and the consumption of memory space.

Algorithms for Big Data (I) 15/19

slide-93
SLIDE 93

For t ≥

1 2ε2δ , we have

Pr [| m∗ − m| ≥ εm] ≤ δ. Our algorithm uses O (

log log n ε2δ

) bits of memory. A trade-ofg between the quality of the randomized algorithm and the consumption of memory space.

Algorithms for Big Data (I) 15/19

slide-94
SLIDE 94

For t ≥

1 2ε2δ , we have

Pr [| m∗ − m| ≥ εm] ≤ δ. Our algorithm uses O (

log log n ε2δ

) bits of memory. A trade-ofg between the quality of the randomized algorithm and the consumption of memory space.

Algorithms for Big Data (I) 15/19

slide-95
SLIDE 95

The Median trick

We choose t in the previous algorithm. Independently run the algorithm s times in parallel, and let the outputs be m m ms. It holds that for every i s, Pr mi m m Output the median of m ms m .

Algorithms for Big Data (I) 16/19

slide-96
SLIDE 96

The Median trick

We choose t =

3 2ε2 in the previous algorithm.

Independently run the algorithm s times in parallel, and let the outputs be m m ms. It holds that for every i s, Pr mi m m Output the median of m ms m .

Algorithms for Big Data (I) 16/19

slide-97
SLIDE 97

The Median trick

We choose t =

3 2ε2 in the previous algorithm.

Independently run the algorithm s times in parallel, and let the outputs be

  • m∗

1,

m∗

2, . . . ,

m∗

s.

It holds that for every i s, Pr mi m m Output the median of m ms m .

Algorithms for Big Data (I) 16/19

slide-98
SLIDE 98

The Median trick

We choose t =

3 2ε2 in the previous algorithm.

Independently run the algorithm s times in parallel, and let the outputs be

  • m∗

1,

m∗

2, . . . ,

m∗

s.

It holds that for every i = 1, . . . , s, Pr [

  • m∗

i − m

  • ≥ εm

] ≤ 1 3. Output the median of m∗

1, . . . ,

m∗

s

(=: m∗∗).

Algorithms for Big Data (I) 16/19

slide-99
SLIDE 99

Chernoff bound

Chernofg bound

Let X Xn be independent random variables with Xi for every i n. Let X

n i

  • Xi. Then for every

, it holds that Pr X E X E X exp E X

Algorithms for Big Data (I) 17/19

slide-100
SLIDE 100

Chernoff bound

Chernofg bound

Let X1, . . . , Xn be independent random variables with Xi ∈ [0, 1] for every i = 1, . . . , n. Let X = ∑n

i=1 Xi. Then for every 0 < ε < 1, it holds that

Pr [|X − E [X]| > ε · E [X]] ≤ 2 exp ( −ε2E [X] 3 ) .

Algorithms for Big Data (I) 17/19

slide-101
SLIDE 101

Analysis of the median trick

For every i s, we let Yi be the indicator of the (good) event mi m m Then Y

s i

Yi satisfies E Y s. If the median m is bad (namely m m m), then at least half of mi ’s are bad. Equivalently, Y s. By Chernofg bound, Pr Y E Y s exp s

Algorithms for Big Data (I) 18/19

slide-102
SLIDE 102

Analysis of the median trick

For every i = 1, . . . , s, we let Yi be the indicator of the (good) event

  • m∗

i − m

  • < ε · m.

Then Y

s i

Yi satisfies E Y s. If the median m is bad (namely m m m), then at least half of mi ’s are bad. Equivalently, Y s. By Chernofg bound, Pr Y E Y s exp s

Algorithms for Big Data (I) 18/19

slide-103
SLIDE 103

Analysis of the median trick

For every i = 1, . . . , s, we let Yi be the indicator of the (good) event

  • m∗

i − m

  • < ε · m.

Then Y := ∑s

i=1 Yi satisfies E [Y] ≥ 2 3s.

If the median m is bad (namely m m m), then at least half of mi ’s are bad. Equivalently, Y s. By Chernofg bound, Pr Y E Y s exp s

Algorithms for Big Data (I) 18/19

slide-104
SLIDE 104

Analysis of the median trick

For every i = 1, . . . , s, we let Yi be the indicator of the (good) event

  • m∗

i − m

  • < ε · m.

Then Y := ∑s

i=1 Yi satisfies E [Y] ≥ 2 3s.

If the median m∗∗ is bad (namely | m∗∗ − m| ≥ ε · m), then at least half of m∗

i ’s are bad.

Equivalently, Y s. By Chernofg bound, Pr Y E Y s exp s

Algorithms for Big Data (I) 18/19

slide-105
SLIDE 105

Analysis of the median trick

For every i = 1, . . . , s, we let Yi be the indicator of the (good) event

  • m∗

i − m

  • < ε · m.

Then Y := ∑s

i=1 Yi satisfies E [Y] ≥ 2 3s.

If the median m∗∗ is bad (namely | m∗∗ − m| ≥ ε · m), then at least half of m∗

i ’s are bad.

Equivalently, Y ≤ 1

2s.

By Chernofg bound, Pr Y E Y s exp s

Algorithms for Big Data (I) 18/19

slide-106
SLIDE 106

Analysis of the median trick

For every i = 1, . . . , s, we let Yi be the indicator of the (good) event

  • m∗

i − m

  • < ε · m.

Then Y := ∑s

i=1 Yi satisfies E [Y] ≥ 2 3s.

If the median m∗∗ is bad (namely | m∗∗ − m| ≥ ε · m), then at least half of m∗

i ’s are bad.

Equivalently, Y ≤ 1

2s.

By Chernofg bound, Pr [ |Y − E [Y]| ≥ 1 6s ] ≤ 2 exp ( − s 108 ) .

Algorithms for Big Data (I) 18/19

slide-107
SLIDE 107

Therefore, for t = O (

1 ε2

) and s = O (log 1

δ

), we have Pr [| m∗∗ − m| ≥ εm] < δ. We use O log log log n bits of memory.

Algorithms for Big Data (I) 19/19

slide-108
SLIDE 108

Therefore, for t = O (

1 ε2

) and s = O (log 1

δ

), we have Pr [| m∗∗ − m| ≥ εm] < δ. We use O (

1 ε2 · log 1 δ · log log n

) bits of memory.

Algorithms for Big Data (I) 19/19