algorithms for big data i
play

Algorithms for Big Data (I) Chihao Zhang Shanghai Jiao Tong - PowerPoint PPT Presentation

Algorithms for Big Data (I) Chihao Zhang Shanghai Jiao Tong University Sept. 20, 2019 Algorithms for Big Data (I) 1/19 Course Information Course Homepage: http://chihaozhang.com/teaching/BDA2019fall Time: Every Friday, 12:55 - 15:40 Ofgice


  1. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 43 How many numbers? How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19

  2. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 14 How many numbers? How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19

  3. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 21 How many numbers? How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19

  4. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 17 How many numbers? How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19

  5. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 46 How many numbers? How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19

  6. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 46 How many distinct numbers? What is the most frequent number? Algorithms for Big Data (I) 4/19 ▶ How many numbers?

  7. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 46 What is the most frequent number? Algorithms for Big Data (I) 4/19 ▶ How many numbers? ▶ How many distinct numbers?

  8. A programmer for routers A router has limited memory, but needs to process large data… The router can monitor the id of devices connecting to it. 46 Algorithms for Big Data (I) 4/19 ▶ How many numbers? ▶ How many distinct numbers? ▶ What is the most frequent number?

  9. a m where each a i We say the algorithm is sublinear if s Streaming Model How many numbers (what is m ?) Algorithms for Big Data (I) … What is the most frequent number? ? What is the median of How many distinct numbers? . We can ask The input is a sequence o min m n One can process the input stream using at most s bits of memory n a a 5/19

  10. We say the algorithm is sublinear if s Streaming Model One can process the input stream using at most s bits of memory o min m n . We can ask How many numbers (what is m ?) How many distinct numbers? What is the median of ? What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ]

  11. We say the algorithm is sublinear if s Streaming Model One can process the input stream using at most s bits of memory o min m n . We can ask How many numbers (what is m ?) How many distinct numbers? What is the median of ? What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ]

  12. Streaming Model One can process the input stream using at most s bits of memory We can ask How many numbers (what is m ?) How many distinct numbers? What is the median of ? What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) .

  13. Streaming Model One can process the input stream using at most s bits of memory We can ask How many distinct numbers? What is the median of ? What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) . ▶ How many numbers (what is m ?)

  14. Streaming Model One can process the input stream using at most s bits of memory We can ask What is the median of ? What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) . ▶ How many numbers (what is m ?) ▶ How many distinct numbers?

  15. Streaming Model One can process the input stream using at most s bits of memory We can ask What is the most frequent number? … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) . ▶ How many numbers (what is m ?) ▶ How many distinct numbers? ▶ What is the median of σ ?

  16. Streaming Model One can process the input stream using at most s bits of memory We can ask … Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) . ▶ How many numbers (what is m ?) ▶ How many distinct numbers? ▶ What is the median of σ ? ▶ What is the most frequent number?

  17. Streaming Model One can process the input stream using at most s bits of memory We can ask Algorithms for Big Data (I) 5/19 The input is a sequence σ = ⟨ a 1 , a 2 , . . . , a m ⟩ where each a i ∈ [ n ] We say the algorithm is sublinear if s = o ( min { m , n } ) . ▶ How many numbers (what is m ?) ▶ How many distinct numbers? ▶ What is the median of σ ? ▶ What is the most frequent number? ▶ …

  18. How many bits of memory needed? log m . Possible if allow approximation: How many numbers? We can maintain a counter k . Whenever one reads a number a i , let k k . Can be improved to o log m ? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19

  19. How many bits of memory needed? log m . Possible if allow approximation: How many numbers? Can be improved to o log m ? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 .

  20. Possible if allow approximation: How many numbers? How many bits of memory needed? log m . Can be improved to o log m ? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 .

  21. Possible if allow approximation: How many numbers? Can be improved to o log m ? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m .

  22. Possible if allow approximation: How many numbers? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m . Can be improved to o ( log m ) ?

  23. Possible if allow approximation: How many numbers? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m . Can be improved to o ( log m ) ?

  24. Possible if allow approximation: How many numbers? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m . Can be improved to o ( log m ) ?

  25. How many numbers? Impossible (Why?) For every , compute a number m such that m m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m . Can be improved to o ( log m ) ? Possible if allow approximation:

  26. How many numbers? Impossible (Why?) m such that m Algorithms for Big Data (I) 6/19 We can maintain a counter k . Whenever one reads a number a i , let k = k + 1 . How many bits of memory needed? log 2 m . Can be improved to o ( log m ) ? Possible if allow approximation: For every ε > 0 , compute a number � 1 − ε ≤ � m ≤ 1 + ε .

  27. Morris’ algorithm Algorithm Morris’ Algorithm for Counting Elements Init: A variable X . On Input y : increase X with probability X . Output: Output m X . This is a randomized algorithm. Therefore we look at the expectation of its output. Algorithms for Big Data (I) 7/19

  28. Morris’ algorithm Algorithm Morris’ Algorithm for Counting Elements Init: On Input y : Output: Therefore we look at the expectation of its output. Algorithms for Big Data (I) 7/19 A variable X ← 0 . increase X with probability 2 − X . m = 2 X − 1 . Output � ▶ This is a randomized algorithm.

  29. Morris’ algorithm Algorithm Morris’ Algorithm for Counting Elements Init: On Input y : Output: Algorithms for Big Data (I) 7/19 A variable X ← 0 . increase X with probability 2 − X . m = 2 X − 1 . Output � ▶ This is a randomized algorithm. ▶ Therefore we look at the expectation of its output.

  30. Assume it is true for smaller m , let X i denote the value of X afuer processing i th input. Analysis The output m is a random variable, we prove that its expectation E m m by induction on m . Since X when m , we have E m . Algorithms for Big Data (I) 8/19

  31. Assume it is true for smaller m , let X i denote the value of X afuer processing i th input. Analysis on m . Since X when m , we have E m . Algorithms for Big Data (I) 8/19 The output � m is a random variable, we prove that its expectation E [ � m ] = m by induction

  32. Assume it is true for smaller m , let X i denote the value of X afuer processing i th input. Analysis on m . Algorithms for Big Data (I) 8/19 The output � m is a random variable, we prove that its expectation E [ � m ] = m by induction Since X = 1 when m = 1 , we have E [ � m ] = 1 .

  33. Analysis on m . Algorithms for Big Data (I) 8/19 The output � m is a random variable, we prove that its expectation E [ � m ] = m by induction Since X = 1 when m = 1 , we have E [ � m ] = 1 . Assume it is true for smaller m , let X i denote the value of X afuer processing i th input.

  34. Analysis (cont’d) i Algorithms for Big Data (I) (induction hypothesis) m X m E i i Pr X m i m i i Pr X m E m i i Pr X m i m i i Pr X m i m X m E 9/19

  35. Analysis (cont’d) m Algorithms for Big Data (I) (induction hypothesis) m 9/19 [ 2 X m ] E [ � m ] = E − 1 ∑ Pr [ X m = i ] · 2 i − 1 = i =0 ( Pr [ X m − 1 = i ] (1 − 2 − i ) + Pr [ X m − 1 = i − 1] · 2 1 − i ) ∑ · 2 i − 1 = i =0 ( ) ∑ m − 1 2 i + 1 = Pr [ X m − 1 = i ] − 1 i =0 [ 2 X m − 1 ] = E = m

  36. It uses approximately O log log m bits of memory. That is, we want to establish concentration inequality of the form It is now clear that Morris’ algorithm is an unbiased estimator for m . However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. Pr m m for . For fixed , the smaller is, the betuer the algorithm will be. Algorithms for Big Data (I) 10/19

  37. That is, we want to establish concentration inequality of the form It is now clear that Morris’ algorithm is an unbiased estimator for m . However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. Pr m m for . For fixed , the smaller is, the betuer the algorithm will be. Algorithms for Big Data (I) 10/19 It uses approximately O ( log log m ) bits of memory.

  38. That is, we want to establish concentration inequality of the form It is now clear that Morris’ algorithm is an unbiased estimator for m . However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. Pr m m for . For fixed , the smaller is, the betuer the algorithm will be. Algorithms for Big Data (I) 10/19 It uses approximately O ( log log m ) bits of memory.

  39. It is now clear that Morris’ algorithm is an unbiased estimator for m . However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. For fixed , the smaller is, the betuer the algorithm will be. Algorithms for Big Data (I) 10/19 It uses approximately O ( log log m ) bits of memory. That is, we want to establish concentration inequality of the form Pr [ | � m − m | > ε ] ≤ δ , for ε , δ > 0 .

  40. It is now clear that Morris’ algorithm is an unbiased estimator for m . However, for a practical randomized algorithm, we further require its output to concentrate on the expectation. Algorithms for Big Data (I) 10/19 It uses approximately O ( log log m ) bits of memory. That is, we want to establish concentration inequality of the form Pr [ | � m − m | > ε ] ≤ δ , for ε , δ > 0 . For fixed ε , the smaller δ is, the betuer the algorithm will be.

  41. Concentration For every random variable X and every a Algorithms for Big Data (I) a Var X a E X Pr X , it holds that Chebyshev’s inequality We need some probabilistic tools to establish the concentration inequality. a E X a Pr X , it holds that For every nonnegative random variable X and every a Markov’s inequality 11/19

  42. Concentration For every random variable X and every a Algorithms for Big Data (I) a Var X a E X Pr X , it holds that Chebyshev’s inequality We need some probabilistic tools to establish the concentration inequality. a E X a Pr X , it holds that For every nonnegative random variable X and every a Markov’s inequality 11/19

  43. Concentration , it holds that Algorithms for Big Data (I) a Var X a E X Pr X For every random variable X and every a We need some probabilistic tools to establish the concentration inequality. Chebyshev’s inequality a Markov’s inequality 11/19 For every nonnegative random variable X and every a ≥ 0 , it holds that Pr [ X ≥ a ] ≤ E [ X ] .

  44. Concentration We need some probabilistic tools to establish the concentration inequality. Markov’s inequality a Chebyshev’s inequality Algorithms for Big Data (I) 11/19 For every nonnegative random variable X and every a ≥ 0 , it holds that Pr [ X ≥ a ] ≤ E [ X ] . For every random variable X and every a ≥ 0 , it holds that Pr [ | X − E [ X ] | ≥ a ] ≤ Var [ X ] . a 2

  45. Concentration (cont’d) Var m Algorithms for Big Data (I) m m X m E E m E m Therefore, In order to apply Chebyshev’s inequality, we have to compute the variance of m . expectation. We can prove the claim using an induction argument similar to our proof for the m m X m E Lemma 12/19

  46. Concentration (cont’d) expectation. Algorithms for Big Data (I) m m X m E E m E m Var m Therefore, We can prove the claim using an induction argument similar to our proof for the Lemma E m . 12/19 In order to apply Chebyshev’s inequality, we have to compute the variance of � [( ) 2 ] = 3 2 m 2 + 3 2 X m 2 m + 1 .

  47. Concentration (cont’d) expectation. Algorithms for Big Data (I) m m X m E E m E m Var m Therefore, We can prove the claim using an induction argument similar to our proof for the Lemma E m . 12/19 In order to apply Chebyshev’s inequality, we have to compute the variance of � [( ) 2 ] = 3 2 m 2 + 3 2 X m 2 m + 1 .

  48. Concentration (cont’d) We can prove the claim using an induction argument similar to our proof for the Algorithms for Big Data (I) Therefore, expectation. 12/19 E m . Lemma In order to apply Chebyshev’s inequality, we have to compute the variance of � [( ) 2 ] = 3 2 m 2 + 3 2 X m 2 m + 1 . [( ) 2 ] [ m 2 ] − m 2 ≤ m 2 m ] 2 = E 2 X m − 1 Var [ � m ] = E � − E [ � 2

  49. Two common tricks work here. Can we improve the concentration? Algorithms for Big Data (I) 13/19 Applying Chebyshev’s inequality, we obtain for every ε > 0 , 1 Pr [ | � m − m | ≥ ε m ] ≤ 2 ε 2 .

  50. Two common tricks work here. Can we improve the concentration? Algorithms for Big Data (I) 13/19 Applying Chebyshev’s inequality, we obtain for every ε > 0 , 1 Pr [ | � m − m | ≥ ε m ] ≤ 2 ε 2 .

  51. Two common tricks work here. Can we improve the concentration? Algorithms for Big Data (I) 13/19 Applying Chebyshev’s inequality, we obtain for every ε > 0 , 1 Pr [ | � m − m | ≥ ε m ] ≤ 2 ε 2 .

  52. Averaging trick The final output is m Algorithms for Big Data (I) t m m Pr m Apply Chebyshev’s inequality to m : . t m i i t m t . The Chebyshev’s inequality tells us that we can improve the concentration by reducing m We can independently run Morris algorithm t time in parallel, and let the outputs be Var Y for independent X and Y . Var X Y Var X Var X ; a Var a X Note that variance satisfies the variance. 14/19

  53. Averaging trick The final output is m Algorithms for Big Data (I) t m m Pr m Apply Chebyshev’s inequality to m : . t m i i t m t . The Chebyshev’s inequality tells us that we can improve the concentration by reducing m We can independently run Morris algorithm t time in parallel, and let the outputs be Var Y for independent X and Y . Var X Y Var X Var X ; a Var a X Note that variance satisfies the variance. 14/19

  54. Averaging trick m i Algorithms for Big Data (I) t m m Pr m Apply Chebyshev’s inequality to m : . t i The Chebyshev’s inequality tells us that we can improve the concentration by reducing t The final output is m m t . m We can independently run Morris algorithm t time in parallel, and let the outputs be Note that variance satisfies the variance. 14/19 ▶ Var [ a · X ] = a 2 · Var [ X ] ; ▶ Var [ X + Y ] = Var [ X ] + Var [ Y ] for independent X and Y .

  55. Averaging trick i Algorithms for Big Data (I) t m m Pr m Apply Chebyshev’s inequality to m : . t m i t The Chebyshev’s inequality tells us that we can improve the concentration by reducing The final output is m m t . We can independently run Morris algorithm t time in parallel, and let the outputs be Note that variance satisfies the variance. 14/19 ▶ Var [ a · X ] = a 2 · Var [ X ] ; ▶ Var [ X + Y ] = Var [ X ] + Var [ Y ] for independent X and Y . � m 1 , . . . , �

  56. Averaging trick The Chebyshev’s inequality tells us that we can improve the concentration by reducing Algorithms for Big Data (I) t m m Pr m Apply Chebyshev’s inequality to m : . t m i 14/19 We can independently run Morris algorithm t time in parallel, and let the outputs be m t . Note that variance satisfies the variance. ▶ Var [ a · X ] = a 2 · Var [ X ] ; ▶ Var [ X + Y ] = Var [ X ] + Var [ Y ] for independent X and Y . � m 1 , . . . , � ∑ t m ∗ := i =1 � The final output is �

  57. Averaging trick m t . Algorithms for Big Data (I) . t m i The Chebyshev’s inequality tells us that we can improve the concentration by reducing 14/19 Note that variance satisfies the variance. We can independently run Morris algorithm t time in parallel, and let the outputs be ▶ Var [ a · X ] = a 2 · Var [ X ] ; ▶ Var [ X + Y ] = Var [ X ] + Var [ Y ] for independent X and Y . � m 1 , . . . , � ∑ t m ∗ := i =1 � The final output is � m ∗ : Apply Chebyshev’s inequality to � 1 m ∗ − m | ≥ ε m ] ≤ Pr [ | � t · 2 ε 2 .

  58. A trade-ofg between the quality of the randomized algorithm and the consumption of log log n Our algorithm uses O bits of memory. memory space. Algorithms for Big Data (I) 15/19 1 For t ≥ 2 ε 2 δ , we have m ∗ − m | ≥ ε m ] ≤ δ . Pr [ | �

  59. A trade-ofg between the quality of the randomized algorithm and the consumption of Our algorithm uses O log log n bits of memory. memory space. Algorithms for Big Data (I) 15/19 1 For t ≥ 2 ε 2 δ , we have m ∗ − m | ≥ ε m ] ≤ δ . Pr [ | � ( ) ε 2 δ

  60. Our algorithm uses O log log n bits of memory. memory space. Algorithms for Big Data (I) 15/19 1 For t ≥ 2 ε 2 δ , we have m ∗ − m | ≥ ε m ] ≤ δ . Pr [ | � ( ) ε 2 δ A trade-ofg between the quality of the randomized algorithm and the consumption of

  61. The Median trick m i Algorithms for Big Data (I) . m m s Output the median of m m m Pr We choose t s , It holds that for every i m s . m m Independently run the algorithm s times in parallel, and let the outputs be in the previous algorithm. 16/19

  62. The Median trick m i Algorithms for Big Data (I) . m m s Output the median of m m m Pr s , It holds that for every i m s . m m Independently run the algorithm s times in parallel, and let the outputs be 16/19 3 We choose t = 2 ε 2 in the previous algorithm.

  63. The Median trick s , Algorithms for Big Data (I) . m m s Output the median of m m m m i Pr It holds that for every i Independently run the algorithm s times in parallel, and let the outputs be s . 16/19 3 We choose t = 2 ε 2 in the previous algorithm. m ∗ m ∗ m ∗ � 1 , � 2 , . . . , �

  64. The Median trick s . Algorithms for Big Data (I) s Pr 16/19 Independently run the algorithm s times in parallel, and let the outputs be 3 We choose t = 2 ε 2 in the previous algorithm. m ∗ m ∗ m ∗ � 1 , � 2 , . . . , � It holds that for every i = 1 , . . . , s , [� � ] ≤ 1 �� m ∗ � ≥ ε m i − m 3 . m ∗ m ∗ m ∗∗ ) . Output the median of � 1 , . . . , � (=: �

  65. X n be independent random variables with X i Chernoff bound , it holds that Algorithms for Big Data (I) E X exp E X E X Pr X X i . Then for every Chernofg bound i n Let X n . for every i Let X 17/19

  66. Chernoff bound Chernofg bound Algorithms for Big Data (I) 17/19 Let X 1 , . . . , X n be independent random variables with X i ∈ [0 , 1] for every i = 1 , . . . , n . Let X = ∑ n i =1 X i . Then for every 0 < ε < 1 , it holds that ( ) − ε 2 E [ X ] Pr [ | X − E [ X ] | > ε · E [ X ]] ≤ 2 exp . 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend