Median in Random Order Streams Lecture 17 March 26, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Median in Random Order Streams Lecture 17 March 26, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 16

Quantiles and Selection Input: stream of numbers x 1 , x 2 , . . . , x n (or elements from a total order) and integer k Selection: (Approximate) rank k element in the input. Quantile summary: A compact data structure that allows approximate selection queries. Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 16

Summary of previous lecture Randomized: Pick Θ( 1 ǫ log(1 /δ )) elements. With probability (1 − 1 /δ ) will provide ǫ -approximate quantile summary ǫ log 2 n ) Deterministic: ǫ -approximate quantile summary using O ( 1 elements and can be improved to O ( 1 ǫ log n ) elements Exact selection: With O ( n 1 / p log n ) memory and p passes. Median in 2 passes with O ( √ n log n ) memory. Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 16

Random order streams Question: Can we improve bounds/algorithms if we move beyond worst case? Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 16

Random order streams Question: Can we improve bounds/algorithms if we move beyond worst case? Two models: Elements x 1 , x 2 , . . . , x n chosen iid from some probability distribution. For instance each x i ∈ [0 , 1] Elements x 1 , x 2 , . . . , x n chosen adversarially but stream is a uniformaly random permutation of elements. Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 16

Median in random order streams [Munro-Paterson 1980] Theorem Median in O ( √ n log n ) memory in one pass with high probability if stream is random order. More generally in p passes with memory O ( n 1 / 2 p log n ) Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 16

Munro-Paterson algorithm Given a space parameter s algorithm stores a set of s consecutive elements seen so far in the stream Maintains counters ℓ and h ℓ is number of elements seen so far that are less than min S h is number of elements seen so far that are more than max S . Tries to keep ℓ and h balanced Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 16

Munro-Paterson algorithm MP-Median (s) : Store the first s elements of the stream in S ℓ = h = 0 While (stream is not empty) do x is new element If ( x > max S ) then h = h + 1 Else If ( x < min S ) then ℓ = ℓ + 1 Else Insert x into S If h > ℓ discard min S from S and ℓ = ℓ + 1 Else discard max S from S and h = h + 1 endWhile If 1 ≤ n / 2 − ℓ ≤ s then Output n / 2 − ℓ ranked element from S Else output FAIL Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 16

Example σ = 1 , 2 , 3 , 4 , 5 , 6 , 7 , 9 , 10 and s = 3 σ = 10 , 19 , 1 , 23 , 15 , 11 , 14 , 16 , 3 , 7 and s = 3 . Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 16

Analysis Theorem If s = Ω( √ n log n ) and stream is random order then algorithm outputs median with high probability. Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 16

Recall: Random walk on the line Start at origin 0 . At each step move left one unit with probability 1 / 2 and move right with probability 1 / 2 . After n steps how far from the origin? Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 16

Recall: Random walk on the line Start at origin 0 . At each step move left one unit with probability 1 / 2 and move right with probability 1 / 2 . After n steps how far from the origin? At time i let X i be − 1 if move to left and 1 if move to right. Y n position at time n Y n = � n i =1 X i E[ Y n ] = 0 and Var ( Y n ) = � n i =1 Var ( X i ) = n | Y n | ≥ t √ n � � ≤ 1 / t 2 By Chebyshev: Pr By Chernoff: | Y n | ≥ t √ n ≤ 2 exp ( − t 2 / 2) . � � Pr Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 16

Analysis Let H i and L i be random variables for the values of h and ℓ after seeing i items in the random stream Let D i = H i − L i Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

Analysis Let H i and L i be random variables for the values of h and ℓ after seeing i items in the random stream Let D i = H i − L i Observation: Algorithm fails only if | D n | ≥ s − 1 Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

Analysis Let H i and L i be random variables for the values of h and ℓ after seeing i items in the random stream Let D i = H i − L i Observation: Algorithm fails only if | D n | ≥ s − 1 Will instead analyse the probability that | D i | ≥ s − 1 at any i Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

Analysis Lemma Suppose D i = H i − L i ≥ 0 and D i < s − 1 . Pr[ D i +1 = D i + 1] = H i / ( H i + s + L i ) ≤ 1 / 2 . Lemma Suppose D i = H i − L i < 0 and | D i | < s − 1 . Pr[ D i +1 = D i − 1] = L i / ( H i + s + L i ) ≤ 1 / 2 . Thus, process behaves better than random walk on the line (formal proof is technical) and with high probability | D i | ≤ c √ n log n for all i . Thus if s > c √ n log n then algorithm succeeds with high probability. Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 16

Other results on selection in random order streams [Munro-Paterson] extend analysis for p = 1 and show that Θ( n 1 / 2 p log n ) memory sufficient for p passes (with high probability). Note that for adversarial stream one needs Θ( n 1 / p ) memory [Guha-MacGregor] show that O (log log n ) -passes sufficient for exact selection in random order streams Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 16

Part I Secretary Problem Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 16

Secretary Problem Stream of numbers x 1 , x 2 , . . . , x n (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen. Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

Secretary Problem Stream of numbers x 1 , x 2 , . . . , x n (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen. Extensively studied with applications to auction design etc. Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

Secretary Problem Stream of numbers x 1 , x 2 , . . . , x n (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen. Extensively studied with applications to auction design etc. In the worst case no guarantees possible. What about random arrival order? Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

Algorithm Assume n is known. LearnAndPick ( θ ) : Let y be max number seen in the first θ n numbers Pick z the first number larger than y in the remaining stream Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

Algorithm Assume n is known. LearnAndPick ( θ ) : Let y be max number seen in the first θ n numbers Pick z the first number larger than y in the remaining stream Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

Algorithm Assume n is known. LearnAndPick ( θ ) : Let y be max number seen in the first θ n numbers Pick z the first number larger than y in the remaining stream Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θ n numbers and a is the residual stream. Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

Algorithm Assume n is known. LearnAndPick ( θ ) : Let y be max number seen in the first θ n numbers Pick z the first number larger than y in the remaining stream Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θ n numbers and a is the residual stream. If θ = 1 / 2 then each will occur with probability roughly 1 / 2 and hence 1 / 4 probability. Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

Algorithm Assume n is known. LearnAndPick ( θ ) : Let y be max number seen in the first θ n numbers Pick z the first number larger than y in the remaining stream Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θ n numbers and a is the residual stream. If θ = 1 / 2 then each will occur with probability roughly 1 / 2 and hence 1 / 4 probability. Optimal strategy: θ = 1 / e and probability of picking largest number is 1 / e . A more careful calculation. Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

Median in Random Order Streams Lecture 17 March 26, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Median in Random Order Streams Lecture 17 March 26, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 16 Quantiles and Selection Input: stream of numbers x 1 , x 2 , . . . , x n (or elements

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

Data Streams: Random Order & Multiple Passes 2009 Barbados Workshop on Computational

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Median filter Non-linear filtering example Replace each pixel by the median over N pixels (5

Today Non-linear filtering example Median filter Replace each pixel by the median over N pixels

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Think Eternally: Improved Algorithms for the Temp Secretary Problem and Extensions Thomas

The Cayley-Moser Problem Optimal Stopping Buying a house, selling an asset, or searching for a

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Monte Carlo simulation inspired by computational optimization Colin Fox fox@physics.otago.ac.nz

Week 6.1, Monday, Sept 23 Homework 3 Due Tonight: 11:59PM on Gradescope Late Submissions: Close

False-name-proofness in Online Mechanisms Taiki Todo, Takayuki Mouri, Atsushi Iwasaki, and

Self-Organizing Maps Kyle Thayer Organizing Marbles Self-Organizing Maps Algorithm

Pattern Analysis and Machine Intelligence Lecture Notes on Clustering (II) 2013-2014 Davide

Median in Random Order Streams Lecture 17 March 26, 2019 Chandra - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Median in Random Order Streams Lecture 17 March 26, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 16 Quantiles and Selection Input: stream of numbers x 1 , x 2 , . . . , x n (or elements

the nerves sensory radial median ulnar median median sensory median median ulnar radial

I - -75 Median Cable Barrier 75 Median Cable Barrier 75 Median Cable Barrier I 75 Median Cable

Linear-time Median Def: Median of elements A=a 1 , a 2 , , a n is the (n/2)-th smallest element

Spartanburg Nation Median Value of a $115,900 $184,700 Home Median Gross Rent $705 $950

Data Streams: Random Order &amp; Multiple Passes 2009 Barbados Workshop on Computational

African American Strategy Equitable Access to Homeownership Presentation April 16, 2018

Median Finding Test Cases What's Next 1. Median finding, part 2 2. Why we write test cases 3.

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Random Numbers RANDOM VS PSEUDO RANDOM Truly Random numbers From Wolfram: A random number

Business Statistics CONTENTS Hypotheses on the median The sign test The Wilcoxon signed ranks

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Median filter Non-linear filtering example Replace each pixel by the median over N pixels (5

Today Non-linear filtering example Median filter Replace each pixel by the median over N pixels

Reorder Buffer Method Issue Execute Write Classic 5-stage pipeline In-order In-order

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Think Eternally: Improved Algorithms for the Temp Secretary Problem and Extensions Thomas

The Cayley-Moser Problem Optimal Stopping Buying a house, selling an asset, or searching for a

Saka e Fuchino ( ) Graduate School of System Informatics Kobe University (

Monte Carlo simulation inspired by computational optimization Colin Fox fox@physics.otago.ac.nz

Week 6.1, Monday, Sept 23 Homework 3 Due Tonight: 11:59PM on Gradescope Late Submissions: Close

False-name-proofness in Online Mechanisms Taiki Todo, Takayuki Mouri, Atsushi Iwasaki, and

Self-Organizing Maps Kyle Thayer Organizing Marbles Self-Organizing Maps Algorithm

Pattern Analysis and Machine Intelligence Lecture Notes on Clustering (II) 2013-2014 Davide

Data Streams: Random Order & Multiple Passes 2009 Barbados Workshop on Computational

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams