Median in Random Order Streams Lecture 17 March 26, 2019 Chandra - - PowerPoint PPT Presentation

median in random order streams
SMART_READER_LITE
LIVE PREVIEW

Median in Random Order Streams Lecture 17 March 26, 2019 Chandra - - PowerPoint PPT Presentation

CS 498ABD: Algorithms for Big Data, Spring 2019 Median in Random Order Streams Lecture 17 March 26, 2019 Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 16 Quantiles and Selection Input: stream of numbers x 1 , x 2 , . . . , x n (or elements


slide-1
SLIDE 1

CS 498ABD: Algorithms for Big Data, Spring 2019

Median in Random Order Streams

Lecture 17

March 26, 2019

Chandra (UIUC) CS498ABD 1 Spring 2019 1 / 16

slide-2
SLIDE 2

Quantiles and Selection

Input: stream of numbers x1, x2, . . . , xn (or elements from a total

  • rder) and integer k

Selection: (Approximate) rank k element in the input. Quantile summary: A compact data structure that allows approximate selection queries.

Chandra (UIUC) CS498ABD 2 Spring 2019 2 / 16

slide-3
SLIDE 3

Summary of previous lecture

Randomized: Pick Θ( 1

ǫ log(1/δ)) elements. With probability

(1 − 1/δ) will provide ǫ-approximate quantile summary Deterministic: ǫ-approximate quantile summary using O( 1

ǫ log2 n)

elements and can be improved to O( 1

ǫ log n) elements

Exact selection: With O(n1/p log n) memory and p passes. Median in 2 passes with O(√n log n) memory.

Chandra (UIUC) CS498ABD 3 Spring 2019 3 / 16

slide-4
SLIDE 4

Random order streams

Question: Can we improve bounds/algorithms if we move beyond worst case?

Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 16

slide-5
SLIDE 5

Random order streams

Question: Can we improve bounds/algorithms if we move beyond worst case? Two models: Elements x1, x2, . . . , xn chosen iid from some probability

  • distribution. For instance each xi ∈ [0, 1]

Elements x1, x2, . . . , xn chosen adversarially but stream is a uniformaly random permutation of elements.

Chandra (UIUC) CS498ABD 4 Spring 2019 4 / 16

slide-6
SLIDE 6

Median in random order streams

[Munro-Paterson 1980]

Theorem

Median in O(√n log n) memory in one pass with high probability if stream is random order. More generally in p passes with memory O(n1/2p log n)

Chandra (UIUC) CS498ABD 5 Spring 2019 5 / 16

slide-7
SLIDE 7

Munro-Paterson algorithm

Given a space parameter s algorithm stores a set of s consecutive elements seen so far in the stream Maintains counters ℓ and h ℓ is number of elements seen so far that are less than min S h is number of elements seen so far that are more than max S. Tries to keep ℓ and h balanced

Chandra (UIUC) CS498ABD 6 Spring 2019 6 / 16

slide-8
SLIDE 8

Munro-Paterson algorithm

MP-Median (s): Store the first s elements of the stream in S ℓ = h = 0 While (stream is not empty) do x is new element If (x > max S) then h = h + 1 Else If (x < min S) then ℓ = ℓ + 1 Else Insert x into S If h > ℓ discard min S from S and ℓ = ℓ + 1 Else discard max S from S and h = h + 1 endWhile If 1 ≤ n/2 − ℓ ≤ s then Output n/2 − ℓ ranked element from S Else output FAIL

Chandra (UIUC) CS498ABD 7 Spring 2019 7 / 16

slide-9
SLIDE 9

Example

σ = 1, 2, 3, 4, 5, 6, 7, 9, 10 and s = 3 σ = 10, 19, 1, 23, 15, 11, 14, 16, 3, 7 and s = 3.

Chandra (UIUC) CS498ABD 8 Spring 2019 8 / 16

slide-10
SLIDE 10

Analysis

Theorem

If s = Ω(√n log n) and stream is random order then algorithm

  • utputs median with high probability.

Chandra (UIUC) CS498ABD 9 Spring 2019 9 / 16

slide-11
SLIDE 11

Recall: Random walk on the line

Start at origin 0. At each step move left one unit with probability 1/2 and move right with probability 1/2. After n steps how far from the origin?

Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 16

slide-12
SLIDE 12

Recall: Random walk on the line

Start at origin 0. At each step move left one unit with probability 1/2 and move right with probability 1/2. After n steps how far from the origin? At time i let Xi be −1 if move to left and 1 if move to right. Yn position at time n Yn = n

i=1 Xi

E[Yn] = 0 and Var(Yn) = n

i=1 Var(Xi) = n

By Chebyshev: Pr

  • |Yn| ≥ t√n
  • ≤ 1/t2

By Chernoff: Pr

  • |Yn| ≥ t√n
  • ≤ 2exp(−t2/2).

Chandra (UIUC) CS498ABD 10 Spring 2019 10 / 16

slide-13
SLIDE 13

Analysis

Let Hi and Li be random variables for the values of h and ℓ after seeing i items in the random stream Let Di = Hi − Li

Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

slide-14
SLIDE 14

Analysis

Let Hi and Li be random variables for the values of h and ℓ after seeing i items in the random stream Let Di = Hi − Li Observation: Algorithm fails only if |Dn| ≥ s − 1

Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

slide-15
SLIDE 15

Analysis

Let Hi and Li be random variables for the values of h and ℓ after seeing i items in the random stream Let Di = Hi − Li Observation: Algorithm fails only if |Dn| ≥ s − 1 Will instead analyse the probability that |Di| ≥ s − 1 at any i

Chandra (UIUC) CS498ABD 11 Spring 2019 11 / 16

slide-16
SLIDE 16

Analysis

Lemma

Suppose Di = Hi − Li ≥ 0 and Di < s − 1. Pr[Di+1 = Di + 1] = Hi/(Hi + s + Li) ≤ 1/2.

Lemma

Suppose Di = Hi − Li < 0 and |Di| < s − 1. Pr[Di+1 = Di − 1] = Li/(Hi + s + Li) ≤ 1/2. Thus, process behaves better than random walk on the line (formal proof is technical) and with high probability |Di| ≤ c√n log n for all i. Thus if s > c√n log n then algorithm succeeds with high probability.

Chandra (UIUC) CS498ABD 12 Spring 2019 12 / 16

slide-17
SLIDE 17

Other results on selection in random order streams

[Munro-Paterson] extend analysis for p = 1 and show that Θ(n1/2p log n) memory sufficient for p passes (with high probability). Note that for adversarial stream one needs Θ(n1/p) memory [Guha-MacGregor] show that O(log log n)-passes sufficient for exact selection in random order streams

Chandra (UIUC) CS498ABD 13 Spring 2019 13 / 16

slide-18
SLIDE 18

Part I Secretary Problem

Chandra (UIUC) CS498ABD 14 Spring 2019 14 / 16

slide-19
SLIDE 19

Secretary Problem

Stream of numbers x1, x2, . . . , xn (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen.

Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

slide-20
SLIDE 20

Secretary Problem

Stream of numbers x1, x2, . . . , xn (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen. Extensively studied with applications to auction design etc.

Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

slide-21
SLIDE 21

Secretary Problem

Stream of numbers x1, x2, . . . , xn (value/ranking of items/people) Want to select the largest number Easy if we can store the maximum number Online setting: have to make a single irrevocable decision when number seen. Extensively studied with applications to auction design etc. In the worst case no guarantees possible. What about random arrival

  • rder?

Chandra (UIUC) CS498ABD 15 Spring 2019 15 / 16

slide-22
SLIDE 22

Algorithm

Assume n is known.

LearnAndPick (θ): Let y be max number seen in the first θn numbers Pick z the first number larger than y in the remaining stream

Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

slide-23
SLIDE 23

Algorithm

Assume n is known.

LearnAndPick (θ): Let y be max number seen in the first θn numbers Pick z the first number larger than y in the remaining stream

Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element?

Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

slide-24
SLIDE 24

Algorithm

Assume n is known.

LearnAndPick (θ): Let y be max number seen in the first θn numbers Pick z the first number larger than y in the remaining stream

Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θn numbers and a is the residual stream.

Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

slide-25
SLIDE 25

Algorithm

Assume n is known.

LearnAndPick (θ): Let y be max number seen in the first θn numbers Pick z the first number larger than y in the remaining stream

Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θn numbers and a is the residual stream. If θ = 1/2 then each will occur with probability roughly 1/2 and hence 1/4 probability.

Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16

slide-26
SLIDE 26

Algorithm

Assume n is known.

LearnAndPick (θ): Let y be max number seen in the first θn numbers Pick z the first number larger than y in the remaining stream

Question: Assume numbers are in random order. What is a lower bound on the probability that algorithm will pick the largest element? Observation: Let a be largest and b the second largest. Algorithm will pick a if b is in the first θn numbers and a is the residual stream. If θ = 1/2 then each will occur with probability roughly 1/2 and hence 1/4 probability. Optimal strategy: θ = 1/e and probability of picking largest number is 1/e. A more careful calculation.

Chandra (UIUC) CS498ABD 16 Spring 2019 16 / 16