Randomized Sampling Problems Sorting in Parallel Selection Anil - - PowerPoint PPT Presentation

randomized sampling
SMART_READER_LITE
LIVE PREVIEW

Randomized Sampling Problems Sorting in Parallel Selection Anil - - PowerPoint PPT Presentation

Randomized Sampling Anil Maheshwari Input Sampling algorithm Randomized Sampling Problems Sorting in Parallel Selection Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University Outline Randomized Sampling


slide-1
SLIDE 1

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Randomized Sampling

Anil Maheshwari anil@scs.carleton.ca

School of Computer Science Carleton University

slide-2
SLIDE 2

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Outline

1

Input

2

Sampling algorithm

3

Problems

4

Sorting in Parallel

5

Selection

slide-3
SLIDE 3

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem

Input: A set S of n distinct numbers. Let elements of S be x1 < x2 < · · · < xn Let R = {y1 < y2 < . . . < y|R|} ⊆ S. R-Partition of S \ R into |R| + 1 (open) subsets is: S0 = {x ∈ S : x < y1}, Si = {x ∈ S : yi < x < yi+1}, i = 1, 2, . . . , |R| − 1, and S|R| = {x ∈ S : x > y|R|} Problem: Find a GD partition.

slide-4
SLIDE 4

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

r-Sampling Algorithm

Fix an integer r with 1 < r < n

Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)

r-sampling is GD if

1

1 ≤ |R| ≤ 2r, and

2

for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r

r

elements of S.

slide-5
SLIDE 5

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Good Sample

A good sample R is

1

non-empty

2

At most twice as large as the sample size we are aiming for, and

3

Elements of S \ R are approximately evenly distributed over the open intervals (except for the ln r factor).

slide-6
SLIDE 6

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 1

Compute the expected size E(|R|) of the set R.

Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)

slide-7
SLIDE 7

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 2

Prove that Pr(R = ∅) ≤ e−r. (Hint: Recall that 1 − z ≤ e−z for all real numbers z.)

Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)

slide-8
SLIDE 8

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 3

Use Chernoff bound to show that Pr(|R| > 2r) ≤ e−r/3.

Let X1, · · · , Xn, be 0-1 i.i.d’s random variables. Let X = n

i=1 Xi. Chernoff bounds estimate the

probability of X deviating from (1 ± ǫ)E[X], for 1 ≥ ǫ ≥ 0. P (X ≥ (1 + ǫ)E[X]) ≤ exp(−ǫ2E[X]/3) P (X ≤ (1 − ǫ)E[X]) ≤ exp(−ǫ2E[X]/2).

slide-9
SLIDE 9

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 4

Consider the sorted sequence x1 < x2 < . . . < xn of elements of S. Let integer k divides n. Partition S into n/k subsets B1, B2, . . . , Bn/k, each containing k elements: B1 contains x1, . . . , xk; B2 contains xk+1, . . . , x2k, etc. Think of Bi’s as buckets. Bucket Bi is empty if Bi ∩ R = ∅. Argue that the following is true: If each bucket is non-empty, then each open interval contains at most 2k elements of S.

slide-10
SLIDE 10

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 5

Prove the following: Pr(each bucket is non-empty) ≥ 1 − n k (1 − p)k

slide-11
SLIDE 11

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 6

Show that Pr(each open interval contains at most 2k elements of S ) ≥ 1 − n

k (1 − p)k

slide-12
SLIDE 12

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 7

Recall that p = r/n. Let k = n ln r

r

. Prove that Pr(at least one open interval contains more than 2n ln r

r

elements of S) ≤

1 ln r

slide-13
SLIDE 13

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 8

Show that Pr( the sample R is bad) ≤ e−r + e−r/3 +

1 ln r

r-sampling is GD if 1 1 ≤ |R| ≤ 2r, and 2 for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r

r

elements of S.

slide-14
SLIDE 14

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Problem 9

Show that if r is chosen sufficiently large than Pr(the sample R is good) ≥ 1

2

r-sampling is GD if 1 1 ≤ |R| ≤ 2r, and 2 for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r

r

elements of S.

slide-15
SLIDE 15

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Application I(Optional)

Leslie G. Valiant A Bridging Model for Parallel Computation Communications of ACM 33(8): 103-111 (1990) 3-attribute Bulk-Synchronous Parallel Computer:

1

Components: Processor/Memory

2

Router: Point-to-Point messages between components

3

Synchronization mechanism for all components after every "L" units of time

slide-16
SLIDE 16

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

BSP Computation

Computation in terms of Supersteps. In a superset, each component

  • first receives messages,
  • performs local computation, and
  • prepares messages for transmission for the next

superstep. Routers realize h-relations: Each component sends & receives at most h-messages.

slide-17
SLIDE 17

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

BSP Sort (Sketch)

Assume ρ processors. Processor Pi consists of n/ρ data items, denoted by Si, at the start.

1

Each processor chooses its elements with uniform probability r/n. (Assume selected elements at processor i be Ri.)

2

Route all selected elements to all the processors.

3

Let R = R1 ∪ R2, ∪ . . . ∪ Rρ. Each processors sorts items in R and partitions its set Si.

4

. . .

5

Each processor Pi, routes the partitions appropriately.

6

Each processor sorts its items

slide-18
SLIDE 18

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

BSP Sort Analysis

Think of how random sampling is used. What is r in terms of ρ, n? How quantities ρ, k, and r are connected? Left as an exercise (need some BSP background). Correctness: Why does it sort? Number of Supersteps: O(1) Work done in each superset: If n >> ρ, O( n

ρ log ρ).

h-relation: If n >> ρ, h = O( n

ρ log ρ) suffices.

slide-19
SLIDE 19

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Application II (Optional)

Robert Floyd and Ronald Rivest Expected Time Bounds for Selection Communications of ACM 18(3):165–172 1975 Rajeev Raman Random Sampling Techniques in Parallel Computation 1998 IPPS/SPDP’98 Workshop Lecture Notes in Computer Science 1388:351–360

slide-20
SLIDE 20

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

General Framework

Two main paradigms: Partitioning: Partition in independent subproblems that can be solved in parallel. (e.g. BSP Sort) Pruning: Use an inefficient algorithm on a small random sample of the input, and use that solution to reduce the size of the problem. How?

1

Obtain a random sample R of instance I of the problem.

2

Use an inefficient algorithm to solve R.

3

Use the solution of R to discard irrelevant parts of I, and obtain the reduced problem I′.

4

Use inefficient algorithm to solve I′.

slide-21
SLIDE 21

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Selection Problem

Input: n distinct values and an integer k, 1 ≤ k ≤ n. Output: median element of I Standard Methods: Sort and report the n/2-th element O(n log n), or use the selection algorithm O(n).

Algorithm SELECTION-BY-PRUNING(I)

  • 1. Choose n3/4 elements of I uniformly at random to form

the random sample R

  • 2. Sort R
  • 3. Let l(r) be elements of rank |R|/2 − √n (resp. |R|/2 + √n) in R
  • 4. Let I′ ⊆ I be all elements of I between l and r
  • 5. Let nl be the number of elements of I that are < l
  • 6. Sort I′
  • 7. Return the element of rank n

2 − nl of I′ as the median element

slide-22
SLIDE 22

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection

Remarks

1

Sorting R takes time O(n3/4 log n) = O(n), if n1/4 ≥ log n.

2

Exercise: Between two consecutive elements of R,

  • n average there are approx. n1/4 elements of I.

3

We have 2√n samples between l and r.

4

Expected number of elements of I between l and r are 2n1/2n1/4.

5

Thus, E[|I′|] = O(n3/4)

6

This ensures run-time of O(n).

7

For correctness: Estimate what is the probability that median element is not between l and r. Using Chernoff Bounds it can be shown that it is exponentially low.

slide-23
SLIDE 23

Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection