Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Randomized Sampling Problems Sorting in Parallel Selection Anil - - PowerPoint PPT Presentation
Randomized Sampling Problems Sorting in Parallel Selection Anil - - PowerPoint PPT Presentation
Randomized Sampling Anil Maheshwari Input Sampling algorithm Randomized Sampling Problems Sorting in Parallel Selection Anil Maheshwari anil@scs.carleton.ca School of Computer Science Carleton University Outline Randomized Sampling
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Outline
1
Input
2
Sampling algorithm
3
Problems
4
Sorting in Parallel
5
Selection
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem
Input: A set S of n distinct numbers. Let elements of S be x1 < x2 < · · · < xn Let R = {y1 < y2 < . . . < y|R|} ⊆ S. R-Partition of S \ R into |R| + 1 (open) subsets is: S0 = {x ∈ S : x < y1}, Si = {x ∈ S : yi < x < yi+1}, i = 1, 2, . . . , |R| − 1, and S|R| = {x ∈ S : x > y|R|} Problem: Find a GD partition.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
r-Sampling Algorithm
Fix an integer r with 1 < r < n
Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)
r-sampling is GD if
1
1 ≤ |R| ≤ 2r, and
2
for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r
r
elements of S.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Good Sample
A good sample R is
1
non-empty
2
At most twice as large as the sample size we are aiming for, and
3
Elements of S \ R are approximately evenly distributed over the open intervals (except for the ln r factor).
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 1
Compute the expected size E(|R|) of the set R.
Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 2
Prove that Pr(R = ∅) ≤ e−r. (Hint: Recall that 1 − z ≤ e−z for all real numbers z.)
Algorithm RANDOMSAMPLE(S, r) p = r/n; R = ∅; for each x ∈ S do with probability p, add x to R endfor; Sort the elements of R; Compute the open intervals S0, S1, . . . , S|R|; Return(R, S0, S1, . . . , S|R|)
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 3
Use Chernoff bound to show that Pr(|R| > 2r) ≤ e−r/3.
Let X1, · · · , Xn, be 0-1 i.i.d’s random variables. Let X = n
i=1 Xi. Chernoff bounds estimate the
probability of X deviating from (1 ± ǫ)E[X], for 1 ≥ ǫ ≥ 0. P (X ≥ (1 + ǫ)E[X]) ≤ exp(−ǫ2E[X]/3) P (X ≤ (1 − ǫ)E[X]) ≤ exp(−ǫ2E[X]/2).
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 4
Consider the sorted sequence x1 < x2 < . . . < xn of elements of S. Let integer k divides n. Partition S into n/k subsets B1, B2, . . . , Bn/k, each containing k elements: B1 contains x1, . . . , xk; B2 contains xk+1, . . . , x2k, etc. Think of Bi’s as buckets. Bucket Bi is empty if Bi ∩ R = ∅. Argue that the following is true: If each bucket is non-empty, then each open interval contains at most 2k elements of S.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 5
Prove the following: Pr(each bucket is non-empty) ≥ 1 − n k (1 − p)k
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 6
Show that Pr(each open interval contains at most 2k elements of S ) ≥ 1 − n
k (1 − p)k
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 7
Recall that p = r/n. Let k = n ln r
r
. Prove that Pr(at least one open interval contains more than 2n ln r
r
elements of S) ≤
1 ln r
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 8
Show that Pr( the sample R is bad) ≤ e−r + e−r/3 +
1 ln r
r-sampling is GD if 1 1 ≤ |R| ≤ 2r, and 2 for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r
r
elements of S.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Problem 9
Show that if r is chosen sufficiently large than Pr(the sample R is good) ≥ 1
2
r-sampling is GD if 1 1 ≤ |R| ≤ 2r, and 2 for each i with 0 ≤ i ≤ |R|, the open interval Si contains at most 2n ln r
r
elements of S.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Application I(Optional)
Leslie G. Valiant A Bridging Model for Parallel Computation Communications of ACM 33(8): 103-111 (1990) 3-attribute Bulk-Synchronous Parallel Computer:
1
Components: Processor/Memory
2
Router: Point-to-Point messages between components
3
Synchronization mechanism for all components after every "L" units of time
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
BSP Computation
Computation in terms of Supersteps. In a superset, each component
- first receives messages,
- performs local computation, and
- prepares messages for transmission for the next
superstep. Routers realize h-relations: Each component sends & receives at most h-messages.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
BSP Sort (Sketch)
Assume ρ processors. Processor Pi consists of n/ρ data items, denoted by Si, at the start.
1
Each processor chooses its elements with uniform probability r/n. (Assume selected elements at processor i be Ri.)
2
Route all selected elements to all the processors.
3
Let R = R1 ∪ R2, ∪ . . . ∪ Rρ. Each processors sorts items in R and partitions its set Si.
4
. . .
5
Each processor Pi, routes the partitions appropriately.
6
Each processor sorts its items
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
BSP Sort Analysis
Think of how random sampling is used. What is r in terms of ρ, n? How quantities ρ, k, and r are connected? Left as an exercise (need some BSP background). Correctness: Why does it sort? Number of Supersteps: O(1) Work done in each superset: If n >> ρ, O( n
ρ log ρ).
h-relation: If n >> ρ, h = O( n
ρ log ρ) suffices.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Application II (Optional)
Robert Floyd and Ronald Rivest Expected Time Bounds for Selection Communications of ACM 18(3):165–172 1975 Rajeev Raman Random Sampling Techniques in Parallel Computation 1998 IPPS/SPDP’98 Workshop Lecture Notes in Computer Science 1388:351–360
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
General Framework
Two main paradigms: Partitioning: Partition in independent subproblems that can be solved in parallel. (e.g. BSP Sort) Pruning: Use an inefficient algorithm on a small random sample of the input, and use that solution to reduce the size of the problem. How?
1
Obtain a random sample R of instance I of the problem.
2
Use an inefficient algorithm to solve R.
3
Use the solution of R to discard irrelevant parts of I, and obtain the reduced problem I′.
4
Use inefficient algorithm to solve I′.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Selection Problem
Input: n distinct values and an integer k, 1 ≤ k ≤ n. Output: median element of I Standard Methods: Sort and report the n/2-th element O(n log n), or use the selection algorithm O(n).
Algorithm SELECTION-BY-PRUNING(I)
- 1. Choose n3/4 elements of I uniformly at random to form
the random sample R
- 2. Sort R
- 3. Let l(r) be elements of rank |R|/2 − √n (resp. |R|/2 + √n) in R
- 4. Let I′ ⊆ I be all elements of I between l and r
- 5. Let nl be the number of elements of I that are < l
- 6. Sort I′
- 7. Return the element of rank n
2 − nl of I′ as the median element
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection
Remarks
1
Sorting R takes time O(n3/4 log n) = O(n), if n1/4 ≥ log n.
2
Exercise: Between two consecutive elements of R,
- n average there are approx. n1/4 elements of I.
3
We have 2√n samples between l and r.
4
Expected number of elements of I between l and r are 2n1/2n1/4.
5
Thus, E[|I′|] = O(n3/4)
6
This ensures run-time of O(n).
7
For correctness: Estimate what is the probability that median element is not between l and r. Using Chernoff Bounds it can be shown that it is exponentially low.
Randomized Sampling Anil Maheshwari Input Sampling algorithm Problems Sorting in Parallel Selection