Forty years of Quicksort and Quickselect: a personal view Conrado - - PowerPoint PPT Presentation

forty years of quicksort and quickselect a personal view
SMART_READER_LITE
LIVE PREVIEW

Forty years of Quicksort and Quickselect: a personal view Conrado - - PowerPoint PPT Presentation

Forty years of Quicksort and Quickselect: a personal view Conrado Martnez Univ. Politcnica de Catalunya, Spain Introduction Quicksort and quickselect were invented in the early sixties by C.A.R.


slide-1
SLIDE 1
  • Forty years of Quicksort and

Quickselect: a personal view

Conrado Martínez

  • Univ. Politècnica de Catalunya, Spain
slide-2
SLIDE 2
  • 2/51

Introduction

Quicksort and quickselect were invented in the early sixties by C.A.R. Hoare (Hoare, 1961; Hoare, 1962)

  • Univ. Politècnica de Catalunya, Spain
slide-3
SLIDE 3
  • 2/51

Introduction

Quicksort and quickselect were invented in the early sixties by C.A.R. Hoare (Hoare, 1961; Hoare, 1962) They are simple, elegant, beatiful and practical solutions to two basic problems of Computer Science: sorting and selection

  • Univ. Politècnica de Catalunya, Spain
slide-4
SLIDE 4
  • 2/51

Introduction

Quicksort and quickselect were invented in the early sixties by C.A.R. Hoare (Hoare, 1961; Hoare, 1962) They are simple, elegant, beatiful and practical solutions to two basic problems of Computer Science: sorting and selection They are primary examples of the divide-and-conquer principle

  • Univ. Politècnica de Catalunya, Spain
slide-5
SLIDE 5
  • 3/51

Quicksort

void quicksort(vector<Elem>& A, int i, int j) { if (i < j) { int p = get_pivot(A, i, j); swap(A[p], A[l]); int k; partition(A, i, j, k); // A[i..k − 1] ≤ A[k] ≤ A[k + 1..j] quicksort(A, i, k - 1); quicksort(A, k + 1, j); } }

  • Univ. Politècnica de Catalunya, Spain
slide-6
SLIDE 6
  • 4/51

Quickselect

Elem quickselect(vector<Elem>& A, int i, int j, int m) { if (i >= j) return A[i]; int p = get_pivot(A, i, j, m); swap(A[p], A[l]); int k; partition(A, i, j, k); if (m < k) quickselect(A, i, k - 1, m); else if (m > k) quickselect(A, k + 1, j, m); else return A[k]; }

  • Univ. Politècnica de Catalunya, Spain
slide-7
SLIDE 7
  • 5/51

Partition

void partition(vector<Elem>& A, int i, int j, int& k) { int l = i; int u = j + 1; Elem pv = A[i]; for ( ; ; ) { do ++l; while(A[l] < pv); do --u; while(A[u] > pv); if (l >= u) break; swap(A[l], A[u]); }; swap(A[i], A[u]); k = u; }

  • Univ. Politècnica de Catalunya, Spain
slide-8
SLIDE 8
  • 6/51

Partition

pv

< pv ??? > pv l u j i

  • Univ. Politècnica de Catalunya, Spain
slide-9
SLIDE 9
  • 7/51

The Recurrences for Average Costs

Probability that the selected pivot is the k-th

  • f n elements: πn,k
  • Univ. Politècnica de Catalunya, Spain
slide-10
SLIDE 10
  • 7/51

The Recurrences for Average Costs

Probability that the selected pivot is the k-th

  • f n elements: πn,k

Average number of comparisons Qn to sort n elements: Qn = n − 1 +

n

  • k=1

πn,k · (Qk−1 + Qn−k)

  • Univ. Politècnica de Catalunya, Spain
slide-11
SLIDE 11
  • 7/51

The Recurrences for Average Costs

Probability that the selected pivot is the k-th

  • f n elements: πn,k

Average number of comparisons Cn,m to select the m-th out of n: Cn,m = n − 1 +

n

  • k=m+1

πn,k · Ck−1,m +

m−1

  • k=1

πn,k · Cn−k,m−k

  • Univ. Politècnica de Catalunya, Spain
slide-12
SLIDE 12
  • 8/51

Quicksort: The Average Cost

For the standard variant, πn,k = 1/n

  • Univ. Politècnica de Catalunya, Spain
slide-13
SLIDE 13
  • 8/51

Quicksort: The Average Cost

For the standard variant, πn,k = 1/n Average number of comparisons Qn to sort n elements (Hoare, 1962): Qn = 2(n + 1)Hn − 4n = 2n ln n + (2γ − 4)n + 2 ln n + O(1) where Hn =

1≤k≤n 1/k = ln n + γ + O(1/n)

is the n-th harmonic number and γ = 0.577 . . . is Euler’s gamma constant.

  • Univ. Politècnica de Catalunya, Spain
slide-14
SLIDE 14
  • 9/51

Quickselect: The Average Cost

Average number of comparisons Cn,m to select the m-th out of n elements (Knuth, 1971): Cn,m = 2

  • n + 3 + (n + 1)Hn

− (n + 3 − m)Hn+1−m − (m + 2)Hm

  • Univ. Politècnica de Catalunya, Spain
slide-15
SLIDE 15
  • 9/51

Quickselect: The Average Cost

This is Θ(n) for any m, 1 ≤ m ≤ n. In particular, m0(α) = lim

n→∞,m/n→α

Cn,m n = 2 + 2 · H(α), H(x) = −(x ln x + (1 − x) ln(1 − x)). with 0 ≤ α ≤ 1. The maximum is at α = 1/2, where m0(1/2) = 2 + 2 ln 2 = 3.386 . . . ; the mean value is m0 = 3.

  • Univ. Politècnica de Catalunya, Spain
slide-16
SLIDE 16
  • 10/51

Improving Quicksort and Quickselect

Apply general techniques: recursion removal, loop unwrapping, . . .

  • Univ. Politècnica de Catalunya, Spain
slide-17
SLIDE 17
  • 10/51

Improving Quicksort and Quickselect

Apply general techniques: recursion removal, loop unwrapping, . . . Reorder recursive calls to quicksort

  • Univ. Politècnica de Catalunya, Spain
slide-18
SLIDE 18
  • 10/51

Improving Quicksort and Quickselect

Apply general techniques: recursion removal, loop unwrapping, . . . Reorder recursive calls to quicksort Switch to a simpler algorithm for small subfiles

  • Univ. Politècnica de Catalunya, Spain
slide-19
SLIDE 19
  • 10/51

Improving Quicksort and Quickselect

Apply general techniques: recursion removal, loop unwrapping, . . . Reorder recursive calls to quicksort Switch to a simpler algorithm for small subfiles Use samples to select better pivots

  • Univ. Politècnica de Catalunya, Spain
slide-20
SLIDE 20
  • 11/51

Improving Quicksort and Quickselect

Apply general techniques: recursion removal, loop unwrapping, . . . Reorder recursive calls to quicksort Switch to a simpler algorithm for small subfiles Use samples to select better pivots

  • Univ. Politècnica de Catalunya, Spain
slide-21
SLIDE 21
  • 12/51

Small Subfiles

It is well known (Sedgewick, 1975) that, for quicksort, it is convenient to stop recursion for subarrays of size ≤ n0 and use insertion sort instead

  • Univ. Politècnica de Catalunya, Spain
slide-22
SLIDE 22
  • 12/51

Small Subfiles

It is well known (Sedgewick, 1975) that, for quicksort, it is convenient to stop recursion for subarrays of size ≤ n0 and use insertion sort instead The optimal choice for n0 is around 20 to 25 elements

  • Univ. Politècnica de Catalunya, Spain
slide-23
SLIDE 23
  • 12/51

Small Subfiles

It is well known (Sedgewick, 1975) that, for quicksort, it is convenient to stop recursion for subarrays of size ≤ n0 and use insertion sort instead The optimal choice for n0 is around 20 to 25 elements Alternatively, one might do nothing with small subfiles and perform a single pass of insertion sort over the whole file

  • Univ. Politècnica de Catalunya, Spain
slide-24
SLIDE 24
  • 13/51

Small Subfiles

Cutting off recursion also yields benefits for quickselect

  • Univ. Politècnica de Catalunya, Spain
slide-25
SLIDE 25
  • 13/51

Small Subfiles

Cutting off recursion also yields benefits for quickselect In (Martínez, Panario, Viola, 2002) we investigate different choices to select small subfiles and how they affect the average total cost: selection, insertion sort, optimized selection

  • Univ. Politècnica de Catalunya, Spain
slide-26
SLIDE 26
  • 14/51

Small Subfiles

We have now Cn,m =                  tn,m +

n

  • k=m+1

πn,k · Ck−1,m +

m−1

  • k=1

πn,k · Cn−k,m−k, if n > n0 bn,m if n ≤ n0

  • Univ. Politècnica de Catalunya, Spain
slide-27
SLIDE 27
  • 15/51

Small Subfiles

Let C(z, u) =

n≥0

  • 1≤m≤n Cn,mznum
  • Univ. Politècnica de Catalunya, Spain
slide-28
SLIDE 28
  • 15/51

Small Subfiles

Let C(z, u) =

n≥0

  • 1≤m≤n Cn,mznum

It can be shown that C(z, u) = Cn0(z, u) + z

0 (1 − t)(1 − ut)∂T(t,u) ∂t

dt (1 − z)(1 − uz) where T(z, u) =

n≥0

  • 1≤m≤n tn,mznum and

Cn0(z, u) is the only part depending on the bn,m’s and n0.

  • Univ. Politècnica de Catalunya, Spain
slide-29
SLIDE 29
  • 16/51

Small Subfiles

In order to determine the optimal choice for n0 we need only to compute [znum]Cn0(z, u)

  • Univ. Politècnica de Catalunya, Spain
slide-30
SLIDE 30
  • 16/51

Small Subfiles

In order to determine the optimal choice for n0 we need only to compute [znum]Cn0(z, u) We assume tn,m = αn + β + γ/(n − 1) and bn,m = K1n2+K2n+K3m2+K4m+K5mn+K6 + K7g2 + K8g + K9gn, where g ≡ min{m, n − m + 1}, to study the best choice for n0, as a function of α, β, γ and the Ki’s.

  • Univ. Politècnica de Catalunya, Spain
slide-31
SLIDE 31
  • 17/51

Small Subfiles

Using insertion sort with n0 ≤ 10 reduces the average cost; the optimal choice for n0 is 5

  • Univ. Politècnica de Catalunya, Spain
slide-32
SLIDE 32
  • 17/51

Small Subfiles

Using insertion sort with n0 ≤ 10 reduces the average cost; the optimal choice for n0 is 5 Selection (we locate the minimum, then the second minimum, etc.) reduces the average cost if n0 ≤ 11; the optimum n0 is 6

  • Univ. Politècnica de Catalunya, Spain
slide-33
SLIDE 33
  • 17/51

Small Subfiles

Using insertion sort with n0 ≤ 10 reduces the average cost; the optimal choice for n0 is 5 Selection (we locate the minimum, then the second minimum, etc.) reduces the average cost if n0 ≤ 11; the optimum n0 is 6 Optimized selection (looks for the m-th from the minimum or the maximum, whatever is closer) yields improved average performance if n0 ≤ 22; the optimum n0 is 11

  • Univ. Politècnica de Catalunya, Spain
slide-34
SLIDE 34
  • 18/51

Median-of-three

In quicksort with median-of-three, the pivot of each recursive stage is selected as the median of a sample of three elements (Singleton, 1969)

  • Univ. Politècnica de Catalunya, Spain
slide-35
SLIDE 35
  • 18/51

Median-of-three

In quicksort with median-of-three, the pivot of each recursive stage is selected as the median of a sample of three elements (Singleton, 1969) This reduces the probability of uneven partitions which lead to quadratic worst-case

  • Univ. Politècnica de Catalunya, Spain
slide-36
SLIDE 36
  • 19/51

Median-of-three

We have in this case πn,k = (k − 1)(n − k) n

3

  • Univ. Politècnica de Catalunya, Spain
slide-37
SLIDE 37
  • 19/51

Median-of-three

We have in this case πn,k = (k − 1)(n − k) n

3

  • The average number of comparisons Qn is

(Sedgewick, 1975) Qn = 12 7 n log n + O(n), roughly a 14.3% less than standard quicksort

  • Univ. Politècnica de Catalunya, Spain
slide-38
SLIDE 38
  • 20/51

Median-of-three

To study quickselect with median-of-three, in (Kirschenhofer, Martínez, Prodinger, 1997), we use bivariate generating functions C(z, u) =

  • n≥0
  • 1≤m≤n

Cn,mznum

  • Univ. Politècnica de Catalunya, Spain
slide-39
SLIDE 39
  • 20/51

Median-of-three

To study quickselect with median-of-three, in (Kirschenhofer, Martínez, Prodinger, 1997), we use bivariate generating functions C(z, u) =

  • n≥0
  • 1≤m≤n

Cn,mznum The recurrences translate into second-order differential equations of hypergeometric type x(1 − x)y′′ + (c − (1 + a + b)x)y′ − aby = 0

  • Univ. Politècnica de Catalunya, Spain
slide-40
SLIDE 40
  • 21/51

Median-of-three

We compute explicit solutions for comparisons and for passes; from there, one has to extract (painfully ;-)) the coefficients

  • Univ. Politècnica de Catalunya, Spain
slide-41
SLIDE 41
  • 21/51

Median-of-three

We compute explicit solutions for comparisons and for passes; from there, one has to extract (painfully ;-)) the coefficients For instance, for the average number of passes we get Pn,m = 24 35Hn + 18 35Hm + 18 35Hn+1−m + O(1)

  • Univ. Politècnica de Catalunya, Spain
slide-42
SLIDE 42
  • 21/51

Median-of-three

We compute explicit solutions for comparisons and for passes; from there, one has to extract (painfully ;-)) the coefficients And for the average number of comparisons Cn,m = 2n + 72 35Hn − 156 35 Hm − 156 35 Hn+1−m + 3m − (m − 1)(m − 2) n + O(1)

  • Univ. Politècnica de Catalunya, Spain
slide-43
SLIDE 43
  • 22/51

Median-of-three

An important particular case is m = ⌈n/2⌉ (the median) were the average number of comparisons is 11 4 n + o(n) Compare to (2 + 2 ln 2)n + o(n) for standar quickselect.

  • Univ. Politècnica de Catalunya, Spain
slide-44
SLIDE 44
  • 23/51

Median-of-three

In general, m1(α) = lim

n→∞,m/n→α

Cn,m n = 2 + 3 · α · (1 − α) with 0 ≤ α ≤ 1. The mean value is m1 = 5/2; compare to 3n + o(n) comparisons for standard quickselect on random ranks.

  • Univ. Politècnica de Catalunya, Spain
slide-45
SLIDE 45
  • 24/51

Optimal Sampling

In (Martínez, Roura, 2001) we study what happens if we use samples of size s = 2t + 1 to pick the pivots, but t = t(n)

  • Univ. Politècnica de Catalunya, Spain
slide-46
SLIDE 46
  • 24/51

Optimal Sampling

In (Martínez, Roura, 2001) we study what happens if we use samples of size s = 2t + 1 to pick the pivots, but t = t(n) The comparisons needed to pick the pivots have to be taken into account: Qn = n − 1 + Θ(s) +

n

  • k=1

πn,k · (Qk−1 + Qn−k)

  • Univ. Politècnica de Catalunya, Spain
slide-47
SLIDE 47
  • 25/51

Optimal Sampling

Traditional techniques to solve recurrences cannot be used here

  • Univ. Politècnica de Catalunya, Spain
slide-48
SLIDE 48
  • 25/51

Optimal Sampling

Traditional techniques to solve recurrences cannot be used here We make extensive use of the continuous master theorem (Roura, 1997)

  • Univ. Politècnica de Catalunya, Spain
slide-49
SLIDE 49
  • 25/51

Optimal Sampling

Traditional techniques to solve recurrences cannot be used here We make extensive use of the continuous master theorem (Roura, 1997) We also study the cost of quickselect when the rank of the sought element is random

  • Univ. Politècnica de Catalunya, Spain
slide-50
SLIDE 50
  • 25/51

Optimal Sampling

Traditional techniques to solve recurrences cannot be used here We make extensive use of the continuous master theorem (Roura, 1997) We also study the cost of quickselect when the rank of the sought element is random Total cost: # of comparisons + ξ · # of exchanges

  • Univ. Politècnica de Catalunya, Spain
slide-51
SLIDE 51
  • 26/51

Optimal Sampling

Theorem 1. If we use samples of size s, with s = o(n) and

s = ω(1) then the average total cost Qn of quicksort is Qn = (1 + ξ/4)n log2 n + o(n log n)

and the average total cost Cn of quickselect to find an element of given random rank is

Cn = 2(1 + ξ/4)n + o(n)

  • Univ. Politècnica de Catalunya, Spain
slide-52
SLIDE 52
  • 27/51

Optimal Sampling

Theorem 2. Let s∗ = 2t∗ + 1 denote the optimal sample size that minimizes the average total cost of quickselect; assume the average total cost of the algorithm to pick the medians from the samples is βs + o(s). Then

t∗ = 1 2√β · √n + o √n

  • Univ. Politècnica de Catalunya, Spain
slide-53
SLIDE 53
  • 28/51

Optimal Sampling

Theorem 3. Let s∗ = 2t∗ + 1 denote the optimal sample size that minimizes the average number of comparisons made by quicksort. Then

t∗ =

  • 1

β 4 − ξ(2 ln 2 − 1) 8 ln 2

  • · √n + o

√n

  • if ξ < τ = 4/(2 ln 2 − 1) ≈ 10.3548
  • Univ. Politècnica de Catalunya, Spain
slide-54
SLIDE 54
  • 29/51

Optimal Sampling

5 10 15 20 25 500 1000 1500 2000 2500 3000

Optimal sample size (Theorem 3) vs. exact values

  • Univ. Politècnica de Catalunya, Spain
slide-55
SLIDE 55
  • 30/51

Optimal Sampling

If exchanges are expensive (ξ ≥ τ) we have to use fixed-size samples and pick the median (not optimal) or pick the (ψ · s)-th element of a sample of size Θ(√n)

  • Univ. Politècnica de Catalunya, Spain
slide-56
SLIDE 56
  • 30/51

Optimal Sampling

If exchanges are expensive (ξ ≥ τ) we have to use fixed-size samples and pick the median (not optimal) or pick the (ψ · s)-th element of a sample of size Θ(√n) If the position of the pivot is close to either end of the array, then few exchanges are necessary on that stage, but a poor partition leads to more recursive steps. This trade-off is relevant if exchanges are very expensive

  • Univ. Politècnica de Catalunya, Spain
slide-57
SLIDE 57
  • 31/51

Optimal Sampling

The variance of quickselect when s = s(n) → ∞ is Vn = Θ

  • max

n2 s , n · s

  • Univ. Politècnica de Catalunya, Spain
slide-58
SLIDE 58
  • 31/51

Optimal Sampling

The variance of quickselect when s = s(n) → ∞ is Vn = Θ

  • max

n2 s , n · s

  • The best choice is s = Θ(√n); then

Vn = Θ(n3/2) and there is concentration in probability

  • Univ. Politècnica de Catalunya, Spain
slide-59
SLIDE 59
  • 31/51

Optimal Sampling

The variance of quickselect when s = s(n) → ∞ is Vn = Θ

  • max

n2 s , n · s

  • The best choice is s = Θ(√n); then

Vn = Θ(n3/2) and there is concentration in probability We conjecture this type of result holds for quicksort too

  • Univ. Politècnica de Catalunya, Spain
slide-60
SLIDE 60
  • 32/51

Adaptive Sampling

In (Martínez, Panario, Viola, 2004) we study choosing pivots with relative rank in the sample close to α = m/n

  • Univ. Politècnica de Catalunya, Spain
slide-61
SLIDE 61
  • 32/51

Adaptive Sampling

In (Martínez, Panario, Viola, 2004) we study choosing pivots with relative rank in the sample close to α = m/n In general: r(α) = rank of the pivot within the sample, when selecting the m-th out of n elements and α = m/n

  • Univ. Politècnica de Catalunya, Spain
slide-62
SLIDE 62
  • 32/51

Adaptive Sampling

In (Martínez, Panario, Viola, 2004) we study choosing pivots with relative rank in the sample close to α = m/n In general: r(α) = rank of the pivot within the sample, when selecting the m-th out of n elements and α = m/n Divide [0, 1] into ℓ intervals with endpoints 0 = a0 < a1 < a2 < · · · < aℓ = 1 and let rk denote the value of r(α) for α in the k-th interval

  • Univ. Politècnica de Catalunya, Spain
slide-63
SLIDE 63
  • 33/51

Adaptive Sampling

For median-of-(2t + 1): ℓ = 1 and r1 = t + 1

  • Univ. Politècnica de Catalunya, Spain
slide-64
SLIDE 64
  • 33/51

Adaptive Sampling

For median-of-(2t + 1): ℓ = 1 and r1 = t + 1 For proportion-from-s: ℓ = s, ak = k/s and rk = k

  • Univ. Politècnica de Catalunya, Spain
slide-65
SLIDE 65
  • 33/51

Adaptive Sampling

For median-of-(2t + 1): ℓ = 1 and r1 = t + 1 For proportion-from-s: ℓ = s, ak = k/s and rk = k “Proportion-from”-like strategies: ℓ = s and rk = k, but the endpoints of the intervals ak = k/s

  • Univ. Politècnica de Catalunya, Spain
slide-66
SLIDE 66
  • 33/51

Adaptive Sampling

For median-of-(2t + 1): ℓ = 1 and r1 = t + 1 For proportion-from-s: ℓ = s, ak = k/s and rk = k “Proportion-from”-like strategies: ℓ = s and rk = k, but the endpoints of the intervals ak = k/s A sampling strategy is symmetric if r(α) = s + 1 − r(1 − α)

  • Univ. Politècnica de Catalunya, Spain
slide-67
SLIDE 67
  • 34/51

Adaptive Sampling

Theorem 4. Let f(α) = limn→∞,m/n→α

Cn,m n . Then

f(α) = 1 + s! (r(α) − 1)!(s − r(α))!× 1

α

f α x

  • xr(α)(1 − x)s−r(α) dx

+ α f α − x 1 − x

  • xr(α)−1(1 − x)s+1−r(α) dx
  • .
  • Univ. Politècnica de Catalunya, Spain
slide-68
SLIDE 68
  • 35/51

Adaptive Sampling: Proportion-from-2

Here f(α) is composed of two “pieces” f1 and f2 for the intervals [0, 1/2] and (1/2, 1]

  • Univ. Politècnica de Catalunya, Spain
slide-69
SLIDE 69
  • 35/51

Adaptive Sampling: Proportion-from-2

Here f(α) is composed of two “pieces” f1 and f2 for the intervals [0, 1/2] and (1/2, 1] Because of symmetry we need only to solve for f1 f1(x) = a

  • (x − 1) ln(1 − x) + x3

6 + x2 2 − x

  • − b(1 + H(x)) + cx + d.
  • Univ. Politècnica de Catalunya, Spain
slide-70
SLIDE 70
  • 36/51

Adaptive Sampling: Proportion-from-2

The maximum is at α = 1/2. There f(1/2) = 3.112 . . .

  • Univ. Politècnica de Catalunya, Spain
slide-71
SLIDE 71
  • 36/51

Adaptive Sampling: Proportion-from-2

The maximum is at α = 1/2. There f(1/2) = 3.112 . . . Proportion-from-2 beats standard quickselect: f(α) ≤ m0(α)

  • Univ. Politècnica de Catalunya, Spain
slide-72
SLIDE 72
  • 36/51

Adaptive Sampling: Proportion-from-2

The maximum is at α = 1/2. There f(1/2) = 3.112 . . . Proportion-from-2 beats standard quickselect: f(α) ≤ m0(α) Proportion-from-2 beats median-of-three in some regions: f(α) ≤ m1(α) if α ≤ 0.140 . . . or α ≥ 0.860 . . .

  • Univ. Politècnica de Catalunya, Spain
slide-73
SLIDE 73
  • 36/51

Adaptive Sampling: Proportion-from-2

The maximum is at α = 1/2. There f(1/2) = 3.112 . . . Proportion-from-2 beats standard quickselect: f(α) ≤ m0(α) Proportion-from-2 beats median-of-three in some regions: f(α) ≤ m1(α) if α ≤ 0.140 . . . or α ≥ 0.860 . . . The grand-average: Cn = 2.598 · n + o(n)

  • Univ. Politècnica de Catalunya, Spain
slide-74
SLIDE 74
  • 37/51

Adaptive Sampling: Proportion-from-2

0.0 0.5 1.0 α 2.75 3.113 3.386 2 1.5 0.140 m0(α) f(α) m1(α)

  • Univ. Politècnica de Catalunya, Spain
slide-75
SLIDE 75
  • 38/51

Adaptive Sampling: Proportion-from-3

For proportion-from-3, f1(x) = −C0(1 + H(x)) + C1 + C2x + C3K1(x) + C4K2(x), f2(x) = −C5(1 + H(x)) + C6x(1 − x) + C7, with

K1(x) = cos( √ 2 ln x) ·

  • n≥0

Anxn+4 + sin( √ 2 ln x) ·

  • n≥0

Bnxn+4, K2(x) = sin( √ 2 ln x) ·

  • n≥0

Anxn+4 − cos( √ 2 ln x) ·

  • n≥0

Bnxn+4.

  • Univ. Politècnica de Catalunya, Spain
slide-76
SLIDE 76
  • 39/51

Adaptive Sampling: Proportion-from-3

Two maxima at α = 1/3 and α = 2/3. There f(1/3) = f(2/3) = 2.883 . . .

  • Univ. Politècnica de Catalunya, Spain
slide-77
SLIDE 77
  • 39/51

Adaptive Sampling: Proportion-from-3

Two maxima at α = 1/3 and α = 2/3. There f(1/3) = f(2/3) = 2.883 . . . The median is not the most difficult rank: f(1/2) = 2.723 . . .

  • Univ. Politècnica de Catalunya, Spain
slide-78
SLIDE 78
  • 39/51

Adaptive Sampling: Proportion-from-3

Two maxima at α = 1/3 and α = 2/3. There f(1/3) = f(2/3) = 2.883 . . . The median is not the most difficult rank: f(1/2) = 2.723 . . . Proportion-from-3 beats median-of-three in some regions: f(α) ≤ m1(α) if α ≤ 0.201 . . ., α ≥ 0.798 . . . or 1/3 < α < 2/3

  • Univ. Politècnica de Catalunya, Spain
slide-79
SLIDE 79
  • 39/51

Adaptive Sampling: Proportion-from-3

Two maxima at α = 1/3 and α = 2/3. There f(1/3) = f(2/3) = 2.883 . . . The median is not the most difficult rank: f(1/2) = 2.723 . . . Proportion-from-3 beats median-of-three in some regions: f(α) ≤ m1(α) if α ≤ 0.201 . . ., α ≥ 0.798 . . . or 1/3 < α < 2/3 The grand-average: Cn = 2.421 · n + o(n)

  • Univ. Politècnica de Catalunya, Spain
slide-80
SLIDE 80
  • 40/51

Adaptive Sampling: Batfind

0.0 0.5 1.0 α 2.75 2.723 2 4/3 0.201 0.276 f(α) m1(α)

  • Univ. Politècnica de Catalunya, Spain
slide-81
SLIDE 81
  • 40/51

Adaptive Sampling: Batfind

0.0 0.5 1.0 α 2.75 2.723 2 4/3 0.201 0.276 f(α) m1(α)

  • Univ. Politècnica de Catalunya, Spain
slide-82
SLIDE 82
  • 41/51

Adaptive Sampling: ν-find

Like proportion-from-3, but a1 = ν and a2 = 1 − ν

  • Univ. Politècnica de Catalunya, Spain
slide-83
SLIDE 83
  • 41/51

Adaptive Sampling: ν-find

Like proportion-from-3, but a1 = ν and a2 = 1 − ν Same differential equation, same fi’s, with Ci = Ci(ν)

  • Univ. Politècnica de Catalunya, Spain
slide-84
SLIDE 84
  • 41/51

Adaptive Sampling: ν-find

Like proportion-from-3, but a1 = ν and a2 = 1 − ν Same differential equation, same fi’s, with Ci = Ci(ν) If ν → 0 then fν → m1 (median-of-three)

  • Univ. Politècnica de Catalunya, Spain
slide-85
SLIDE 85
  • 41/51

Adaptive Sampling: ν-find

Like proportion-from-3, but a1 = ν and a2 = 1 − ν Same differential equation, same fi’s, with Ci = Ci(ν) If ν → 0 then fν → m1 (median-of-three) If ν → 1/2 then fν is similar to proportion-from-2, but it is not the same

  • Univ. Politècnica de Catalunya, Spain
slide-86
SLIDE 86
  • 42/51

Adaptive Sampling: ν-find

Theorem 5. There exists a value ν∗, namely,

ν∗ = 0.182 . . ., such that for any ν, 0 < ν < 1/2, and

any α,

fν∗(α) ≤ fν(α).

Furthermore, ν∗ is the unique value of ν such that fν is continuous,i.e.,

fν∗,1(ν∗) = fν∗,2(ν∗).

  • Univ. Politècnica de Catalunya, Spain
slide-87
SLIDE 87
  • 43/51

Adaptive Sampling: ν-find

Obviously, the value ν∗ minimizes the maximum fν∗(1/2) = 2.659 . . . and the mean f ν∗ = 2.342 . . .

  • Univ. Politècnica de Catalunya, Spain
slide-88
SLIDE 88
  • 43/51

Adaptive Sampling: ν-find

Obviously, the value ν∗ minimizes the maximum fν∗(1/2) = 2.659 . . . and the mean f ν∗ = 2.342 . . . If ν > ˜ ν = 0.268 . . . then fν has two absolute maxima at α = ν and α = 1 − ν; otherwise there is one absolute maximum at α = 1/2

  • Univ. Politècnica de Catalunya, Spain
slide-89
SLIDE 89
  • 44/51

Adaptive Sampling: ν-find

If ν ≤ ν′ = 0.404 . . . then ν-find beats median-of-3 on average ranks: f ν ≤ 5/2

  • Univ. Politècnica de Catalunya, Spain
slide-90
SLIDE 90
  • 44/51

Adaptive Sampling: ν-find

If ν ≤ ν′ = 0.404 . . . then ν-find beats median-of-3 on average ranks: f ν ≤ 5/2 If ν ≤ ν′

m = 0.364 . . . then ν-find beats

median-of-3 to find the median: fν(1/2) ≤ 11/4

  • Univ. Politècnica de Catalunya, Spain
slide-91
SLIDE 91
  • 44/51

Adaptive Sampling: ν-find

If ν ≤ ν′ = 0.404 . . . then ν-find beats median-of-3 on average ranks: f ν ≤ 5/2 If ν ≤ ν′

m = 0.364 . . . then ν-find beats

median-of-3 to find the median: fν(1/2) ≤ 11/4 If ν ≤ ν′ = 0.219 . . . then ν-find beats median-of-3 for all ranks: fν(α) ≤ m1(α)

  • Univ. Politècnica de Catalunya, Spain
slide-92
SLIDE 92
  • 45/51

Adaptive Sampling: ν-find

0.15 0.25 0.35 2.2 2.4 2.6 2.8 3.0 ν ν∗ ν′ ˜ ν ν′

m

fν(1/2) f1,ν(ν) f2,ν(ν) m1(ν) 2.75

  • Univ. Politècnica de Catalunya, Spain
slide-93
SLIDE 93
  • 46/51

Adaptive Sampling: proportion-from-s

Theorem 6. Let f (s)(α) = limn→∞,m/n→α

Cn,m n

when using samples of size s. Then for any adaptive sampling strategy such that lims→∞ r(α)/s = α

f (∞)(α) = lim

s→∞ f (s)(α) = 1 + min(α, 1 − α).

  • Univ. Politècnica de Catalunya, Spain
slide-94
SLIDE 94
  • 47/51

Partial Sort

Partial sort: Given an array A of n elements, return the m smallest elements in A in ascending order

  • Univ. Politècnica de Catalunya, Spain
slide-95
SLIDE 95
  • 47/51

Partial Sort

Partial sort: Given an array A of n elements, return the m smallest elements in A in ascending order Heapsort-based partial sort: Build a heap, extract m times the minimum; the cost is Θ(n + m log n)

  • Univ. Politècnica de Catalunya, Spain
slide-96
SLIDE 96
  • 47/51

Partial Sort

Partial sort: Given an array A of n elements, return the m smallest elements in A in ascending order Heapsort-based partial sort: Build a heap, extract m times the minimum; the cost is Θ(n + m log n) “Quickselsort”: find the m-th with quickselect, then quicksort m − 1 elements to its left; the cost is Θ(n + m log m)

  • Univ. Politècnica de Catalunya, Spain
slide-97
SLIDE 97
  • 48/51

Partial Quicksort

void partial_quicksort(vector<Elem>& A, int i, int j, int m) { if (i < j) { int p = get_pivot(A, i, j); swap(A[p], A[l]); int k; partition(A, i, j, k); partial_quicksort(A, i, k - 1, m); if (k < m-1) partial_quicksort(A, k + 1, j, m); } }

  • Univ. Politècnica de Catalunya, Spain
slide-98
SLIDE 98
  • 49/51

Partial Quicksort

Average number of comparisons Pn,m to sort m smallest elements: Pn,m = n − 1 +

n

  • k=m+1

πn,k · Pk−1,m +

m

  • k=1

πn,k · (Pk−1,k−1 + Pn−k,m−k)

  • Univ. Politècnica de Catalunya, Spain
slide-99
SLIDE 99
  • 49/51

Partial Quicksort

Average number of comparisons Pn,m to sort m smallest elements: Pn,m = n − 1 +

n

  • k=m+1

πn,k · Pk−1,m +

m

  • k=1

πn,k · (Pk−1,k−1 + Pn−k,m−k) But Pn,n = Qn = 2(n + 1)Hn − 4n!

  • Univ. Politècnica de Catalunya, Spain
slide-100
SLIDE 100
  • 50/51

Partial Quicksort

The recurrence for Pn,m is the same as for quickselect but the toll function is n − 1 +

  • 0≤k<m

πn,kQk

  • Univ. Politècnica de Catalunya, Spain
slide-101
SLIDE 101
  • 50/51

Partial Quicksort

The recurrence for Pn,m is the same as for quickselect but the toll function is n − 1 +

  • 0≤k<m

πn,kQk For πn,k = 1/n, the solution is Pn,m = 2n + 2(n + 1)Hn − 2(n + 3 − m)Hn+1−m − 6m + 6

  • Univ. Politècnica de Catalunya, Spain
slide-102
SLIDE 102
  • 51/51

Partial Quicksort

Partial quicksort makes 2m − 4Hm + 2 comparisons less than “quickselsort”

  • Univ. Politècnica de Catalunya, Spain
slide-103
SLIDE 103
  • 51/51

Partial Quicksort

Partial quicksort makes 2m − 4Hm + 2 comparisons less than “quickselsort” It makes m/3 − 5Hm/6 + 1/2 exchanges less than “quickselsort”

  • Univ. Politècnica de Catalunya, Spain
slide-104
SLIDE 104
  • 51/51

Partial Quicksort

Partial quicksort makes 2m − 4Hm + 2 comparisons less than “quickselsort” It makes m/3 − 5Hm/6 + 1/2 exchanges less than “quickselsort” Why? Short, intuitive explanation?

  • Univ. Politècnica de Catalunya, Spain