403: Algorithms and Data Structures Quicksort Fall 2016 UAlbany - - PowerPoint PPT Presentation

403 algorithms and data structures quicksort
SMART_READER_LITE
LIVE PREVIEW

403: Algorithms and Data Structures Quicksort Fall 2016 UAlbany - - PowerPoint PPT Presentation

403: Algorithms and Data Structures Quicksort Fall 2016 UAlbany Computer Science Some slides borrowed from David Luebke So far: SorDng Algorithm Time Space Inser6on O(n 2 ) in-place Merge O(n logn) 2 nd array to merge


slide-1
SLIDE 1

403: Algorithms and Data Structures Quicksort

Fall 2016 UAlbany Computer Science

Some slides borrowed from David Luebke

slide-2
SLIDE 2

So far: SorDng

  • Inser6on

O(n2) in-place

  • Merge

O(n logn) 2nd array to merge

  • Heapsort

O(n logn) in-place

  • Quicksort

from O(n logn) to O(n2) in-place – very good in pracDce (small constants) – QuadraDc Dme is rare

Next

Algorithm Time Space

slide-3
SLIDE 3

Quicksort

  • Another divide-and-conquer algorithm

– DIVIDE: The array A[p..r] is par11oned into two non-empty subarrays A[p..q] and A[q+1..r]

  • Invariant: All elements in A[p..q] are less than all

elements in A[q+1..r]

– CONQUER: The subarrays are recursively sorted by calls to quicksort – COMBINE: Unlike merge sort, no combining step: two subarrays form an already-sorted array

slide-4
SLIDE 4

Quicksort Code

Quicksort(A, p, r) { if (p < r) { q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); } }

slide-5
SLIDE 5

ParDDon

  • Clearly, all the acDon takes place in the

partition() funcDon

– Rearranges the subarray in place – End result:

  • Two subarrays
  • All values in first subarray ≤ all values in second

– Returns the index of the “pivot” element separaDng the two subarrays

  • How do you suppose we implement this?
slide-6
SLIDE 6

ParDDon In Words

  • ParDDon(A, p, r):

– Select an element to act as the “pivot” (which?) – Grow two regions, A[p..i] and A[j..r]

  • All elements in A[p..i] <= pivot
  • All elements in A[j..r] >= pivot

– Increment i unDl A[i] >= pivot – Decrement j unDl A[j] <= pivot – Swap A[i] and A[j] – Repeat unDl i >= j – Return j

Note: slightly different from book’s partition()

slide-7
SLIDE 7

ParDDon Code

Partition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;

Illustrate on A = {4,5,9,7,2,13,6,3}; i j Choose pivot x

Scan looking for element exceeding x Scan looking for element at most x When we find such elements, Exchange them

slide-8
SLIDE 8

Pivot=4 Goal: 4 5 9 7 2 13 6 3 i=0 j=9 3 5 9 7 2 13 6 4 i=0 j=9 i=2 j=5 3 2 9 7 5 13 6 4 i=3 j=5 i=2 j=2 i>j: DONE <=x >=x

Example

Assume all elements are disDnct

slide-9
SLIDE 9

ParDDon Code

Partition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j;

partition() runs in O(n) time

  • O(1) at each element: skip or

swap

  • Linear in the size of the array

What is the running time of partition()?

slide-10
SLIDE 10

Back to Quicksort

Quicksort(A, p, r) if (p < r) q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r);

3 9 5 7

Qsort(A,1,4)

A

Part(A,1,4) Returns: 1

3 9 5 7

Qsort(A,1,1) Qsort(A,2,4) Part(A,2,4) Returns: 3

3 7 5 9

Qsort(A,2,3) Part(A,2,4) Returns: 2

3 5 7 9

Qsort(A,2,2) Qsort(A,3,3) Qsort(A,4,4)

slide-11
SLIDE 11

Analyzing Quicksort

  • What will be a bad case for the algorithm?

– ParDDon is always unbalanced

  • What will be the best case for the algorithm?

– ParDDon is perfectly balanced

  • Which is more likely?

– The lader, by far, except...

  • Will any par1cular input elicit the worst case?

– Yes: Already-sorted input

slide-12
SLIDE 12

Analyzing Quicksort: Balanced splits

  • In the balanced split case:

T(n) = 2T(n/2) + Θ(n)

  • What does this work out to?

T(n) = Θ(n lg n)

Take home: A good balance is important

slide-13
SLIDE 13

Analyzing Quicksort: Sorted case

  • Sorted case:

T(1) = Θ(1) T(n) = T(n - 1) + Θ(n) by subsDtuDon… T(n) = T(1) + nΘ(n)

  • Works out to

T(n) = Θ(n2)

2 3 6 7 10 13 14 16

First call: j will decrease to 1 (n steps) Second: j decrease to 2 (n-1 steps) … n+ n-1 + n-2 + … = Θ(n2)

slide-14
SLIDE 14

Is sorted really the worst case?

  • Argue formally that things cannot get worse
  • A formal argument with general split
  • Assume that every split results in two arrays

– Size q – Size n-q

  • T(n) = max 1<=q<=n-1[T(q)+T(n-q)] + O(n)

– where T(1) = O(1)

  • Show that T(n) = O(n2)

IT CANNOT GET WORSE

slide-15
SLIDE 15

Average behavior: IntuiDon

  • Worst case: assumes 1:n-1 split

– rare in pracDce

  • The O(nlogn) behavior occurs even if the split

is say 10%:90%

  • If all splits are equally likely

– 1:n-1, 2:n-2 … n-1:1 – then on average, we will not get a very tall tree – details in extra slide at the end (not required)

slide-16
SLIDE 16

Avoiding the O(n2) case

  • The real liability of quicksort is that it runs in

O(n2) on already-sorted input

  • SoluDons

– Randomize the input array – Pick a random pivot element – choose 3 elements and take median for pivot

  • How will these solve the problem?

– By ensuring that no parDcular input can be chosen to make quicksort run in O(n2) Dme

slide-17
SLIDE 17

Other Improvements (lower constants)

  • When a subarray is small (say smaller than 5)

switch to a simple sorDng procedure say inserDon sort instead of Quicksort

– why does this help?

  • Pick more than one pivot

– ParDDons the array in more than 2 parts – Smaller number of comparisons (1.9nlogn vs 2nlogn ) and overall beder performance in pracDce – Details: Kushagra et al. “MulD-Pivot Quicksort: Theory and Experiments”, SIAM, 2013

slide-18
SLIDE 18

Announcements

  • Read through Chapter 7
  • HW2 due on Wednesday
slide-19
SLIDE 19

Extra slides*

  • Average case rigorous analysis follows
  • This is advanced material (will not appear in

HWs and exam)

slide-20
SLIDE 20

Analyzing Quicksort: Average Case

  • Assuming random input, average-case running

Dme is much closer to O(n lg n) than O(n2)

  • First, a more intuiDve explanaDon/example:

– Suppose that parDDon() always produces a 9-to-1

  • split. This looks quite unbalanced!

– The recurrence is thus: T(n) = T(9n/10) + T(n/10) + n – How deep will the recursion go?

Use n instead of O(n) for convenience (how?)

slide-21
SLIDE 21

Analyzing Quicksort: Average Case

  • IntuiDvely, a real-life run of quicksort will

produce a mix of “bad” and “good” splits

– Randomly distributed among the recursion tree – Pretend for intuiDon that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) – What happens if we bad-split root node, then good-split the resul1ng size (n-1) node?

slide-22
SLIDE 22

Analyzing Quicksort: Average Case

  • IntuiDvely, a real-life run of quicksort will

produce a mix of “bad” and “good” splits

– Randomly distributed among the recursion tree – Pretend for intuiDon that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) – What happens if we bad-split root node, then good-split the resul1ng size (n-1) node?

  • We fail English
slide-23
SLIDE 23

Analyzing Quicksort: Average Case

  • IntuiDvely, a real-life run of quicksort will

produce a mix of “bad” and “good” splits

– Randomly distributed among the recursion tree – Pretend for intuiDon that they alternate between best-case (n/2 : n/2) and worst-case (n-1 : 1) – What happens if we bad-split root node, then good- split the resul1ng size (n-1) node?

  • We end up with three subarrays, size 1, (n-1)/2, (n-1)/2
  • Combined cost of splits = n + n -1 = 2n -1 = O(n)
  • No worse than if we had good-split the root node!
slide-24
SLIDE 24

Analyzing Quicksort: Average Case

  • IntuiDvely, the O(n) cost of a bad split

(or 2 or 3 bad splits) can be absorbed into the O(n) cost of each good split

  • Thus running Dme of alternaDng bad and good

splits is sDll O(n lg n), with slightly higher constants

  • How can we be more rigorous?
slide-25
SLIDE 25

Analyzing Quicksort: Average Case

  • For simplicity, assume:

– All inputs disDnct (no repeats) – Slightly different partition() procedure

  • parDDon around a random element, which is not

included in subarrays

  • all splits (0:n-1, 1:n-2, 2:n-3, … , n-1:0) equally likely
  • What is the probability of a par1cular split

happening?

  • Answer: 1/n
slide-26
SLIDE 26

Analyzing Quicksort: Average Case

  • So parDDon generates splits

(0:n-1, 1:n-2, 2:n-3, … , n-2:1, n-1:0) each with probability 1/n

  • If T(n) is the expected running Dme,
  • What is each term under the summa1on for?
  • What is the Θ(n) term for?

( ) ( ) ( )

[ ]

( )

− =

Θ + − − + =

1

1 1 n

k

n k n T k T n n T

slide-27
SLIDE 27

Analyzing Quicksort: Average Case

  • So…

– Note: this is just like the book’s recurrence (p166), except that the summaDon starts with k=0 – We’ll take care of that in a second

( ) ( ) ( )

[ ]

( ) ( ) ( )

∑ ∑

− = − =

Θ + = Θ + − − + =

1 1

2 1 1

n k n k

n k T n n k n T k T n n T

Write it on the board

slide-28
SLIDE 28

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer – Assume that the inducDve hypothesis holds – SubsDtute it in for some value < n – Prove that it follows for n

slide-29
SLIDE 29

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • What’s the answer?

– Assume that the inducDve hypothesis holds – SubsDtute it in for some value < n – Prove that it follows for n

slide-30
SLIDE 30

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds – SubsDtute it in for some value < n – Prove that it follows for n

slide-31
SLIDE 31

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds

  • What’s the induc1ve hypothesis?

– SubsDtute it in for some value < n – Prove that it follows for n

slide-32
SLIDE 32

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds

  • T(n) ≤ an lg n + b for some constants a and b

– SubsDtute it in for some value < n – Prove that it follows for n

slide-33
SLIDE 33

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds

  • T(n) ≤ an lg n + b for some constants a and b

– SubsDtute it in for some value < n

  • What value?

– Prove that it follows for n

slide-34
SLIDE 34

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds

  • T(n) ≤ an lg n + b for some constants a and b

– SubsDtute it in for some value < n

  • The value k in the recurrence

– Prove that it follows for n

slide-35
SLIDE 35

Analyzing Quicksort: Average Case

  • We can solve this recurrence using the

dreaded subsDtuDon method

– Guess the answer

  • T(n) = O(n lg n)

– Assume that the inducDve hypothesis holds

  • T(n) ≤ an lg n + b for some constants a and b

– SubsDtute it in for some value < n

  • The value k in the recurrence

– Prove that it follows for n

  • Grind through it…
slide-36
SLIDE 36

Note: leaving the same recurrence as the book What are we doing here?

Analyzing Quicksort: Average Case

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

∑ ∑ ∑ ∑ ∑

− = − = − = − = − =

Θ + + = Θ + + + = Θ + ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + ≤ Θ + + ≤ Θ + =

1 1 1 1 1 1 1 1

lg 2 2 lg 2 lg 2 lg 2 2

n k n k n k n k n k

n b k ak n n n b b k ak n n b k ak b n n b k ak n n k T n n T

The recurrence to be solved What are we doing here? What are we doing here? Plug in inductive hypothesis Expand out the k=0 case 2b/n is just a constant, so fold it into Θ(n)

slide-37
SLIDE 37

What are we doing here? What are we doing here? Evaluate the summation: b+b+…+b = b (n-1) The recurrence to be solved Since n-1<n, 2b(n-1)/n < 2b

Analyzing Quicksort: Average Case

( ) ( ) ( ) ( ) ( ) ( )

n b k k n a n n n b k k n a n b n k ak n n b k ak n n T

n k n k n k n k n k

Θ + + ≤ Θ + − + = Θ + + = Θ + + =

∑ ∑ ∑ ∑ ∑

− = − = − = − = − =

2 lg 2 ) 1 ( 2 lg 2 2 lg 2 lg 2

1 1 1 1 1 1 1 1 1 1 What are we doing here? Distribute the summation This summation gets its own set of slides later

slide-38
SLIDE 38

How did we do this? Pick a large enough that an/4 dominates Θ(n)+b What are we doing here? Remember, our goal is to get T(n) ≤ an lg n + b What the hell? We’ll prove this later What are we doing here? Distribute the (2a/n) term The recurrence to be solved

Analyzing Quicksort: Average Case

( ) ( ) ( ) ( ) ( )

b n an n a b n b n an n b n a n an n b n n n n a n b k k n a n T

n k

+ ≤ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − + Θ + + = Θ + + − = Θ + + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ≤ Θ + + ≤

− =

lg 4 lg 2 4 lg 2 8 1 lg 2 1 2 2 lg 2

2 2 1 1

slide-39
SLIDE 39

Analyzing Quicksort: Average Case

  • So T(n) ≤ an lg n + b for certain a and b

– Thus the inducDon holds – Thus T(n) = O(n lg n) – Thus quicksort runs in O(n lg n) Dme on average (phew!)

  • Oh yeah, the summaDon…
slide-40
SLIDE 40

What are we doing here? The lg k in the second term is bounded by lg n

Tightly Bounding The Key SummaDon

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

∑ ∑ ∑ ∑ ∑ ∑ ∑

− = − = − = − = − = − = − =

+ = + ≤ + =

1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 1

lg lg lg lg lg lg lg

n n k n k n n k n k n n k n k n k

k n k k n k k k k k k k k k

What are we doing here? Move the lg n outside the summation What are we doing here? Split the summation for a tighter bound

slide-41
SLIDE 41

The summation bound so far

Tightly Bounding The Key SummaDon

⎡ ⎤ ⎡ ⎤

( )

⎡ ⎤ ⎡ ⎤

( )

⎡ ⎤ ⎡ ⎤

( )

⎡ ⎤ ⎡ ⎤

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

− = − = − = − = − = − = − = − = − =

+ − = + − = + ≤ + ≤

1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 2 1 2 1 1 1

lg 1 lg lg 1 lg lg 2 lg lg lg lg

n n k n k n n k n k n n k n k n n k n k n k

k n k n k n n k k n n k k n k k k k

What are we doing here? The lg k in the first term is bounded by lg n/2 What are we doing here? lg n/2 = lg n - 1 What are we doing here? Move (lg n - 1) outside the summation

slide-42
SLIDE 42

The summation bound so far

Tightly Bounding The Key SummaDon

( )

⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤

( )

⎡ ⎤

∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

− = − = − = − = − = − = − = − = − =

− ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − = − = + − = + − ≤

1 2 1 1 2 1 1 1 1 2 1 2 1 1 2 1 1 2 1 2 1 1 1

2 ) ( 1 lg lg lg lg lg 1 lg lg

n k n k n k n n k n k n k n n k n k n k

k n n n k k n k n k k n k n k n k k

What are we doing here? Distribute the (lg n - 1) What are we doing here? The summations overlap in range; combine them What are we doing here? The Guassian series

slide-43
SLIDE 43

The summation bound so far

Tightly Bounding The Key SummaDon

( )

⎡ ⎤

( )

[ ]

( )

[ ]

( )

4 8 1 lg lg 2 1 1 2 2 2 1 lg 1 2 1 lg 1 2 1 lg 2 ) ( 1 lg

2 2 1 2 1 1 2 1 1 1

n n n n n n n n n n n k n n n k n n n k k

n k n k n k

+ − − ≤ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − ≤ − − ≤ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ≤

∑ ∑ ∑

− = − = − = What are we doing here? Rearrange first term, place upper bound on second What are we doing? X Guassian series What are we doing? Multiply it all out

slide-44
SLIDE 44

Tightly Bounding The Key SummaDon

( )

! ! Done! 2 when 8 1 lg 2 1 4 8 1 lg lg 2 1 lg

2 2 2 2 1 1

≥ − ≤ + − − ≤

− =

n n n n n n n n n n k k

n k