Mergesort and Quicksort LAST TODAY NEXT Binary search Divide and - - PowerPoint PPT Presentation
Mergesort and Quicksort LAST TODAY NEXT Binary search Divide and - - PowerPoint PPT Presentation
Mergesort and Quicksort LAST TODAY NEXT Binary search Divide and conquer Part II of course mergesort and quicksort Data structures Recursion Randomness Recall: Complexity of binary search Worst case: O(log n) Best case: O(1) Review
LAST Binary search TODAY Divide and conquer
- mergesort and quicksort
Recursion Randomness NEXT Part II of course
- Data structures
Recall: Complexity of binary search
Worst case: O(log n) Best case: O(1)
Review
Algorithm Complexity Linear search O(n) Binary search O(log n)
Why do we get a logarithmic speed up in moving from linear search to binary search?
Review
Algorithm Complexity Linear search O(n) Selection sort O(n2) Binary search O(log n)
In Practice …
Suppose that Google sorts 109 pages, and examining each page takes 10-9
- seconds. How long does it take to sort all the pages using selection sort?
Algorithm Complexity Linear search O(n) Selection sort O(n2) Binary search O(log n)
(109 )2 * 10-9 = 109 more than 3 years
Warm-up exercise
sorting the first half: _________ steps sorting the second half: _________ steps
n
sort each half
sorted sorted
If we used an O(n2) algorithm for sorting, for an input of size n, how many steps would it take to sort the two halves?
n/2 elements n/2 elements
Suppose we sort each half separately, and then combine them with an O(n) algorithm.
Doing less work for sorting
size of input
work (number of steps)
n n2 n/2 n2/4
n/2n/2 nn/2
sort each half
sorted sorted
Divide and conquer for sorting
Divide and conquer
n/2n/2 nn/2 n/2sorted
merge two halves sort each half
sorted sorted
Toward implementation
n
lo hi mid
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert … selection_sort(A, lo, mid); //@assert is_sorted(A, lo, mid); selection_sort(A, mid, hi); //@assert is_sorted(A, mid, hi); … }
n
lo hi mid
Are the function calls safe?
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; selection_sort(A, lo, mid); //@assert is_sorted(A, lo, mid); selection_sort(A, mid, hi); //@assert is_sorted(A, mid, hi); … }
n
lo hi mid
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; selection_sort(A, lo, mid); //@assert is_sorted(A, lo, mid); selection_sort(A, mid, hi); //@assert is_sorted(A, mid, hi); … } void merge(int[] A, int lo, int mid, int hi) //@requires 0 <= lo && lo <= mid && mid <= hi && hi <= \length(A); //@requires is_sorted(A, lo, mid) && is_sorted(A, mid, hi); //@ensures is_sorted(A, lo, hi);
n
lo hi mid
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; selection_sort(A, lo, mid); //@assert is_sorted(A, lo, mid); selection_sort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); //@assert is_sorted(A, lo, hi); } void merge(int[] A, int lo, int mid, int hi) //@requires 0 <= lo && lo <= mid && mid <= hi && hi <= \length(A); //@requires is_sorted(A, lo, mid) && is_sorted(A, mid, hi); //@ensures is_sorted(A, lo, hi);
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; selection_sort(A, lo, mid); //@assert is_sorted(A, lo, mid); selection_sort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); //@assert is_sorted(A, lo, hi); }
Suppose merge is O(n), what is the complexity of sort?
O(n2) + O(n) = O(n2)
Mergesort
Some observations
void selection_sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); ; void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; selection_sort(A, lo, mid); selection_sort(A, mid, hi); merge(A, lo, mid, hi); }
same contracts
We can use sort instead of selection_sort recursively!
Recursive function
void sort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; sort(A, lo, mid); sort(A, mid, hi); merge(A, lo, mid, hi); //@assert is_sorted(A, lo, hi); }
Recursive merge sort
void merge(int[] A, int lo, int mid, int hi) //@requires 0 <= lo && lo <= mid && mid <= hi && hi <= \length(A); //@requires is_sorted(A, lo, mid) && is_sorted(A, mid, hi); //@ensures is_sorted(A, lo, hi); ;
How can we reason about correctness of recursive code?
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; mergesort(A, lo, mid); mergesort(A, mid, hi); merge(A, lo, mid, hi); }
A problem?
void merge(int[] A, int lo, int mid, int hi) //@requires 0 <= lo && lo <= mid && mid <= hi && hi <= \length(A); //@requires is_sorted(A, lo, mid) && is_sorted(A, mid, hi); //@ensures is_sorted(A, lo, hi); ; void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; mergesort(A, lo, mid); mergesort(A, mid, hi); merge(A, lo, mid, hi); }
Adding a base case
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int mid = lo + (hi - lo)/2; //@assert lo <= mid && mid <= hi; mergesort(A, lo, mid); //@assert is_sorted(A, lo, mid); mergesort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); }
Adding a base case
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int mid = lo + (hi - lo)/2; //@assert lo < mid && mid < hi; mergesort(A, lo, mid); //@assert is_sorted(A, lo, mid); mergesort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); }
Complexity
n arrays of size 1
merge
n/2 arrays of size 2 2 arrays of size n/2
merge
1 array of size n
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int mid = lo + (hi - lo)/2; //@assert lo < mid && mid < hi; mergesort(A, lo, mid); //@assert is_sorted(A, lo, mid); mergesort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); }
How many levels are there?
n arrays of size 1 n/2 arrays of size 2 2 arrays of size n/2
O(n)
1 array of size n
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int mid = lo + (hi - lo)/2; //@assert lo < mid && mid < hi; mergesort(A, lo, mid); //@assert is_sorted(A, lo, mid); mergesort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); }
O(n) O(n)
How many levels are there?
n arrays of size 1 n/2 arrays of size 2 2 arrays of size n/2
O(n log n)
1 array of size n
void mergesort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int mid = lo + (hi - lo)/2; //@assert lo < mid && mid < hi; mergesort(A, lo, mid); //@assert is_sorted(A, lo, mid); mergesort(A, mid, hi); //@assert is_sorted(A, mid, hi); merge(A, lo, mid, hi); }
Recall
Suppose that Google sorts 109 pages, and examining each page takes 10-9
- seconds. How long does it take to sort all the pages using selection sort?
Algorithm Complexity Linear search O(n) Selection sort O(n2) Binary search O(log n)
(109 )2 * 10-9 = 109 more than 3 years
In Practice …
Suppose that Google sorts 109 pages, and examining each page takes 10-9
- seconds. How long does it take to sort all the pages using an O(n log n)
sorting algorithm?
Algorithm Complexity Linear search O(n) Selection sort O(n2) Binary search O(log n) Merge sort O(n log n)
109 * log (109 ) * 10-9 = log 109 30 seconds
Quicksort
Abstract view like merge sort
partition
lo hi
x sorted sorted
lo hi
x
p
sort parts
lo hi
x smaller
p
larger A[p] ≥ A[lo,p) A[p] ≤ A[p+1,hi)
Example
int partition(int[] A, int lo, int hi) //@requires 0 <= lo && lo < hi && hi <= \length(A); //@ensures lo <= \result && \result < hi; //@ensures ge_seg(A[\result], A, lo, \result); //@ensures le_seg(A[\result], A, \result+1, hi); ; void quicksort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi);
int partition(int[] A, int lo, int hi) //@requires 0 <= lo && lo < hi && hi <= \length(A); //@ensures lo <= \result && \result < hi; //@ensures ge_seg(A[\result], A, lo, \result); //@ensures le_seg(A[\result], A, \result+1, hi); ; void quicksort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; }
int partition(int[] A, int lo, int hi) //@requires 0 <= lo && lo < hi && hi <= \length(A); //@ensures lo <= \result && \result < hi; //@ensures ge_seg(A[\result], A, lo, \result); //@ensures le_seg(A[\result], A, \result+1, hi); ; void quicksort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int p = ________________; //@assert lo <= p && p < hi; //@assert ge_seg(A[p],A,lo,p) && le_seg(A[p],A,p+1,hi); _____________________; //@assert is_sorted(A, lo, p); _____________________; //@assert is_sorted(A, p+1, hi); //@assert is_sorted(A, lo, hi); }
int partition(int[] A, int lo, int hi) //@requires 0 <= lo && lo < hi && hi <= \length(A); //@ensures lo <= \result && \result < hi; //@ensures ge_seg(A[\result], A, lo, \result); //@ensures le_seg(A[\result], A, \result+1, hi); ; void quicksort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int p = partition(A, lo, hi); //@assert lo <= p && p < hi; //@assert ge_seg(A[p],A,lo,p) && le_seg(A[p],A,p+1,hi); quicksort(A, lo, p); //@assert is_sorted(A, lo, p); quicksort(A, p+1, hi); //@assert is_sorted(A, p+1, hi); //@assert is_sorted(A, lo, hi); }
Correctness of quicksort
int partition(int[] A, int lo, int hi) //@requires 0 <= lo && lo < hi && hi <= \length(A); //@ensures lo <= \result && \result < hi; //@ensures ge_seg(A[\result], A, lo, \result); //@ensures le_seg(A[\result], A, \result+1, hi); ; void quicksort(int[] A, int lo, int hi) //@requires 0 <= lo && lo <= hi && hi <= \length(A); //@ensures is_sorted(A, lo, hi); { if (hi - lo <= 1) return; int p = partition(A, lo, hi); //@assert lo <= p && p < hi; //@assert ge_seg(A[p], A,lo,p) && le_seg(A[p], A,p+1,hi); quicksort(A, lo, p); //@assert is_sorted(A, lo, p); quicksort(A, p+1, hi); //@assert is_sorted(A, p+1, hi); //@assert is_sorted(A, lo, hi); }
- A[p] ≥ A[lo,p) postcondition of partition
- A[p] ≤ A[p+1,hi) postcondition of partition
- A[lo,p) is sorted
- A[p+1,hi) is sorted
Choice of midpoint
- Best: partition always chooses median as pivot
- Cost: O(n log n)
- Impractical
- Worst: partition always return index of minimal element
- Degenerates into selection sort
- O(n2)
- In practice
- Always return a fixed index, or random index, …
- Small probability that we hit the worst case O(n2)
- Average case O(n log n)
Comparing sorting algorithms
selection sort merge sort quicksort worst-case average-case in-place?
O(n2) O(n2) O(n log n) O(n2) O(n log n) O(n log n) Yes No Yes
Bonus slides
Comparing sorting algorithms
selection sort merge sort quicksort worst-case average-case in-place? stable?
O(n2) O(n2) O(n log n) O(n2) O(n log n) O(n log n) Yes No Yes Yes No No
Practical consequences
Credit: Algorithm Design, Tardos, Kleinberg (table), UC Berkeley CS 61B, Hug (slide)