COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 3 Chapters 6 7 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 6 Transform-and-Conquer Jim Diamond, Jodrey School of Computer Science, Acadia University Chapter 6
Chapter 6
Transform-and-Conquer
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 95
Transform and Conquer
- Idea: somehow transform the given instance, or any instance of a
problem, to something simpler or something already solved
- Three major variations:
– instance simplification: transform a problem instance to a simpler
- r more convenient instance of the same problem
– representation change: transform a problem instance to a different representation of the same instance – problem reduction: transform a problem instance to an instance of a different problem, for which a solution technique is already known
- The concept of problem reduction is well-known in various areas of
mathematics, and figures heavily in the study of NP-completeness –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 96
Presorting
- Many problems involving lists are easier when the list is sorted
– – computing the median (selection problem) – checking if all elements are distinct (element uniqueness)
- Also:
– topological sorting helps to solve some problems on dags – presorting is used in many geometric algorithms
- Note: if sorting is more expensive than another solution to the original
problem, it makes little or no sense to do this transformation
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 97
Presorting Example: Checking For Uniqueness
- Suppose you have a list of numbers and want to confirm that all
numbers are unique – – this is Θ(n2)
GEQ: What is EXACT answer?
- Instead, consider presorting the list
–
- Time complexity?
Tunique(n) = Tsort(n) + Tscan(n) ∈ Θ(n lg n) + Θ(n) = Θ(n lg n)
- Other possibilities?
– – linear time?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 98
Searching
- Searching is a ubiquitous problem in computer science
- Example: use grep to find a matching string in a text file or text stream
- Example: use a web search engine to find 345,678 matches to a query
- Example: in a database use a command like
select <stuff> from <some table> where <some condition>;
to ask the database to search for some data
- Q: how much time does grep take to search through an input of n
bytes? – – even if the search stops upon the first match, you still might need to process the entire file
- While linear time is generally a good time complexity for a problem, in
the case of searching that may not be acceptable –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 99
Binary Trees
- A binary tree is a tree in which each node has at most two children
(the left child and the right child)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 100
Binary Search Trees (also called “ordered binary trees”)
- A binary search tree is a binary tree in which
– – all children in the left subtree of a given node N have values less than N’s value – all children in the right subtree of a given node N have values greater than N’s value
6 3 1 5 8 9
- A binary search tree is said to be balanced if, for every pair of leaves,
the lengths of the paths from the root to these leaves are different by at most 1 –
- Q: how can we efficiently build a balanced search tree?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 101
Binary Search Trees: 2
- If we use a straightforward technique for building a binary search tree,
some input sequences create “deep” trees – – 2 would be at the root, 3 its only child, 5 the only child of 3, and so on – with n items in this sequence, the maximum depth would be n − 1
- Q: how many “probes” to find an item which is in the tree?
A: on average, about n/2
- Q: how many “probes” to determine an item is not in the tree?
A: on average, about n/2
- Not much better than an unsorted list!
–
- Solution: balance the tree
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 102
Balanced Search Trees
- There are various ways of balancing a search tree
– AVL trees: binary trees where the subtree heights differ by at most 1 – if an addition or deletion unbalances the tree, some rotations are done to re-balance the tree – Red-black trees: binary trees where, at any node, the height of one subtree can be at most twice the height of the other
- 2–3 trees: all leaves are at the same depth, but each internal node can
have either 2 or 3 children
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 103
2–3 Tree Node Insertion
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 104
2–3 Tree: Analysis
- Q: how good is a 2–3 tree? (That is, how quickly can we search?)
- A: it depends on the exact shape of the tree
- Consider the case of an n-key tree of height h where all internal nodes
have degree 2; we must have
n = 1 + 2 + 4 + · · · + 2h = 2h+1 − 1
Therefore h = log2(n + 1) − 1
- Now consider the case of an n-key tree of height h where all internal
nodes have degree 3; we must have
n = 2 + 6 + 18 + · · · + 2 · 3h = 2(1 + 3 + 9 + · · · + 3h) = 3h+1 − 1
Therefore h = log3(n + 1) − 1
- These provide upper and lower bounds for any 2–3 tree with n keys, so
that
log3(n + 1) − 1 ≤ h ≤ log2(n + 1) − 1
- Therefore insertion, deletion and searching are all Θ(log n) in both the
average case and the worst case
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 105
Heaps
- A priority queue is an abstract data type which supports the following
- perations:
–
- A heap is a data structure which implements a priority queue
– it is an implicit data structure — no pointers/links/etc are used
- A heap is a balanced binary tree in which the deepest leaves are all to
the left of the other leaves
- Note that the children of the node in array index n are in indices 2n and
2n + 1
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 106
Heaps: 2
- In a (max-)heap, the values stored in the children of a node N have
values less than that stored in N –
- This is similar to a binary tree
– but in the case of a heap there is no specified relationship between the values in the two child nodes
- Note that a left child can have a value greater or less than that of its
sibling
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 107
Heap Properties (Max Heap)
- For any given n, there is only 1 binary tree which has the right shape to
be a heap
- The largest element in a heap is always at the root
- For any node in a heap, that node with its left and right subtrees is also
a heap
- To represent a heap in an array, just write down the elements top to
bottom, left to right
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 108
Constructing a Heap (“Heapify”)
- Insert the elements in the array (or the binary tree) in the order they are
received
- Then starting at the bottom level of non-leaves, if the parent’s value is
not larger than both children, swap the parent with the largest child –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 109
Heapify Pseudo-Code
HeapBottomUp(H[1..n]) // Construct a max-heap from the elements of the given array // using the bottom-up algorithm. // Input: an array H[1..n] of orderable items // Output: a max-heap H[1..n] for i = n/2 downto 1 do // i is root of current heap k = i // index of sub-tree being checked v = H[k] // save this node’s value heap = false while not heap and 2 * k < n do j = 2 * k if j < n // if there are two children if H[j] < H[j + 1] // pick the larger value j++ if v >= H[j] // is this node’s value bigger? heap = true // if so we are done else H[k] = H[j] // else "bubble" child up k = j H[k] = v // put the saved value in the empty slot
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 110
Efficiency of Heapify
- Assume the tree is full; i.e., n = 2k − 1 for some k
– define h = ⌊lg n⌋ to be the height of the tree – in this case, h = k − 1
- In the worst case, every time we examine an internal node it will need to
be moved to a leaf –
- Thus a value at level i in the tree will be compared to 2(h − i) other
values (note: the root is at level 0)
- The total number of comparisons is therefore
Cworst(n) =
h−1
- i=0
- level
i keys
2(h − i) =
h−1
- i=0
2(h − i)2i = 2
- n − lg(n + 1)
- –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 111
Heapsort Using A Max-Heap
- We can sort an array A of n items as follows:
– – for i = n to 2 do – swap A[1] with A[i] – – swap new root with largest child until A[1 .. i − 1] is a valid heap
- A[1 .. n] is now sorted in increasing order
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 112
Analysis of Heapsort
- Let h = ⌊lg n⌋, where n is the number of items to be sorted
- Stage 1: Build heap for a given list of n keys
worst-case:
C(n) =
h−1
- i=0
2(h − i)2i = 2
- n − lg(n + 1)
- ∈ Θ(n)
- Stage 2: Repeat operation of root removal n − 1 times (fix heap)
worst-case
C(n) =
n−1
- i=1
2 lg i ∈ Θ(n lg n)
Both worst-case and average-case efficiency: Θ(n lg n)
- In-place: yes
- Stable: no (e.g., 1 1)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 113
Using a Heap as a Priority Queue
- The “find max” operation can be done in constant time
- The heapsort algorithm uses the “delete max” operation, whose cost is
Θ(lg n) for a heap of n elements
- The “insert” operation can be done by inserting the new element in the
first array location following the heap, and then (repeatedly) comparing it with its parent, until the heap condition is satisfied –
- Other data structures can implement a priority queue
– – e.g., invent a data structure which allows insertion in O(1) time, but requires O(n) time for “find max” or to “delete max” – e.g., invent a data structure which implements “find max” and “delete max” in O(1) time, but requires O(n) time for insertion
GEQs
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 114
Problem Reduction
- This variation of transform-and-conquer solves a problem P by
transforming it into a different problem P ′ for which an algorithm is already available
- To be of practical value, the combined time of the transformation and
solving P ′ should be smaller than solving P as given by another method
- Example:
Suppose you want the least common multiple lcm(m, n) of two numbers – you could compute the prime factorization of the two numbers, compare the exponents, and figure out the lcm – e.g., lcm(24,18): 24 = 23 · 31, 18 = 21 · 32;
- ne number (24) has three 2’s
- ne number (18) has two 3’s
there are no other prime factors, so the LCM is 23 · 32 = 72 – – instead note that lcm(m, n) = m · n/gcd(m, n) and recall that we have a “nice” algorithm to compute gcd()
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 115
Problem Reduction: Counting Paths in Graphs
- Suppose we have a graph G and we wish to calculate the number of
walks (“paths”, according to the textbook author) of length k > 0 from
- ne vertex to another
- It can be shown that if A is the adjacency matrix for G, then the number
- f length-k walks (“paths”) from vertex i to vertex j is given by Ak
i,j
- We already know how to do matrix multiplication, so. . .
- Q: is there a better algorithm?
Recall that the usual mat-mult algorithm is Θ(n3) – Q’: what about high powers of A? – A?: can we use fast powering algorithms on matrices?
GEQ!
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 116
Linear Programming: 1
- Linear programming is one of the more important ideas in “industrial
mathematics”; it is used in many, many disciplines
- Example: suppose a farmer uses three feeds for his cows: F1, F2 and F3
–
Fi has unit cost Ci and gives nutrient amounts N 1
i , N 2 i , N 3 i and N 4 i
– the farmer wishes to feed his cows to meet their nutritional needs
Rj (1 ≤ j ≤ 4) at minimum cost
– this requires Ui units of each type of feed (Ui to be found) – he must solve the following problem: – minimize C =
3
- i=1
Ui · Ci (the objective function) subject to N 1
1 · U1 + N 1 2 · U2 + N 1 3 · U3 ≥ R1
N 2
1 · U1 + N 2 2 · U2 + N 2 3 · U3 ≥ R2
N 3
1 · U1 + N 3 2 · U2 + N 3 3 · U3 ≥ R3
(the constraints)
N 4
1 · U1 + N 4 2 · U2 + N 4 3 · U3 ≥ R4
U1, U2, U3 ≥ 0
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 6 117
Linear Programming: 2
- These optimization systems are much more difficult to solve than
solving a linear system of equations
(Q: how do we do that?)
- It turns out that the constraints define a region in n-dimensional space
–
- This feasible region is the intersection of a finite number of half-spaces
– this means it is a convex region (and a convex polytope)
- It can be shown that the optimum value of the objective function can
be found at a corner of the polytope
if the optimum solution is finite
– (a) finding the corners of an n-dimensional convex polytope, and (b) finding a corner with the optimum value
- The simplex method was used since the 1940’s to solve such problems
– although the simplex method can, in principle, require exponential time, in practice it runs acceptably quickly
This made people curious!
–
major result from the 1980’s: there is a polynomial-time algorithm
to solve linear programming problems
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7
Space and Time Tradeoffs
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 118
Space and Time Tradeoffs
- Idea: in some situations, we can
– – use less space by using more time – the “less time for more space” is the one typically considered
- Example: pre-compute and store some values which your program may
need many times – a library to implement sin(x) might precompute sin() for a few hundred values of x (0 ≤ x < 2π) and then use some fast algorithm to interpolate other values – note that
x − x3/3! + x5/5! − x7/7! + x9/9!
is a very good approximation of sin(x) in the range [−π, π]
- Example 2: there is less coding effort involved in computing Fib(n) if
you had previously filled an array with Fib(0), Fib(1), Fib(2), Fib(3), . . .
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 119
Space and Time Tradeoffs: 2
- We consider two varieties of space-for-time algorithms:
– input enhancement — pre-process the input (or part of the input) to store some info to be used later in solving the problem – counting sorts – – pre-structuring — pre-process the input to make accessing its elements easier – hashing –
- Counting sort: if your array values are restricted to some very small
range (say, 16 to 25) you could sort as follows: – – fill the first c1 array locations with 1, the next c2 array locations with 2, the next c3 array locations with 3, and so on
What is the time complexity of this algorithm?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 120
Input Enhancement for String Searching
- Recall: we look for a search string of length m in a text of length n
- Also recall we looked at the brute force string searching algorithm;
–
- Better algorithms: Knuth, Morris & Pratt (1977),
Boyer & Moore (1977), Horspool (1980) – – B-M algorithm pre-processes the search string right to left, storing information in two tables – Horspool’s algorithm simplifies the B-M algorithm by using just one table
- B-M and Horspool’s algorithms compare the search string with the text
right to left – e.g., if the search string is ABC, first C is compared, then (if necessary) B and then (if necessary) A – if no match is found, the search string is then moved to the right (so the text is searched from left to right)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 121
String Matching Ideas
- Suppose you are looking for the pattern LOOK in some text, and you have
reached this point in the search (the lower case letters are “variables”):
...QWERTYabcd... LOOK
– assume ’
abcd’ != ’ LOOK’
(i.e., there is not a match)
- Q: how far ahead should you move?
–
...QWERTYabcdefgh... LOOK
– Case 2: d is found in the pattern, somewhere other than last char: shift pattern so d is aligned with rightmost occurrence in pattern
e.g., d == ’O’
...QWERTYabcOe... LOOK
– why rightmost?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 122
String Matching Ideas (continued)
...QWERTYabcd... LOOK
– Case 3: d is only found at the last char of pattern: similar to case 1: re-start the search 4 (==|LOOK|) chars to the right
...QWERTYabcKefgh... LOOK
– Case 4: d is found both at the end of the pattern and somewhere else in the pattern (consider the pattern OKOK for this case): shift pattern so d is aligned with the next rightmost occurrence in the pattern
...QWERTYabcKe... OKOK
- Using these ideas allows us to move forward more than 1 char at a time
– – but only if we can efficiently figure out how far to move ahead – i.e., we don’t want to do length calculations every step
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 123
Horspool’s Algorithm
- Horspool’s algorithm is a simplified version of the Boyer-Moore alg:
– it pre-processes the pattern to generate a shift table t() that determines how much to shift the pattern when a mismatch occurs – always makes a shift based on the text’s character d aligned with the last character in the pattern according to the shift table’s entry for d (i.e., t(d))
- Example: for LOOK, every table entry is 4, except for O, which is 1, and
L, which is 3.
– that is, if LOOK doesn’t match the current text, look at the character
d aligned with K, and shift the pattern t(d) characters to the right
- The table can be computed in O(m) + O(|A|) time, where m is the
length of the pattern and A is the alphabet of all characters –
- Summary: by pre-processing the input and using extra space (t()), we
can improve upon the brute force algorithm (sort of. . . this is still
O(nm) in worst case)
GEQ: show such a worst case.
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 124
Yet More String Matching Ideas
- In Horspool’s algorithm, if a mismatch is found at any character
position, the shift is based upon the character aligned with the rightmost pattern character:
...QWERTYabJK... OKOK
In this case Horspool’s algorithm will shift t(’
K’ ) = 2 characters
– but we can observe that the mismatched character (’
J’ in this
case) does not appear anywhere in the pattern, so we could actually shift more than 2 characters – this is t(’
J’ ) − k = 3 characters, where k is the number of
characters (starting at the right) which matched (1 in this case) – if 0 characters (starting at the right) match, the computation would be exactly the same as in Horspool’s algorithm
- Q: is this modified strategy always better than Horspool’s?
A: no; for the above example, if the pattern was OZOK Horspool would shift the pattern 4 chars to the right – so is this an improvement in any situation?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 125
Yet Yet More String Matching Ideas
- Suppose the picture was like this, where e = B:
...abcdeER... ABERBER
In this case the length k = 2 suffix of the pattern matches the text –
- If the pattern contains another occurrence of suff (k), we could shift the
pattern right until that occurrence aligns with where suff (k) is now:
...abcdeER... ABERBER
Note that since the second ER from the right is preceded by the same char as suff (k), this can’t possibly match either (we know e != B)
- A more interesting possibility:
...abcdeER... ====> ...abcdeERfgh AZERBER ====> AZERBER
We can shift the pattern by the distance between suff (k) and the next rightmost occurrence of suff (k) not preceded by the same char
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 126
Yet Yet More String Matching Ideas: 2
- What if there is no other occurrence of suff (k) not preceded by the
same char as suff (k)?
...abcdeER... ====> ....abcdeERfghijkl ABERBER ====> ABERBER
In this specific example, it is safe to shift the pattern by its entire length
- How about this (for k = 3, so d = R)?
...abcdBER... ERBER
If we shift by m = | ERBER | chars, we might miss a match
...abcdBERBER... ERBER
- We need to handle this case a bit more carefully:
– – if this exists, we could shift by the distance between this prefix and suff (l)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 7 127
Boyer-Moore
- Bad symbol shift when char c doesn’t match, after k ≥ 0 matching
chars found; a generalization of Horspool’s algorithm:
d1 = max(t(c) − k, 1)
- Good suffix shift d2 (when k > 0):
– set d2 to the distance between matched suffix of size k and its rightmost occurrence in the pattern that is not preceded by the same character as the suffix†, if such a substring exists –
- therwise, see if there is a prefix of size 0 < l < k which matches
suff (l), and if there is, d2 is the distance between the longest such prefix and the suffix (the distance between the first chars of the prefix and the suffix) –
- Finally, d = max(d1, d2) is the amount by which the pattern can be
shifted
† clearer to say “preceded by a different character than the suffix”?
Jim Diamond, Jodrey School of Computer Science, Acadia University