COMP 3403 Algorithm Analysis Part 4 Chapter 8 Jim Diamond CAR - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 4 Chapter 8 Jim Diamond CAR - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 4 Chapter 8 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 8 Dynamic Programming Jim Diamond, Jodrey School of Computer Science, Acadia University Chapter 8 128
Chapter 8
Dynamic Programming
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 128
Dynamic Programming: Introduction
- The word “programming” here refers to the concept of “planning”,
rather than the concept of “coding in a computer language”
- Idea: we have seen that it is a common idea to break down a larger
problem into sub-problems –
- Example: consider F(n) = F(n − 1) + F(n − 2)
– the two sub-problems overlap, since to calculate F(n − 1) we will need to calculate F(n − 2) (which in this particular case is the entirety of the second sub-problem) – if we choose the “obvious” recursive implementation of F(n), the number of sub-problems solved is exponential in n (!) – – the obvious recursive algorithm is very, very inefficient
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 129
Dynamic Programming for Mr. Fibonacci
- As mentioned, the “obvious” recursive algorithm to compute Fibonacci
numbers is horribly inefficient
- A dynamic programming approach would arrange the calculations so
that no sub-problem is solved more than once
- Two approaches:
– bottom-up: compute in the following order:
F(0) = 0, F(1) = 1 F(2) = 1 + 0 = 1 F(3) = 1 + 1 = 2
The iterative approach using an array
· · · F(n) = F(n − 1) + F(n − 2)
– top-down: record the solution to each sub-problem when calculated; when the solution to a sub-problem is desired, see if the particular sub-problem has already been solved
so called “memory functions”
- Top-down may be more efficient for some problems, since the solution
to some “smaller” sub-problems may not be required
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 130
Dynamic Programming: General Concept
- Recall: many problem solving techniques involve
– – solving each of the sub-problems, and – assembling the solutions to the sub-problems into a solution of the big problem
- In some problems (such as calculating F(n)) the sub-problems may
“overlap” and/or themselves have sub-problems in common –
- Idea: store the solution to a given sub-problem in a table the first time
it is computed –
- The “trick” to using dynamic programming is to figure out
– what the overlapping/repeated calculations are, and – how to arrange the calculations so as to avoid repeatedly solving the same sub-problem(s)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 131
Example: Binomial Coefficients
- The binomial coefficients are the coefficients of the binomial formula:
(a + b)n =
- n
- anb0 + · · · +
- n
k
- an−kbk + · · · +
- n
n
- a0bn
Recurrence:
n = n
n
= 1
for n ≥ 0
n
k
= n−1
k−1
+ n−1
k
- for n > k > 0
Why?
- The value of
n
k
- can be computed by filling a table as follows:
1 2
· · k − 1 k
1 1 1 1
·
1
n − 1
1
- n − 1
k − 1
- n − 1
k
- n
1
- n
k
- Q: Do we need all of this table filled in?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 132
Example: The “Coin-Row” Problem: 1
- Given: there are n coins in a row, of values c1, c2, . . . , cn
- Goal: pick up the maximum amount of money, without taking two
adjacent coins
- Observation 1: starting with the largest coin won’t work
–
- Q: how to proceed?
- Observation 2: either the optimum solution uses the first coin or it
doesn’t
- Observation 2′: either the optimum solution uses the last coin or it
doesn’t –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 133
Example: The “Coin-Row” Problem: 2
- Define F(k) to be the best solution using only the first k coins
- Apply Observation 2′:
F(n) = max
cn + F(n − 2), F(n − 1)
- (*)
- Write down the base cases (“initial conditions”):
F(0) = 0, F(1) = c1
- Now fill in a one-row table of F(i) values from left to right using
formula (*) –
- Problem for the diligent student: F(n) just gives us the optimal
amount; how do we know which specific coins to take?
GEQ!
- Note that for this problem, to calculate only F(n), we don’t really need
the whole array of size n: we could get away with just three memory locations
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 134
Collecting Objects from an n × m Board: 1
- Given: there is an n × m board with objects at some of the board
positions; for example * * * * * * * * * * *
- Rules: you start in the upper left corner, at at each turn you can move
right or down (but not off the board); you collect the object from any location you move onto
- Goal: collect as many objects as possible
- Observation 1: a wrong choice at the beginning or near the end can
produce a sub-optimal solution
(as well as in the middle)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 135
Collecting Objects from an n × m Board: 2
- As with many (most?) dynamic programming problems, the first trick is
to figure out what function you are optimizing
- The second trick is to figure out how you can relate the optimal
solution of smaller problems to the optimal solution of larger problems
- Once you know these two things, the rest is often “easy”
- Idea 1: maximize F(k), the number of objects collected after k steps
- Problem: relating F(k) to F(k − 1) is difficult because there are (in
general) many places you can be after k steps, and for each of those there are (for this problem) usually two places you might have come from (to the left or above)
- So that is not a good choice of function to optimize
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 136
Collecting Objects from an n × m Board: 3
- Idea 2: maximize F(i, j), the number of objects which can be collected
when you get to board position (i, j)
- This definition of F() can be “easily” related to “smaller” problems
– – (and for those, only if i and/or j is larger than 0)
- Base case: F(0, 0) is the number of objects at (0, 0)
* * * * * * * * * * * gives F() = 1 2 2 3 4 1 1 2 2 3 4 1 1 3 4 4 4 2 3 3 5 6 6
- Keep track of how we did it with another matrix (of size O(n · m)):
- ←
← ← ← ← ↑ ← ↑ ← ↑ ↑ ↑ ← ↑ ← ← ← ↑ ← ← ↑ ← ←
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 137
The 0–1 Knapsack Problem: 1
- Problem: given n items of known weights w1, . . . , wn and values
v1, . . . , vn and a knapsack of capacity W , find the most valuable subset
- f the items that fit into the knapsack.
– – values can be non-integer
- A brute force solution is discussed in Section 3.4
(Ω(2n) time!)
- Q: how can we formulate this as a dynamic programming problem
–
- Idea: consider just the first i items, for 1 ≤ i ≤ n
- That by itself doesn’t give us the necessary recurrence to allow us to
state “bigger” instances in terms of “smaller” instances
- Note: a more general formulation of the knapsack problem allows any number of each
item in the knapsack, not just 0 or 1
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 138
The 0–1 Knapsack Problem: 2
- Idea 2: to be able to state “bigger” instances in terms of “smaller”
instances, we must not only consider different numbers of objects, but also consider different target weights
- Idea: allow the recurrence to be a function of not only the first i items,
but the allowed weight: – define F(i, j) be the value of an optimal solution for items with weights w1, . . . , wi and values v1, . . . , vi in a knapsack of capacity j
- Now apply the following amazing observation: there are two categories
- f subsets of the first i items that fit into a knapsack of capacity j:
– – those that do not contain item i
- The subsets that don’t contain i have optimal value F(i − 1, j)
- The subsets that contain i have optimal value F(i − 1, j − wi)
but only if j − wi ≥ 0
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 139
The 0–1 Knapsack Problem: 3
- These ideas give the following recurrence:
F(i, j) =
max{F(i − 1, j), vi + F(i − 1, j − wi)}
if wi ≤ j
F(i − 1, j)
- therwise
- As usual, we need some base cases (initial conditions):
F(0, j) = 0 for j ≥ 0
and
F(i, 0) = 0 for i ≥ 0
- To perform the calculations, we fill a matrix in using a very similar
technique to the binomial coefficients example –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 140
Memory Functions
- In the previous two examples, we used a bottom-up approach to filling
in the matrix – this has the disadvantage of possibly calculating values which are never needed for specific desired solution – it also doesn’t match the recursive definition of the functions used to describe the solution
- Instead, we can apply the following idea:
– –
(if all possible values are valid, use a corresponding array of bit values)
– when the (recursive) algorithm needs a given value, it checks the validity of that location in the array – – if invalid, recursively calculate it
- This way no value is ever calculated twice
- See algorithm MFKnapsack in the textbook
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 141
Optimal Binary Search Trees
- Suppose you have n keys
a1 < · · · < an
and experience has shown that the probabilities of searching for each of the ai’s is (respectively)
p1, . . . , pn
Problem: find a BST with a minimum average number of comparisons in a successful search
- Idea: different binary search trees (for a given set of probabilities) will
have different average search costs Q: Can we just generate all BSTs and pick the best? A: Since the total number of BSTs with n nodes is given by
- 2n
n
- ·
1 n + 1 ,
which grows exponentially, brute force is (generally) hopeless –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 142
Optimal Binary Search Trees: 2
- Example: What is an optimal BST for keys A, B, C, and D (= a1, a2,
a3, a4) with search probabilities 0.1, 0.2, 0.4, and 0.3, respectively?
–
- The cost (average number of comparisons) for the first tree is
Cost[1, 4] =
n
- i=1
pi ∗ level(ai) = 0.1 · 1 + 0.2 · 2 + 0.4 · 3 + 0.3 · 4 = 2.9
and the cost for the second is
0.1 · 2 + 0.2 · 1 + 0.4 · 2 + 0.3 · 3 = 2.1
- Are either of these optimal?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 143
Optimal Binary Search Trees: 3
- To find an efficient solution for this, we employ the following amazingly
brilliant observation: –
- If that’s not enough for you, consider this piece of cleverness:
– as well as this piece: – the right subtree in an optimal BST should be optimal for the set of keys it contains
- Admittedly, the latter two pieces are not quite as obvious as the first
part Q1: Q2: if you can’t prove these two claims, can you at least convince yourself they are true?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 144
Optimal Binary Search Trees: 4
- Consider some subtree of the optimal BST:
- Observations:
– – all of the keys greater than ak are in the right sub-subtree
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 145
Optimal Binary Search Trees: 5
- Define C[i, j] to be the cost of the optimal BST T j
i made up of the
keys ai, . . . , aj, for any 1 ≤ i ≤ j ≤ n
- Using the usual dynamic programming principle, we first find (optimal)
costs C[i′, j′] for sub-problems T j′
i′ where i′ > i and/or j′ < j
–
- Suppose I have some optimal BST T j
i for the keys ai, . . . , aj
– if T j
i is (say) the left tree of a tree T ′ with root aj+1, then to access
some key in {ai, . . . , aj} we use one more comparison than we used to find the key in T j
i
<board example>
- Thus we can relate the cost of a BST to the costs of its two sub-BSTs
- E.g., suppose we have T j
i with root ak for i ≤ k ≤ j
C[i, j] = pk + C′[i, k − 1] + C′[k + 1, j]
where C′[l, m] is similar to C[l, m], but when subtree T m
l
is one level deeper in the complete tree
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 146
Optimal Binary Search Trees: 6
- How do we compute C′[. . .]?
Recall that
C[i, j] =
j
- l=i
pl · level(al)
- If the whole subtree T j
i is moved one level down, then the number of
comparisons to reach a given key in T j
i is increased by one:
C′[i, j] =
j
- l=i
pl · (level(al) + 1) =
j
- l=i
pl · level(al) +
j
- l=i
pl = C[i, j] +
j
- l=i
pl
so the cost of a tree grows by its weight (the sum of the key probabilities
- f keys in that tree) whenever the tree is moved down one level
– this allows us to easily compute the overall cost of a tree, based upon knowing the costs of its subtrees
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 147
Optimal Binary Search Trees: 7
- We now know how to compute the cost of a tree knowing its root
vertex and the costs (and weights) of its subtrees
- Denote j
l=i pl, the weight of T j i , by W[i, j]
- To find the optimal BST for the keys ai, . . . , aj we can try each possible
key at the root:
C[i, j] = min
i≤k≤j (pk · 1 + C′[i, k − 1] + C′[k + 1, j])
= min
i≤k≤j (pk + C[i, k − 1] + W[i, k − 1] + C[k + 1, j] + W[k + 1, j])
= W[i, j] + min
i≤k≤j (C[i, k − 1] + C[k + 1, j])
- Note that we need to define C[i, i − 1] = 0 for 1 ≤ i ≤ n + 1 to handle
the cases where the first or last node is the root
- Also note that C[i, i] = pi, the weight of a one-node tree
- Q: is this computationally efficient?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 148
Optimal Binary Search Trees: 8
- Sample computation table:
- Note that there is a column indexed with 0 and a row indexed with n + 1
–
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 149
Optimal Binary Search Trees: 9
- How much computation is required to fill in the table?
– – there are n pk’s next to that – each of the above can be written down in constant time
- There are (n − 1) + (n − 2) + · · · + 2 + 1 entries left to be computed
– each of these is computed by computing O(n) sums (each sum is the sum of three numbers) and finding the minimum of the sums
- In total, Θ(n3)
. . . bah!
- We can improve this to Θ(n2) with the following observation:
adding a weight to the left can not move the optimal root to the right – that is, if the root for the optimal subtree [ai, . . . , aj] is ak, then adding aj+1 can not give an optimal subtree with a root al where l < k – a careful counting shows that computing each diagonal can be done in Θ(n) time, thus a total of Θ(n2) time is required
Proof: GEQ
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 150
Optimal Binary Search Trees: 10
- W[i, j] computation:
– in the algorithm in the textbook, each time a W[i, j] is needed the algorithm uses a for loop to compute the value (W[i, j] = j
l=i pl)
– instead of doing this, a matrix can be computed in a manner similar to the C[i, j] matrix –
- Space complexity:
– –
- nly O(n) space is needed for W[. . .] if you are careful
– GEQ: how?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 151
The Transitive Closure of a Directed Graph
- Recall that while the adjacency matrix A for an undirected graph is equal
to its transpose AT , this is not (in general) the case for a directed graph
- Given a digraph, suppose you want to know which vertices can be
reached from a given vertex with a path of length 1 or more –
- Define the transitive closure of a digraph G = (V, E) to be the graph
TC(G) = (V, E′) where (v1, v2) ∈ E′ iff there is a path of length greater than 0 from v1 to v2
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 152
Computing the Transitive Closure of a Digraph G
- Idea 1: for each v ∈ V , do a BFS or DFS starting at v, recording the
vertices reachable in TC(G)’s adjacency matrix –
- Problem: each vertex and each edge will be examined more times than
necessary – e.g., if G has (a, b), (b, c), (c, d) and (d, e), then the path from c to e will be “discovered” (at least) 3 times –
- nce when searching from a, once from b and once from c
- Q: Can we do better?
A: Yes! Idea: we make use of three “dimensions” here: “from vertex”, “to vertex” and “intermediate vertex” Note that the adjacency matrix of G shows the paths from vi to vj with no intermediate vertex numbered higher than 0 –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 153
Transitive Closure: Warshall’s Algorithm
- Given an n-vertex digraph G, define a series of n × n Boolean matrices
R(0), . . . , R(k−1), R(k), . . . , R(n)
where r(k)
i,j = 1 iff there is a path from vi to vj with no intermediate
vertex numbered higher than k
- R(0) is the adjacency matrix of G: a path which goes from (say) vi to vj
and has no intermediate vertices numbered higher than 0 must be a single edge
- R(1) indicates which vertices are connected either directly or through the
intermediate vertex v1
- Similarly, R(2) indicates which vertices are connected either directly or
through the intermediate vertices v1 and/or v2
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 154
Transitive Closure: Warshall’s Algorithm 2
- Warshall’s algorithm iteratively computes the
- R(k)
matrices – – to compute R(k) given R(k−1), observe that if
r(k−1)
i,k
= 1
and r(k−1)
k,j
= 1,
then there is a path from vi to vj that goes through vk (so r(k)
i,j = 1)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 155
Transitive Closure: Warshall’s Algorithm 3
Applying Warshall’s Algorithm: new 1’s are in bold face
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 156
Transitive Closure: Warshall’s Algorithm 4
/* * Warshall’s algorithm computes the transitive closure * Input: the adjacency matrix of a digraph with n vertices * Returns: the transitive closure of the input */ Warshall(A[1..n][1..n]) R(0) ← A for k ← 1 to n for i ← 1 to n for j ← 1 to n R(k) i,j ← R(k-1) i,j || ( R(k-1) i,k && R(k-1) k,j ) return R(n)
- Time complexity: Θ(n3)
Bah!
–
- Space complexity: only one matrix is needed. . . we can just update the
input matrix in-place!
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 157
Shortest Paths
- Given a weighted, connected graph G, the all-pairs shortest path
problem is to find the lengths of the minimum-length paths between each pair of vertices –
- Let D be an n × n matrix where di,j is the length of the shortest path
from vi to vj
- Example:
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 158
Shortest Paths: Floyd’s Algorithm
- Assume the graph does not contain a cycle with a negative length
–
- Similar to Warshall’s algorithm, Floyd’s algorithm calculates a series of
distance matrices:
D(0), . . . , D(k−1), D(k), . . . , D(n)
where D(k) is the set of shortest path distances where no path uses an intermediate vertex numbered higher than k
- As before, D(k) can be computed from D(k−1)
– similar to before, D(0) is the graph’s edge weight matrix
- Idea: given the shortest path from vi to vj with no intermediate vertices
numbered larger than k, we can say that – – the shortest path does not go through v
k
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 8 159
Shortest Paths: Floyd’s Algorithm: 2
- Note: it turns out that we can over-write the D(k−1) matrix with the
D(k) matrix, so we don’t actually need n different matrices
/* * Floyd’s alg computes the lengths of all shortest paths * Input: the weight matrix of a graph with n vertices * Returns: the matrix of shortest path lengths */ Floyd(W[1..n][1..n]) D ← W for k ← 1 to n for i ← 1 to n for j ← 1 to n Di,j ← min(Di,j, Di,k + Dk,j) return D
- Time complexity: Θ(n3)
Bah!
- GEQ: what “easy” addition can be made to this algorithm so that we
can compute the paths, not just their lengths?
Jim Diamond, Jodrey School of Computer Science, Acadia University