COMP 3403 Algorithm Analysis Part 5 Chapter 9 Jim Diamond CAR - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 5 Chapter 9 Jim Diamond CAR - - PowerPoint PPT Presentation
COMP 3403 Algorithm Analysis Part 5 Chapter 9 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 9 Greedy Techniques Jim Diamond, Jodrey School of Computer Science, Acadia University Chapter 9 160
Chapter 9
Greedy Techniques
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 160
Greedy Approaches
- One technique to solve a problem is to make a series of decisions which
seem to be the best at the instant each decision is made
- More specifically, these decisions have three properties:
– – they are locally optimal: that is, among all options available at the time the decision is made, the best (or a best) decision is made – they are irrevocable: that is, having made a decision, the decision can not be taken back or reversed (you can’t later change your mind)
- Applying these criteria often provide a relatively simple algorithm
– sometimes the overall solution is optimal, sometimes it isn’t – in cases where the solution is not optimal, a greedy approach can still be useful – – it provides a bound, which may be useful for other purposes
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 161
Change-Making Problem
- Given an “unlimited” amounts of coins of denominations d1 > · · · > dm,
make change for an amount n using the smallest number of coins
- Example: d1 = 25/
c, d2 = 10/ c, d3 = 5/ c, d4 = 1/ c and n = 62/ c
- The greedy solution:
– – may not be optimal for “unnormal” coin denominations – e.g., suppose n = 15/ c and the available coin denominations are 10/ c, 7/ c and 1/ c; the optimal solution uses three coins the greedy solution uses six coins
- In the case of “unnormal” coin denominations, the greedy solution gives
an upper bound on the number of coins needed, but the upper bound is not (in general) “tight”
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 162
Minimum(-Cost/Weight) Spanning Trees
- Given a connected graph G, a spanning tree T of G is a connected,
acyclic subgraph of G which contains all of G’s vertices
- Given a weighted, connected graph G, a minimum-cost spanning tree T
- f G is a spanning tree of G which is has minimum cost over all of G’s
spanning trees –
- Example
A graph and its 3 spanning trees; T1 has min cost
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 163
Prim’s MST algorithm
- Start with a tree T1 consisting of one (any) vertex and “grow” the tree
- ne vertex at a time to produce a MST through a series of expanding
subtrees T1, T2, . . . , Tn
- On each iteration, construct Ti+1 from Ti by adding a vertex not in Ti
that is closest to some vertex already in Ti (this is a “greedy” step!)
- Stop when all vertices are included
/* * Prim’s algorithm to find a minimum cost spanning tree. * Input: a weighted, connected graph G=(V,E) * Returns: ET, the set of edges of an MST of G */ Prim(G)
VT ← {v0} ET ← Ø
for i ← 1 to |V| - 1 do find a min-cost edge e*={u*, v*} with v* ∈ VT and u* ∈ VT
VT ← VT ∪ {u*} ET ← ET ∪ {e*}
return ET
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 164
Example of Prim’s Algorithm
- Arbitrarily start with the first vertex
(which is a); this gives V (T1) = {a}
- Find a min-cost edge with a at one
end: (a, b); this gives V (T2) = {a, b}
- Find a min-cost edge with a or b at
- ne end, the other end in V (T2):
(b, c); this gives V (T3) = {a, b, c}
- Next we pick (b, f); this gives
V (T4) = {a, b, c, f}
- Next we pick (e, f); this gives
V (T5) = {a, b, c, e, f}
- Next we pick (d, f); this gives
V (T6) = {a, b, c, d, e, f}
- The min cost tree has E(T6) =
{(a, b), (b, c), (b, f), (d, f), (e, f)}
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 165
Prim’s Algorithm: Is the Tree Optimal?
- The algorithm obeys the criteria: feasible, locally optimal, irrevocable;
but is the overall answer optimal? Yes! Proof by contradiction: Let T be a tree generated by Prim, where e1 was the first edge added, e2 the second, and so on. Assume T is not optimal; then there are optimal trees {T ∗(k)} which have smaller costs than T . For each such k, let eik = (v, u) be the first edge added to T which is not in
T ∗(k) (i.e., e1, . . . , eik−1 are all in T ∗(k), but eik / ∈ T ∗(k)). Choose T ∗ to be an
- ptimal tree which maximizes ik.
Example: suppose T is Prim’s tree for some G, and T ∗(1), T ∗(2), T ∗(3) and
T ∗(4) are MSTs for G, where T = e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13, e14, e15, . . . , en−1 {e1, e2, e3, e4} ⊂ T ∗(1), but e5 / ∈ T ∗(1)
thus ei1 = e5
{e1, e2, e3, . . . , e8} ⊂ T ∗(2), but e9 / ∈ T ∗(2)
thus ei2 = e9
{e1, e2} ⊂ T ∗(3), but e3 / ∈ T ∗(3)
thus ei3 = e3
{e1, e2, e3, . . . , e6} ⊂ T ∗(4), but e7 / ∈ T ∗(4)
thus ei4 = e7 Here we would choose T ∗ = T ∗(2)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 166
Prim’s Algorithm: Is the Tree Optimal? (2)
Consider the graph T ∗ ∪ {eik}:
v u eik e′
- This must have some cycle C, and C must have some edge e′ = eik
connecting some vertex v ∈ Tik−1 to some vertex u /
∈ Tik−1.
- If e′ had less cost than eik, Prim’s algorithm would have chosen it
–
- By deleting e′ from T ∗ ∪ {eik} we produce a tree T ′ whose weight is no
larger than T ∗ Since T ∗ is assumed to be optimal, c(T ′) = c(T ∗). But then T ′ is an optimal tree with a larger associated ik value than T ∗, contradicting the choice of T ∗. Therefore T is optimal.
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 167
Prim’s Algorithm: Efficiency
- We should be able to do operations like “add e∗ to ET ” and “add u∗ to
VT ” efficiently (in constant time?)
- But how about “find a min cost edge (u∗, v∗) with v∗ ∈ VT and
u∗ ∈ VT ”?
–
- There are various data structures we could use to solve this
- E.g., we can use a priority queue where each element in the queue is an
edge
the book idea uses a priority queue, but for vertices
– – when a new vertex u∗ is added to the tree, we add all edges from u∗ to non-tree vertices to the priority queue
- GEQ #1: is this a valid solution?
- GEQ #2: if so, how efficient is this? Is it better or worse than the
description in the book?
- GEQ #3: if better, does every edge removed from the priority queue
end up in the tree?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 168
Kruskal’s MST Algorithm
- Idea:
– – grow tree one edge at a time to produce an MST through a series
- f forests F1, F2, . . . , Fn−1
–
- n each iteration, add the next edge on the sorted list unless this
would create a cycle; if it would, skip the edge
// Kruskal’s algorithm to find an MST // Input: a weighted, connected graph G = (V,E) // Returns: ET, the set of edges of an MST of G Kruskal(G) sort E in nondecreasing order of weights: ei1, ei2, ei3,. . .
ET ← Ø
; edge_count ← 0 ; k ← 0 while edge_count < |V| - 1 k++ if ET ∪ {eik} is acyclic
ET ← ET ∪ {eik}, edge_count++
return ET
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 169
Kruskal Example
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 170
Kruskal’s Algorithm Implementation Considerations
- In some respects, this algorithm appears simpler than Prim’s algorithm
- However, the question “is ET ∪ {Eik} acyclic?” is easier said than done
–
- Idea 1: if Eik = (u, v), search in the current graph to see if v is
reachable from u – – can we do better?
- Idea 2: can we come up with an efficient way of answering the question
“is u in the same tree as v?” – Q: what if each tree in the forest had a unique id, and we could find the id of u’s tree (and v’s tree) efficiently? – Q’: when an edge is added, two trees are merged: how can we efficiently update one tree’s id?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 171
The Union-Find Algorithm
- Suppose we have some algorithm which requires the following abstract
- perations:
–
makeset(v) — create a set with the element v
–
find(v) — return (a unique identifier for) the set containing v
–
union(u,v) — move all elements of the set containing v to the set
containing u
- These operations allow us to implement Kruskal’s algorithm:
– – this creates a forest of |V| trees, each with just one vertex – to answer the question “if ET ∪ {eik} is acyclic” where
eik = (u,v) merely compare find(u) to find(v)
– – to implement “ET ← ET ∪ {eik}” where eik = (u,v) merely call
union(u,v)
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 172
The Union-Find Algorithm: 2
- What data structure(s) can we pick to allow us to efficiently implement
these operations?
- A1: use a linked list for each set
– –
union(): constant time
–
find(): ummm. . . ahhh. . . O(|V |) time
- A2: store the set number for each vertex in a vector
– –
find(): constant time!
–
union(): ummm. . . ahhh. . . Θ(|V |) time
- A3: use both a linked list and a vector
– –
find(): constant time!
–
union(): ummm. . . ahhh. . . O(|V |) time worst case (why?)
- None of these is good enough!
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 173
The Union-Find Algorithm: 3
- Idea: represent each set as a tree, with (directed) edges pointing
towards the root:
- makeset(): still constant time
- union(): constant time (assuming we have the roots of the two trees
already, which would be the case for Kruskal’s algorithm)
Why? GEQ?
- find(): still a problem, since the paths from a vertex to a root can be
“long” –
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 174
The Union-Find Algorithm: 4
- Path compression: every time a find(x) operation is done, make all of
the nodes from x to the immediate child of the root children of the root –
- With this path compression operation, a sequence of n union()s and m
find()s is only very very very very slightly worse than linear in m + n
– – the analysis is fairly difficult. . . multiple papers were presented showing (incorrectly) that the overall time is linear
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 175
Single-Source Shortest Path
- Problem: find the shortest path from a given vertex s to all other
vertices – – Dijkstra’s algorithm solves the problem only when we have this restriction
- Dijkstra’s algorithm works by first finding the path to the closest vertex
to s, then to the next closest vertex, and so on – – at any step the vertices adjacent to Ti but not in Ti are considered for inclusion – the vertex with minimum path length to s is added to Ti to produce Ti+1 – the candidate vertices adjacent to Ti+1 and their total distance from s are then updated
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 176
Dijkstra’s Algorithm
- Dijkstra’s algorithm is similar to Prim’s MST algorithm, but with a
different way of computing numerical labels: among vertices not already in the tree, it finds a vertex u with the smallest sum
dv + w(v, u)
where –
v is a vertex for which a shortest path has been already found on
preceding iterations (such vertices form a tree) –
dv is the length of a shortest path from the source s to v
–
- Algorithm: see text
– – the length of the shortest path from s, and – the previous vertex on the path from s
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 177
Dijkstra’s Algorithm: Example
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 178
Dijkstra’s Algorithm: Comments
- As mentioned previously, Dijkstra’s algorithm does not work for graphs
with negative edge weights
- Dijkstra’s algorithm is applicable to both undirected and directed graphs
- Efficiency:
– – GEQ: prove this –
O(|E| log |V |) for graphs represented by adjacency lists and min-heap
implementation of priority queue – – for sparse graphs the adjacency list representation is preferable
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 179
Information Theory
- The information content of a symbol is equivalent to the amount of
“surprise” one experiences upon receiving it – – there is little surprise if next symbol is “y” – this is much surprise if next symbol is “q”
- Claim: more information is transmitted by an “unlikely” symbol than by
a “likely” one – example above: after Thursda has been sent, could transmit “0” (the 0 bit) or “1e(q)” (respectively), where 1 is the 1 bit and “e(q)” is the encoding of q
- Idea: a likely symbol can be transmitted with fewer bits than an unlikely
symbol
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 180
Data Compression
- Fewer bits are needed for likely (i.e., probable) symbols
- “Expectation” is based upon the knowledge of sender and receiver
–
- Compression (using this idea) requires:
– – a non-uniform probability distribution of the next symbol to be transmitted
- If, given “all” knowledge
P(si) = P(sj), ∀i, j ∈ {1 . . . n}
then, on average, at least log2(n) bits are required to send next symbol
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 181
Information Theory (continued)
- Example
Walk up to someone† and ask them to complete this sentence: “Peter Piper picked a peck of pickled ” They will look you straight in the eye and say: “0”
† A native English speaker, anyway
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 182
Coding
- Given some set of symbols S = {s1, s2, . . . , sn}, a coding is an
assignment of bit strings to symbols
- The codes can be either
– – variable-length
- One desirable property of a variable-length code is that no code is the
prefix of another code –
- Suppose the symbols have difference occurrence probabilities; one
coding of the symbols might produce a shorter encoding of a message than another coding
- Problem: If frequencies of the character occurrences are known, what is
the best binary prefix-free code?
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 183
Huffman Coding: 1
- Concept: replace each (fixed-length) encoding e(si) with a
variable-length bit string bi, transmit bi instead of e(si)
- Assume each symbol si has a certain probability pi of being transmitted
–
- Example:
–
S = {a, b, c, d}, pa = 1/2, pb = 1/4, pc = 1/8, pd = 1/8
–
ba = 0, bb = 10, bc = 110, bd = 111
– – Huffman coding requires (on average)
- i
p(si) × |bi| = 1 × m 2 + 2 × m 4 + 3 × m 8 + 3 × m 8 = 7m/4 bits, a 12.5% saving
- Note: more significant savings are possible on larger examples
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 184
Huffman Coding: 2
- We need to know probabilities; how?
(a) (b) examine entire message before sending – then we must also transmit frequency distribution to receiver (c) – we must also transmit frequency distribution in this case
- Other related possibilities
– – cope with non-stationarity by periodically re-computing distribution – use 2nd order probabilities pi|j: i.e., the probability of seeing si given that the previous symbol was sj – e.g., the probability that a “random” symbol in English text is “u” is fairly low, but if the previous symbol was “q” the probability is quite high
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 185
Construction of Huffman Codes
- Q: how can we find these codes?
- Huffman’s algorithm:
– – repeat the following step n − 1 times: – find two trees T 1
i and T 2 i with smallest weights (break ties
arbitrarily) – join T 1
i and T 2 i into one (as left and right subtrees) and make its
weight equal the sum of the weights of T 1
i and T 2 i
– mark edges leading to left and right subtrees with 0’s and 1’s, respectively.
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 186
Huffman Example
- There are 5 symbols, ’A’, ’B’, ’C’, ’D’
and ’ ’
- They have the probabilities 0.35, 0.1,
0.2, 0.2 and 0.15, respectively
- The lightest-weight trees at the first
step are ’B’ and ’ ’ –
- The Huffman codes are 11, 100, 00, 01
and 101, respectively
Jim Diamond, Jodrey School of Computer Science, Acadia University
Chapter 9 187
Huffman Results
- Huffman codes are guaranteed to be the best possible
(that is, no other code mapping one input symbol to one code has better performance on average) –
- There are many variations on the Huffman theme
– – probabilities may not be known in advance at all “adaptive Huffman” encoding modifies the probability estimates as each symbol is encoded – decoder duplicates this process – the Huffman tree must be updated as the probability estimates change, which means the codes change over the course of encoding a message – pairs or triples of symbols can be encoded at once –
Jim Diamond, Jodrey School of Computer Science, Acadia University