CS CS 466 466 In Introduct ctio ion t to B Bio ioin - - PowerPoint PPT Presentation
CS CS 466 466 In Introduct ctio ion t to B Bio ioin - - PowerPoint PPT Presentation
CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 1 Mohammed El-Kebir January 28, 2020 Outline 1. Change problem 2. Review of running time analysis 3. Edit distance 4. Review elementary graph
Outline
- 1. Change problem
- 2. Review of running time analysis
- 3. Edit distance
- 4. Review elementary graph theory
- 5. Manhattan Tourist problem
- 6. Longest/shortest paths in DAGs
Reading:
- Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
- Lecture notes
2
The Change Problem
3
- Suppose we have ๐ = 3 coins:
- What is the minimum number of coins needed to make change for ๐ = 9
cents?
- Answer: ๐', โฆ , ๐* = (1, 0, 2) thus 1 + 0 + 2 = 3 coins.
1 cent 7 cent 3 cent
๐ = ( ) , ,
Change Problem: Given amount ๐ โ โ โ {0} and coins ๐ = ๐', โฆ , ๐* โ โ* s.t. ๐* = 1 and ๐8 โฅ ๐8:' for all ๐ โ ๐ โ 1 = {1, โฆ , ๐ โ 1}, find ๐ = ๐', โฆ , ๐* โ โ* s.t. (i) ๐ = โ8?'
*
๐8๐8 and (ii) โ8?'
*
๐8 is minimum
The Change Problem โ Four Algorithms
4
GreedyChange(๐, ๐', โฆ , ๐*) 1. for ๐ ร 1 to ๐ 2. ๐8 ร 3. ๐ ร ๐ โ ๐8๐8
bM/cic
ExhaustiveChange(๐, ๐', โฆ , ๐*) 1. for 2. if 3. return
(d1, . . . , dn) 2 [bM/c1c] โฅ . . . โฅ [bM/cnc] Pn
i=1 cidi = M
(d1, . . . , dn)
RecursiveChange(๐, ๐', โฆ , ๐*)
- 1. if ๐ = 0
2. return 0
- 3. bestNumCoins ร โ
- 4. for ๐ ร 1 to ๐
5. if ๐ โฅ ๐8 6. numCoins ร RecursiveChange(๐ โ ๐8, ๐', โฆ , ๐*) 7. if numCoins + 1 < bestNumCoins 8. bestNumCoins ร numCoins + 1
- 9. return bestNumCoins
DPChange(๐, ๐', โฆ , ๐*)
- 1. for ๐ ร 1 to ๐
2. minNumCoins[๐] ร โ
- 3. for ๐ ร 1 to ๐
4. minNumCoins[๐8] ร 1
- 5. for ๐ ร 1 to ๐
6. for ๐ ร 1 to ๐ 7. if ๐ > ๐8 8. minNumCoins[๐] ร min(1 + minNumCoins[๐ โ ๐8], minNumCoins[๐])
- 9. return minNumCoins[M]
Four Different Algorithms
5
Technique Correct? Efficient? Greedy algorithm [GreedyChange] no yes Exhaustive enumeration [ExhaustiveChange] yes no Recursive algorithm [RecursiveChange] yes no Dynamic programming [DPChange] yes yes
Question: How to assess efficiency?
Running Time Analysis
- The running time of an algorithm ๐ต for problem ฮ is the maximum number
- f steps that ๐ต will take on any instance of size ๐ = |๐|
- Asymptotic running time ignores constant factors using Big O notation
6
f(n) g(n)
๐(๐) is ๐(๐ ๐ ) provided there exists ๐ > 0 and ๐J โฅ 0 such that ๐ ๐ โค ๐ ๐(๐) for all ๐ โฅ ๐J
Running Time Analysis โ Example
7
๐ ๐ = 10000 + 500๐M ๐ ๐ = ๐N/2
๐(๐) is ๐(๐ ๐ ) provided there exists ๐ > 0 and ๐J โฅ 0 such that ๐ ๐ โค ๐ ๐(๐) for all ๐ โฅ ๐J Pick ๐ = 1000 and ๐J = 3. Then, ๐(๐) โค ๐๐(๐) for all ๐ โฅ ๐J.
๐ ๐ 1000 ๐ ๐
The Change Problem โ Running Time Analysis
8
GreedyChange(๐, ๐', โฆ , ๐*) 1. for ๐ ร 1 to ๐ 2. ๐8 ร 3. ๐ ร ๐ โ ๐8๐8
bM/cic
DPChange(๐, ๐', โฆ , ๐*)
- 1. for ๐ ร 1 to ๐
2. minNumCoins[๐] ร โ
- 3. for ๐ ร 1 to ๐
4. minNumCoins[๐8] ร 1
- 5. for ๐ ร 1 to ๐
6. for ๐ ร 1 to ๐ 7. if ๐ > ๐8 8. minNumCoins[๐] ร min(1 + minNumCoins[๐ โ ๐8], minNumCoins[๐])
- 9. return minNumCoins[M]
Number of operations:
- Line 2: 3 = ๐(1)
- Line 3: 3 = ๐(1)
- Total: 6๐ = ๐(๐)
Number of operations:
- Lines 1-2: ๐(๐)
- Lines 3-4: ๐(๐)
- Lines 5-8: ๐(๐๐)
- Total: ๐(๐) + ๐(๐) + ๐(๐๐) =
๐(๐๐)
Running Time Analysis โ Guidelines
9
- ๐(๐Q) โ ๐(๐S) for any positive constants ๐ < ๐
- For any constants ๐, ๐ > 0 and ๐ > 1,
๐(๐) โ ๐(log ๐) โ ๐(๐S) โ ๐(๐*)
- We can multiply to learn about other functions. For any constants ๐, ๐ > 0 and ๐ > 1,
๐ ๐๐ = ๐(๐) โ ๐(๐ log ๐) โ ๐ ๐ ๐S = ๐(๐S:') โ ๐(๐๐*)
- Base of the logarithm is a constant and can be ignored. For any constants ๐, ๐ > 1,
๐ logQ ๐ = ๐(logS ๐/ logS ๐) = ๐(1/(logS ๐) logS ๐) = ๐(logS ๐)
Running Time Analysis โ Guidelines
10
Big Oh Name ๐(1) Constant ๐(log ๐) Logarithmic ๐(๐) Linear ๐(๐Z) Quadratic ๐ ๐[ = ๐(poly ๐ ) Polynomial ๐(2^_`a(*)) Exponential
- ๐(๐Q) โ ๐(๐S) for any positive constants ๐ < ๐
- For any constants ๐, ๐ > 0 and ๐ > 1,
๐(๐) โ ๐(log ๐) โ ๐(๐S) โ ๐(๐*)
- We can multiply to learn about other functions. For any constants ๐, ๐ > 0 and ๐ > 1,
๐ ๐๐ = ๐(๐) โ ๐(๐ log ๐) โ ๐ ๐ ๐S = ๐(๐S:') โ ๐(๐๐*)
- Base of the logarithm is a constant and can be ignored. For any constants ๐, ๐ > 1,
๐ logQ ๐ = ๐(logS ๐/ logS ๐) = ๐(1/(logS ๐) logS ๐) = ๐(logS ๐)
Running Time Analysis โ More Examples
11
Question: What is ๐
* b
?
Running Time Analysis โ More Examples
- For constant ๐ > 0 it holds that *
b = O(๐b)
- Recall that ๐! = โ8?'
*
๐
12
Question: What is ๐ ๐! ? Question: What is ๐
* b
?
Running Time Analysis โ More Examples
- For constant ๐ > 0 it holds that *
b = O(๐b)
- Recall that ๐! = โ8?'
*
๐
13
Stirlingโs approximation: ๐! โ 2๐๐
* i *
= 2๐
* ij^ * ๐* = ๐ ๐* = ๐(2* `_k *)
(*) : ๐ / exp ๐ < 1 for all ๐ > 0
(*)
Question: What is ๐ ๐! ? Question: What is ๐
* b
? Question: Is ๐* = ๐ ๐! ?
Running Time Analysis โ More Examples
- For constant ๐ > 0 it holds that *
b = O(๐b)
- Recall that ๐! = โ8?'
*
๐
14
Stirlingโs approximation: ๐! โ 2๐๐
* i *
= 2๐
* ij^ * ๐* = ๐ ๐* = ๐(2* `_k *)
(*) : ๐ / exp ๐ < 1 for all ๐ > 0
(*)
Question: What is ๐ ๐! ? Question: What is ๐
* b
? Question: Is ๐* = ๐ ๐! ? Question: What is ๐ log(๐!) ?
Course Topic #1: Sequence Alignment
15
Question: How do we align sequences to identify similarities/differences?
โThus, although the FOXP2 protein is extremely conserved among mammals, it acquired two amino-acid changes on the human lineage, at least one of which may have functional consequences. This is an intriguing finding, because FOXP2 is the first gene known to be involved in the development of speech and language.โ
Nature (2002)
Alignment
16
An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. v: KITTEN
(m = 6)
w: SITTING
(n = 7)
Input Output
K
- I
T T E N
- S
I
- T
T I N G v: w: Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions.
Alignment
17
An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. v: KITTEN
(m = 6)
w: SITTING
(n = 7)
Input Output
K
- I
T T E N
- S
I
- T
T I N G v: w: Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions.
Edit Distance [Levenshtein, 1966]
18
Edit Distance problem: Given strings ๐ฐ โ ฮฃp and ๐ฑ โ ฮฃ*, compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ. Elementary operations: insertion, deletions and substitutions of single characters
๐ ๐๐๐ฎ, ๐๐๐ฌ = 1 ๐ ๐๐๐ฎ, ๐๐ฎ๐ = 2 ๐ ๐๐๐ฎ, ๐๐ฌ๐ = 3
Computing Edit Distance
19
v: ATGTTAT... w: AGCGTAC... Edit Distance problem: Given strings ๐ฐ โ ฮฃp and ๐ฑ โ ฮฃ*, compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ.
match mismatch
A T
- G
T T T A G C G T
- C
๐ฐ8: ๐ฑ
v:
Optimal substructure: Edit distance obtained from edit distance of prefix of string. ๐ ๐ ๐ โ 1 ๐ โ 1
prefix of ๐ฐ of length ๐ prefix of ๐ฑ of length ๐ insertion deletion
Computing Edit Distance โ Optimal Substructure
20
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Insertion: ๐ ๐, ๐ = ๐ ๐, ๐ โ 1 + 1 Extend by a character in ๐ฑ Match: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ โ 1 Extend by a character in ๐ฐ and ๐ฑ โฆ ๐ฐ8 โฆ
- Deletion: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ + 1
Extend by a character in ๐ฐ Mismatch: ๐ ๐, ๐ = ๐ ๐ โ 1, ๐ โ 1 + 1 Extend by a character in ๐ฐ and ๐ฑ
๐[๐, ๐] is the edit distance of ๐ฐ8 and ๐ฑ
v,
where ๐ฐ8 is prefix of ๐ฐ of length ๐ and ๐ฑ
v is prefix of ๐ฑ of length ๐
Computing Edit Distance โ Recurrence
21
๐[๐, ๐] is the edit distance of ๐ฐ8 and ๐ฑ
v,
where ๐ฐ8 is prefix of ๐ฐ of length ๐ and ๐ฑ
v is prefix of ๐ฑ of length ๐
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃณ d[i 1, j] + 1, d[i, j 1] + 1, d[i 1, j 1] + 1, if vi 6= wj, d[i 1, j 1], if vi = wj.
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ
Computing Edit Distance โ Recurrence
22
๐[๐, ๐] is the edit distance of ๐ฐ8 and ๐ฑ
v,
where ๐ฐ8 is prefix of ๐ฐ of length ๐ and ๐ฑ
v is prefix of ๐ฑ of length ๐
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
Computing Edit Distance โ Dynamic Programming
23
1 2 3 4 1 2 3 4
W A T C G A T G T V
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
24
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
match mismatch insertion deletion 1 1 0 or 1
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
25
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
26
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 ? 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
27
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
28
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 ? 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
29
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 1 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
30
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
31
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
32
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
33
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
34
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1 match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
Computing Edit Distance โ Dynamic Programming
35
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
A T
- G
T A T C G
- A
T G T A T C G
Computing Edit Distance โ Running Time
36
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1
For each ๐ + 1 ร (๐ + 1) entry:
- 3 addition operations
- 1 comparison operation
- 1 minimum operation
Running time: ๐ ๐๐ time
Computing Edit Distance โ Running Time
37
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1
For each ๐ + 1 ร (๐ + 1) entry:
- 3 addition operations
- 1 comparison operation
- 1 minimum operation
Running time: ๐ ๐๐ time
Computing Edit Distance โ Your turn!
38
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
1 2 3 1 2 3
W C A R C A T V
match mismatch insertion deletion
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ
- โฆ
๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
๐, ๐ ๐ โ 1, ๐ ๐ โ 1, ๐ โ 1 ๐, ๐ โ 1
1 1 0 or 1
1 2 3 1 2 3
W A T E C A T V
1 2 3 1 2 3
W A R E C A T V
๐ ๐๐๐ฎ, ๐๐๐ฌ = ๐ ๐๐๐ฎ, ๐๐ฎ๐ = ๐ ๐๐๐ฎ, ๐๐ฌ๐ =
Change Problem and Edit distance
40
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2
W A T C G A T G T V
Value 1 2 3 4 5 6 7 Min # coins 1 2 1 2 1 2 3
Make M cents using minimum number of 1, 3 and 5 cent coins.
- Both have optimal substructure and can be solved using dynamic programming
- These are examples of a more general problem!
Review of Graph Theory
- Graph ๐ป = (๐, ๐น)
- Vertices ๐ = {๐ค', โฆ , ๐ค*}
- Edges ๐น = {(๐ค8, ๐คv), โฆ }
41
Champaign-Urbana Chicago Indianapolis
- St. Louis
Bloomington
Review of Graph Theory
- Directed Graph ๐ป = (๐, ๐น)
- Vertices ๐ = {๐ค', โฆ , ๐ค*}
- Directed edges ๐น = {(๐ค8, ๐คv), โฆ }
42
Champaign-Urbana Chicago Indianapolis
- St. Louis
Bloomington
Review of Graph Theory
- Directed Graph ๐ป = (๐, ๐น)
- Vertices ๐ = {๐ค', โฆ , ๐ค*}
- Directed edges ๐น = {(๐ค8, ๐คv), โฆ }
- Path is a sequence of vertices and edges
that connect them
43
Champaign-Urbana Chicago Indianapolis
- St. Louis
Bloomington
Review of Graph Theory
- Directed Graph ๐ป = (๐, ๐น)
- Vertices ๐ = {๐ค', โฆ , ๐ค*}
- Directed edges ๐น = {(๐ค8, ๐คv), โฆ }
- Path is a sequence of vertices and edges
that connect them
- Edges can be weighted
44
Champaign-Urbana Chicago Indianapolis
- St. Louis
Bloomington 130 50 140 170 150 180
Manhattan Tourist Problem
45
End
* * * * * * * * * * *
Begin
*
A tourist in Manhattan wants to visit the maximum number of attractions (*) by traveling on a path (only eastward and southward) from start to end
Manhattan Tourist Problem
46
End
1 1 1 2 5 1 1 2 1 1 3
Begin
1
A tourist in Manhattan wants to visit the maximum number of attractions (*) by traveling on a path (only eastward and southward) from start to end May be more than 1 attraction on a street. Add weights!
Manhattan Tourist Problem
47
3 2 4 7 3 3 3 1 3 2 4 4 5 6 4 6 5 5 8 2 2 5
1 2 3 1 2 3
i coordinate
13
begin end
4
3 2 4 1 2 4 3 3 1 1 2 2 2 4
19 9 5 15 23 20 3
4
j coordinate
Manhattan Tourist Problem (MTP): Given a weighted, directed grid graph G with two vertices โbeginโ and โendโ, find the maximum weight path in G from โbeginโ to โendโ.
Manhattan Tourist Problem โ Exhaustive Algorithm
48
3 2 4 7 3 3 3 1 3 2 4 4 5 6 4 6 5 5 8 2 2 5
1 2 3 1 2 3
i coordinate
13
begin end
4
3 2 4 1 2 4 3 3 1 1 2 2 2 4
19 9 5 15 23 20 3
4
j coordinate
Check all paths Question: How many paths?
Manhattan Tourist Problem โ Greedy Algorithm
49
1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2
promising start, but leads to bad choices! begin end
18 22
better path!
Manhattan Tourist Problem โ Optimal Substructure
50
1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2
best score to this point begin end
18 20
best score to this point
22
best score to end
Manhattan Tourist Problem โ Optimal Substructure
51
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between ๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between ๐, ๐ โ 1 and ๐, ๐
1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2
best score to this point begin end
18 20
best score to this point
22
best score to end
Question: What is the recurrence?
Manhattan Tourist Problem โ Optimal Substructure
52 1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2
best score to this point begin end
18 20
best score to this point
22
best score to end
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between ๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between ๐, ๐ โ 1 and ๐, ๐
MTP โ Solving Recurrence using Dynamic Programming
53
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
source
MTP โ Solving Recurrence using Dynamic Programming
54
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 5 1 1
i source 1 5 j
MTP โ Solving Recurrence using Dynamic Programming
55
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 2 2 5 3 3 1 2 1 2
i source 1 3 5 8 7 j
MTP โ Solving Recurrence using Dynamic Programming
56
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 2 2 5 3 3 1 2 1 2
i source 1 3 5 8 7 j
5 1 2 5 10 3 3
8 8 12 13
MTP โ Solving Recurrence using Dynamic Programming
57
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 2 2 5 3 3 1 2 1 2
i source 1 3 5 8 7 j
5 1 2 5 10 3 3
8 8 12 13
5 3 3 5
18 16 12
MTP โ Solving Recurrence using Dynamic Programming
58
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 2 2 5 3 3 1 2 1 2
i source 1 3 5 8 7 j
5 1 2 5 10 3 3
8 8 12 13
5 3 3 5
18 16 12
4 5 1
20 21
MTP โ Solving Recurrence using Dynamic Programming
59
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
1 2 2 5 3 3 1 2 1 2
i source 1 3 5 8 7 j
5 1 2 5 10 3 3
8 8 12 13
5 3 3 5
18 16 12
4 5 1
20 21
2
22
MTP โ Solving Recurrence using Dynamic Programming
60
๐ก[๐, ๐] is the best score for path to coordinate (๐, ๐)
s[i, j] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ 0, if i = 0 and j = 0, s[i โ 1, j] + w[(i โ 1, j), (i, j)] if i > 0, s[i, j โ 1] + w[(i, j โ 1), (i, j)] if j > 0.
1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 1 2 3 1 2 3
i source 1 3 8 5 8 8 7 12 13 18 16 12 20 21 j
2
22 S3,3 = 22
Let ๐ be the number of rows and ๐ be the number of columns. Running time: ๐(๐๐) Question: Implementation?
- ๐ฅ[ ๐ โ 1, ๐ , (๐, ๐)] weight of street between
๐ โ 1, ๐ and ๐, ๐
- ๐ฅ[ ๐, ๐ โ 1 , (๐, ๐)] weight of street between
๐, ๐ โ 1 and ๐, ๐
Manhattan Is Not a Perfect Grid
61
What about diagonals?
B A3 A1 A2
s[B] = max ๏ฃฑ ๏ฃด ๏ฃฒ ๏ฃด ๏ฃณ s[A1] + w[A1, B], s[A2] + w[A2, B], s[A3] + w[A3, B].
Manhattan Is Not a Perfect Grid, Itโs a Directed Graph
62
s[0, 0] = 0 s[i, j] = max
(i0,j0) 2 pred(i,j){s[i0, j0] + w[(i0, j0), (i, j)]}
๐, ๐ pred ๐, ๐
๐ป = (๐, ๐น) is a directed acyclic graph (DAG) with nonnegative edges weights ๐ฅ โถ ๐น โ โ: Each edge is evaluated
- nce: ๐( ๐น ) time
Dynamic Programming as a Graph Problem
63
End
* * * * * * * * * * *
Begin
*
Manhattan Tourist Problem: Every path in directed graph is a possible tourist path. Find maximum weight path. Running time: ๐ ๐๐ = ๐( ๐น ) Change Problem: Make M cents using minimum number of coins ๐ = 1, 3, 5 . Every path in directed graph is a possible
- change. Find shortest path.
Running time: ๐ ๐๐ = ๐( ๐น )
What About the Edit Distance Problem?
64
1 2 3 4 1 2 3 4
W A T C G A T G T V
Edit Distance problem: Given strings ๐ฐ โ ฮฃp and ๐ฑ โ ฮฃ*, compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ.
match mismatch insertion deletion
- ๐ฑ
v
๐ฐ8
- ๐ฐ8
๐ฑ
v
๐ฐ8 ๐ฑ
v
What About the Edit Distance Problem?
65
1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O
W A T C G A T G T V
Edit Distance problem: Given strings ๐ฐ โ ฮฃp and ๐ฑ โ ฮฃ*, compute the minimum number ๐(๐ฐ, ๐ฑ) of elementary operations to transform ๐ฐ into ๐ฑ.
match mismatch insertion deletion
- ๐ฑ
v
๐ฐ8
- ๐ฐ8
๐ฑ
v
๐ฐ8 ๐ฑ
v
Edit graph is a weighed, directed grid graph ๐ป = (๐, ๐น) with source vertex (0, 0) and target vertex (๐, ๐). Each edge (๐, ๐) has weight [๐, ๐] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0).
What About the Edit Distance Problem?
66
1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O
W A T C G A T G T V
match mismatch insertion deletion
- ๐ฑ
v
๐ฐ8
- ๐ฐ8
๐ฑ
v
๐ฐ8 ๐ฑ
v
Edit graph is a weighed, directed grid graph ๐ป = (๐, ๐น) with source vertex (0, 0) and target vertex (๐, ๐). Each edge (๐, ๐) has weight [๐, ๐] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0). Alignment is a path from (0, 0) to (๐, ๐)
What About the Edit Distance Problem?
67
1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O
W A T C G A T G T V
match mismatch insertion deletion
- ๐ฑ
v
๐ฐ8
- ๐ฐ8
๐ฑ
v
๐ฐ8 ๐ฑ
v
Edit Distance problem: Given edit graph ๐ป = (๐, ๐น), with edge weights c โถ ๐น โ 0,1 . Find shortest path from (0, 0) to (๐, ๐).
Edit graph is a weighed, directed grid graph ๐ป = (๐, ๐น) with source vertex (0, 0) and target vertex (๐, ๐). Each edge (๐, ๐) has weight [๐, ๐] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0). Alignment is a path from (0, 0) to (๐, ๐)
Shortest Path vs Longest Path
- Change graph, edit graph and the MTP grid are directed graphs G.
- Change problem and Edit Distance problem are minimization problems.
- Find shortest path in G from source to sink.
- Manhattan Tourist problem is a maximization problem.
- Find longest path in G from source to sink.
68
Shortest Path vs Longest Path
- Shortest path in directed graphs can be found efficiently (Dijkstra, Bellman-
Ford, Floyd-Warshall algorithms)
- Longest path in direct graphs cannot be found efficiently (NP-hard).
- Change graph, edit graph and MTP grid graph are directed acylic graphs
(DAGs).
- No directed cycles.
- Longest path problem in a DAG can
solved efficiently by dynamic programming
69
Question: Whatโs the relation between absence of directed cycles and optimal substructure?
directed cycle
Weighted Edit Distance
70
๐[๐, ๐] is the edit distance of ๐ฐ8 and ๐ฑ
v,
where ๐ฐ8 is prefix of ๐ฐ of length ๐ and ๐ฑ
v is prefix of ๐ฑ of length ๐
d[i, j] = min ๏ฃฑ ๏ฃด ๏ฃด ๏ฃด ๏ฃฒ ๏ฃด ๏ฃด ๏ฃด ๏ฃณ d[i 1, j] + 1, d[i, j 1] + 1, d[i 1, j 1] + 1, if vi 6= wj, d[i 1, j 1], if vi = wj.
โฆ
- โฆ
๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ ๐ฑ
v
โฆ ๐ฐ8 โฆ
- deletion
insertion mismatch
Replace +1 with different penalties for different types of edits.
Summary
- 1. Change problem
- 2. Review of running time analysis
- 3. Edit distance
- 4. Review elementary graph theory
- 5. Manhattan Tourist problem
- 6. Longest/shortest paths in DAGs
Reading:
- Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
- Lecture notes
71
Sources
- CS 362 by Layla Oesper (Carleton College)
- CS 1810 by Ben Raphael (Brown/Princeton University)
- An Introduction to Bioinformatics Algorithms book (Jones and Pevzner)
- http://bioalgorithms.info/
72