[PPT] - CS CS 466 466 In Introduct ctio ion t to B Bio ioin PowerPoint Presentation

SLIDE 1

CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics

Lecture 2 Part 1

Mohammed El-Kebir January 28, 2020

SLIDE 2

Outline

1. Change problem
2. Review of running time analysis
3. Edit distance
4. Review elementary graph theory
5. Manhattan Tourist problem
6. Longest/shortest paths in DAGs

Reading:

Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
Lecture notes

2

SLIDE 3

The Change Problem

3

Suppose we have 𝑜 = 3 coins:
What is the minimum number of coins needed to make change for 𝑁 = 9

cents?

Answer: 𝑒', … , 𝑒* = (1, 0, 2) thus 1 + 0 + 2 = 3 coins.

1 cent 7 cent 3 cent

𝐝 = ( ) , ,

Change Problem: Given amount 𝑁 ∈ ℕ ∖ {0} and coins 𝐝 = 𝑑', … , 𝑑* ∈ ℕ* s.t. 𝑑* = 1 and 𝑑8 ≥ 𝑑8:' for all 𝑗 ∈ 𝑜 − 1 = {1, … , 𝑜 − 1}, find 𝐞 = 𝑒', … , 𝑒* ∈ ℕ* s.t. (i) 𝑁 = ∑8?'

*

𝑑8𝑒8 and (ii) ∑8?'

*

𝑒8 is minimum

SLIDE 4

The Change Problem – Four Algorithms

4

GreedyChange(𝑁, 𝑑', … , 𝑑*) 1. for 𝑗 ß 1 to 𝑜 2. 𝑒8 ß 3. 𝑁 ß 𝑁 − 𝑒8𝑑8

bM/cic

ExhaustiveChange(𝑁, 𝑑', … , 𝑑*) 1. for 2. if 3. return

(d1, . . . , dn) 2 [bM/c1c] ⇥ . . . ⇥ [bM/cnc] Pn

i=1 cidi = M

(d1, . . . , dn)

RecursiveChange(𝑁, 𝑑', … , 𝑑*)

1. if 𝑁 = 0

2. return 0

3. bestNumCoins ß ∞
4. for 𝑗 ß 1 to 𝑜

5. if 𝑁 ≥ 𝑑8 6. numCoins ß RecursiveChange(𝑁 − 𝑑8, 𝑑', … , 𝑑*) 7. if numCoins + 1 < bestNumCoins 8. bestNumCoins ß numCoins + 1

9. return bestNumCoins

DPChange(𝑁, 𝑑', … , 𝑑*)

1. for 𝑛 ß 1 to 𝑁

2. minNumCoins[𝑛] ß ∞

3. for 𝑗 ß 1 to 𝑜

4. minNumCoins[𝑑8] ß 1

5. for 𝑛 ß 1 to 𝑁

6. for 𝑗 ß 1 to 𝑜 7. if 𝑛 > 𝑑8 8. minNumCoins[𝑛] ß min(1 + minNumCoins[𝑛 − 𝑑8], minNumCoins[𝑛])

9. return minNumCoins[M]

SLIDE 5

Four Different Algorithms

5

Technique Correct? Efficient? Greedy algorithm [GreedyChange] no yes Exhaustive enumeration [ExhaustiveChange] yes no Recursive algorithm [RecursiveChange] yes no Dynamic programming [DPChange] yes yes

Question: How to assess efficiency?

SLIDE 6

Running Time Analysis

The running time of an algorithm 𝐵 for problem Π is the maximum number
f steps that 𝐵 will take on any instance of size 𝑜 = |𝑌|
Asymptotic running time ignores constant factors using Big O notation

6

f(n) g(n)

𝑔(𝑜) is 𝑃(𝑕 𝑜 ) provided there exists 𝑑 > 0 and 𝑜J ≥ 0 such that 𝑔 𝑜 ≤ 𝑑 𝑕(𝑜) for all 𝑜 ≥ 𝑜J

SLIDE 7

Running Time Analysis – Example

7

𝑔 𝑜 = 10000 + 500𝑜M 𝑕 𝑜 = 𝑜N/2

𝑔(𝑜) is 𝑃(𝑕 𝑜 ) provided there exists 𝑑 > 0 and 𝑜J ≥ 0 such that 𝑔 𝑜 ≤ 𝑑 𝑕(𝑜) for all 𝑜 ≥ 𝑜J Pick 𝑑 = 1000 and 𝑜J = 3. Then, 𝑔(𝑜) ≤ 𝑑𝑕(𝑜) for all 𝑜 ≥ 𝑜J.

𝑔 𝑜 1000 𝑕 𝑜

SLIDE 8

The Change Problem – Running Time Analysis

8

GreedyChange(𝑁, 𝑑', … , 𝑑*) 1. for 𝑗 ß 1 to 𝑜 2. 𝑒8 ß 3. 𝑁 ß 𝑁 − 𝑒8𝑑8

bM/cic

DPChange(𝑁, 𝑑', … , 𝑑*)

1. for 𝑛 ß 1 to 𝑁

2. minNumCoins[𝑛] ß ∞

3. for 𝑗 ß 1 to 𝑜

4. minNumCoins[𝑑8] ß 1

5. for 𝑛 ß 1 to 𝑁

6. for 𝑗 ß 1 to 𝑜 7. if 𝑛 > 𝑑8 8. minNumCoins[𝑛] ß min(1 + minNumCoins[𝑛 − 𝑑8], minNumCoins[𝑛])

9. return minNumCoins[M]

Number of operations:

Line 2: 3 = 𝑃(1)
Line 3: 3 = 𝑃(1)
Total: 6𝑜 = 𝑃(𝑜)

Number of operations:

Lines 1-2: 𝑃(𝑁)
Lines 3-4: 𝑃(𝑜)
Lines 5-8: 𝑃(𝑁𝑜)
Total: 𝑃(𝑁) + 𝑃(𝑜) + 𝑃(𝑁𝑜) =

𝑃(𝑁𝑜)

SLIDE 9

Running Time Analysis – Guidelines

9

𝑃(𝑜Q) ⊂ 𝑃(𝑜S) for any positive constants 𝑏 < 𝑐
For any constants 𝑏, 𝑐 > 0 and 𝑑 > 1,

𝑃(𝑏) ⊂ 𝑃(log 𝑜) ⊂ 𝑃(𝑜S) ⊂ 𝑃(𝑑*)

We can multiply to learn about other functions. For any constants 𝑏, 𝑐 > 0 and 𝑑 > 1,

𝑃 𝑏𝑜 = 𝑃(𝑜) ⊂ 𝑃(𝑜 log 𝑜) ⊂ 𝑃 𝑜 𝑜S = 𝑃(𝑜S:') ⊂ 𝑃(𝑜𝑑*)

Base of the logarithm is a constant and can be ignored. For any constants 𝑏, 𝑐 > 1,

𝑃 logQ 𝑜 = 𝑃(logS 𝑜/ logS 𝑏) = 𝑃(1/(logS 𝑏) logS 𝑜) = 𝑃(logS 𝑜)

SLIDE 10

Running Time Analysis – Guidelines

10

Big Oh Name 𝑃(1) Constant 𝑃(log 𝑜) Logarithmic 𝑃(𝑜) Linear 𝑃(𝑜Z) Quadratic 𝑃 𝑜[ = 𝑃(poly 𝑜 ) Polynomial 𝑃(2^_`a(*)) Exponential

𝑃(𝑜Q) ⊂ 𝑃(𝑜S) for any positive constants 𝑏 < 𝑐
For any constants 𝑏, 𝑐 > 0 and 𝑑 > 1,

𝑃(𝑏) ⊂ 𝑃(log 𝑜) ⊂ 𝑃(𝑜S) ⊂ 𝑃(𝑑*)

We can multiply to learn about other functions. For any constants 𝑏, 𝑐 > 0 and 𝑑 > 1,

𝑃 𝑏𝑜 = 𝑃(𝑜) ⊂ 𝑃(𝑜 log 𝑜) ⊂ 𝑃 𝑜 𝑜S = 𝑃(𝑜S:') ⊂ 𝑃(𝑜𝑑*)

Base of the logarithm is a constant and can be ignored. For any constants 𝑏, 𝑐 > 1,

𝑃 logQ 𝑜 = 𝑃(logS 𝑜/ logS 𝑏) = 𝑃(1/(logS 𝑏) logS 𝑜) = 𝑃(logS 𝑜)

SLIDE 11

Running Time Analysis – More Examples

11

Question: What is 𝑃

* b

?

SLIDE 12

Running Time Analysis – More Examples

For constant 𝑙 > 0 it holds that *

b = O(𝑜b)

Recall that 𝑜! = ∏8?'

*

𝑗

12

Question: What is 𝑃 𝑜! ? Question: What is 𝑃

* b

?

SLIDE 13

Running Time Analysis – More Examples

For constant 𝑙 > 0 it holds that *

b = O(𝑜b)

Recall that 𝑜! = ∏8?'

*

𝑗

13

Stirling’s approximation: 𝑜! ≈ 2𝜌𝑜

* i *

= 2𝜌

* ij^ * 𝑜* = 𝑃 𝑜* = 𝑃(2* `_k *)

(*) : 𝑜 / exp 𝑜 < 1 for all 𝑜 > 0

(*)

Question: What is 𝑃 𝑜! ? Question: What is 𝑃

* b

? Question: Is 𝑜* = 𝑃 𝑜! ?

SLIDE 14

Running Time Analysis – More Examples

For constant 𝑙 > 0 it holds that *

b = O(𝑜b)

Recall that 𝑜! = ∏8?'

*

𝑗

14

Stirling’s approximation: 𝑜! ≈ 2𝜌𝑜

* i *

= 2𝜌

* ij^ * 𝑜* = 𝑃 𝑜* = 𝑃(2* `_k *)

(*) : 𝑜 / exp 𝑜 < 1 for all 𝑜 > 0

(*)

Question: What is 𝑃 𝑜! ? Question: What is 𝑃

* b

? Question: Is 𝑜* = 𝑃 𝑜! ? Question: What is 𝑃 log(𝑜!) ?

SLIDE 15

Course Topic #1: Sequence Alignment

15

Question: How do we align sequences to identify similarities/differences?

“Thus, although the FOXP2 protein is extremely conserved among mammals, it acquired two amino-acid changes on the human lineage, at least one of which may have functional consequences. This is an intriguing finding, because FOXP2 is the first gene known to be involved in the development of speech and language.”

Nature (2002)

SLIDE 16

Alignment

16

An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. v: KITTEN

(m = 6)

w: SITTING

(n = 7)

Input Output

K

I

T T E N

S

I

T

T I N G v: w: Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions.

SLIDE 17

Alignment

17

An alignment between two strings v (of m characters) and w (of n characters) is a two row matrix where the first row contains the characters of v in order, the second row contains the characters of w in order, and spaces may be interspersed throughout each. v: KITTEN

(m = 6)

w: SITTING

(n = 7)

Input Output

K

I

T T E N

S

I

T

T I N G v: w: Question: Is this a good alignment? Answer: Count the number of insertion, deletions, substitutions.

SLIDE 18

Edit Distance [Levenshtein, 1966]

18

Edit Distance problem: Given strings 𝐰 ∈ Σp and 𝐱 ∈ Σ*, compute the minimum number 𝑒(𝐰, 𝐱) of elementary operations to transform 𝐰 into 𝐱. Elementary operations: insertion, deletions and substitutions of single characters

𝑒 𝐝𝐛𝐮, 𝐝𝐛𝐬 = 1 𝑒 𝐝𝐛𝐮, 𝐛𝐮𝐟 = 2 𝑒 𝐝𝐛𝐮, 𝐛𝐬𝐟 = 3

SLIDE 19

Computing Edit Distance

19

v: ATGTTAT... w: AGCGTAC... Edit Distance problem: Given strings 𝐰 ∈ Σp and 𝐱 ∈ Σ*, compute the minimum number 𝑒(𝐰, 𝐱) of elementary operations to transform 𝐰 into 𝐱.

match mismatch

A T

G

T T T A G C G T

C

𝐰8: 𝐱

v:

Optimal substructure: Edit distance obtained from edit distance of prefix of string. 𝑗 𝑘 𝑘 − 1 𝑗 − 1

prefix of 𝐰 of length 𝑗 prefix of 𝐱 of length 𝑘 insertion deletion

SLIDE 20

Computing Edit Distance – Optimal Substructure

20

…

…

𝐱

v

… 𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

Insertion: 𝑒 𝑗, 𝑘 = 𝑒 𝑗, 𝑘 − 1 + 1 Extend by a character in 𝐱 Match: 𝑒 𝑗, 𝑘 = 𝑒 𝑗 − 1, 𝑘 − 1 Extend by a character in 𝐰 and 𝐱 … 𝐰8 …

Deletion: 𝑒 𝑗, 𝑘 = 𝑒 𝑗 − 1, 𝑘 + 1

Extend by a character in 𝐰 Mismatch: 𝑒 𝑗, 𝑘 = 𝑒 𝑗 − 1, 𝑘 − 1 + 1 Extend by a character in 𝐰 and 𝐱

𝑒[𝑗, 𝑘] is the edit distance of 𝐰8 and 𝐱

v,

where 𝐰8 is prefix of 𝐰 of length 𝑗 and 𝐱

v is prefix of 𝐱 of length 𝑘

SLIDE 21

Computing Edit Distance – Recurrence

21

𝑒[𝑗, 𝑘] is the edit distance of 𝐰8 and 𝐱

v,

where 𝐰8 is prefix of 𝐰 of length 𝑗 and 𝐱

v is prefix of 𝐱 of length 𝑘

d[i, j] = min          d[i 1, j] + 1, d[i, j 1] + 1, d[i 1, j 1] + 1, if vi 6= wj, d[i 1, j 1], if vi = wj.

…

…

𝐱

v

… 𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

… 𝐰8 …

SLIDE 22

Computing Edit Distance – Recurrence

22

𝑒[𝑗, 𝑘] is the edit distance of 𝐰8 and 𝐱

v,

where 𝐰8 is prefix of 𝐰 of length 𝑗 and 𝐱

v is prefix of 𝐱 of length 𝑘

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

SLIDE 23

Computing Edit Distance – Dynamic Programming

23

1 2 3 4 1 2 3 4

W A T C G A T G T V

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 24

Computing Edit Distance – Dynamic Programming

24

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

match mismatch insertion deletion 1 1 0 or 1

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 25

Computing Edit Distance – Dynamic Programming

25

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 26

Computing Edit Distance – Dynamic Programming

26

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 ? 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 27

Computing Edit Distance – Dynamic Programming

27

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 28

Computing Edit Distance – Dynamic Programming

28

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 ? 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 29

Computing Edit Distance – Dynamic Programming

29

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 1 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 30

Computing Edit Distance – Dynamic Programming

30

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 31

Computing Edit Distance – Dynamic Programming

31

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 32

Computing Edit Distance – Dynamic Programming

32

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 2 2 3 3 4 4

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 33

Computing Edit Distance – Dynamic Programming

33

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 34

Computing Edit Distance – Dynamic Programming

34

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1 match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

SLIDE 35

Computing Edit Distance – Dynamic Programming

35

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

A T

G

T A T C G

A

T G T A T C G

SLIDE 36

Computing Edit Distance – Running Time

36

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1

For each 𝑛 + 1 × (𝑜 + 1) entry:

3 addition operations
1 comparison operation
1 minimum operation

Running time: 𝑃 𝑛𝑜 time

SLIDE 37

Computing Edit Distance – Running Time

37

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1

For each 𝑛 + 1 × (𝑜 + 1) entry:

3 addition operations
1 comparison operation
1 minimum operation

Running time: 𝑃 𝑛𝑜 time

SLIDE 38

Computing Edit Distance – Your turn!

38

d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.

1 2 3 1 2 3

W C A R C A T V

match mismatch insertion deletion

…

…

𝐱

v

… 𝐰8 …

…

𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

𝑗, 𝑘 𝑗 − 1, 𝑘 𝑗 − 1, 𝑘 − 1 𝑗, 𝑘 − 1

1 1 0 or 1

1 2 3 1 2 3

W A T E C A T V

1 2 3 1 2 3

W A R E C A T V

𝑒 𝐝𝐛𝐮, 𝐝𝐛𝐬 = 𝑒 𝐝𝐛𝐮, 𝐛𝐮𝐟 = 𝑒 𝐝𝐛𝐮, 𝐛𝐬𝐟 =

SLIDE 39

Change Problem and Edit distance

40

1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 1 2 3 3 2 1 1 1 4 4 3 2 2 2

W A T C G A T G T V

Value 1 2 3 4 5 6 7 Min # coins 1 2 1 2 1 2 3

Make M cents using minimum number of 1, 3 and 5 cent coins.

Both have optimal substructure and can be solved using dynamic programming
These are examples of a more general problem!

SLIDE 40

Review of Graph Theory

Graph 𝐻 = (𝑊, 𝐹)
Vertices 𝑊 = {𝑤', … , 𝑤*}
Edges 𝐹 = {(𝑤8, 𝑤v), … }

41

Champaign-Urbana Chicago Indianapolis

St. Louis

Bloomington

SLIDE 41

Review of Graph Theory

Directed Graph 𝐻 = (𝑊, 𝐹)
Vertices 𝑊 = {𝑤', … , 𝑤*}
Directed edges 𝐹 = {(𝑤8, 𝑤v), … }

42

Champaign-Urbana Chicago Indianapolis

St. Louis

Bloomington

SLIDE 42

Review of Graph Theory

Directed Graph 𝐻 = (𝑊, 𝐹)
Vertices 𝑊 = {𝑤', … , 𝑤*}
Directed edges 𝐹 = {(𝑤8, 𝑤v), … }
Path is a sequence of vertices and edges

that connect them

43

Champaign-Urbana Chicago Indianapolis

St. Louis

Bloomington

SLIDE 43

Review of Graph Theory

Directed Graph 𝐻 = (𝑊, 𝐹)
Vertices 𝑊 = {𝑤', … , 𝑤*}
Directed edges 𝐹 = {(𝑤8, 𝑤v), … }
Path is a sequence of vertices and edges

that connect them

Edges can be weighted

44

Champaign-Urbana Chicago Indianapolis

St. Louis

Bloomington 130 50 140 170 150 180

SLIDE 44

Manhattan Tourist Problem

45

End

* * * * * * * * * * *

Begin

*

A tourist in Manhattan wants to visit the maximum number of attractions (*) by traveling on a path (only eastward and southward) from start to end

SLIDE 45

Manhattan Tourist Problem

46

End

1 1 1 2 5 1 1 2 1 1 3

Begin

1

A tourist in Manhattan wants to visit the maximum number of attractions (*) by traveling on a path (only eastward and southward) from start to end May be more than 1 attraction on a street. Add weights!

SLIDE 46

Manhattan Tourist Problem

47

3 2 4 7 3 3 3 1 3 2 4 4 5 6 4 6 5 5 8 2 2 5

1 2 3 1 2 3

i coordinate

13

begin end

4

3 2 4 1 2 4 3 3 1 1 2 2 2 4

19 9 5 15 23 20 3

4

j coordinate

Manhattan Tourist Problem (MTP): Given a weighted, directed grid graph G with two vertices “begin” and “end”, find the maximum weight path in G from “begin” to “end”.

SLIDE 47

Manhattan Tourist Problem – Exhaustive Algorithm

48

3 2 4 7 3 3 3 1 3 2 4 4 5 6 4 6 5 5 8 2 2 5

1 2 3 1 2 3

i coordinate

13

begin end

4

3 2 4 1 2 4 3 3 1 1 2 2 2 4

19 9 5 15 23 20 3

4

j coordinate

Check all paths Question: How many paths?

SLIDE 48

Manhattan Tourist Problem – Greedy Algorithm

49

1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2

promising start, but leads to bad choices! begin end

18 22

better path!

SLIDE 49

Manhattan Tourist Problem – Optimal Substructure

50

1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2

best score to this point begin end

18 20

best score to this point

22

best score to end

SLIDE 50

Manhattan Tourist Problem – Optimal Substructure

51

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between 𝑗 − 1, 𝑘 and 𝑗, 𝑘
𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between 𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2

best score to this point begin end

18 20

best score to this point

22

best score to end

Question: What is the recurrence?

SLIDE 51

Manhattan Tourist Problem – Optimal Substructure

52 1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 2

best score to this point begin end

18 20

best score to this point

22

best score to end

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between 𝑗 − 1, 𝑘 and 𝑗, 𝑘
𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between 𝑗, 𝑘 − 1 and 𝑗, 𝑘

SLIDE 52

MTP – Solving Recurrence using Dynamic Programming

53

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

source

SLIDE 53

MTP – Solving Recurrence using Dynamic Programming

54

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 5 1 1

i source 1 5 j

SLIDE 54

MTP – Solving Recurrence using Dynamic Programming

55

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 2 5 3 3 1 2 1 2

i source 1 3 5 8 7 j

SLIDE 55

MTP – Solving Recurrence using Dynamic Programming

56

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 2 5 3 3 1 2 1 2

i source 1 3 5 8 7 j

5 1 2 5 10 3 3

8 8 12 13

SLIDE 56

MTP – Solving Recurrence using Dynamic Programming

57

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 2 5 3 3 1 2 1 2

i source 1 3 5 8 7 j

5 1 2 5 10 3 3

8 8 12 13

5 3 3 5

18 16 12

SLIDE 57

MTP – Solving Recurrence using Dynamic Programming

58

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 2 5 3 3 1 2 1 2

i source 1 3 5 8 7 j

5 1 2 5 10 3 3

8 8 12 13

5 3 3 5

18 16 12

4 5 1

20 21

SLIDE 58

MTP – Solving Recurrence using Dynamic Programming

59

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

1 2 2 5 3 3 1 2 1 2

i source 1 3 5 8 7 j

5 1 2 5 10 3 3

8 8 12 13

5 3 3 5

18 16 12

4 5 1

20 21

2

22

SLIDE 59

MTP – Solving Recurrence using Dynamic Programming

60

𝑡[𝑗, 𝑘] is the best score for path to coordinate (𝑗, 𝑘)

s[i, j] = max      0, if i = 0 and j = 0, s[i − 1, j] + w[(i − 1, j), (i, j)] if i > 0, s[i, j − 1] + w[(i, j − 1), (i, j)] if j > 0.

1 2 5 2 1 5 2 3 4 5 3 3 5 10 3 5 5 1 1 2 3 1 2 3

i source 1 3 8 5 8 8 7 12 13 18 16 12 20 21 j

2

22 S3,3 = 22

Let 𝑛 be the number of rows and 𝑜 be the number of columns. Running time: 𝑃(𝑛𝑜) Question: Implementation?

𝑥[ 𝑗 − 1, 𝑘 , (𝑗, 𝑘)] weight of street between

𝑗 − 1, 𝑘 and 𝑗, 𝑘

𝑥[ 𝑗, 𝑘 − 1 , (𝑗, 𝑘)] weight of street between

𝑗, 𝑘 − 1 and 𝑗, 𝑘

SLIDE 60

Manhattan Is Not a Perfect Grid

61

What about diagonals?

B A3 A1 A2

s[B] = max      s[A1] + w[A1, B], s[A2] + w[A2, B], s[A3] + w[A3, B].

SLIDE 61

Manhattan Is Not a Perfect Grid, It’s a Directed Graph

62

s[0, 0] = 0 s[i, j] = max

(i0,j0) 2 pred(i,j){s[i0, j0] + w[(i0, j0), (i, j)]}

𝑗, 𝑘 pred 𝑗, 𝑘

𝐻 = (𝑊, 𝐹) is a directed acyclic graph (DAG) with nonnegative edges weights 𝑥 ∶ 𝐹 → ℝ: Each edge is evaluated

nce: 𝑃( 𝐹 ) time

SLIDE 62

Dynamic Programming as a Graph Problem

63

End

* * * * * * * * * * *

Begin

*

Manhattan Tourist Problem: Every path in directed graph is a possible tourist path. Find maximum weight path. Running time: 𝑃 𝑛𝑜 = 𝑃( 𝐹 ) Change Problem: Make M cents using minimum number of coins 𝐝 = 1, 3, 5 . Every path in directed graph is a possible

change. Find shortest path.

Running time: 𝑃 𝑁𝑜 = 𝑃( 𝐹 )

SLIDE 63

What About the Edit Distance Problem?

64

1 2 3 4 1 2 3 4

W A T C G A T G T V

Edit Distance problem: Given strings 𝐰 ∈ Σp and 𝐱 ∈ Σ*, compute the minimum number 𝑒(𝐰, 𝐱) of elementary operations to transform 𝐰 into 𝐱.

match mismatch insertion deletion

𝐱

v

𝐰8

𝐰8

𝐱

v

𝐰8 𝐱

v

SLIDE 64

What About the Edit Distance Problem?

65

1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O

W A T C G A T G T V

Edit Distance problem: Given strings 𝐰 ∈ Σp and 𝐱 ∈ Σ*, compute the minimum number 𝑒(𝐰, 𝐱) of elementary operations to transform 𝐰 into 𝐱.

match mismatch insertion deletion

𝐱

v

𝐰8

𝐰8

𝐱

v

𝐰8 𝐱

v

Edit graph is a weighed, directed grid graph 𝐻 = (𝑊, 𝐹) with source vertex (0, 0) and target vertex (𝑛, 𝑜). Each edge (𝑗, 𝑘) has weight [𝑗, 𝑘] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0).

SLIDE 65

What About the Edit Distance Problem?

66

1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O

W A T C G A T G T V

match mismatch insertion deletion

𝐱

v

𝐰8

𝐰8

𝐱

v

𝐰8 𝐱

v

Edit graph is a weighed, directed grid graph 𝐻 = (𝑊, 𝐹) with source vertex (0, 0) and target vertex (𝑛, 𝑜). Each edge (𝑗, 𝑘) has weight [𝑗, 𝑘] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0). Alignment is a path from (0, 0) to (𝑛, 𝑜)

SLIDE 66

What About the Edit Distance Problem?

67

1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O

W A T C G A T G T V

match mismatch insertion deletion

𝐱

v

𝐰8

𝐰8

𝐱

v

𝐰8 𝐱

v

Edit Distance problem: Given edit graph 𝐻 = (𝑊, 𝐹), with edge weights c ∶ 𝐹 → 0,1 . Find shortest path from (0, 0) to (𝑛, 𝑜).

Edit graph is a weighed, directed grid graph 𝐻 = (𝑊, 𝐹) with source vertex (0, 0) and target vertex (𝑛, 𝑜). Each edge (𝑗, 𝑘) has weight [𝑗, 𝑘] corresponding to edit cost: deletion (1), insertion (1), mismatch (1) and match (0). Alignment is a path from (0, 0) to (𝑛, 𝑜)

SLIDE 67

Shortest Path vs Longest Path

Change graph, edit graph and the MTP grid are directed graphs G.
Change problem and Edit Distance problem are minimization problems.
Find shortest path in G from source to sink.
Manhattan Tourist problem is a maximization problem.
Find longest path in G from source to sink.

68

SLIDE 68

Shortest Path vs Longest Path

Shortest path in directed graphs can be found efficiently (Dijkstra, Bellman-

Ford, Floyd-Warshall algorithms)

Longest path in direct graphs cannot be found efficiently (NP-hard).
Change graph, edit graph and MTP grid graph are directed acylic graphs

(DAGs).

No directed cycles.
Longest path problem in a DAG can

solved efficiently by dynamic programming

69

Question: What’s the relation between absence of directed cycles and optimal substructure?

directed cycle

SLIDE 69

Weighted Edit Distance

70

𝑒[𝑗, 𝑘] is the edit distance of 𝐰8 and 𝐱

v,

where 𝐰8 is prefix of 𝐰 of length 𝑗 and 𝐱

v is prefix of 𝐱 of length 𝑘

d[i, j] = min          d[i 1, j] + 1, d[i, j 1] + 1, d[i 1, j 1] + 1, if vi 6= wj, d[i 1, j 1], if vi = wj.

…

…

𝐱

v

… 𝐰8 … 𝐱

v

… 𝐰8 … 𝐱

v

… 𝐰8 …

deletion

insertion mismatch

Replace +1 with different penalties for different types of edits.

SLIDE 70

Summary

1. Change problem
2. Review of running time analysis
3. Edit distance
4. Review elementary graph theory
5. Manhattan Tourist problem
6. Longest/shortest paths in DAGs

Reading:

Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4
Lecture notes

71

SLIDE 71

Sources

CS 362 by Layla Oesper (Carleton College)
CS 1810 by Ben Raphael (Brown/Princeton University)
An Introduction to Bioinformatics Algorithms book (Jones and Pevzner)
http://bioalgorithms.info/

72