CS CS 466 466 In Introduct ctio ion t to B Bio ioin - - PowerPoint PPT Presentation
CS CS 466 466 In Introduct ctio ion t to B Bio ioin - - PowerPoint PPT Presentation
CS CS 466 466 In Introduct ctio ion t to B Bio ioin informatics ics Lecture 2 Part 2 Mohammed El-Kebir January 28, 2020 Outline 1. Edit distance recap 2. Global alignment 3. Fitting alignment 4. Local alignment 5. Gapped
Outline
- 1. Edit distance recap
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.8 and 6.9
- Lecture notes
2
Weighted Edit Distance β Practice Problem
- Compute weighted edit distance between π° = AGT and π± = ATCT.
3
1 2 3 4 1 2 3
A T C G A G T V w
d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 2, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
Weighted Edit Distance β Practice Problem
- Compute weighted edit distance between π° = AGT and π± = ATCT.
4
1 2 3 4 1 2 3 4 1 1 1 2 3 2 2 1 2 3 2 3 3 2 1 2 3
A T C G A G T V w
d[i, j] = min                0, if i = 0 and j = 0, d[i 1, j] + 1, if i > 0, d[i, j 1] + 1, if j > 0, d[i 1, j 1] + 2, if i > 0, j > 0 and vi 6= wj, d[i 1, j 1], if i > 0, j > 0 and vi = wj.
Edit Distance β Additional Insights
- An alignment corresponds to a series of elementary operations
5
Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf
Edit Distance β Additional Insights
- An alignment corresponds to a series of elementary operations
- But not every series of elementary operations corresponds to an alignment! Why?
6
Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf
Distance Function / Metric
7
A distance function (metric) on a set π is a function π βΆ π Γ π β β s.t. for all π¦, π§, π¨ β π: i. π π¦, π§ β₯ 0 [non-negativity]
- ii. π π¦, π§ = 0 if and only if π¦ = π§
[identity of indiscernibles]
- iii. π π¦, π§ = π(π§, π¦)
[symmetry]
- iv. π π¦, π§ β€ π π¦, π¨ + π(π¨, π§)
[triangle inequality] Question: Is edit distance a distance function?
Edit Distance is a Distance Function
8
Edit distance π(π°, π±) is the minimum number of elementary operations to transform π° β Ξ£β into π± β Ξ£β. Claim: edit distance is a distance function. Proof: Let π―, π°, π± β Ξ£β. i. π π°, π± β₯ 0 [non-negativity] Edit distance is defined by an alignment. This in turn uniquely determines a series of elementary operations, each with cost either 0 (match) or 1 (otherwise). Thus, π π°, π± β₯ 0.
Edit Distance is a Distance Function
9
Edit distance π(π°, π±) is the minimum number of elementary operations to transform π° β Ξ£β into π± β Ξ£β.
Proof: Let π―, π°, π± β Ξ£β. ii. π π°, π± = 0 if and only if π° = π± [identity of indiscernibles] (=>) By the premise, π π°, π± = 0. By definition, the optimal alignment can only consist
- f operations with cost 0. That is, the alignment consist of only matches. Thus, π° = π±.
(<=) By the premise, π° = π±. Thus, there exists an alignment where every pair of columns is a match. This means that |π°| = |π±| and each letter π€A equals π₯A (where π β [|π°|]). Moreover, only the match operations has cost 0, the other operations have cost
- 1. Hence, this is the optimal alignment with cost π π°, π± = 0.
Claim: edit distance is a distance function.
Edit Distance is a Distance Function
10
Edit distance π(π°, π±) is the minimum number of elementary operations to transform π° β Ξ£β into π± β Ξ£β.
Proof: Let π―, π°, π± β Ξ£β. iii. π π°, π± = π(π±, π°) [symmetry] Let π = [πA,H] be the optimal alignment corresponding to π π°, π± , i.e. π is an 2 Γ π matrix where π β {max( π° , π± ), β¦ , π° + π± }. Define the function π π = π such that π is obtained by interchanging the two rows of π. Since the cost of any insertion, deletion and mismatch is 1, we have that alignment π has cost π π°, π± . The existence
- f an alignment from π± to π° with cost less than π π°, π± , yields a contradiction as it
implies that π is not an optimal alignment from π° to π±. Hence, π π±, π° = π π°, π± .
Claim: edit distance is a distance function.
Edit Distance is a Distance Function
11
Edit distance π(π°, π±) is the minimum number of elementary operations to transform π° β Ξ£β into π± β Ξ£β.
Proof: Let π―, π°, π± β Ξ£β. iv. π π°, π± β€ π π°, π― + π(π―, π±) [triangle inequality] Assume for a contradiction that π π°, π± > π π°, π― + π(π―, π±). Let π be the sequence
- f elementary operations for transforming π° into π―. Let πβ² be the sequence of
elementary operations for transforming π― into π±. Note that π π°, π― = |π| and π π―, π± = |πβ²|. Concatenate π and πβ² and remove redundant operations, yielding sequence πβ²β². By definition, πVV β€ π + πV . We can obtain an alignment of π° and π± from πβ²β² with cost πVV β€ π π°, π― + π(π―, π±). This yields a contradiction with π π°, π± > π π°, π― + π(π―, π±) being the cost of the optimal alignment of π° and π±.
Claim: edit distance is a distance function.
Outline
- 1. Edit distance recap
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.8 and 6.9
12
Biological Sequence Alignment
- Weighted edit distance: find
alignment with minimum distance
- Shortest path in weighted
edit graph
- Sequence alignment: find
alignment with maximum similarity
- Longest path in weighted
edit graph
- Score function:
π βΆ Ξ£ βͺ β
Z β β
13
1 2 3 4 O O O O O 1 O O O O O 2 O O O O O 3 O O O O O 4 O O O O O
W A T C G A T G T V
match mismatch insertion deletion
- "
#
$%
- $%
"
#
$% "
#
π(π€A, β) π(β, π₯H) π(π€A, π₯H)
Question: What is an example of π?
Scoring Matrices
14
Transitions: interchanges among purines (two rings) or pyrimidines (one ring)
- A <--> G
- C <--> T
Transversions: interchanges between purines (two rings) and pyrimidines (one ring)
- A <--> C, A <--> T
- G <--> C, G <--> T
Transitions more likely than transversions!
A C G T
Scoring Matrices
15
Transitions: interchanges among purines (two rings) or pyrimidines (one ring)
- A <--> G
- C <--> T
Transversions: interchanges between purines (two rings) and pyrimidines (one ring)
- A <--> C, A <--> T
- G <--> C, G <--> T
Transitions more likely than transversions!
π A T C G
- A
1
- 2
- 2
- 1
- 1
T
- 2
1
- 1
- 2
- 1
C
- 2
- 1
1
- 2
- 1
G
- 1
- 2
- 2
1
- 1
- 1
- 1
- 1
- 1
ββ
Global Alignment β Needleman-Wunsch Algorithm
- An alignment is a source-to-sink path in the edit graph
- An alignment π = [πA,H] is a 2 Γ π matrix s.t. (i) π = {max π, π , β¦ , π + π},
(ii) πA,H β Ξ£ βͺ β and (iii) there is no π β [π] where π_,H = πZ,H = β
16
Global Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find alignment with maximum score. s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0 and j = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0.
deletion insertion match/ mismatch
Demonstration
- http://alfehrest.org/sub/nwa/index.html
- π° = ATGTTAT and π± = ATCGTAC.
17
π A T C G
- A
1
- 2
- 2
- 1
- 1
T
- 2
1
- 1
- 2
- 1
C
- 2
- 1
1
- 2
- 1
G
- 1
- 2
- 2
1
- 1
- 1
- 1
- 1
- 1
ββ
Outline
- 1. Edit distance recap
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.7 and 6.9
- Lecture notes
18
Next Generation Sequencing (NGS) Technology
19 November, 2017
Log Scale 1,000 10,000 100,000,000 10,000,000 1,000,000 100,000
NGS
Allow for inexact matches due to:
- Sequencing errors
- Polymorphisms/mutations in
reference genome
20
NGS Characterized by Short Reads
Genome Millions -billions nucleotides Next-generation DNA sequencing 10-100βs million short reads Short read: 100 nucleotides
β¦ GGTAGTTAG β¦ β¦ TATAATTAG β¦ β¦ AGCCATTAG β¦ β¦ CGTACCTAG β¦ β¦ CATTCAGTAG β¦ β¦ GGTAAACTAG β¦
Allow for inexact matches due to:
- Sequencing errors
- Polymorphisms/mutations in
reference genome
21
NGS Characterized by Short Reads
Genome Millions -billions nucleotides Next-generation DNA sequencing 10-100βs million short reads Short read: 100 nucleotides
β¦ GGTAGTTAG β¦ β¦ TATAATTAG β¦ β¦ AGCCATTAG β¦ β¦ CGTACCTAG β¦ β¦ CATTCAGTAG β¦ β¦ GGTAAACTAG β¦
Question: How to account for discrepancy between lengths of reference and short read? Human reference genome is 3,300,000,000 nucleotides, while a short read is 100 nucleotides. Global sequence alignment will not work!
Fitting Alignment
22
For short read alignment, we want to align complete short read π° β Ξ£` to substring of reference genome π± β Ξ£a. Note that π βͺ π. Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
π° β Ξ£` π± β Ξ£a
Fitting Alignment β Naive Approach
- Consider all contiguous non-empty substrings of π±, defined by 1 β€ π β€ π β€ π
- How many?
23
Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
π° β Ξ£` π± β Ξ£a
Fitting Alignment β Naive Approach
- Consider all contiguous non-empty substrings of π±, defined by 1 β€ π β€ π β€ π
- How many? Answer: π +
a Z
- What are their total lengths?
- What is the running time?
24
Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
π° β Ξ£` π± β Ξ£a
Fitting Alignment β Dynamic Programming
25
Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if i > 0 and j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max{s[m, 0], . . . , s[m, n]}
A G G T A C G G C
π°\π±
Fitting Alignment β Dynamic Programming
26
Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if i > 0 and j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max{s[m, 0], . . . , s[m, n]}
A G G T A C G G C
π°\π±
Start anywhere on first row End anywhere on last row
Fitting Alignment β Dynamic Programming
27
Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π±
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if i > 0 and j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max{s[m, 0], . . . , s[m, n]}
Question: Let match score be 1, mismatch/indel score be -1. What is π‘β? Question: Same scores. What is optimal global alignment and score? A G G T A C G G C
- A
- G
G
- T
A C G G C
π° π±
π°\π±
Start anywhere on first row End anywhere on last row
Fitting Alignment β Dynamic Programming
- Online:
https://valiec.github.io/AlignmentVisualizer/index.html
28
Question: Let match score be 1, mismatch/indel score be -1. What is π‘β? Question: Same scores. What is optimal global alignment and score? A G G T A C G G C
- A
- G
G
- T
A C G G C
π° π±
π°\π±
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if i > 0 and j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max{s[m, 0], . . . , s[m, n]}
Outline
- 1. Edit distance
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.8 and 6.9
- Lecture notes
29
Local Alignment β Biological Motivation
30
ABL1 SHKA
From Pfam database (http://pfam.sanger.ac.uk/)
Proteins are composed of functional units called domains. Such domains may occur in different proteins even across species.
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring of π° and a substring of π± whose alignment has maximum global alignment score π‘β among all global alignments of all substrings of π° and π±
Global, Fitting and Local Alignment
31
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring of π° and a substring of π± whose alignment has maximum global alignment score π‘β among all global alignments of all substrings of π° and π± Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π± Global Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find alignment of π° and π± with maximum score.
Local Alignment β Naive Algorithm
Brute force:
- 1. Generate all pairs (π°V, π±V) of substrings of π° and π±
- 2. For each pair (π°V, π±V), solve global alignment problem.
32
Question: There are `
Z a Z pairs of substrings.
But they have different lengths. What is the running time?
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring of π° and a substring of π± whose alignment has maximum global alignment score π‘β among all global alignments of all substrings of π° and π±
Key Idea
33
Local alignment:
- Start and end anywhere
Global alignment:
- Start at (0,0) and end at (π, π)
Local Alignment Recurrence
34
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0 and j = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max
i,j s[i, j]
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring
- f π° and a substring of π± whose alignment has
maximum global alignment score π‘β among all global alignments of all substrings of π° and π±
Local Alignment Recurrence
35
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0 and j = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max
i,j s[i, j]
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring
- f π° and a substring of π± whose alignment has
maximum global alignment score π‘β among all global alignments of all substrings of π° and π±
Start anywhere End anywhere
Running time: π(ππ)
Local Alignment β Dynamic Programming
- Online:
https://valiec.github.io/AlignmentVisualizer/index.html
36
Question: Let match score be 2, mismatch score be -2 and indel be -4. What is π‘β? A G G T A C G G C G G G G
π° π±
π°\π±
s[i, j] = max ο£±    ο£²    ο£³ 0, if i = 0 and j = 0, s[i β 1, j] + Ξ΄(vi, β), if i > 0, s[i, j β 1] + Ξ΄(β, wj), if j > 0, s[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0. sβ = max
i,j s[i, j]
Global, Fitting and Local Alignment
37
Local Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find a substring of π° and a substring of π± whose alignment has maximum global alignment score π‘β among all global alignments of all substrings of π° and π± Fitting Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find an alignment of π° and a substring of π± with maximum global alignment score π‘β among all global alignments of π° and all substrings of π± Global Alignment problem: Given strings π° β Ξ£` and π± β Ξ£a and scoring function π, find alignment of π° and π± with maximum score.
Outline
- 1. Edit distance
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.8 and 6.9
- Lecture notes
38
Scoring Gaps
39
Let π° = AAC and π± = ACAGGC Match π π, π = 1; Mismatch π π, π = β1 (where π β π); Indel π π, β = π β, π = β2 Both alignments have 3 matches and 2 indels. Score: 3 β 1 + 2 β β2 = β1
A
- A
C A C A A C
π° π±
A
- A
- C
A C A A C
π° π±
Scoring Gaps
40
Let π° = AAC and π± = ACAGGC Match π π, π = 1; Mismatch π π, π = β1 (where π β π); Indel π π, β = π β, π = β2 Question: Which alignment is better?
A
- A
C A C A A C
π° π±
A
- A
- C
A C A A C
π° π± Both alignments have 3 matches and 2 indels. Score: 3 β 1 + 2 β β2 = β1
Scoring Gaps β Affine Gap Penalties
41
Desired: Lower penalty for consecutive gaps than interspersed gaps. Why: Consecutive gaps are more likely due to slippage errors in DNA replication (2-3 nucleotides), codons for protein sequences, etc.
A
- A
C A C A A C
π° π±
A
- A
- C
A C A A C
π° π±
Scoring Gaps β Affine Gap Penalties
42
Desired: Lower penalty for consecutive gaps than interspersed gaps. Why: Consecutive gaps are more likely due to slippage errors in DNA replication (2-3 nucleotides), codons for protein sequences, etc. Affine gap penalty: Two penalties: (i) gap open penalty π β₯ 0 and (ii) gap extension penalty π β₯ 0. Stretch of π consecutive gaps has score β(π + ππ).
A
- A
C A C A A C
π° π±
A
- A
- C
A C A A C
π° π±
Scoring Gaps β Affine Gap Penalties
43
Desired: Lower penalty for consecutive gaps than interspersed gaps. Why: Consecutive gaps are more likely due to slippage errors in DNA replication (2-3 nucleotides), codons for protein sequences, etc. Affine gap penalty: Two penalties: (i) gap open penalty π β₯ 0 and (ii) gap extension penalty π β₯ 0. Stretch of π consecutive gaps has score β(π + ππ). Let π = 10 and π = 1. Left: 3 β 1 β 10 + 1 β 2 = β9. Right: 3 β 1 β (10 + 1 β 1) β 10 + 1 β 1 = β19.
A
- A
C A C A A C
π° π±
A
- A
- C
A C A A C
π° π±
Affine Gap Penalty Alignment β Naive Approach
44
Idea: Insert horizontal (deletion) and vertical (insertion) edges spanning π > 1 gaps with score β (π + ππ).
new edges
- ld edges
Affine gap penalty: Two penalties: (i) gap open penalty π β₯ 0 and (ii) gap extension penalty π β₯ 0. Stretch of π consecutive gaps has score β(π + ππ).
... ... ... ... ... ... ... ... ...
Affine Gap Penalty Alignment β Naive Approach
45
Idea: Insert horizontal (deletion) and vertical (insertion) edges spanning π > 1 gaps with score β (π + ππ).
new edges
- ld edges
Question: Whatβs the running time? Question: Whatβs the recurrence? Affine gap penalty: Two penalties: (i) gap open penalty π β₯ 0 and (ii) gap extension penalty π β₯ 0. Stretch of π consecutive gaps has score β(π + ππ).
... ... ... ... ... ... ... ... ...
46
Affine Gap Penalty Alignment
Idea: Three separate recurrences: (i) Gap in first sequence π‘β π, π (ii) Match/mismatch π‘β[π, π] (iii) Gap in second sequence π‘β[π, π]
47
Affine Gap Penalty Alignment
Idea: Three separate recurrences: (i) Gap in first sequence π‘β π, π (ii) Match/mismatch π‘β[π, π] (iii) Gap in second sequence π‘β[π, π]
s![i, j] = max ( s![i, j β 1] β Ο, if j > 1, s&[i, j β 1] β (Ο + Ο), if j > 0, s&[i, j] = max 8 > > > < > > > : 0, if i = 0 and j = 0, s![i, j], if j > 0, s#[i, j], if i > 0, s&[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0, s#[i, j] = max ( s#[i β 1, j] β Ο, if i > 1, s&[i β 1, j] β (Ο + Ο), if i > 0.
48
Affine Gap Penalty Alignment
Idea: Three separate recurrences: (i) Gap in first sequence π‘β π, π (ii) Match/mismatch π‘β[π, π] (iii) Gap in second sequence π‘β[π, π] Running time: π(ππ)
s![i, j] = max ( s![i, j β 1] β Ο, if j > 1, s&[i, j β 1] β (Ο + Ο), if j > 0, s&[i, j] = max 8 > > > < > > > : 0, if i = 0 and j = 0, s![i, j], if j > 0, s#[i, j], if i > 0, s&[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0, s#[i, j] = max ( s#[i β 1, j] β Ο, if i > 1, s&[i β 1, j] β (Ο + Ο), if i > 0.
Affine Gap Penalty Alignment β Example
49
π° = AAC π± = ACAAC
Let π = 10 and π = 1. Match = 1. Mismatch = -1
s![i, j] = max ( s![i, j β 1] β Ο, if j > 1, s&[i, j β 1] β (Ο + Ο), if j > 0, s&[i, j] = max 8 > > > < > > > : 0, if i = 0 and j = 0, s![i, j], if j > 0, s#[i, j], if i > 0, s&[i β 1, j β 1] + Ξ΄(vi, wj), if i > 0 and j > 0, s#[i, j] = max ( s#[i β 1, j] β Ο, if i > 1, s&[i β 1, j] β (Ο + Ο), if i > 0.
Gapped Alignment β Additional Insights
- Naive approach supports arbitrary gap penalties given two
sequences π° β Ξ£` and π± β Ξ£a. This results in an π(ππ π + π ) algorithm.
- Alignment with convex gap
penalties given two sequences π° β Ξ£` and π± β Ξ£a can be computed in π(ππ log π) time.
See: Dan Gusfield. 1997. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, New York, NY, USA.
50
Take Home Messages
- 1. Edit distance
- 2. Global alignment
- 3. Fitting alignment
- 4. Local alignment
- 5. Gapped alignment
Reading:
- Jones and Pevzner. Chapters 6.6, 6.8 and 6.9
- Lecture notes
51