Bioinformatics Algorithms
(Fundamental Algorithms, module 2)
Zsuzsanna Lipt´ ak
Masters in Medical Bioinformatics academic year 2018/19, II. semester
Bioinformatics Algorithms (Fundamental Algorithms, module 2) - - PowerPoint PPT Presentation
Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in Medical Bioinformatics academic year 2018/19, II. semester Pairwise Alignment 2 Semiglobal Alignment 2 / 17 Semiglobal alignment match: 1,
(Fundamental Algorithms, module 2)
Zsuzsanna Lipt´ ak
Masters in Medical Bioinformatics academic year 2018/19, II. semester
2 / 17
match: 1, mismatch: -1, gap: -1
CAGCGTACACT
score −5 CAGCGTACACT C--C-T--A-- score −3
3 / 17
match: 1, mismatch: -1, gap: -1
CAGCGTACACT
score −5 CAGCGTACACT C--C-T--A-- score −3
3 / 17
match: 1, mismatch: -1, gap: -1
CAGCGTACACT
score −5 CAGCGTACACT C--C-T--A-- score −3
not to count at all.
3 / 17
match: 1, mismatch: -1, gap: -1
CAGCGTACACT
score −5 CAGCGTACACT C--C-T--A-- score −3
not to count at all.
3 / 17
match: 1, mismatch: -1, gap: -1
If we do not count the extremal gaps, then we get: CAGCGTACACT
score 2 CAGCGTACACT C--C-T--A-- score −1 . . . as desired, the score now reflects that the left alignment is better than the right one.
4 / 17
gaps matched here should be free action beginning of s 0s in first column end of s maximize over last column beginning of t 0s in first row end of t maximize over last row
5 / 17
gaps matched here should be free action beginning of s 0s in first column end of s maximize over last column beginning of t 0s in first row end of t maximize over last row
Analysis
time and space O(nm)
5 / 17
The global similarity of the two strings s = ACGC and t = GCTC is 0, with (unique)
ACGC
GCTC
where we set all four types of external gaps as free, and match: +1, mism., gap = -1. D(i, j) G C T C 1 2 3 4 A 1 −1 −1 −1 −1 C 2 −1 −1 G 3 1 −1 −1 C 4 2 1
semiglobal alignment: ACGC--
score = 2
6 / 17
N.B.
7 / 17
N.B.
depending on where we want to have charge-free gaps (e.g. beginning and end of first sequence; beginning of first, end of second; etc.)
7 / 17
N.B.
depending on where we want to have charge-free gaps (e.g. beginning and end of first sequence; beginning of first, end of second; etc.)
Applications include:
need?
7 / 17
N.B.
depending on where we want to have charge-free gaps (e.g. beginning and end of first sequence; beginning of first, end of second; etc.)
Applications include:
need?
s′ of s and suffix t′ of t s.t. sim(s′, t′) maximal, or vice versa (prefix
7 / 17
N.B.
depending on where we want to have charge-free gaps (e.g. beginning and end of first sequence; beginning of first, end of second; etc.)
Applications include:
need?
s′ of s and suffix t′ of t s.t. sim(s′, t′) maximal, or vice versa (prefix
maximal - which variant do we need?
7 / 17
8 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
9 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
then the first alignment has only one long gap, while the second has 3.
9 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
then the first alignment has only one long gap, while the second has 3.
event happened (insertion or deletion of a stretch of DNA).
9 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
then the first alignment has only one long gap, while the second has 3.
event happened (insertion or deletion of a stretch of DNA).
9 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
then the first alignment has only one long gap, while the second has 3.
event happened (insertion or deletion of a stretch of DNA).
have higher score.
9 / 17
match: 2, mismatch: -1, gap: -1
GACGCTGCCAC GACGCTGCCAC
then the first alignment has only one long gap, while the second has 3.
event happened (insertion or deletion of a stretch of DNA).
have higher score.
9 / 17
individual gaps.
10 / 17
individual gaps.
Affine gap functions:
continuing one)
10 / 17
match: 2, mismatch: -1, gaps: h = −3, g = −1
GACGCTGCCAC GACGCTGCCAC
score = −8 score = −14
11 / 17
match: 2, mismatch: -1, gaps: h = −3, g = −1
GACGCTGCCAC GACGCTGCCAC
score = −8 score = −14
11 / 17
match: 2, mismatch: -1, gaps: h = −3, g = −1
GACGCTGCCAC GACGCTGCCAC
score = −8 score = −14
11 / 17
Recall the central idea of the DP-algorithm:
12 / 17
Recall the central idea of the DP-algorithm: If A is an alignment and B is the same al. without the last column, then
∗
∗
∗
−
−
∗
12 / 17
Recall the central idea of the DP-algorithm: If A is an alignment and B is the same al. without the last column, then
∗
∗
∗
−
−
∗
The problem now is that in cases 2. and 3., the score of the last column depends on what comes before! E.g. with h = −3, g = −1, the score of A
−
∗
−
12 / 17
alignment without last column), according to what type its last column is.
13 / 17
alignment without last column), according to what type its last column is.
j-length prefix of t ending with si
tj
j-length prefix of t ending with −
tj
j-length prefix of t ending with si
−
alignment without last column), according to what type its last column is.
j-length prefix of t ending with si
tj
j-length prefix of t ending with −
tj
j-length prefix of t ending with si
−
13 / 17
Matrix A: Score of last column does not depend on alignment B
∗
∗
14 / 17
Matrix A: Score of last column does not depend on alignment B
∗
∗
Computation of entries:
A(0, 0) = 0 (this is necessary for the recursion)
A(i − 1, j − 1) + f (si, tj) B(i − 1, j − 1) + f (si, tj) C(i − 1, j − 1) + f (si, tj)
14 / 17
Matrix B: Score of last column depends on B
−
∗
B(i, j) = max
best al. of types A or C + start a new gap
15 / 17
Matrix B: Score of last column depends on B
−
∗
B(i, j) = max
best al. of types A or C + start a new gap
Computation of entries:
A(i, j − 1) + (h + g) B(i, j − 1) + g C(i, j − 1) + (h + g)
15 / 17
Matrix C: Score of last column depends on B
∗
−
C(i, j) = max
best al. of types A or B + start a new gap
16 / 17
Matrix C: Score of last column depends on B
∗
−
C(i, j) = max
best al. of types A or B + start a new gap
Computation of entries:
A(i − 1, j) + (h + g) B(i − 1, j) + (h + g) C(i − 1, j) + g
16 / 17
3(n + 1)(m + 1) = O(nm) entries, so altogether O(nm).
Time: O(length of optimal alignment) = O(n + m)
17 / 17
3(n + 1)(m + 1) = O(nm) entries, so altogether O(nm).
Time: O(length of optimal alignment) = O(n + m)
algorithm.
time and space by a factor of 3.
17 / 17
3(n + 1)(m + 1) = O(nm) entries, so altogether O(nm).
Time: O(length of optimal alignment) = O(n + m)
algorithm.
time and space by a factor of 3.
linear gap penalties, and they are universally applied. (All alignment programs use affine gap functions.)
17 / 17