Intro to Alignment Algorithms: Global and Local
Algorithmic Functions of Computational Biology Professor Istrail
Intro to Alignment Algorithms: Global and Local Algorithmic - - PowerPoint PPT Presentation
Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail Algorithmic Functions of Computational Biology Professor Istrail Sequence Comparison Biomolecular sequences DNA sequences
Algorithmic Functions of Computational Biology Professor Istrail
Biomolecular sequences
□
DNA sequences (string over 4 letter alphabet {A, C, G, T})
□
RNA sequences (string over 4 letter alphabet {ACGU})
□
Protein sequences (string over 20 letter alphabet {Amino Acids}) Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins.
Algorithmic Functions of Computational Biology Professor Istrail
Global Similarity
Algorithmic Functions of Computational Biology – Professor Istrail
Output: an alignment of the two sequences of maximum score Example:
□
GCGCATTTGAGCGA
□
TGCGTTAGGGTGACCA A possible alignment:
TGCG - - TTAGGGTGACC
match mismatch indel
Algorithmic Functions of Computational Biology – Professor Istrail
CSCI2820 - Class 4
5
mismatch deletion (in template) insertion (in template)
m n
... 2 1 ... 2 1
Consider two sequences Over the alphabet
} , , , T G C A { = Σ
j i y
belong to Σ
Algorithmic Functions of Computational Biology – Professor Istrail
Unit-score A C G T A C G T 1 1 1 1
Professor Istrail
ACG | | | AGG
Score = (A,A) (C,G) (G,G) + + = 1 + 0 + 1 = 2 Unit-cost A | A A is aligned with A C | G C is aligned with G G is aligned with G G | G
Algorithmic Functions of Computational Biology – Professor Istrail
ACATGGAAT ACAGGAAAT ACAT GG - AAT ACA - GG AAAT OPTIMAL ALIGNMENTS SCORE 7 8 AAAGGG GGGAAA SCORE 3
GGGAAA - - -
“-” is the gap symbol
Algorithmic Functions of Computational Biology – Professor Istrail
(-,y) = the score for aligning - with y (x,-) = the score for aligning x with -
Algorithmic Functions of Computational Biology – Professor Istrail
A-CG - G ATCGTG Alignment Score
(A,A) + (G,G) + (C,C) + (-,T) + (-,T ) + (G,G) THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS
Algorithmic Functions of Computational Biology – Professor Istrail
ARTEMIS Summer 2008 Professor Istrail
The Mother & Father of Bioinformatics
ARTEMIS Summer 2008 Professor Istrail
Dayhoff PAM scoring matrices
PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL Partial alignment for Monkey and Trout somatotropin proteins
A R N D 6 4
Algorithmic Functions of Computational Biology – Professor Istrail
Scoring function = a sum of a terms each for a pair of
aligned residues, and for each gap The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated Identities and conservative substitutions are Positive terms Non-conservative substitutions are Negative terms
Algorithmic Functions of Computational Biology – Professor Istrail
CSCI2820 - Class 4
n respectively and a similarity matrix
and Y
– Global alignments require all bases in both sequences are aligned
16
CSCI2820 - Class 4
n respectively and a similarity matrix
and Y
– Local alignments do not require using all bases in either sequence in the alignment
17
Suppose that we want to align AGT with AT We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph. This is the Edit Graph
Algorithmic Functions of Computational Biology – Professor Istrail
1
2 3 1 2
AGT has length 3 AT has length 2 The Edit graph has (3+1)*(2+1) nodes The sequence AGT The sequence AT
Algorithmic Functions of Computational Biology – Professor Istrail
1 2 3 1 2
A G T A T
AGT indexes the columns, and AT indexes the rows of this “table”
Begin End
Algorithmic Functions of Computational Biology – Professor Istrail
1 2 3 1 2
A G T A T
Begin End
The Graph is directed. The nodes (i,j) will hold values.
Algorithmic Functions of Computational Biology – Professor Istrail
1 1 2
A G A T
T
1 2 3 1 2
A G A T
Begin End
Algorithmic Functions of Computational Biology Professor Istrail
T A T
1 2 3 1 2
A G
A
A A
G
A T G T T T G A T A
Directed edges get as labels pairs of aligned letters.
Begin End
Algorithmic Functions of Computational Biology – Professor Istrail
T A T
1 2 3 1 2
A G
A
A A
G
A T G T T T G A T A
AGT A-T
Begin End
Every path from Begin to End corresponds to an alignment Every alignment corresponds to a path between Begin and End
Algorithmic Functions of Computational Biology – Professor Istrail
The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems
Algorithmic Functions of Computational Biology – Professor Istrail
Part 1: Compute first the optimal alignment score Part 2: Construct optimal alignment We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex Given: Two sequences X and Y Find: An optimal alignment of X with Y
Algorithmic Functions of Computational Biology – Professor Istrail
1 2 3 1 2
A G T A T S(2,1) S(1,0)
Algorithmic Functions of Computational Biology – Professor Istrail
Matrix S =[S(i,j)] S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j) (i,j) (i,j-1) (i-1,j) (i-1,j-1) The optimal path to (i,j) must pass through one of the vertices (i-1,j-1) (i-1,j) (i,j-1)
O p t i m a l P a t h t
i , j )
Algorithmic Functions of Computational Biology – Professor Istrail
(i,j) (i,j-1) (i-1,j) (i-1,j-1) Optimal path to (i-1,j) + (- , yj)
yj
(- , yj)
Algorithmic Functions of Computational Biology – Professor Istrail
(i,j) (i-1,j) (i,j-1) (i-1,j-1) Optimal path to (i-1,j-1) + (xi,yj)
S(i-1,j-1) + (xi , yj)
Algorithmic Functions of Computational Biology – Professor Istrail
(i,j) (i,j-1) (I-1,j) (i-1,j-1) Optimal path to (i,j-1) + (xi,-)
S(i,j-1) + (xi, -)
Algorithmic Functions of Computational Biology – Professor Istrail
S(i,j) = S(i-1, j-1) + (xi, yj) S(i-1, j) + (xi, -) S(i, j-1) + (-, yj)
MAX
Algorithmic Functions of Computational Biology – Professor Istrail
T A T
1 2 3 1 2
A G
A
A A
G
A T G T T T G A T A
1 1 1 1 1 2
AGT A - T Optimal Alignment
Algorithmic Functions of Computational Biology – Professor Istrail
S(i,j) = S(i-1, j-1) + (xi, yj), S(i-1, j) + (xi, -), S(i, j-1) + (-, yj)
MAX
0,
We add this
Algorithmic Functions of Computational Biology – Professor Istrail
CSCI2820 - Class 4
35
X = hlsek Y = nlsak
from the BRCA2 (early onset) protein in human and chimpanzee
sequences being compared represent a similar biological sequence
CSCI2820 - Class 4
Margaret Dayhoff’s PAM 100 similarity matrix (partial)
36
A N E H L K S * A 4
1
N
5 1 2
1 1
E 1 5
H
2
7
L
6
K
1
5
S 1 1
4
*
1
CSCI2820 - Class 4 37
h l s e k n l s a k
X Y
2
12 3
3 12 3
8
3 17