Sequence comparison: Dynamic programming
Genome 559: Introduction to Statistical and Computational Genomics
- Prof. James H. Thomas
http://faculty.washington.edu/jht/GS559_2013/
Sequence comparison: Dynamic programming Genome 559: Introduction - - PowerPoint PPT Presentation
Sequence comparison: Dynamic programming Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas http://faculty.washington.edu/jht/GS559_2013/ Sequence comparison overview Problem: Find the best
http://faculty.washington.edu/jht/GS559_2013/
2
5 2.5x102 10 1.8x105 20 1.4x1011 30 1.2x1017 40 1.1x1023
FYI for two sequences of length m and n, possible alignments number:
2
( )! min( , ) (min( , )!) mn mn m n m n
2n choose n - the binomial coefficient
A C G T A 10
C
10
G
10
T
10
Value at (i,j) will be the score of the best alignment of the first i characters
characters of the other sequence.
Moving horizontally in the matrix introduces a gap in the sequence along the left edge.
A C G T A 10
C
10
G
10
T
10
Moving vertically in the matrix introduces a gap in the sequence along the top edge.
A C G T A 10
C
10
G
10
T
10
Moving diagonally in the matrix aligns two residues
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
Start at top left and move progressively
C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
Then simply repeat the same rule progressively across the matrix
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
What is the alignment associated with this entry?
A C G T A 10
C
10
G
10
T
10
(just follow the arrows back - this is called the traceback)
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
Continue and find the optimal global alignment, and its score.
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
A C G T A 10
C
10
G
10
T
10
j i
j i y
take the max of these three
A C G T A 10
C
10
G
10
T
10
Practice problem: find a best pairwise alignment of GAATC and AATTC
d = -4