SLIDE 1
Sequence alignments
SLIDE 2 Genetic sequences change over time
time
LRGGD LRCD mutation ARCD mutation
Relationship between original and final sequence: LRGGD AR-CD LRGGD ARC-D
LRGD deletion
SLIDE 3
In practice: we only know sequences from extant organisms
ancestor human LRGDDC mouse LGDCC
SLIDE 4 We need to align these sequences to compare them
human LRGDDC mouse LGDCC LRGDDC L-GDCC LRGDDC- L-GD-CC LRGDDC
Which alignment is correct?
SLIDE 5 We need to score the alignment
Example:
- match = +1
- mismatch = -1
- gap = 0
LRGDDC L-GDCC score = 1+0+1+1-1+1 = 3 LRGDDC- L-GD-CC score = 1+0+1+1+0+1+0 = 4 LRGDDC
score = 0-1+1+1-1+1 = 1
SLIDE 6 We need to score the alignment
Example:
- match = +1
- mismatch = -1
- gap = -2
LRGDDC L-GDCC score = 1-2+1+1-1+1 = 1 LRGDDC- L-GD-CC score = 1-2+1+1-2+1-2 = -2 LRGDDC
score = -2-1+1+1-1+1 = -1
SLIDE 7 We often score by amino-acid similarity
http://commons.wikimedia.org/wiki/File:BLOSUM62.gif
BLOSUM62 Matrix
score = log pij pipj
SLIDE 8
Gaps in alignments are called “indels”
LRGDDC L-GDCC indel
SLIDE 9 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
A T
SLIDE 10 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
A T Alignment:
SLIDE 11 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G A T Alignment:
SLIDE 12 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G A T Alignment:
SLIDE 13 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G A T Alignment:
SLIDE 14 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
A T Alignment:
SLIDE 15 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
A
T
Alignment:
SLIDE 16 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
? A
T
SLIDE 17 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
A
T
Alignment:
SLIDE 18 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
A
T
Alignment:
SLIDE 19 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 20 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 21 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 22 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 23 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 24 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1 A
T
Alignment:
SLIDE 25 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1
A
1 T
2
SLIDE 26 How do we find the best alignment given a scoring system?
Global alignment: Needleman-Wunsch algorithm Example: align GCAT and GAT Scoring: match = 1, mismatch = -1, gap = -1
C A T
G
1
A
1 T
2 Alignment:
SLIDE 27
Needleman-Wunsch algorithm, mathematical form
M(i, j)=max M(i −1, j)+p M(i, j −1)+p M(i −1, j −1)+s(aj,bi) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ M(0, j)= j×p
first row, p = gap penalty
M(i,0)= i ×p
first column top left diagonal s(aj, bi) = match/mismatch score for sites j and i in sequences a and b
SLIDE 28 Now try on your own
Align ATGCT and ATTACA Scoring: match = 1, mismatch = -1, gap = -1
T T A C A
T G C T
SLIDE 29
Multiple sequence alignment (MSA)
SLIDE 30 Software to generate MSAs
(very good, very fast) http://mafft.cbrc.jp/alignment/software/
(very good, very fast) http://www.ebi.ac.uk/Tools/msa/clustalo/
(extremely good, very slow) http://wasabiapp.org/software/prank/