SLIDE 1
Global and local alignments
SLIDE 2 Global vs. local alignments
- Global: align all nucleotides
- Local: align subsequences with best score
global alignment: GCAT GC-T Align these sequences: GCAT, GCT (match = 1, mismatch = -1, gap = -1) local alignment:
?
SLIDE 3 We can make local alignments using the Smith-Waterman algorithm
Like Needleman-Wunsch, with 2 changes:
- Don't allow negative scores, set them to 0
- Backtrack from cell with highest score, stop at 0
SLIDE 4 We can make local alignments using the Smith-Waterman algorithm
Like Needleman-Wunsch, with 2 changes:
- Don't allow negative scores, set them to 0
- Backtrack from cell with highest score, stop at 0
- G
C A T
G
1
C
2 1 T
1 1 2 Needleman-Wunsch
GCAT GC-T
SLIDE 5 We can make local alignments using the Smith-Waterman algorithm
Like Needleman-Wunsch, with 2 changes:
- Don't allow negative scores, set them to 0
- Backtrack from cell with highest score, stop at 0
- G
C A T
G
1
C
2 1 T
1 1 2
C A T
1 C 2 1 T 1 1 2 Needleman-Wunsch Smith-Waterman
GCAT GC-T GC GC
SLIDE 6 We can make local alignments using the Smith-Waterman algorithm
Like Needleman-Wunsch, with 2 changes:
- Don't allow negative scores, set them to 0
- Backtrack from cell with highest score, stop at 0
- G
C A T
G
1
C
2 1 T
1 1 2
C A T
1 C 2 1 T 1 1 2 Needleman-Wunsch Smith-Waterman
GCAT GC-T GC GC
GCAT GC-T
SLIDE 7
Smith-Waterman algorithm, mathematical form
M(i, j)=max M(i −1, j)+p M(i, j −1)+p M(i −1, j −1)+s(aj,bi) ⎛ ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ M(0, j)= 0
first row
M(i,0)= 0
first column top left diagonal s(aj, bi) = match/mismatch score for sites j and i in sequences a and b
SLIDE 8
BLAST (Basic Local Alignment Search Tool)
SLIDE 9
BLAST is the primary method to find sequences in modern sequence data bases
SLIDE 10 Image from: http://www.ncbi.nlm.nih.gov/books/NBK62051/
SLIDE 11 Primary BLAST quality metric: E value
The Expectation value or E value represents the number
- f different alignments with scores equivalent to or
better than the one observed that are expected to occur in a database search by chance. The lower the E value, the more significant the score and the alignment.
SLIDE 12
Anatomy of a BLAST result
SLIDE 13
Anatomy of a BLAST result
sequence we found (subject sequence)
SLIDE 14
Anatomy of a BLAST result
E value
SLIDE 15
Anatomy of a BLAST result
number and % of exact matches, near matches, and no matches
SLIDE 16
Anatomy of a BLAST result
number and % of exact matches, near matches, and no matches exact match
SLIDE 17
Anatomy of a BLAST result
number and % of exact matches, near matches, and no matches near match (positive)
SLIDE 18
Anatomy of a BLAST result
number and % of exact matches, near matches, and no matches no match
SLIDE 19
Anatomy of a BLAST result