Introduction to bioinformatics, Autumn 2007 41
Sequence alignment
l
Alignment specifies which positions in two sequences match
acgtctag |||||
- actctag
Sequence alignment Alignment specifies which positions in two - - PowerPoint PPT Presentation
Sequence alignment Alignment specifies which positions in two sequences l match acgtctag acgtctag acgtctag || ||||| || ||||| actctag- -actctag ac-tctag 2 matches 5 matches 7 matches 5 mismatches 2 mismatches 0 mismatches 1 not
Introduction to bioinformatics, Autumn 2007 41
l
Introduction to bioinformatics, Autumn 2007 42
l
− We can’t tell whether the ancestor sequence had a base or
Introduction to bioinformatics, Autumn 2007 43
l
l
l
l
Introduction to bioinformatics, Autumn 2007 44
l
l
l
l
Introduction to bioinformatics, Autumn 2007 45
l
l
l
− Identity (match) +1 − Substitution (mismatch) -µ − Indel
S(WHAT/WH-Y) = 1 + 1 – – µ
Introduction to bioinformatics, Autumn 2007 46
l
l
l
Introduction to bioinformatics, Autumn 2007 47
l
l
l
− 50, 20, 5 and 2 cents
l
Introduction to bioinformatics, Autumn 2007 48
50 20 5
Introduction to bioinformatics, Autumn 2007 49
l
− Example: solve the problem for 9 cents with available coins
l
l
Introduction to bioinformatics, Autumn 2007 50
Amount of change left
l
l
Introduction to bioinformatics, Autumn 2007 51
Introduction to bioinformatics, Autumn 2007 52
2--µ 2-
Introduction to bioinformatics, Autumn 2007 53
Case 2 Case 3
Introduction to bioinformatics, Autumn 2007 54
Case 2 Case 3
Introduction to bioinformatics, Autumn 2007 55
l Any alignment can be written
l Score for aligning A and B up
Introduction to bioinformatics, Autumn 2007 56
l
− Case 1: (a1a2…ai-1) ai
− Case 2: (a1a2…ai-1) ai
− Case 3: (a1a2…ai) –
Introduction to bioinformatics, Autumn 2007 57
l
− Case 1: (a1a2…ai-1) ai
− Case 2: (a1a2…ai-1) ai
− Case 3: (a1a2…ai) –
Introduction to bioinformatics, Autumn 2007 58
Introduction to bioinformatics, Autumn 2007 59
I nput sequences A, B, n = | A|, m = |B| Set Si,0 := -i f or all i Set S0,j := -j f or all j f or i := 1 t o n f or j := 1 t o m Si,j := max{Si-1,j – , Si-1,j -1 + s(ai,bj), Si,j-1 – } end end
Introduction to bioinformatics, Autumn 2007 60
Introduction to bioinformatics, Autumn 2007 61
Introduction to bioinformatics, Autumn 2007 62
l
l
l
l
Introduction to bioinformatics, Autumn 2007 63
Human bone morphogenic protein receptor type II precursor (left) has a 300 aa region that resembles 291 aa region in TGF- receptor (right). The shared function here is protein kinase.
Introduction to bioinformatics, Autumn 2007 64
Introduction to bioinformatics, Autumn 2007 65
l
− Look for the highest-scoring path in the alignment matrix
− Allow preceding and trailing indels without penalty
Introduction to bioinformatics, Autumn 2007 66
Introduction to bioinformatics, Autumn 2007 67
Introduction to bioinformatics, Autumn 2007 68
Introduction to bioinformatics, Autumn 2007 69
Introduction to bioinformatics, Autumn 2007 70
G 8 G 7 A 6 A 5 T 4 C 3 C 2 A 1
C T A A C T C G G
9 8 7 6 5 4 3 2 1
Introduction to bioinformatics, Autumn 2007 71
2 4 3 2 1 2 4 2 G 8 1 3 5 4 3 2 2 G 7 3 2 4 6 5 1 A 6 3 1 1 3 4 3 2 A 5 2 1 2 1 2 4 T 4 1 3 1 2 1 2 C 3 2 1 1 2 2 C 2 2 2 2 A 1
C T A A C T C G G
9 8 7 6 5 4 3 2 1
Introduction to bioinformatics, Autumn 2007 72
l
l
− use non-uniform mismatch
Introduction to bioinformatics, Autumn 2007 73
l
l
l
− In coding regions, insertions or deletions of codons may
Introduction to bioinformatics, Autumn 2007 74
l
l
l
Introduction to bioinformatics, Autumn 2007 75
l
l
l
l
Introduction to bioinformatics, Autumn 2007 76
l
l
l
l
Introduction to bioinformatics, Autumn 2007 77
– Orthologous sequences from different organisms – Paralogs from multiple duplications
Introduction to bioinformatics, Autumn 2007 78
l
l
l
l
l
Introduction to bioinformatics, Autumn 2007 79
l
l
− Choose two sequences and align them − Choose third sequence w.r.t. two previous sequences and
− Repeat until all sequences have been aligned − Different options how to choose sequences and score
Introduction to bioinformatics, Autumn 2007 80
l
− Construct a distance matrix of all pairs of sequences using
− Progressively align pairs in order of decreasing similarity − CLUSTALW uses various heuristics to contribute to
Introduction to bioinformatics, Autumn 2007 81
l
l
l
Introduction to bioinformatics, Autumn 2007 82
l