Multiple Sequence Alignment
based on Ch. 6 from Biological Sequence Analysis by R. Durbin et al., 1998
Acknowledgements: M.Sc. student Diana Popovici M.Sc. student Oana R˘ at ¸oi
[ MHC class I with peptide ] MHC = Major Histocompatibility Complex
0.
Multiple Sequence Alignment based on Ch. 6 from Biological Sequence - - PowerPoint PPT Presentation
0. Multiple Sequence Alignment based on Ch. 6 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. student Diana Popovici M.Sc. student Oana R at oi [ MHC class I with peptide ] MHC = Major
[ MHC class I with peptide ] MHC = Major Histocompatibility Complex
0.
1.
2.
3.
4.
At the top: α-helices (A-H). At the bottom: highly conservative residues (uppercase let- ter), medium (lowercase letter), or low (dot).
the two highly conserved histidines (H): they interact with the
heme group in the globine active side.
Helix AAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBBCCCCCCCCCCC HBA_HUMAN
HBA_HUMAN
MYG_PHYCA
GLB3_CHITP ----------LSADQISTVQASFDKVKG------DPVGILYAVFKADPSIMAKFTQF GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYS--TYETSGVDILVKFFTSTPAAQEFFPKF LGB2_LUPLU --------GALTESQAALVKSSWEEFNA--NIPKHTHRFFILVLEIAPAAKDLFS-F GLB1_GLYDI ---------GLSAAQRQVIAATWKDIAGADNGAGVGKDCLIKFLSAHPQMAAVFG-F Consensus Ls.... v a W kv . . g . L.. f . P . F F Helix DDDDDDDEEEEEEEEEEEEEEEEEEEEE FFFFFFFFFFFF HBA_HUMAN
HBA_HUMAN GDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHL---D--NLKGTFATLSELHCDKL- MYG_PHYCA KHLKTEAEMKASEDLKKHGVTVLTALGAILKK----K-GHHEAELKPLAQSHATKH- GLB3_CHITP AG-KDLESIKGTAPFETHANRIVGFFSKIIGEL--P---NIEADVNTFVASHKPRG- GLB5_PETMA KGLTTADQLKKSADVRWHAERIINAVNDAVASM--DDTEKMSMKLRDLSGKHAKSF- LGB2_LUPLU LK-GTSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG- GLB1_GLYDI SG----AS---DPGVAALGAKVLAQIGVAVSHL--GDEGKMVAQMKAVGVRHKGYGN Consensus . t .. . v..Hg KV. a a...l d . a l. l H . Helix FFGGGGGGGGGGGGGGGGGGG HHHHHHHHHHHHHHHHHHHHHHHHHH HBA_HUMAN
HBA_HUMAN
MYG_PHYCA
GLB3_CHITP --VTHDQLNNFRAGFVSYMKAHT--DFA-GAEAAWGATLDTFFGMIFSKM------- GLB5_PETMA -QVDPQYFKVLAAVIADTVAAG---------DAGFEKLMSMICILLRSAY------- LGB2_LUPLU --VADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA--- GLB1_GLYDI KHIKAQYFEPLGASLLSAMEHRIGGKMNAAAKDAWAAAYADISGALISGLQS----- Consensus v. f l . .. .... f . aa. k.. l sky
5.
structure: ...aaaaa...bbbbbbbbbb.....cccccccCCC..C........ddd 1tlk ILDMDVVEGSAARFDCKVEGY--PDPEVMWFKDDNP--VKESR----HFQ AXO1_RAT RDPVKTHEGWGVMLPCNPPAHY-PGLSYRWLLNEFPNFIPTDGR---HFV AXO1_RAT ISDTEADIGSNLRWGCAAAGK--PRPMVRWLRNGEP--LASQN----RVE AXO1_RAT RRLIPAARGGEISILCQPRAA--PKATILWSKGTEI--LGNST----RVT AXO1_RAT
NCA2_HUMAN PTPQEFREGEDAVIVCDVVSS--LPPTIIWKHKGRD--VILKKDV--RFI NCA2_HUMAN PSQGEISVGESKFFLCQVAGDA-KDKDISWFSPNGEK-LTPNQQ---RIS NCA2_HUMAN IVNATANLGOSVTLVCDAEGF--PEPTMSWTKDGEQ--IEQEEDDE-KYI NRG_DROME RRQSLALRGKRMELFCIYGGT--PLPQTVWSKDGQR--IQWSD----RIT NRG_DROME PQNYEVAAGQSATFRCNEAHDDTLEIEIDWWKDGQS--IDFEAQP--RFV consensus : ........G..+.+.C.+.........+.W........+.........++ structure: ddd.....eeeeee.......fffffffff.......gggggggggggg. 1tlk IDYDEEGNCSLTISEVCGDDDAKYTCKAVNSL-----GEATCTAELLVET AXO1_RAT SQTT----GNLYIARTNASDLGNYSCLATSHMDFSTKSVFSKFAQLNLAA AXO1_RAT VLA-----GDLRFSKLSLEDSGMYQCVAENKH-----GTIYASAELAVQA AXO1_RAT VTSD----GTLIIRNISRSDEGKYTCFAENFM-----GKANSTGILSVRD AXO1_RAT AKETI---GDLTILNAHVRHGGKYTCMAQTVV-----DGTSKEATVLVRG NCA2_HUMAN VLSN----NYLQIRGIKKTDEGTYRCEGRILARG---EINFKDIQVIVNV NCA2_HUMAN VVWNDDSSSTLTIYNANIDDAGIYKCVVTGEDG----SESEATVNVKIFQ NCA2_HUMAN FSDDSS---QLTIKKVDKNDEAEYICIAENKA-----GEQDATIHLKVFA NRG_DROME QGHYG---KSLVIRQTNFDDAGTYTCDVSNGVG----NAQSFSIILNVNS NRG_DROME KTND----NSLTIAKTMELDSGEYTCVARTRL-----DEATARANLIVQD consensus : ..........L.+..+...+.+.Y.C.................+.+.+..
6.
7.
8.
9.
10.
k<l s(mk i , ml i), where
11.
12.
i , ml i) =
i , ml i) + i
i , ml i)+
i , ml i)
13.
14.
15.
16.
(http://www-lmmb.ncifcrf.gov/ toms/sequencelogo.html) 17.
18.
k≤j S(−, k) and V (i, 0) = k≤i s(xk, −)
b[s(a, b) × p(b, j)]
19.
20.
i, . . . , mN i of aligned symbols in column i;
i the symbol in column i for sequence j;
i δ
i = a
i = a) is 1 if mj i = a and 0
i, . . . , cK i of observed symbols in column
21.
a pcia ia
a cia log pia
22.
23.
i S(mi).
24.
i1, ..., xN iN.
i1, x2 i2, . . ., xN iN),
i2, . . . , xN iN),
i1, −, . . . , xN iN),
i1, x2 i2, . . ., −),
iN),
i2, . . . , −),
25.
(i,j−1,k) (i−1,j−1,k) (i,j,k−1) (i,j−1,k−1) (i−1,j−1,k−1) (i−1,j,k−1) (i,j,k) (i−1,j,k)
26.
∆1+...+∆N>0{αi1−∆1,...,iN−∆N + S(∆1x1 i1, . . . , ∆NxN iN)}
27.
28.
29.
k<l S(akl), where
k′<l′ S(ak′l′) ≤ S(akl) − S(ˆ
k′<l′ S(ˆ
k′<l′ S(ˆ
k′<l′ S(ˆ
30.
31.
32.
align optimally concatenate divide
33.
Acknowledgement: This slide and the previous one are from the Sequence Analysis Master Course, Centre for Integrative Bioinformatics, Vrije Universiteit, Amsterdam 34.
35.
expected to decay exponentially towards 0 with increasing evolutionary distance, hence the − log to make the measure more approximately linear with evolutionary distance. 36.
37.
38.
39.
40.
GAAGTT
GAC−TT
GAACTG
GTACTG
41.
42.
Acknowledgement: This slide is from Serafim Batzoglou, Bioinformatics Course, Stanford University. 43.
44.
45.
46.
47.
48.
49.
50.