Michael Schroeder
Biotechnology Center TU Dresden
Introduction Applied Bioinformatics Michael Schroeder - - PowerPoint PPT Presentation
Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden DNA the molecule of life http://www.ornl.gov/hgmis 2 High-throughput Technology 1950s: 2000s: 2010s: Watson and Crick Sanger Center BGI,
Biotechnology Center TU Dresden
http://www.ornl.gov/hgmis
2
1950s: Watson and Crick
3
2000s: Sanger Center
Cambridge
2010s: BGI,
Beijing
4
10 20 30 40 50 60 70 80 60 65 70 75 80 85 90 95 Year New drugs per year 5 10 15 20
New Drugs R&D spendings
10 20 30 40 50 60 70 80 60 65 70 75 80 85 90 95 Year New drugs per year 5 10 15 20
New Drugs R&D spendings
5
6
Cystein proteases in kiwi and papaya, respectively Tenderises meat and breaks down casein (milk)
7 Oxygen transport in red blood cells and legumes, respectively
8
9
Doolittle et al., Science, 1983 Simian sarcoma virus onco gene, v-sys, is derived from the gene encoding a platelet-derived growth factor.
Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf
10
>sp|P00674|RNP_HORSE Ribonuclease pancreatic Horse KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ… >sp|P00673|RNP_BALAC Ribonuclease pancreatic Minke whale RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ… >sp|P00686|RNP_MACRU Ribonuclease pancreatic Red kangaroo ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQE…
11
CLUSTAL 2.1 multiple sequence alignment sp|P00674|RNP_HORSE sp|P00673|RNP_BALAC sp|P00686|RNP_MACRU KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60
*:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* * KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118 :*: ****::***:*.* : **:** *..****** *:**: :::******* ****** DASVEVST 128 DNSV---- 124 DAYV---- 122 * *
http://www.genome.jp/tools/clustalw
12
Number of aligned residues § Horse and Minke whale: 95 § Minke whale and Red kangoroo: 82 § Horse and Red kangoroo: 75
13
14
African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF
African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF
15
Sequence similarity is not equal to homology
16
17
18
19
20
21
22
23
24
25
Let a = a1 . . . am and b = b1 . . . bn be strings. Then leva,b = leva,b(m, n) is the Levenshtein distance of a and b, where leva,b(i, j) = 8 > > > > < > > > > : max(i, j) if min(i, j) = 0, min 8 > < > : leva,b(i 1, j) + 1 leva,b(i, j 1) + 1 leva,b(i 1, j 1) + 1(ai6=bj)
and 0 i m and 0 j n and 1(ai6=bj) := ( 1 if (ai 6= bj), if (ai = bj).
From lectures.molgen.mpg.de/Alg/Intro/
26
Aligning RDISLVKNAGI and RNILVSDAKNVGI R D I S L V - - - K N A G I R N I - L V S D A K N V G I
27
compbio.pbworks.com
28
29
Saitou, Kyushu Museum, 2002
30