Introduction Applied Bioinformatics Michael Schroeder - - PowerPoint PPT Presentation

introduction
SMART_READER_LITE
LIVE PREVIEW

Introduction Applied Bioinformatics Michael Schroeder - - PowerPoint PPT Presentation

Introduction Applied Bioinformatics Michael Schroeder Biotechnology Center TU Dresden DNA the molecule of life http://www.ornl.gov/hgmis 2 High-throughput Technology 1950s: 2000s: 2010s: Watson and Crick Sanger Center BGI,


slide-1
SLIDE 1

Michael Schroeder

Biotechnology Center TU Dresden

Introduction

Applied Bioinformatics

slide-2
SLIDE 2

DNA – the molecule of life

http://www.ornl.gov/hgmis

2

slide-3
SLIDE 3

1950s: Watson and Crick

High-throughput Technology

3

2000s: Sanger Center

Cambridge

2010s: BGI,

Beijing

slide-4
SLIDE 4

Drug Discovery

4

10 20 30 40 50 60 70 80 60 65 70 75 80 85 90 95 Year New drugs per year 5 10 15 20

New Drugs R&D spendings

10 20 30 40 50 60 70 80 60 65 70 75 80 85 90 95 Year New drugs per year 5 10 15 20

New Drugs R&D spendings

slide-5
SLIDE 5

Genetic Code

5

slide-6
SLIDE 6

Actinidin and Papain

6

50% sequence ID, same structure

Cystein proteases in kiwi and papaya, respectively Tenderises meat and breaks down casein (milk)

slide-7
SLIDE 7

11% sequence ID, same structure

Hemoglobin and Leghemoglobin

7 Oxygen transport in red blood cells and legumes, respectively

slide-8
SLIDE 8

Sequence-Structure Relation

8

slide-9
SLIDE 9

Similar sequences hint for…

§ common ancestry and § possibly similar function

9

slide-10
SLIDE 10

Similar sequence, similar function?

§ Monkey V-sys and human PDGF 85% similar

Doolittle et al., Science, 1983 Simian sarcoma virus onco gene, v-sys, is derived from the gene encoding a platelet-derived growth factor.

§ Hypothesis: Cancer = deregulated growth factor

Alignment from: http://pdf.aminer.org/000/244/500/design_and_implementation_of_a_dna_sequence_processor.pdf

10

slide-11
SLIDE 11

>sp|P00674|RNP_HORSE Ribonuclease pancreatic Horse KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ… >sp|P00673|RNP_BALAC Ribonuclease pancreatic Minke whale RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ… >sp|P00686|RNP_MACRU Ribonuclease pancreatic Red kangaroo ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQE…

Similar sequence, common ancestry?

11

slide-12
SLIDE 12

Alignment

CLUSTAL 2.1 multiple sequence alignment sp|P00674|RNP_HORSE sp|P00673|RNP_BALAC sp|P00686|RNP_MACRU KESPAMKFERQHMDSGSTSSSNPTYCNQMMKRRNMTQGWCKPVNTFVHEPLADVQAICLQ 60 RESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ 60

  • ETPAEKFQRQHMDTEHSTASSSNYCNLMMKARDMTSGRCKPLNTFIHEPKSVVDAVCHQ 59

*:** **:*****: :......*** ** *.**.* ***:***:**. *.*:* * KNITCKNGQSNCYQSSSSMHITDCRLTSGSKYPNCAYQTSQKERHIIVACEGNPYVPVHF 120 KNVLCKNGRTNCYESNSTMHITDCRQTGSSKYPNCAYKTSQKEKHIIVACEGNPYVPVHF 120 ENVTCKNGRTNCYKSNSRLSITNCRQTGASKYPNCQYETSNLNKQIIVACEG-QYVPVHF 118 :*: ****::***:*.* : **:** *..****** *:**: :::******* ****** DASVEVST 128 DNSV---- 124 DAYV---- 122 * *

http://www.genome.jp/tools/clustalw

12

Number of aligned residues § Horse and Minke whale: 95 § Minke whale and Red kangoroo: 82 § Horse and Red kangoroo: 75

slide-13
SLIDE 13

Similar sequence, common ancestry?

13

slide-14
SLIDE 14

14

African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF

African elephant: sp|O47885|CYB_ELEMA Mammoth: sp|P92658|CYB_MAMPR Indian elephant: sp|P24958|CYB_LOXAF

slide-15
SLIDE 15

Elephant and Mammoth

Mammoth-African elephant 10 mismatches Mammoth-Indian elephant 14 mismatches Significant?

15

slide-16
SLIDE 16

Similarity implies homology

Sequence similarity is not equal to homology

16

slide-17
SLIDE 17

Similarity usually implies homology

§ Conservation: Sequences similar in many species § Convergent evolution § Mutation rate varied § Horizontal gene transfer

17

slide-18
SLIDE 18

Homologue Orthologue Paralogue

18

slide-19
SLIDE 19

Darwin‘s Tree of Life

19

slide-20
SLIDE 20

Tree of Life with 2.3 Mio Species

  • pentreeoflife.org

20

slide-21
SLIDE 21

Sequence Alignments

§ Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast?

21

slide-22
SLIDE 22

How to judge an alignment

§ Scoring scheme

§ number of matches, mismatches, gaps § substitution matrices

§ Significance

§ E-value, P-value, Z-score

§ Structure

§ Benchmark sequence against structure alignment

§ Function

§ Benchmark sequence alignment implies similar function?

22

slide-23
SLIDE 23

Sequence Alignments

§ Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast?

23

slide-24
SLIDE 24

Levenshtein (or Edit) Distance

Minimum number of insertions, deletions, and replacements to convert string a into string b

24

slide-25
SLIDE 25

Levenshtein (or Edit) Distance

25

Let a = a1 . . . am and b = b1 . . . bn be strings. Then leva,b = leva,b(m, n) is the Levenshtein distance of a and b, where leva,b(i, j) = 8 > > > > < > > > > : max(i, j) if min(i, j) = 0, min 8 > < > : leva,b(i 1, j) + 1 leva,b(i, j 1) + 1 leva,b(i 1, j 1) + 1(ai6=bj)

  • therwise,

and 0  i  m and 0  j  n and 1(ai6=bj) := ( 1 if (ai 6= bj), if (ai = bj).

slide-26
SLIDE 26

From Distance to Alignment

From lectures.molgen.mpg.de/Alg/Intro/

26

Aligning RDISLVKNAGI and RNILVSDAKNVGI R D I S L V - - - K N A G I R N I - L V S D A K N V G I

slide-27
SLIDE 27

Sequence Alignments

§ Why to compare and align sequences? § How to judge an alignment? § How to compute an alignment? § How to compute an alignment fast?

27

slide-28
SLIDE 28

Computing Alignments fast

compbio.pbworks.com

28

slide-29
SLIDE 29

Computing multiple sequence alignments

29

slide-30
SLIDE 30

Computing phylogenetic trees

§ Distance-based

§ Neighbour joining § Hierarchical clustering

§ Character-based

§ Parsimony method § Maximum Likelihood

Saitou, Kyushu Museum, 2002

30