A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 - - PowerPoint PPT Presentation

a few thoughts on graphs in chemistry and biology
SMART_READER_LITE
LIVE PREVIEW

A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 - - PowerPoint PPT Presentation

A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 th LL-Seminar on Graph Theory AW, 25.04.2002 Graphs are seen as valuable tools to order and classify information in various scientific disciplines at an intermediate stage of


slide-1
SLIDE 1
slide-2
SLIDE 2

A few Thoughts on Graphs in Chemistry and Biology

Peter Schuster 19th LL-Seminar on Graph Theory ÖAW, 25.04.2002

slide-3
SLIDE 3

Graphs are seen as valuable tools to order and classify information in various scientific disciplines at an intermediate stage of knowledge or level of

  • approximation. Such stages are, for example,
  • collection or harvesting of data,
  • ordering of data according to new categories and

development of models for qualitative analysis

  • development of model for quatitative analysis and

accurate predictions.

slide-4
SLIDE 4

Graphs are considered here as tools to

  • distiguish chemical isomers,
  • describe the flux in chemical reaction networks,
  • define biological species by their phylogenetic descent, and
  • model genotype-phenotype maps in case of neutrality.
slide-5
SLIDE 5

Chemists use graphs to distinguish isomers since the second half of the ninteenth century. Atoms are nodes and chemical bonds are edges. In case of hydrocarbons containing exclusively carbon and hydrogen atoms the position of the atom is sufficient to predict its nature: H atoms form one bond and are attached to one edge, whereas C atoms form always four bonds and are connected to four edges.

slide-6
SLIDE 6

D.J.Cram and G.S.Hammond, Organic Chemistry, McGraw-Hill, New York 1959, p.18

slide-7
SLIDE 7

propane isobutane isopentane n-pentane neopentane n-butane ethane methane

CnH2n+2, n = 1,2,3,4,5

Formulas of the eight simplest alkanes as graphs, which allow for the distinction of isomers, e.g. n- and isobutane, n-, iso- and neo-pentane

slide-8
SLIDE 8

C6H6

hexa-2,4-diyne (dimethyl-diacetylene) hexa-1,2,4,5-tetraene (diallene) benzene

Graphs allow for a distinction of single-, double- and triple bonds

slide-9
SLIDE 9

C H

2 6O

dimethylether ethanol

Carbon, hydrogen and oxygen atoms are distinguished by the degree of the corresponding nodes: d(H) = 1, d(O) = 2, and d(C) = 4.

slide-10
SLIDE 10

C6H6

benzene

The benzene molecule cannot be described by a single graph.

slide-11
SLIDE 11

CH3X

methane: X = H methyl chloride: X = Cl methyl bromide: X = Br methyl iodide: X = I methyl fluoride: X = F

Different atoms forming one bond: H, F, Cl, Br, and I

slide-12
SLIDE 12

C H6

2

C H4

2

Cl2

ethane 1,1-dichloro ethane 1,2-dichloro ethane

Two isomers that cannot be distinguished by means of their graphs.

slide-13
SLIDE 13

Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.737

slide-14
SLIDE 14

Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.949

slide-15
SLIDE 15

H H H

N C O

1.35 1.22 1.09 1.00 1.00 112.7

  • 120
  • 122.5
  • 124.7
  • 121.6
  • 118.5
  • 1 Å = 10

m

  • 10

Molecular structure of the formamide molecule

slide-16
SLIDE 16

Molecular structure of an association complex between a protein an a nucleic acid

slide-17
SLIDE 17

Chemists use directed graphs to model reaction mechanisms in chemical kinetics.

slide-18
SLIDE 18

Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.479

slide-19
SLIDE 19

A B C D + + + AB C D + + ABD C + ACD B + ACE B + AD B C + + EC E C + Reaction graph of a kinetic mechanism

slide-20
SLIDE 20

A B C D + + + AB C D + + ABD C + ACD B + ACE B + AD B C + + EC E C + k-1 k-2 k-3 k-4 k1 k2 k3 k4 k5 k6 k7 k7 k8 Reaction graph of a kinetic mechanism with rate constants

slide-21
SLIDE 21

A B C D E F G H I J K L 1

Biochemical Pathways

2 3 4 5 6 7 8 9 10

The reaction network of cellular metabolism published by Boehringer-Ingelheim.

slide-22
SLIDE 22

The citric acid

  • r Krebs cycle

(enlarged from previous slide).

slide-23
SLIDE 23

Biologists use directed graphs in the form of trees to distinguish biological species by their descent. The concept of evolution allows for ordering the wealth of species by means of phylogenetic relation. Direction of development and time ordering is introduced by the fossil record.

slide-24
SLIDE 24

time

Charles Darwin, The Origin of Species, 6th edition. Everyman‘s Library, Vol.811, Dent London, pp.121-122.

slide-25
SLIDE 25

Phylogenetic tree of animal kingdom

Lynn Margulis & Karlene V. Schwarz, Five Kingdoms. An illustrated guide to the Phyla of Life on Earth. W.H. Freeman & Co., San Francisco, 1982, p. 160.

slide-26
SLIDE 26

t3 t2 t1 time

Phylogenetic tree of animal kingdom

Lynn Margulis & Karlene V. Schwarz, Five Kingdoms. An illustrated guide to the Phyla of Life on Earth. W.H. Freeman & Co., San Francisco, 1982, p. 160.

slide-27
SLIDE 27

The genotypes or genomes of individuals and species, being reproductively related ensembles of individuals, are DNA

  • sequences. They are changing from generation to generation

through mutation and recombination. Genotypes unfold into phenotypes or organisms, which are the targets of the evolutionary selection process. Point mutations are single nucleotide exchanges. The Hamming distance of two sequences is the minimal number of single nucleotide exchanges that mutually converts the two sequence into each other.

slide-28
SLIDE 28

.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU

d =1

H

d =1

H

d =2

H

Point mutations as moves in sequence space

slide-29
SLIDE 29

CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C

Hamming distance d (S ,S ) =

H 1 2

4 d (S ,S ) = 0

H 1 1

d (S ,S ) = d (S ,S )

H H 1 2 2 1

d (S ,S ) d (S ,S ) + d (S ,S )

H H H 1 3 1 2 2 3

  • (i)

(ii) (iii)

The Hamming distance induces a metric in sequence space

slide-30
SLIDE 30

4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31

Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G

Mutant class

1 2

3 4

5

Sequence space of binary sequences of chain lenght n=5

slide-31
SLIDE 31

The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks. Neutral networks are represented by graphs in sequence space.

slide-32
SLIDE 32

Three-dimensional structure of phenylalanyl-transfer-RNA

slide-33
SLIDE 33

5'-End 5'-End 5'-End 3'-End 3'-End 3'-End

70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG

Sequence Secondary structure Symbolic notation

Definition and formation of the secondary structure of phenylalanyl-tRNA

slide-34
SLIDE 34

UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG

Criterion of Minimum Free Energy

Sequence Space Shape Space

slide-35
SLIDE 35

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values

slide-36
SLIDE 36

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers

slide-37
SLIDE 37

Sk I. = ( ) ψ

fk f Sk = ( )

Sequence space Phenotype space Non-negative numbers

slide-38
SLIDE 38

Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence

  • space. In this approach, nodes are inserted randomly into sequence

space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.

slide-39
SLIDE 39

Random graph approach to neutral networks Sketch of sequence space Step 00

slide-40
SLIDE 40

Random graph approach to neutral networks Sketch of sequence space Step 01

slide-41
SLIDE 41

Random graph approach to neutral networks Sketch of sequence space Step 02

slide-42
SLIDE 42

Random graph approach to neutral networks Sketch of sequence space Step 03

slide-43
SLIDE 43

Random graph approach to neutral networks Sketch of sequence space Step 04

slide-44
SLIDE 44

Random graph approach to neutral networks Sketch of sequence space Step 05

slide-45
SLIDE 45

Random graph approach to neutral networks Sketch of sequence space Step 10

slide-46
SLIDE 46

Random graph approach to neutral networks Sketch of sequence space Step 15

slide-47
SLIDE 47

Random graph approach to neutral networks Sketch of sequence space Step 25

slide-48
SLIDE 48

Random graph approach to neutral networks Sketch of sequence space Step 50

slide-49
SLIDE 49

Random graph approach to neutral networks Sketch of sequence space Step 75

slide-50
SLIDE 50

Random graph approach to neutral networks Sketch of sequence space Step 100

slide-51
SLIDE 51

λj = 27 ,

/

12 λk = (k)

j

| | Gk

λ κ

cr = 1 - -1 (

1)

/ κ- λ λ

k cr . . . .

> λ λ

k cr . . . .

< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4

  • AUGC

G S S

k k k

= ( ) | ( ) =

  • 1
  • I

I

j j

  • cr

2 0.5 3 0.4226 4 0.3700 Mean degree of neutrality and connectivity of neutral networks

slide-52
SLIDE 52

Giant Component

A multi-component neutral network

slide-53
SLIDE 53

A connected neutral network

slide-54
SLIDE 54

Reference for postulation and in silico verification of neutral networks

slide-55
SLIDE 55

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C G C G G G G G G G G G G G G G G G G C C C G C C C C U U U U G G G G G G G G G G C C C C C C C C C C C C C C U U U U A A A A A A A A A A U U

Compatible Incompatible

5’-end 5’-end 3’-end 3’-end

Sequences are compatible or incompatible with structures

slide-56
SLIDE 56

G C

k k

Gk

Neutral Network Compatible Set Ck

Neutral networks Gk are embedded in sets of compatible sequences Ck.

slide-57
SLIDE 57

:

  • C1

C2 :

  • C1

C2

G1 G2

Two neutral networks, G1 and G2, are embedded in compatible sets, C1 and C2, respectively. The compatible sets form an intersection consisting of sequences that can form both structures.

slide-58
SLIDE 58

C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U

3’-end

Minimum free energy conformation S0 Suboptimal conformation S1

C G

A sequence at the intersection

  • f two structures
slide-59
SLIDE 59

Reference for the definition of the intersection and the proof of the intersection theorem

slide-60
SLIDE 60

A ribozyme switch

E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds. Science 289 (2000), 448-452

slide-61
SLIDE 61

The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures

slide-62
SLIDE 62

Two neutral walks through sequence space with conservation of structure and catalytic activity

slide-63
SLIDE 63

Coworkers

Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Universität Wien, AT Ivo L.Hofacker Christoph Flamm Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Mückstein, Stefanie Widder, Stefan Wuchty Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber