A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 - - PowerPoint PPT Presentation
A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 - - PowerPoint PPT Presentation
A few Thoughts on Graphs in Chemistry and Biology Peter Schuster 19 th LL-Seminar on Graph Theory AW, 25.04.2002 Graphs are seen as valuable tools to order and classify information in various scientific disciplines at an intermediate stage of
A few Thoughts on Graphs in Chemistry and Biology
Peter Schuster 19th LL-Seminar on Graph Theory ÖAW, 25.04.2002
Graphs are seen as valuable tools to order and classify information in various scientific disciplines at an intermediate stage of knowledge or level of
- approximation. Such stages are, for example,
- collection or harvesting of data,
- ordering of data according to new categories and
development of models for qualitative analysis
- development of model for quatitative analysis and
accurate predictions.
Graphs are considered here as tools to
- distiguish chemical isomers,
- describe the flux in chemical reaction networks,
- define biological species by their phylogenetic descent, and
- model genotype-phenotype maps in case of neutrality.
Chemists use graphs to distinguish isomers since the second half of the ninteenth century. Atoms are nodes and chemical bonds are edges. In case of hydrocarbons containing exclusively carbon and hydrogen atoms the position of the atom is sufficient to predict its nature: H atoms form one bond and are attached to one edge, whereas C atoms form always four bonds and are connected to four edges.
D.J.Cram and G.S.Hammond, Organic Chemistry, McGraw-Hill, New York 1959, p.18
propane isobutane isopentane n-pentane neopentane n-butane ethane methane
CnH2n+2, n = 1,2,3,4,5
Formulas of the eight simplest alkanes as graphs, which allow for the distinction of isomers, e.g. n- and isobutane, n-, iso- and neo-pentane
C6H6
hexa-2,4-diyne (dimethyl-diacetylene) hexa-1,2,4,5-tetraene (diallene) benzene
Graphs allow for a distinction of single-, double- and triple bonds
C H
2 6O
dimethylether ethanol
Carbon, hydrogen and oxygen atoms are distinguished by the degree of the corresponding nodes: d(H) = 1, d(O) = 2, and d(C) = 4.
C6H6
benzene
The benzene molecule cannot be described by a single graph.
CH3X
methane: X = H methyl chloride: X = Cl methyl bromide: X = Br methyl iodide: X = I methyl fluoride: X = F
Different atoms forming one bond: H, F, Cl, Br, and I
C H6
2
C H4
2
Cl2
ethane 1,1-dichloro ethane 1,2-dichloro ethane
Two isomers that cannot be distinguished by means of their graphs.
Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.737
Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.949
H H H
N C O
1.35 1.22 1.09 1.00 1.00 112.7
- 120
- 122.5
- 124.7
- 121.6
- 118.5
- 1 Å = 10
m
- 10
Molecular structure of the formamide molecule
Molecular structure of an association complex between a protein an a nucleic acid
Chemists use directed graphs to model reaction mechanisms in chemical kinetics.
Paul Karrer, Lehrbuch der organischen Chemie, Georg Thieme Verlag, Stuttgart 1959, p.479
A B C D + + + AB C D + + ABD C + ACD B + ACE B + AD B C + + EC E C + Reaction graph of a kinetic mechanism
A B C D + + + AB C D + + ABD C + ACD B + ACE B + AD B C + + EC E C + k-1 k-2 k-3 k-4 k1 k2 k3 k4 k5 k6 k7 k7 k8 Reaction graph of a kinetic mechanism with rate constants
A B C D E F G H I J K L 1
Biochemical Pathways
2 3 4 5 6 7 8 9 10
The reaction network of cellular metabolism published by Boehringer-Ingelheim.
The citric acid
- r Krebs cycle
(enlarged from previous slide).
Biologists use directed graphs in the form of trees to distinguish biological species by their descent. The concept of evolution allows for ordering the wealth of species by means of phylogenetic relation. Direction of development and time ordering is introduced by the fossil record.
time
Charles Darwin, The Origin of Species, 6th edition. Everyman‘s Library, Vol.811, Dent London, pp.121-122.
Phylogenetic tree of animal kingdom
Lynn Margulis & Karlene V. Schwarz, Five Kingdoms. An illustrated guide to the Phyla of Life on Earth. W.H. Freeman & Co., San Francisco, 1982, p. 160.
t3 t2 t1 time
Phylogenetic tree of animal kingdom
Lynn Margulis & Karlene V. Schwarz, Five Kingdoms. An illustrated guide to the Phyla of Life on Earth. W.H. Freeman & Co., San Francisco, 1982, p. 160.
The genotypes or genomes of individuals and species, being reproductively related ensembles of individuals, are DNA
- sequences. They are changing from generation to generation
through mutation and recombination. Genotypes unfold into phenotypes or organisms, which are the targets of the evolutionary selection process. Point mutations are single nucleotide exchanges. The Hamming distance of two sequences is the minimal number of single nucleotide exchanges that mutually converts the two sequence into each other.
.... GC UC .... CA .... GC UC .... GU .... GC UC .... GA .... GC UC .... CU
d =1
H
d =1
H
d =2
H
Point mutations as moves in sequence space
CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... CGTCGTTACAATTTA GTTATGTGCGAATTC CAAATT AAAA ACAAGAG..... G A G T A C A C
Hamming distance d (S ,S ) =
H 1 2
4 d (S ,S ) = 0
H 1 1
d (S ,S ) = d (S ,S )
H H 1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance induces a metric in sequence space
4 2 1 8 16 10 19 9 14 6 13 5 11 3 7 12 21 17 22 18 25 20 26 24 28 27 23 15 29 30 31
Binary sequences are encoded by their decimal equivalents: = 0 and = 1, for example, "0" 00000 = "14" 01110 = , "29" 11101 = , etc. ≡ ≡ ≡ , C CCCCC C C C G GGG GGG G
Mutant class
1 2
3 4
5
Sequence space of binary sequences of chain lenght n=5
The RNA model considers RNA sequences as genotypes and simplified RNA structures, called secondary structures, as phenotypes. The mapping from genotypes into phenotypes is many-to-one. Hence, it is redundant and not invertible. Genotypes, i.e. RNA sequences, which are mapped onto the same phenotype, i.e. the same RNA secondary structure, form neutral networks. Neutral networks are represented by graphs in sequence space.
Three-dimensional structure of phenylalanyl-transfer-RNA
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAU AUUCGC UUA AGDDGGGA M CUGAAYA AGMUC TPCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG
Sequence Secondary structure Symbolic notation
Definition and formation of the secondary structure of phenylalanyl-tRNA
UUUAGCCAGCGCGAGUCGUGCGGACGGGGUUAUCUCUGUCGGGCUAGGGCGC GUGAGCGCGGGGCACAGUUUCUCAAGGAUGUAAGUUUUUGCCGUUUAUCUGG UUAGCGAGAGAGGAGGCUUCUAGACCCAGCUCUCUGGGUCGUUGCUGAUGCG CAUUGGUGCUAAUGAUAUUAGGGCUGUAUUCCUGUAUAGCGAUCAGUGUCCG GUAGGCCCUCUUGACAUAAGAUUUUUCCAAUGGUGGGAGAUGGCCAUUGCAG
Criterion of Minimum Free Energy
Sequence Space Shape Space
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers Mapping from sequence space into phenotype space and into fitness values
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Phenotype space Non-negative numbers
Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence
- space. In this approach, nodes are inserted randomly into sequence
space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
Random graph approach to neutral networks Sketch of sequence space Step 00
Random graph approach to neutral networks Sketch of sequence space Step 01
Random graph approach to neutral networks Sketch of sequence space Step 02
Random graph approach to neutral networks Sketch of sequence space Step 03
Random graph approach to neutral networks Sketch of sequence space Step 04
Random graph approach to neutral networks Sketch of sequence space Step 05
Random graph approach to neutral networks Sketch of sequence space Step 10
Random graph approach to neutral networks Sketch of sequence space Step 15
Random graph approach to neutral networks Sketch of sequence space Step 25
Random graph approach to neutral networks Sketch of sequence space Step 50
Random graph approach to neutral networks Sketch of sequence space Step 75
Random graph approach to neutral networks Sketch of sequence space Step 100
λj = 27 ,
/
12 λk = (k)
j
| | Gk
λ κ
cr = 1 - -1 (
1)
/ κ- λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4
- AUGC
G S S
k k k
= ( ) | ( ) =
- 1
- I
I
j j
- cr
2 0.5 3 0.4226 4 0.3700 Mean degree of neutrality and connectivity of neutral networks
Giant Component
A multi-component neutral network
A connected neutral network
Reference for postulation and in silico verification of neutral networks
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C G C G G G G G G G G G G G G G G G G C C C G C C C C U U U U G G G G G G G G G G C C C C C C C C C C C C C C U U U U A A A A A A A A A A U U
Compatible Incompatible
5’-end 5’-end 3’-end 3’-end
Sequences are compatible or incompatible with structures
G C
k k
Gk
Neutral Network Compatible Set Ck
Neutral networks Gk are embedded in sets of compatible sequences Ck.
:
- C1
C2 :
- C1
C2
G1 G2
Two neutral networks, G1 and G2, are embedded in compatible sets, C1 and C2, respectively. The compatible sets form an intersection consisting of sequences that can form both structures.
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G G G G G G G G G G G G C C C C C C C C U U U U U U G G G G G C C C C C C C C C C C C C U U U A A A A A A A A A A U
3’-end
Minimum free energy conformation S0 Suboptimal conformation S1
C G
A sequence at the intersection
- f two structures
Reference for the definition of the intersection and the proof of the intersection theorem
A ribozyme switch
E.A.Schultes, D.B.Bartel, One sequence, two ribozymes: Implication for the emergence of new ribozyme folds. Science 289 (2000), 448-452
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
Coworkers
Walter Fontana, Santa Fe Institute, NM Christian Reidys, Christian Forst, Los Alamos National Laboratory, NM Peter Stadler, Universität Wien, AT Ivo L.Hofacker Christoph Flamm Bärbel Stadler, Andreas Wernitznig, Universität Wien, AT Michael Kospach, Ulrike Mückstein, Stefanie Widder, Stefan Wuchty Jan Cupal, Kurt Grünberger, Andreas Svrček-Seiler Ulrike Göbel, Institut für Molekulare Biotechnologie, Jena, GE Walter Grüner, Stefan Kopp, Jaqueline Weber