[PPT] - Finding relevant paths in the not-so-small world of metabolic PowerPoint Presentation

SLIDE 1

Finding relevant paths in the not-so-small world of metabolic networks

ENSBBAU4 – 19 novembre 2014

Jacques van Helden

Jacques.van-Helden@univ-amu.fr Aix-Marseille Université (AMU)

Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM Unit U1090)

http://jacques.van-helden.perso.luminy.univ-amu.fr/ FORMER ADDRESS (1999-2011) Université Libre de Bruxelles, Belgique Bioinformatique des Génomes et des Réseaux (BiGRe lab) http://www.bigre.ulb.ac.be/

SLIDE 2

Bioinformatique des Génomes et des Réseaux (BiGRe) Université Libre de Bruxelles

http://www.bigre.ulb.ac.be/

Development and application of bioinformatics methods for the analysis
f genomes and biomolecular interaction networks.
Analysis of cis-regulatory sequences
RSAT Web site (http://rsat.ulb.ac.be/rsat/)
Olivier Sand (ex-Postdoc), Matthieu Defrance (ex-Postdoc), Maud Vidick

(Master thesis), Rekin’s Janky (ex-PhD student), Jean Valéry Turatsinze (ex- PhD student), Morgane Thomas-Chollier (ex-PhD student + ex-postdoc), Eric Vervisch (ex-Research fellow)

Biomolecular networks (regulatory, protein interactions, metabolic, host-virus)
NeAT Web site (http://rsat.ulb.ac.be/neat/)
Rekin’s Janky (PhD student), Sylvain Brohée (ex-PhD student, Karoline Faust

(PhD student), Nicolas Simonis (Postdoc), Leon Juvénal Hagingambo(PhD student),

Mobile genetic elements in prokaryotes
ACLAME Web site (http://aclame.ulb.ac.be/)
Raphaël Leplae (Postdoc), Gipsi Lima (PhD student), Ariane Toussaint

(Professor)

Modelling of dynamical systems
Didier Gonze (Premier assistant)

2

B!GRe

Bioinformatique des Génomes et Réseaux

Ariane Toussaint

Professor

Raphaël Leplae

Postdoc

Jacques van Helden

Chargé de cours

Karoline Faust

Ex-PhD student

Didier Gonze

Premier assistant

Gipsi Lima

Postdoc

Jean Valéry Turatsinze

Ex-PhD student

Rekin’s Janky

Ex-PhD student

Morgane Thomas-Chollier

Ex-PhD student+postdoc

Olivier Sand

Ex-Postdoc

Matthieu Defrance

X-Postdoc

Sylvain Brohée

Ex-PhD student

Eric Vervisch

Ex-Research fellow

Nicolas Simonis

Postdoc

Myriam Loubriat

Secretary

Leon Juvenal HajingaboE

PhD Student

Alejandra Medina-Rivera

PhD Student (co-direction)

Elodie Darbo

PhD Student (co-direction)

SLIDE 3

Structure of the talk

Part 1 – From reactions/compounds to pathways
Pathways as graphs of reactions and compounds
Pathway diversity across organisms
Multi-level regulation, feed-back loops ensuring homeostasis
Super-pathways with intricated regulatory circuits
What is a pathway ?
Pathways (e.g. EcoCyc) or reaction maps (e.g. KEGG) ?
How to define boundaries ?
Part 2 – From reactions/compounds to metabolic networks
Building a network from a collection of reactions/compounds
Myths and dogmas about network topology
Part 3 – From networks to pathways
Tricks and traps for metabolic path finding
Finding relevant paths in metabolic networks
Extracting pathways from sets of seed genes/reactions/compounds

SLIDE 4

Part 1 From reactions/compounds to pathways

SLIDE 5

Methionine Biosynthesis in E.coli

5 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine r8 L-aspartic semialdehyde r3 r4 r5 r1 r6 r7 r2 NADP+ NADPH HSCoA SuccinylSCoA L-Cysteine ADP ATP Pyruvate; NH4+ H2O THF 5-MethylTHF NADP+; Pi NADPH Succinate Pi; PPi ATP; H2O Cysteine biosynthesis Lysine biosynthesis Threonine biosynthesis Aspartate biosynthesis Homoserine O-succinyltransferase Cystathionine-gamma-synthase aspartate kinase II/ homoserine dehydrogenase II Cystathionine-beta-lyase Cobalamin-independent- homocysteine transmethylase Cobalamin-dependent- homocysteine transmethylase Aspartate semialdehyde deshydrogenase 2.7.2.4 1.1.1.3 1.2.1.11 2.3.1.46 4.2.99.9 2.1.1.14 4.4.1.8 2.1.1.13 S-adenosylmethionine synthetase 2.5.1.6 metA metB metL metC metE metH asd metK expr expr expr expr expr expr expr expr inhib act metJ Methionine repressor metR metR repr repr repr repr repr repr expr expr up-reg up-reg

SLIDE 6

Methionine Biosynthesis in S.cerevisiae

6 MET31 MET32 MET28 MET4 CBF1 Cbf1p/Met4p/Met28p complex Met31p met32p Met30p MET30 GCN4 Gcn4p HOM6 MET2 MET17 HOM3 MET6 SAM1 SAM2 HOM2 Homoserine deshydrogenase Homoserine O-acetyltransferase O-acetylhomoserine (thiol)-lyase Aspartate kinase Methionine synthase (vit B12-independent) S-adenosyl-methionine synthetase I S-adenosyl-methionine synthetase II Aspartate semialdehyde deshydrogenase O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 2.5.1.49 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 NADP+ NADPH CoA AcetlyCoA Sulfide ADP ATP 5-tetrahydropteroyltri-L-glutamate 5-methyltetrahydropteroyltri-L-glutamate Pi, PPi H20; ATP NADP+; Pi NADPH Sulfur assimilation Cysteine biosynthesis Threonine biosynthesis Aspartate biosynthesis

SLIDE 7

Alternative methionine pathways

7 O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 4.2.99.10 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 Alpha-succinyl-L-Homoserine Cystathionine 2.3.1.46 4.2.99.9 4.4.1.8

S.cerevisiae E.coli

SLIDE 8

Sulfur Assimilation in yeast

8 Sulfate (intracellular) Sulfate (extracellular) 3'-phosphoadenylylsulfate (PAPS) sulfite sulfide Methionine biosynthesis Adenylyl sulfate (APS) MET31 MET32 PPi ATP Sulfate adenylyl transferase 2.7.7.4 MET3 ADP ATP Adenylyl sulfate kinase MET14 2.7.1.25 NADP+; AMP; H+; 3'-phosphate (PAP) NADPH 3'-phosphoadenylylsulfate reductase MET16 1.8.99.4 MET28 MET4 CBF1 Cbf1p/Met4p/Met28p complex Met31p Met32p Met31p MET30 Sulfate transport Sulfate transporter SUL1 Sulfate transporter SUL2 1.8.1.2 3 NADPH; 5H+ 3 NADP+; 3 H2O Sulfite reductase (NADPH) MET10 Putative Sulfite reductase MET5 GCN4 Gcn4p

SLIDE 9

EcoCyc - Superpathway of sulfate assimilation and cysteine biosynthesis

9

SLIDE 10

MetaCyc – Sulfur incoroporation in amino-acids

10

Via methionine Via cysteine

SLIDE 11

KEGG “reference” pathway - Methionine metabolism (1998)

11

SLIDE 12

KEGG “reference” map - Cysteine and methionine metabolism (2009)

In principle, merging

methionine and cysteine should highlight the relationship between the two sulfur-containing amino acids.

Questions:
Where is L-Cysteine ?
Where is L-Methionine ?

12

http://www.genome.jp/kegg/pathway/map/map00270.html

SLIDE 13

KEGG map - Cysteine and methionine metabolism (2009) – S.cerevisiae

KEGG cysteine and

methionine pathway.

Saccharomyces

cerevisiae.

Question
How is sulfur

incoroprated into aa in this yeast ?

13 13

http://www.genome.jp/kegg-bin/show_pathway?org_name=sce&mapno=00270

SLIDE 14

KEGG map - Cysteine and methionine metabolism (2009) – E.coli

KEGG cysteine and

methionine pathway.

Escherichia coli K12.
Question
How is sulfur

incoroprated into aa in this yeast ?

14

http://www.genome.jp/kegg-bin/show_pathway?org_name=eco&mapno=00270

SLIDE 15

KEGG map - Cysteine and methionine metabolism (2009) – M.genitalium

Mycoplasma genitalium
Very small genome

(500 genes).

Intra-cellular parasite.
Parasitism allowed to

loose many pathways.

Relies on host for the

corresponding compounds.

15

http://www.genome.jp/kegg-bin/show_pathway?org_name=mge&mapno=00270

SLIDE 16

Methionine Biosynthesis in E.coli

16 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine r8 L-aspartic semialdehyde r3 r4 r5 r1 r6 r7 r2 NADP+ NADPH HSCoA SuccinylSCoA L-Cysteine ADP ATP Pyruvate; NH4+ H2O THF 5-MethylTHF NADP+; Pi NADPH Succinate Pi; PPi ATP; H2O Cysteine biosynthesis Lysine biosynthesis Threonine biosynthesis Aspartate biosynthesis Homoserine O-succinyltransferase Cystathionine-gamma-synthase aspartate kinase II/ homoserine dehydrogenase II Cystathionine-beta-lyase Cobalamin-independent- homocysteine transmethylase Cobalamin-dependent- homocysteine transmethylase Aspartate semialdehyde deshydrogenase 2.7.2.4 1.1.1.3 1.2.1.11 2.3.1.46 4.2.99.9 2.1.1.14 4.4.1.8 2.1.1.13 S-adenosylmethionine synthetase 2.5.1.6 metA metB metL metC metE metH asd metK expr expr expr expr expr expr expr expr inhib act metJ Methionine repressor metR metR repr repr repr repr repr repr expr expr up-reg up-reg

SLIDE 17

Lysine biosynthesis in Escherichia coli

L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate N-succinyl-epsilon-keto- L-alpha-aminopimelic acid succinyl diaminopimelate LL-diaminopimelic acid meso-diaminopimelic acid dapF diaminopimelate epimerase 5.1.1.7 L-aspartic semialdehyde tetrahydrodipicolinae N-succinyltransferase dapD 2.3.1.117 metL aspartate kinase III 2.7.2.4 dapC succinyl diaminopimelate aminotransferase 2.6.1.17 dapE N-succinyldiaminopimelate desuccinylase 3.5.1.18 asd aspartate semialdehyde deshydrogenase 1.2.1.11 2 H2O pyruvate NADP+ or NAD+ NADPH or NADH; H+ succinyl CoA ADP ATP alpha-ketoglutarate glutamate succinate H2O NADP+; Pi NADPH; H+ dapB dihydrodipicolinate reductase 1.3.1.26 CoA lysR lysR protein Methionine biosynthesis Aspartate biosynthesis L-lysine CO2 4.2.1.52 dapA dihydrodipicolinate synthase 3.5.1.18 lysA diaminopimelate decarboxylase Threnonine biosynthesis

SLIDE 18

Lysine biosynthesis in Saccharomyces cerevisiae

18 1,2,4-Tricarboxylate 2-Oxoglutarate Homoisocitrate Oxaloglutarate 2-Oxoadipate L-2-Aminoadipate L-2-Aminoadipate 6-semialdehyde N6-(L-1,3-Dicarboxypropyl)-L-lysine LYS9 saccharopine dehydrogenase (glutamate forming) Homoisocitrate dehydrogenase LYS20 homocitrate synthase aminoadipate aminotransferase amlnoadipate semialdehyde dehydrogenase LYS7 homocitrate dehydratase 4.1.3.21 CoA Acetyl-CoA 2.6.1.39 2-Oxoglutarate L-Glutamate 1.2.1.31 NAD+( or NADP+); H2O H+ ; NADH (or NADPH) But-1-ene-1,2,4-tricarboxylate H2O H+; NADH NAD+ 1.1.1.87 1.1.1.87 CO2 L-lysine 4.2.1.36 LYS4 homoaconitate hydratase LYS1 saccharopine dehydrogenase (lysine forming) 2-Oxoglutarate ; NADPH (OR NADH) ; H+ 1.5.1.7 NADP+ (OR NAD+) ; H2O 1.5.1.10 NADP+ (OR NAD+); H2O L-Glutamate ; NADPH (or NADH); H+ LYS2 LYS5

SLIDE 19

KEGG - Lysine biosynthesis – Escherichia coli K12

19

http://www.genome.jp/kegg-bin/show_pathway?org_name=eco&mapno=00300

SLIDE 20

http://www.genome.jp/kegg-bin/show_pathway?org_name=sce&mapno=00300

KEGG - Lysine biosynthesis – Saccharomyces cerevisiae

20

SLIDE 21

From pathways to super-pathways

SLIDE 22

Lysine biosynthesis in Escherichia coli

L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate N-succinyl-epsilon-keto- L-alpha-aminopimelic acid succinyl diaminopimelate LL-diaminopimelic acid meso-diaminopimelic acid dapF diaminopimelate epimerase 5.1.1.7 L-aspartic semialdehyde tetrahydrodipicolinae N-succinyltransferase dapD 2.3.1.117 metL aspartate kinase III 2.7.2.4 dapC succinyl diaminopimelate aminotransferase 2.6.1.17 dapE N-succinyldiaminopimelate desuccinylase 3.5.1.18 asd aspartate semialdehyde deshydrogenase 1.2.1.11 2 H2O pyruvate NADP+ or NAD+ NADPH or NADH; H+ succinyl CoA ADP ATP alpha-ketoglutarate glutamate succinate H2O NADP+; Pi NADPH; H+ dapB dihydrodipicolinate reductase 1.3.1.26 CoA lysR lysR protein Methionine biosynthesis Aspartate biosynthesis L-lysine CO2 4.2.1.52 dapA dihydrodipicolinate synthase 3.5.1.18 lysA diaminopimelate decarboxylase Threnonine biosynthesis

SLIDE 23

asd thrABC mRNA Cystathionine-gamma-synthase Aspartate kinase I homoserine dehydrogenase I Cystathionine-beta-lyase Aspartate semialdehyde deshydrogenase L-Aspartyl-4-P L-Aspartate L-Homoserine L-Homoserine phosphate L-Threonine L-Aspartic semialdehyde 1.1.1.3 2.7.1.39 2.7.2.4 4.4.1.8 1.2.1.11

NADP+ NADPH ATP ADP ATP Pi H2O NADP+; Pi NADPH ADP

catalysis catalysis catalysis catalysis catalysis translation expression translation translation inhibition inhibition inhibition

thrABC operon

transcription Attenuation

Threonine biosynthesis in Escherichia coli

23

SLIDE 24

Lysine, Methionine and Threonine biosynthesis in E.coli

24 N-succinyl-epsilon-keto- L-alpha-aminopimelic acid meso-diaminopimelic acid L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate succinyl diaminopimelate LL-diaminopimelic acid 5.1.1.7 L-aspartic semialdehyde 2.3.1.117 2.7.2.4 2.6.1.17 3.5.1.18 1.2.1.11 1.3.1.26 L-lysine 4.2.1.52 3.5.1.18 L-aspartyl-4-P L-aspartate L-Homoserine L-Homoserine phosphate L-Threonine L-aspartic semialdehyde 1.1.1.3 2.7.1.39 2.7.2.4 4.4.1.8 1.2.1.11 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine 2.5.1.6 L-aspartic semialdehyde 1.1.1.3 2.3.1.46 4.2.99.9 2.7.2.4 4.4.1.8 1.2.1.11 2.1.1.13 2.1.1.14

SLIDE 25

L-Methionine L-Threonine L-Isoleucine L-Lysine L-aspartic semialdehyde L-Homoserine L-Cysteine aspartate Methionine biosynthesis Homoserine biosynthesis Threonine biosynthesis common fork for aspartate derivatives Lysine biosynthesis Isoleucine biosynthesis

inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition

Super-pathway : Aspartate-derivative amino acids

25

SLIDE 26

What is a pathway ?

Should we consider that pathways are arbitrary definition of the boundaries ?
Should we even go further and consider that the full organism-specific network is

the only relevant level of analysis ?

If so, can we hope to get any insight from such a complex system ?

SLIDE 27

Is there a metabolic modularity ?

The reductionist approach: 1 gene – 1 enzyme – 1 “function”
Remark: definition of function
“Fonction: action, rôle caractéristique d’un élément, d’un organe, dans un ensemble

(souvent opposé à structure)” Robert, 1982.

It is worthless to dissociate (as in GO) the “molecular” and “cellular” function.
Function is, by definition, the relationship between enzymatic activity and a process in

which it takes place.

> context-dependence
Multifunctionality - an element may be multi-functional by different means
same activity can play different roles in different contexts (tissues, processes)
different activities in the same context (e.g. multi-domain enzymes)
Auxotrophy.
Regulation: changes in conditions induce/activate defined sets of enzymes.

SLIDE 28

Part 2 – From reactions/compounds to metabolic networks

SLIDE 29

Building metabolic networks

SLIDE 30

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine 4.2.99.9 4.4.1.8 2.1.1.14 L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine 2.3.1.31 4.2.99.10 CoA AcetlyCoA Sulfide 2.3.1.46 HSCoA SuccinylSCoA

E.coli S.cerevisiae Metabolic network

30

SLIDE 31

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine CoA AcetlyCoA Sulfide HSCoA SuccinylSCoA

2.3.1.46 2.3.1.46 2.3.1.46 2.3.1.46 4.2.99.9 4.2.99.9 4.2.99.9 4.2.99.9

One node per compound

31

vertices = compounds
arcs = reactions
problem: no

representation of cross- point reactions

SLIDE 32

4.2.99.9 4.4.1.8 2.1.1.14 2.3.1.31 4.2.99.10 2.3.1.46 Alpha-succinyl-L-Homoserine Cystathionine Homocysteine O-acetyl-homoserine Homocysteine

One node per reaction

32

vertices = reactions
arcs = intermediate

compounds

problem: no representation
f cross-point compounds

SLIDE 33

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine 4.2.99.9 4.4.1.8 2.1.1.14 L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine 2.3.1.31 4.2.99.10 CoA AcetlyCoA Sulfide 2.3.1.46 HSCoA SuccinylSCoA

One node per compound and per reaction

33

2 types of vertices
compounds and reactions
arcs
from substrate to reaction
from reaction to product
arc labels can be used to

represent stoichiometry

SLIDE 34

Reactions and compounds: directed bipartite graph

A bipartite graph is a graph whose vertex-set V can be partitioned into two

subsets U and W, such that each edge of G has one endpoint in U and one endpoint in W.

Metabolic networks can be represented as a bipartite graph
Node types: compounds (U) and reactions (W), respectively
Arcs never go from compound to compound
Arcs never go from reaction to reaction

34

5,871 compounds 5,223 reactions 21,194 arcs

SLIDE 35

Boerhinger-Mannheim Metabolic Wall Chart

35

http://www.expasy.ch/cgi-bin/show_thumbnails.pl

SLIDE 36

EcoCyc metabolic chart

36

http://biocyc.org/ECOLI/new-image?type=OVERVIEW

SLIDE 37

KEGG organism-specific network – Mycoplasma genitalium

Compounds and reactions are shown as nodes.
Edges represent substrate/product relationships between intermediate compounds and reactions.
Side compounds are ignored
Network
238 compounds
180 reactions
Bipartite graph

(forward + reverse reactions)

238+2*180 = 598 nodes
820 edges
substrate -> reaction
reaction -> product

SLIDE 38

KEGG organism-specific network - Escherichia coli K12

Compounds and reactions are shown as nodes.
Edges represent substrate/product relationships between intermediate compounds and reactions.
Side compounds are ignored
Network
1115 compounds
1146 reactions
Bipartite graph
1115 +2*1146 = 3407 nodes
5188 edges
substrate -> reaction
reaction -> product

SLIDE 39

KEGG organism-specific network – Saccharoyces cerevisiae

Compounds and reactions are shown as nodes.
Edges represent substrate/product relationships between intermediate compounds and reactions.
Side compounds are ignored
Network
923 compounds
1796 reactions
Bipartite graph
923+2*1796 = 4515 nodes
4110 edges
substrate -> reaction
reaction -> product

SLIDE 40

KEGG reference network

Compounds and reactions are shown as nodes.
Edges represent substrate/product relationships between intermediate compounds and reactions.
Side compounds are ignored
Network
3,801 compounds
5,020 reactions
Bipartite graph
13,841 nodes
21,486 edges
substrate -> reaction
reaction -> product

SLIDE 41

The powerful law of the power law and other myths in network biology

Gipsi Lima-Mendez and Jacques van Helden (2009). Molecular BioSystems, 2009, 5, 1482 – 1493.

Topology of biochemical networks

SLIDE 42

Topological properties of metabolic networks

Power-law
Small world
Scale-freeness
Error tolerance (robustness to random

deletions)

Vulnerability to attacks (targeted on hubs)
Evolutionary scenarios

Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barabasi, A. L. (2000). The large-scale organization of metabolic networks. Nature 407, 651-4. Degree distribution Metabolites Theoretical models for generating networks Small world

Distance betw. compounds Network size

Diameter

# compound pairs

Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

SLIDE 43

Properties of graphs with power-law degree distribution

Small-world property
Distances between node pairs are very short.
The distribution of distances between pairs of

compounds in the metabolic network peaks at 3 (Figure a).

This results from the shortcuts through the

highly connected nodes (the « hubs »).

Scale-free properties
When only a subset of the network is selected

(e.g. the reactions catalyzed in a organisms with small number of enzymes), there is a conservation of

the power-law property
the average distances (Figure b).
Robustness to errors

(random node deletions)

Random node deletions barely affect the

average distance between nodes (Figure e, green).

Sensitivity to attacks

(targeted node deletions)

When the most connected nodes (“hubs”) are

removed from the network, the average distance rapidly increases (Figure e, red).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

SLIDE 44

Lethality and centrality in protein networks

The power law is also

apparent in protein interaction networks.

Degree correlates with

essentiality (deletion phenotypes).

Jeon

ng, H., Mason
n, S.
S. P., Barabasi, A. L. and Oltvai, Z. N. (2001). Lethality and centrality in prot
tein networ
rks. Nature 411, 41-2.

SLIDE 45

Hierarchical organization of modularity in metabolic networks

Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. and Barabasi, A. L. (2002). Hierarchical organization of modularity in metabolic networks. Science 297, 1551-5.

Power law Manifestly modular Hierarchical

SLIDE 46

Universal laws in network biology ...

SLIDE 47

... and beyond

Socio-ecological networks

Ostrom,

m, E. (2009). A general frame

mewor

rk for
r analyzing sustainability of
f soc
cial-

ecol

log
gical systems
ms. Sc

Science 325, 419-22.

The web of life

Bascomp

mpte, J. (2009). Disentangling the web of
f life. Sc

Science 325, 416-9.

SLIDE 48

Myths and dogmas in scale-free networks

Myth
a traditional story, esp. one concerning the early history of a people or explaining some natural or

social phenomenon, and typically involving supernatural beings or events

a widely held but false belief or idea
Dogma
a principle or set of principles laid down by an authority as incontrovertibly true
Myth 1: the degree distribution of biological networks follows a power law

I will also show how this myth is becoming a dogma

Myth 2: the metabolic network is a small world
Myth 3: Biological networks are scale-free
Myth 4: small worlds are tolerant to random deletions, but vulnerable to targeted attacks
Myth 5: biological networks grow by preferential attachment
We challenged those 5 myths for two network types: metabolism and protein interactions.
I will only discuss here about metabolic networks.

Lima-Mendez, G. and van Helden, J. (2009). The powerful law of the power law and other myths in network biology.

Mol. BioSyst., 2009, 5, 1482 - 1493, DOI: 10.1039/b908681a. [Pubmed 20023717]..

SLIDE 49

Myth 1: the degree distribution of biological networks follows a power law

SLIDE 50

Degree - definition

In a non-directed graph
The degree (k) of a node is the number of edges for which it is an endpoint.
In a directed graph
The in-degree (kin) of a node is the number of arcs for which it is the tail.
The out-degree (kout) of a node is the number of arcs for which it is the head.
The total degree (k) of a node is the sum of in-degree and out-degree
k=kin+kout

SLIDE 51

Graph types

Homogeneous networks

Erdös-Rényi model (ER model)
Pairs of nodes are connected with a constant

random probability

The connectivity follows a Poisson law
P(k) ~ λke-λ /k!
λ mean number of connections per node
k number of connections for a given node
The probability of finding a highly connected node

decreases exponentially with connectivity.

Scale-free networks

A few nodes are highly connected, most nodes are

poorly connected.

Can be generated randomly with a model where

new nodes are preferentially connected to already established nodes

The connectivity follows a power law
P(k) = Ck-γ <=> log(P) = -y * log(k) + log(C)
γ the slope of the distribution in a log-log

graph.

k number of connections for a given node

Jeon

ng, H., B. Tomb
mbor
r, R.
R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000.

The large-scale or

rganization
n of
f me

metabol

lic networ
rks. Nature

Nature 4 407: 6 : 651-654.

SLIDE 52

A representation detail

Note: in Jeong (2000), the

schematic drawing is misleading.

Power law is shown on

logarithmic axes whereas Poisson is shown on linear axes

The Poisson has been chosen

with a mean (lambda) of ~20.

Power law Poisson

SLIDE 53

The shape of the Poisson strongly depends on lambda

Density function Density + cCDF Density + cCDF (log scales)

SLIDE 54

Connectivity in the metabolic network

Jeong et al. (2000) calculate

compound connectivity in metabolic networks reconstructed from the genome of various

rganisms.
They show that it follows a

power-law.

Jeon

ng, H., B. Tomb
mbor
r, R.
R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000.

The large-scale or

rganization
n of
f me

metabol

lic networ
rks. Nature

Nature 4 407: 6 : 651-654.

SLIDE 55

Compound degree

The distribution shown in Jeong et al (2000) was

simplified by “binning” the data in class intervals.

The actual distribution shows a more complex shape.
The “hub” compounds generally correspond to pool

metabolites.

compound reactions H2O 1615 NAD+ 578 NADH 569 NADP+ 564 NADPH 559 Oxygen 527 ATP 435 Orthophosphate 349 ADP 324 CO2 323 CoA 303 H+ 272 NH3 270 Pyrophosphate 252 UDP 190 S-Adenosyl-L-methionine 174 S-Adenosyl-L-homocysteine 165 Pyruvate 150 AMP 142 H2O2 138 L-Glutamate 132 2-Oxoglutarate 129 Acceptor 126 Acetyl-CoA 122 Reduced acceptor 122 Acetate 87 UDPglucose 79 D-Glucose 62 Succinate 59 CMP 54

Compound connectivity

0.1 1 10 100 1000 10000 1 10 100 1000 10000

number of reactions (avg=4.9, std=34.9) number of compounds

Compounds from KEGG/LIGAND, 2002 version van Helden, J., L. Wernisch, D. Gilbert, and S. S.J. Wod

dak. 2002. Graph-based analysis of
f me

metabol

lic networ
rks.

In In Ernst Sc Schering Re Res Fou

und Wor
rkshop
p (ed. M.H.-W.e. al.), pp. 245-274. Sp

Springer-Verlag.

SLIDE 56

Metabolic network: Power law fit on the degree distribution

Network: all reactions from

KEGG/LIGAND (

http://www.genome.jp/ligand/).

Degree: number of

reactions in which a compound is involved as substrate or product.

Important: the plot

represents all values, the data is not “binned”.

From Jeong (2000)

SLIDE 57

Metabolic network: Power law fit on the truncated distribution

The fit looks better when the

right tail of the distribution is truncated.

Note: the right tail

represents the “hubs”, which are claimed to confer the power law property to the distribution.

It is thus paradoxical that

the power-law fit improves when they are discarded from the network.

From Jeong (2000)

SLIDE 58

Metabolic network: Power law fit on the cCDF

The fit should be done on

the complementary cumulative distribution function (cCDF).

The fit with the complete

cCDFF remains apparently poor.

The truncated cCDF fits

better the beginning of the curve, but the hubs appear clearly as outliers.

From Jeong (2000)

SLIDE 59

“Universality” of the power law in biological networks

Compounds <-> Reactions Transcription Factors -> Genes Genes <- Transcription Factors

(Poisson fit)

Proteins - proteins (Gavin, 2006) Proteins - proteins (Krogan, 2006)

SLIDE 60

Comparing the likelihood of theoretical distributions

Stumpf & Ingram (2005) measured the likelihood of various distributions fit onto

protein interaction network of various organisms.

The most likely distribution is neither the Poisson nor the power-law but the

stretched exponential (and the Gamma for E.coli)

M. P. H. Stumpf and P. J. Ingram (2005). Probability models for degree distributions of protein interaction networks. Europhys. Lett.71:152-158.

Poisson Exponential Gamma Power-law Lognormal Stretched exponential

S.cerevisiae

SLIDE 61

Testing the goodness of fit

Khanin and Wit (2006) tested the goodness of the fit of a Power law with 12

biological networks.

None of those networks passed the test.
Even the truncated distributions do not fit a Power law.

Khanin, R. and Wit, E. (2006). How scale-free are biological networks. J Comput Biol 13, 810-8.

H0: degree distribution fits power law Reject hypothesis if p-value is small

SLIDE 62

Myth 2: the metabolic network is a small world

The powerful law of the power law and other myths in network biology

SLIDE 63

Is the metabolic network a small world ?

Small-world property
Distances between node pairs

are very short.

The distribution of distances

between pairs of compounds in the metabolic network peaks at 3 (Figure a).

This results from the shortcuts

through the highly connected nodes (the « hubs »).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

SLIDE 64

Who are the metabolic hubs ?

Metabolic hubs appear as side- reactants in most

reactions.

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245

O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 2.5.1.49 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 NADP+ NADPH CoA AcetlyCoA Sulfide ADP ATP 5-tetrahydropteroyltri-L-glutamate 5-methyltetrahydropteroyltri-L-glutamate Pi, PPi H20; ATP NADP+; Pi NADPH

SLIDE 65

Small world

Distance betw. compounds Network size

Diameter

# compound pairs

Fermenting grape to wine in 2 steps

Metabolic hubs cannot be used as

valid intermediate to link reactions.

Counter-example: from glucose to

ethanol

Accepting any compound as

intermediate between two reactions leads to irrelevant 2-steps shortcuts.

All the distances computed in the

seminal articles are thus meaningless.

SLIDE 66

Should we not simply filter out the “hubs” ?

Wagner and Fell described the small-world properties of a metabolic nework at

the same tiem as Jeong & Barabasi.

Fell and Wagner (2000). The small world of metabolism. Nat Biotechnol 18:121-122.
Wagner and Fell (2001). The small world inside large metabolic networks. Proc R Soc

Lond B Biol Sci 268: 1803-1810.

Network building
Context-dependent network:
317 reactions involving 275 metabolites “that represente central routes of energy

metabolism and small-molecule building block synthesis in E. coli under aerobic growth, with glucose as sole carbon source and O2 as electron acceptor”.

They filtered out common co-enzymes (ATP, ADP, NAD)
Compound-reaction matrix
1 if the compound is a substrate/product of the matrix
0 otherwise
Center of the network
glutamate (mean path length 2.46) followed by pyruvate (2.59).
Generative model: network growth by accretion (new members are preferentially

connected to mebers having a hight number of connections).

They interpret this generative model as an evolutionary scenario
“This potential link with evolutionary history is consistent with Morowitz’s20 claim that

intermediary metabolism recapitulates the evolution of biochemistry”.

SLIDE 67

Raw graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the raw graph
L-aspartic acid --> 6.3.5.4 --> AMP --> 6.1.1.10 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
L-aspartic acid --> 4.3.1.1 --> NH3 --> 4.4.1.11 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.5.1.31 --> L-methionine
All these paths convert L-aspartate to L-methionine in 2 reactions steps.
In all these cases, the intermediate compound belongs to the group of highly

connected nodes in the metabolic graph.

These compounds cannot be considered as valid intermediates between

these reactions.

SLIDE 68

Filtered graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the filtered graph

L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine
L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine -->

5.1.1.2 --> L-methionine

L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-

homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

These paths use valid intermediate compounds.
However, they are much shorter (2 or 3 intermediate reactions) than the

annotated methionine pathway.

The intermediate compounds and reactions are not part of the annotated

pathway.

SLIDE 69

Myth 3: biological networks are scale-free

SLIDE 70

Are metabolic networks scale-free ?

Scale-free properties
When only a subset of the network

is selected (e.g. the reactions catalyzed in a organisms with small number of enzymes), there is a conservation of

the power-law property
the small average distances

(Figure b).

Problems
The power law does not fit any of

the actual data sets (see myth 1).

The smal average distances are an

artefact (see myth 2).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

SLIDE 71

Myth 4: small worlds are tolerant to random deletions, but vulnerable to targeted attacks

The powerful law of the power law and other myths in network biology

SLIDE 72

Are metabolic networks robust to errors vulnerable to attacks ?

Robustness to errors

(random node deletions)

Random node deletions barely

affect the average distance between nodes (Figure e, green).

Sensitivity to attacks

(targeted node deletions)

When the most connected nodes

(“hubs”) are removed from the network, the average distance rapidly increases (Figure e, red).

How can those concepts be

transposed to metabolic networks ?

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

SLIDE 73

Are cells resistant to random attacks ?

100 years of genetics and biochemistry show the opposite
All the characterized enzymes were isolated because the mutation of a single enzyme

leads to auxotrophy.

> those mutations are lethal unless the enzyme product is supplied

Source: Byrne & Meacock Microbiology. 2001 Sep;147(Pt 9):2389-98. Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

SLIDE 74

Targeted attacks: can we conceive a water-free cell ?

Deletions act on enzymes, not compounds.
Removing a “hub” involves deleting several hundreds enzymes.
Double or triple mutations are generally lethal
> this is conceivable neither in nature nor in laboratory

Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245

SLIDE 75

Myth 5: biological networks grow by preferential attachment

The powerful law of the power law and other myths in network biology

SLIDE 76

A logical fallacy

A => B

does not mean B => A

Several generative

models can produce a power law degree distribution.

The underlying structure

is however very different.

The power law is not

informative about a network’s origin and evolution.

Keller. Revisiting "scale-free" networks. Bioessays (2005) vol. 27 (10) pp. 1060-8

SLIDE 77

Do metabolic “hubs” correspond to more ancient compounds ?

This hypothesis seems reasonable to

understand relationships between central and secondary metabolism.

However, a strict extrapolation to

compound degree would lead to

bvious absurdity
ATP before adenine
S-Adenosyl-L-methionine before

methionine

...

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245 16 S-Adenosyl-L-homocysteine 227 9 236 17 UDP 216 6 222 18 H2O2 142 21 163 19 2-Oxoglutarate 33 125 158 20 AMP 144 14 158 21 Pyruvate 101 50 151 22 Acetyl-CoA 35 101 136 23 L-Glutamate 83 46 129 24 Oxaloacetate 29 14 43

SLIDE 78

Part 3 From networks to pathways

SLIDE 79

Tricks and traps for metabolic path finding

SLIDE 80

Path finding traps - Ubiquitous compounds

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate 3.5.1.18 LL-diaminopimelic acid succinate Sucinyl diaminopimelate H2O

Invalid pathway

4.2.1.52 H2O L-Aspartic Semialdehyde 3.5.1.18 LL-diaminopimelic acid

Reactions

SLIDE 81

Path finding traps - Direct traversal of reversible reactions

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate

Invalid pathway

4.2.1.52 L-Aspartic Semialdehyde Pyruvate 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde

Valid pathways

4.2.1.52 L-Aspartic Semialdehyde dihydrodipicolinic acid

Reaction

SLIDE 82

Path finding traps - Mutual exclusion of reverse reactions

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate

Invalid pathway Reactions

4.2.1.52 reverse H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde 4.2.1.52 reverse Pyruvate

SLIDE 83

Path finding traps – “generic” compounds and unbalanced reactions

KEGG contains “generic” compounds, i.e. entities that repesent a whole class of compounds.
Examples: sugar, DNA, ...
Those compounds are sometimes involved in reactions which are not properly balanced.
E.g. R00375

dATP + DNA <=> Diphosphate + DNA

Such compounds can fool path finding algorithms and return irrelevant pathways.

SLIDE 84

(Two-ends) path finding

SLIDE 85

Raw graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the raw graph
L-aspartic acid --> 6.3.5.4 --> AMP --> 6.1.1.10 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
L-aspartic acid --> 4.3.1.1 --> NH3 --> 4.4.1.11 --> L-methionine
L-aspartic acid --> 3.5.1.15 --> H2O --> 3.5.1.31 --> L-methionine
All these paths convert L-aspartate to L-methionine in 2 reactions steps.
In all these cases, the intermediate compound belongs to the group of highly

connected nodes in the metabolic graph.

These compounds cannot be considered as valid intermediates between

these reactions.

SLIDE 86

Filtered graph: discarding pool metabolites

To avoid irrelevant shortcuts, a set of highly connected

compounds are discarded from the graph.

The selection is fine-tuned manually
some compounds are maintained (e.g. S–Adenosyl–L–

methionine, …).

thers, although less connected, are removed (e.g. pyruvate,

CMP).

1. H20 2. ATP 3. NAD 4. NADH 5. NADPH 6. NADP 7. O2 8. ADP 9. Pi 10. CoA 11. CO2 12. Ppi 13. NH3 14. UDP 15. AMP 16. pyruvate 17. acetyl-CoA 18. L-glutamate

19. 2-oxoglutarate
20. H2O2
21. Acceptor
22. UDP
23. Reduced acceptor
24. Acetate
25. GDP
26. oxalacetic acid
27. succinic acid
28. GTP
29. CMP
30. UTP
31. H+
32. UMP
33. CDP
34. reduced ferredoxin
35. H2
36. FADH2

Filtered out

compound reactions H2O 1615 NAD+ 578 NADH 569 NADP+ 564 NADPH 559 Oxygen 527 ATP 435 Orthophosphate 349 ADP 324 CO2 323 CoA 303 H+ 272 NH3 270 Pyrophosphate 252 UDP 190 S-Adenosyl-L-methionine 174 S-Adenosyl-L-homocysteine 165 Pyruvate 150 AMP 142 H2O2 138 L-Glutamate 132 2-Oxoglutarate 129 Acceptor 126 Acetyl-CoA 122 Reduced acceptor 122 Acetate 87 UDPglucose 79 D-Glucose 62 Succinate 59 CMP 54 … …

SLIDE 87

Filtered graph : choice of excluded compounds

Where to set the limit ?

Seems obvious for H2O (1615), NADH (569), ... What about ATP (435) ? And pyruvate ? And NH3 ?

Depends on the reaction/pathway considered

e.g. ATP is valid intermediate in nucleotide biosynthesis

Depends on the atoms being transferred during the reaction

e.g. NADH gives one proton

Depends on the focus of the question

e.g. analysis of energy metabolism

→ ATP, NAD will matter

SLIDE 88

Filtered graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the filtered graph

L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine
L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine -->

5.1.1.2 --> L-methionine

L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-

homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

These paths use valid intermediate compounds.
However, they are much shorter (2 or 3 intermediate reactions) than the

annotated methionine pathway.

The intermediate compounds and reactions are not part of the annotated

pathway.

SLIDE 89

Path finding in a weighted graph

Principle
Each compound node is assigned a weight proportional to its connectivity degree.
All compounds are allowed for path finding, but the cost is higher for highly

connected compounds.

This reduces the probability to use a pool metabolite as intermediate between two

successive reactions.

SLIDE 90

Weighted graph: methionine biosynthesis

Search of the 5 shortest paths from L-aspartate to L-methionine
Weighted graph (compound weight = connectivity

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-

diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 -- > L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-

diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 -- > L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.46 --> o- succinyl-L-homoserine --> 2.5.1.48 --> L-cystathionine --> 2.5.1.49 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

E.coli pathway Yeast pathway

SLIDE 91

Heme biosynthesis (Saccharomyces cerevisiae)

2.5.1.61 hydroxymethylbylane 2.3.1.37 5-aminolevulinate 1.3.3.3 protoporphyrinogen ix 4.1.1.37 coproporphyrinogen iii 4.99.1.1 1.3.3.4 protoproporphyrin 4.2.1.24 porphobilinogen 4.2.1.75 uroporphyrinogen iii

Annotated pathway

protoporphyrin

1.14.12.1

H+

6.4.1.- 1.1.1.170 1.1.1.270 1.14.13.72

2.3.1.37

CO2

5-aminolevulinate 2-amino-3-

xoadipate

haem

4.2.1.104 1.14.12.1

4.99.1.1 fe2+

B

Path finding in raw graph

co2 1.3.7.2 biliverdin 1.14.99.3 2.6.1.43 l-alanine 2.6.1.44 2-amino-3-

xoadipate

2.3.1.37 1.2.7.3

xidized ferredoxin

2.3.1.37 5-aminolevulinate 1.4.2.1 ferrocytochrome c succinyl-coa 4.99.1.1 haem protoporphyrin h+ fe2+ gly 2.3.1.37 1.3.7.5 1.9.99.1

Path finding in filtered graph

porphobilinogen 2.5.1.61 2.3.1.37 5-aminolevulinate 2-amino-3-oxoadipate 2.3.1.37

xidized ferredoxin

1.3.7.5 1.3.7.4 1.3.7.2 1.2.7.3 hydroxymethylbilane 4.2.1.24 2.3.1.37 co2 fe2+ 4.2.1.75 uroporphyrinogen iii 1.3.3.3 protoporphyrinogen ix biliverdin 1.3.3.4 succinyl-coa coproporphyrinogen iii h+ 4.1.1.37 protoporphyrin 4.99.1.1 1.14.99.3 haem

D

Path finding in we

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

SLIDE 92

Alignment between inferred and annotated pathways

Threonine biosynthesis

SLIDE 93

Evaluation of inferred paths (KEGG/LIGAND network, aMAZE pathways)

Comparison between inferred paths and annotated pathways based on

intermediate reactions (those not provided as source and target)

Shortest path Graph Average sensitivity Average PPV Average accuracy Raw 31.4% 25.4% 28.4% Filtered 68.0% 63.0% 65.5% Weighted 88.5% 83.4% 85.9% Most accurate among the 5 shortest paths Graph Average sensitivity Average PPV Average accuracy Raw 33.3% 26.5% 29.9% Filtered 71.4% 66.7% 69.1% Weighted 92.2% 88.1% 90.1%

Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity)

PPV = TP/(TP+FP)

Accuracy

Acc = (Sn+PPV)/2

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

False Negative: annotated not inferred False Positive: inferred not annotated True Negative: not inferred not annotated True Positive: Inferred and annotated

SLIDE 94

Evaluation of inferred paths (EcoCyc network, EcoCyc pathways)

Comparison between inferred paths and annotated pathways based on

intermediate reactions (those not provided as source and target)

Shortest path Graph Average sensitivity Average PPV Average accuracy Raw 29.6% 31.0% 29.3% Filtered 63.3% 68.8% 66.6% Weighted 80.7% 85.3% 83.0% Most accurate among the 5 shortest paths Graph Average sensitivity Average PPV Average accuracy Raw 35.0% 40.0% 37.5% Filtered 85.6% 89.2% 87.4% Weighted 92.2% 95.1% 93.7%

Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity)

PPV = TP/(TP+FP)

Accuracy

Acc = (Sn+PPV)/2

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

False Negative: annotated not inferred False Positive: inferred not annotated True Negative: not inferred not annotated True Positive: Inferred and annotated

SLIDE 95

Inferred paths versus KEGG/LIGAND pathway maps

Each inferred path is compared to the 85 pathway

maps, and the significant correspondences are retained (hypergeometric test).

X axis
number of intermediate reactions in the

inferred path

Y axis
number of reaction in common with a

KEGG pathway

Values
number of inferred paths
On the diagonal
inferred paths completely included in one

KEGG pathway.

Inferred length
Raw graph < Filtered graph < Weighted graph
Consistency with KEGG
Raw graph < Filtered graph < Weighted graph

SLIDE 96

Navigating in a network of reactant pairs (RPAIRs)

SLIDE 97

Reactant pairs (RPAIR)

RPAIR definition
“pairs of compounds

that have atoms or atom groups in common on two sides of a reaction” (Kotera et al, 2004)

Example (from Faust et

al., 2009).

1. Kotera, M., Hattori, M., Oh, M.-A., Yamamoto, R., Komeno, T., Yabuzaki, J., Tonomura, K., Goto, S. & Kanehisa, M. (2004). RPAIR: a reactant-pair database representing chemical changes in enzymatic reactions Genome Informatics 15. 2. Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817]. R00480 A00003 (main) A00932 (main) A06173 (trans)

SLIDE 98

Path finding in the RPAIR versus reaction network

Shortest paths fro L-Aspartate to L-Methionine

Reaction network All RPAIRs Main RPAIRs

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 99

Alternative paths found in organism-specific networks

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 100

Impact of path finding parameters Path finding in reference network (all organisms merged)

Path finding in a metabolic network built from all KEGG reactions or reactant

pairs.

104 combinations of parameters tested: network type, weighting policy,

compound filtering, directed/undirected network.

Estimated using a collection of 55 linear pathways from E.coli (32) S.cerevisiae

(11) and H.sapiens (12).

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 101

Impact of path finding parameters Path finding in an organism-specific network (Escherichia coli)

Path finding in a metaboic network built from all KEGG reactions catalyzed in

E.coli + spontaneous reactions.

Estimated using a collection of 32 linear pathways from E.coli.

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 102

Impact of path finding parameters Path finding in an organism-specific network (S.cerevisiae)

Path finding in a metaboic network built from all KEGG reactions catalyzed in

S.cerevisiae + spontaneous reactions.

Estimated using a collection of 11 linear pathways from S.cerevisiae.

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 103

Impact of path finding parameters Path finding in an organism-specific network (H.sapiens)

Path finding in a metaboic network built from all KEGG reactions catalyzed in

H.sapiens + spontaneous reactions.

Estimated using a collection of 12 linear pathways from H.sapiens.

...

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

SLIDE 104

Multi-seed pathway building

SLIDE 105

Reconstructing a pathway from a subset of reactions

Input:
a set of reactions (the seed reactions)
Output:
a metabolic pathway including
the seed reactions, together with their substrates and products
optionally, some additional reactions, intercalated to improve the pathway

connectivity

the pathway can either be connected, or contain several unconnected components

SLIDE 106

Seed nodes

Compound Reaction Seed Reaction

SLIDE 107

Linking seed nodes

Compound Reaction Direct link Seed Reaction

SLIDE 108

Enhance linking by intercalating reactions

Compound Reaction Direct link Intercalated reaction Seed Reaction

SLIDE 109

Subgraph extraction

SLIDE 110

Providing intermediate nodes help finding relevant paths

Reference pathway: pyrimidine

ribonucleotides de novo biosynthesis pathway (MetaCyc identifier: PWY0-162) in E. coli.

A: reference pathway.
B: Path found with the 2 terminal

nodes as seeds (blue).

C: path found with 4 seed nodes

(blue).

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8. [Pubmed 20228128].

SLIDE 111

Multiple seed nodes allow to find branched pathways

A: Reference pathway: superpathway of lysine, threonine and methionine

biosynthesis I (MetaCyc identifier: P4-PWY) in E. coli.

B: Path found with the 5 terminal nodes as seeds.
C: path found with the 5 terminal nodes + 2 intermediate nodes

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8.

SLIDE 112

Evaluation of sub-graph extraction

Identification of optimal path finding algorithms/parameters.
71 reference pathways (including branched and cyclic)

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8.

SLIDE 113

Outlook

SLIDE 114

Outlook

Application: inferring metabolic pathways from genomes, using as seed nodes
Sets of genes belonging to the same operon.
Sets of genes showing similar cis-regulatory elements (phylogenetic footprinting).
Sets of genes co-occuring/co-disappearing in Bacterial genomes (phylgenetic footprints).
Sets of co-expressed genes (microarray experiments).
... any other criterion of functional regroupment.
Weaknesses
Rely on defined set of reactions/compounds (we infer pathways, not reactions)
Path finding or subgraph extraction are only a very naive approximation of metabolic pathways.
Despite the improvements, path finding approaches still return some irrelevant pathways.
Strenghts
Tranctability: can deal wth metabolic networks made of thousands of reactions.
Possibility to discover novel pathways (rater than mapping on “reference” patwhays).
Possibility to introduce context-dependence (organism-specific network, reactions weighted

according to enzyme expression, ...).

Hybrid approaches
Subgraph extraction can suggest reasonable hypotheses about potential pathways, which can be
Used as input for more refined modeling approaches.
Tested experimentally.

SLIDE 115

Are metabolic systems in (near) steady state ?

Do we have reasons to think that cells, in their natural environment, are living in

(near) steady-state conditions for all metabolic concentrations ?

An intuitive example: methioninie biosynthesis in E.coli
Methionine biosynthesis consumes cysteine. This happens very rapidly

(nanoseconds).

This provokes a depletion of cystein concentration.
Depletion of cystein concentration triggers
Fast response (nanoseconds): activation of cystein-synthesizing enzymes (if

present)

Slow response (minutes): transcriptional activation of the same enzymes (if

absent)

Question for the audience
Do we have experimental evidence for the existence of steady-states ?
If yes, how can they be understood in the light of the example above ?
Has this kind of dual-timescales responses been modeled ?

SLIDE 116

Job announcement : Postdoc position @ BiGRe.ULB.ac.be

Location
Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
Université Libre de Bruxelles (Brussels, Belgium)
http://www.bigre.ulb.ac.be/
2-years position, starting ASAP
Skills
Bioinformatics
Good understanding of metabolism, genetics, regulation.
Familarity with Unix environment, scripting capabilities, java programming is a plus.
Context: MICROME project
EU FP7 project involving 14 partners
http://www.microme.eu/
Scope: Metabolic annotation of Bacterial genomes (enzyme identification, pathway reconstruction, metabolic modelling).
BiGRe focus in the project:
Prediction of operons (distance-based + synteny methods).
Prediction of regulons by comparative genomics (phylogenetic footprinting).
Co-occurrence of genes across genomes (phylogenetic profiles).
Applying path-fiding methods to infer pathways from the above-defined groups of related genes.
Evolution of Bacterial metabolism and its regulation.
Related publications
Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction.

Bioinformatics 26:1211-8. [Pubmed 20228128].

Faust, K., Croes, D. and van Helden, J. (2009). In response to "Can sugars be produced from fatty acids? A test case for pathway

analysis tools". Bioinformatics 2009 Sept 23. [Pubmed 19776213].

Janky, R. and van Helden, J. Evaluation of phylogenetic footprint discovery for the prediction of bacterial cis-regulatory elements

(2008). BMC Bioinformatics 2008, 9:37doi:10.1186/1471-2105-9-37. [Pubmed 18215291].

Croes, D., F. Couche, S.J. Wodak, J. van Helden (2006). Inferring Meaningful Pathways in Weighted Metabolic Networks. J. Mol.
Biol. 356:222-36. [Pubmed 16337962].

SLIDE 117

Acknowledgements

Network Analysis Tools

(NeAT, http://rsat.ulb.ac.be/neat/)

Sylvain Brohée
Karoline Faust
Gipsi Lima-Mendez

Programs from external developers

Sijn van Dongen (Sanger, UK) for MCL
Igor Jurisica (USA) for RNSC

Metabolic path finding

Karoline Faust
Pierre Dupont (UCL, Belgium) for kWalks
Shoshana Wodak
Didier Croes
Fabian Couche
The former aMAZE team

Interactome

Nicolas Simonis
Léon Juvénal Hagingambo

Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Collaborations (former aMAZE project)

Georges Cohen (Institut Pasteur - France)
Yves Deville (UCL)
Grégoire Dooms (UCL)
Pierre Schaus (UCL)
Stéphane Zampelli (UCL)
Lorenz Wernisch (Birbick college, UK)
David Gilbert (London City – UK)

Former aMAZE team

Shoshana Wodak
Hassan Anerhour
Erick Antezana
Jean Richelle
Xavier Santaloria
Jesintha Maniraja
Christian Lemer
Olivier Hubaut
Fabian Couche
Frederic Fays
Simon De Keyzer

SLIDE 118

Links and references

Network Analysis Tools (NeAT)
http://neat.rsat.eu/
Publications

1.

1. Faust, K. and van Helden, J. (2012) Predicting metabolic pathways by sub-network extraction. Methods Mol Biol, 804, 107–

130. 2.

2. Faust, K., Croes, D. and van Helden, J. (2011) Prediction of metabolic pathways from genome-scale metabolic networks.

BioSystems, 105, 109–121. 3.

3. Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010) Pathway discovery in metabolic networks by subgraph extraction.

Bioinformatics, 10.1093/bioinformatics/btq105. 4. Lima-Mendez, G. and van Helden, J. (2009). The powerful law of the power law and other myths in network biology. Mol. BioSyst., 2009, 5, 1482 - 1493, DOI: 10.1039/b908681a. 5. Faust, K., Croes, D. and van Helden, J. (2009). In response to "Can sugars be produced from fatty acids? A test case for pathway analysis tools". Bioinformatics 2009 Sept 23. 6. Faust K., Croes, D., van Helden J. (2009). Metabolic Pathfinding Using RPAIR Annotation. Journal of Molecular Biology 388, 390-414. 7. Brohée S., Faust K., Lima-Mendez G., Vanderstocken G., van Helden J. (2008). Network Analysis Tools: from biological networks to clusters and pathways. Nature protocols 3 (10), 1616-29. 8. Brohée, S., Faust, K., Lima-Mendez, G., Sand, O., Janky, R., Vanderstocken, G., Deville, Y. & van Helden, J. (2008). NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Research 36, W444-451. 9. Brohée, S. & van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7, 488. Pubmed 17087821

10. Croes, D., F. Couche, S.J. Wodak, J. van Helden (2006). Inferring Meaningful Pathways in Weighted Metabolic Networks. J.
Mol. Biol. 356:222-36. Pubmed 16337962
11. Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2005. Metabolic PathFinding: inferring relevant pathways in

biochemical networks. Nucleic Acids Res 33: W326-330.

12. van Helden, J., Wernisch, L., Gilbert, D. & Wodak, S. J. (2002). Graph-based analysis of metabolic networks. In Ernst

Schering Res Found Workshop (al., M. H.-W. e., ed.), pp. 245-74. Springer-Verlag.

13. van Helden, J., Gilbert, D., Wernisch, L., Schroeder, M. & Wodak, S. (2001). Applications of regulatory sequence analysis and

metabolic network analysis to the interpretation of gene expression data. Computational Biology : First International Conference on Biology, Informatics, and Mathematics, JOBIM 2000. LNCS volume 2066, Montpellier.