Finding relevant paths in the not-so-small world of metabolic - - PowerPoint PPT Presentation

finding relevant paths in the not so small world of
SMART_READER_LITE
LIVE PREVIEW

Finding relevant paths in the not-so-small world of metabolic - - PowerPoint PPT Presentation

ENSBBAU4 19 novembre 2014 Finding relevant paths in the not-so-small world of metabolic networks Jacques van Helden Jacques.van-Helden@univ-amu.fr Aix-Marseille Universit (AMU) Lab. Technological Advances for Genomics and Clinics (TAGC,


slide-1
SLIDE 1

Finding relevant paths in the not-so-small world of metabolic networks

ENSBBAU4 – 19 novembre 2014

Jacques van Helden

Jacques.van-Helden@univ-amu.fr Aix-Marseille Université (AMU)

  • Lab. Technological Advances for Genomics and Clinics (TAGC, INSERM Unit U1090)

http://jacques.van-helden.perso.luminy.univ-amu.fr/ FORMER ADDRESS (1999-2011) Université Libre de Bruxelles, Belgique Bioinformatique des Génomes et des Réseaux (BiGRe lab) http://www.bigre.ulb.ac.be/

slide-2
SLIDE 2

Bioinformatique des Génomes et des Réseaux (BiGRe) Université Libre de Bruxelles

http://www.bigre.ulb.ac.be/

  • Development and application of bioinformatics methods for the analysis
  • f genomes and biomolecular interaction networks.
  • Analysis of cis-regulatory sequences
  • RSAT Web site (http://rsat.ulb.ac.be/rsat/)
  • Olivier Sand (ex-Postdoc), Matthieu Defrance (ex-Postdoc), Maud Vidick

(Master thesis), Rekin’s Janky (ex-PhD student), Jean Valéry Turatsinze (ex- PhD student), Morgane Thomas-Chollier (ex-PhD student + ex-postdoc), Eric Vervisch (ex-Research fellow)

  • Biomolecular networks (regulatory, protein interactions, metabolic, host-virus)
  • NeAT Web site (http://rsat.ulb.ac.be/neat/)
  • Rekin’s Janky (PhD student), Sylvain Brohée (ex-PhD student, Karoline Faust

(PhD student), Nicolas Simonis (Postdoc), Leon Juvénal Hagingambo(PhD student),

  • Mobile genetic elements in prokaryotes
  • ACLAME Web site (http://aclame.ulb.ac.be/)
  • Raphaël Leplae (Postdoc), Gipsi Lima (PhD student), Ariane Toussaint

(Professor)

  • Modelling of dynamical systems
  • Didier Gonze (Premier assistant)

2

B!GRe

Bioinformatique des Génomes et Réseaux

Ariane Toussaint

Professor

Raphaël Leplae

Postdoc

Jacques van Helden

Chargé de cours

Karoline Faust

Ex-PhD student

Didier Gonze

Premier assistant

Gipsi Lima

Postdoc

Jean Valéry Turatsinze

Ex-PhD student

Rekin’s Janky

Ex-PhD student

Morgane Thomas-Chollier

Ex-PhD student+postdoc

Olivier Sand

Ex-Postdoc

Matthieu Defrance

X-Postdoc

Sylvain Brohée

Ex-PhD student

Eric Vervisch

Ex-Research fellow

Nicolas Simonis

Postdoc

Myriam Loubriat

Secretary

Leon Juvenal HajingaboE

PhD Student

Alejandra Medina-Rivera

PhD Student (co-direction)

Elodie Darbo

PhD Student (co-direction)

slide-3
SLIDE 3

Structure of the talk

  • Part 1 – From reactions/compounds to pathways
  • Pathways as graphs of reactions and compounds
  • Pathway diversity across organisms
  • Multi-level regulation, feed-back loops ensuring homeostasis
  • Super-pathways with intricated regulatory circuits
  • What is a pathway ?
  • Pathways (e.g. EcoCyc) or reaction maps (e.g. KEGG) ?
  • How to define boundaries ?
  • Part 2 – From reactions/compounds to metabolic networks
  • Building a network from a collection of reactions/compounds
  • Myths and dogmas about network topology
  • Part 3 – From networks to pathways
  • Tricks and traps for metabolic path finding
  • Finding relevant paths in metabolic networks
  • Extracting pathways from sets of seed genes/reactions/compounds
slide-4
SLIDE 4

Part 1 From reactions/compounds to pathways

slide-5
SLIDE 5

Methionine Biosynthesis in E.coli

5 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine r8 L-aspartic semialdehyde r3 r4 r5 r1 r6 r7 r2 NADP+ NADPH HSCoA SuccinylSCoA L-Cysteine ADP ATP Pyruvate; NH4+ H2O THF 5-MethylTHF NADP+; Pi NADPH Succinate Pi; PPi ATP; H2O Cysteine biosynthesis Lysine biosynthesis Threonine biosynthesis Aspartate biosynthesis Homoserine O-succinyltransferase Cystathionine-gamma-synthase aspartate kinase II/ homoserine dehydrogenase II Cystathionine-beta-lyase Cobalamin-independent- homocysteine transmethylase Cobalamin-dependent- homocysteine transmethylase Aspartate semialdehyde deshydrogenase 2.7.2.4 1.1.1.3 1.2.1.11 2.3.1.46 4.2.99.9 2.1.1.14 4.4.1.8 2.1.1.13 S-adenosylmethionine synthetase 2.5.1.6 metA metB metL metC metE metH asd metK expr expr expr expr expr expr expr expr inhib act metJ Methionine repressor metR metR repr repr repr repr repr repr expr expr up-reg up-reg

slide-6
SLIDE 6

Methionine Biosynthesis in S.cerevisiae

6 MET31 MET32 MET28 MET4 CBF1 Cbf1p/Met4p/Met28p complex Met31p met32p Met30p MET30 GCN4 Gcn4p HOM6 MET2 MET17 HOM3 MET6 SAM1 SAM2 HOM2 Homoserine deshydrogenase Homoserine O-acetyltransferase O-acetylhomoserine (thiol)-lyase Aspartate kinase Methionine synthase (vit B12-independent) S-adenosyl-methionine synthetase I S-adenosyl-methionine synthetase II Aspartate semialdehyde deshydrogenase O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 2.5.1.49 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 NADP+ NADPH CoA AcetlyCoA Sulfide ADP ATP 5-tetrahydropteroyltri-L-glutamate 5-methyltetrahydropteroyltri-L-glutamate Pi, PPi H20; ATP NADP+; Pi NADPH Sulfur assimilation Cysteine biosynthesis Threonine biosynthesis Aspartate biosynthesis

slide-7
SLIDE 7

Alternative methionine pathways

7 O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 4.2.99.10 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 Alpha-succinyl-L-Homoserine Cystathionine 2.3.1.46 4.2.99.9 4.4.1.8

S.cerevisiae E.coli

slide-8
SLIDE 8

Sulfur Assimilation in yeast

8 Sulfate (intracellular) Sulfate (extracellular) 3'-phosphoadenylylsulfate (PAPS) sulfite sulfide Methionine biosynthesis Adenylyl sulfate (APS) MET31 MET32 PPi ATP Sulfate adenylyl transferase 2.7.7.4 MET3 ADP ATP Adenylyl sulfate kinase MET14 2.7.1.25 NADP+; AMP; H+; 3'-phosphate (PAP) NADPH 3'-phosphoadenylylsulfate reductase MET16 1.8.99.4 MET28 MET4 CBF1 Cbf1p/Met4p/Met28p complex Met31p Met32p Met31p MET30 Sulfate transport Sulfate transporter SUL1 Sulfate transporter SUL2 1.8.1.2 3 NADPH; 5H+ 3 NADP+; 3 H2O Sulfite reductase (NADPH) MET10 Putative Sulfite reductase MET5 GCN4 Gcn4p

slide-9
SLIDE 9

EcoCyc - Superpathway of sulfate assimilation and cysteine biosynthesis

9

slide-10
SLIDE 10

MetaCyc – Sulfur incoroporation in amino-acids

10

Via methionine Via cysteine

slide-11
SLIDE 11

KEGG “reference” pathway - Methionine metabolism (1998)

11

slide-12
SLIDE 12

KEGG “reference” map - Cysteine and methionine metabolism (2009)

  • In principle, merging

methionine and cysteine should highlight the relationship between the two sulfur-containing amino acids.

  • Questions:
  • Where is L-Cysteine ?
  • Where is L-Methionine ?

12

http://www.genome.jp/kegg/pathway/map/map00270.html

slide-13
SLIDE 13

KEGG map - Cysteine and methionine metabolism (2009) – S.cerevisiae

  • KEGG cysteine and

methionine pathway.

  • Saccharomyces

cerevisiae.

  • Question
  • How is sulfur

incoroprated into aa in this yeast ?

13 13

http://www.genome.jp/kegg-bin/show_pathway?org_name=sce&mapno=00270

slide-14
SLIDE 14

KEGG map - Cysteine and methionine metabolism (2009) – E.coli

  • KEGG cysteine and

methionine pathway.

  • Escherichia coli K12.
  • Question
  • How is sulfur

incoroprated into aa in this yeast ?

14

http://www.genome.jp/kegg-bin/show_pathway?org_name=eco&mapno=00270

slide-15
SLIDE 15

KEGG map - Cysteine and methionine metabolism (2009) – M.genitalium

  • Mycoplasma genitalium
  • Very small genome

(500 genes).

  • Intra-cellular parasite.
  • Parasitism allowed to

loose many pathways.

  • Relies on host for the

corresponding compounds.

15

http://www.genome.jp/kegg-bin/show_pathway?org_name=mge&mapno=00270

slide-16
SLIDE 16

Methionine Biosynthesis in E.coli

16 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine r8 L-aspartic semialdehyde r3 r4 r5 r1 r6 r7 r2 NADP+ NADPH HSCoA SuccinylSCoA L-Cysteine ADP ATP Pyruvate; NH4+ H2O THF 5-MethylTHF NADP+; Pi NADPH Succinate Pi; PPi ATP; H2O Cysteine biosynthesis Lysine biosynthesis Threonine biosynthesis Aspartate biosynthesis Homoserine O-succinyltransferase Cystathionine-gamma-synthase aspartate kinase II/ homoserine dehydrogenase II Cystathionine-beta-lyase Cobalamin-independent- homocysteine transmethylase Cobalamin-dependent- homocysteine transmethylase Aspartate semialdehyde deshydrogenase 2.7.2.4 1.1.1.3 1.2.1.11 2.3.1.46 4.2.99.9 2.1.1.14 4.4.1.8 2.1.1.13 S-adenosylmethionine synthetase 2.5.1.6 metA metB metL metC metE metH asd metK expr expr expr expr expr expr expr expr inhib act metJ Methionine repressor metR metR repr repr repr repr repr repr expr expr up-reg up-reg

slide-17
SLIDE 17

Lysine biosynthesis in Escherichia coli

L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate N-succinyl-epsilon-keto- L-alpha-aminopimelic acid succinyl diaminopimelate LL-diaminopimelic acid meso-diaminopimelic acid dapF diaminopimelate epimerase 5.1.1.7 L-aspartic semialdehyde tetrahydrodipicolinae N-succinyltransferase dapD 2.3.1.117 metL aspartate kinase III 2.7.2.4 dapC succinyl diaminopimelate aminotransferase 2.6.1.17 dapE N-succinyldiaminopimelate desuccinylase 3.5.1.18 asd aspartate semialdehyde deshydrogenase 1.2.1.11 2 H2O pyruvate NADP+ or NAD+ NADPH or NADH; H+ succinyl CoA ADP ATP alpha-ketoglutarate glutamate succinate H2O NADP+; Pi NADPH; H+ dapB dihydrodipicolinate reductase 1.3.1.26 CoA lysR lysR protein Methionine biosynthesis Aspartate biosynthesis L-lysine CO2 4.2.1.52 dapA dihydrodipicolinate synthase 3.5.1.18 lysA diaminopimelate decarboxylase Threnonine biosynthesis

slide-18
SLIDE 18

Lysine biosynthesis in Saccharomyces cerevisiae

18 1,2,4-Tricarboxylate 2-Oxoglutarate Homoisocitrate Oxaloglutarate 2-Oxoadipate L-2-Aminoadipate L-2-Aminoadipate 6-semialdehyde N6-(L-1,3-Dicarboxypropyl)-L-lysine LYS9 saccharopine dehydrogenase (glutamate forming) Homoisocitrate dehydrogenase LYS20 homocitrate synthase aminoadipate aminotransferase amlnoadipate semialdehyde dehydrogenase LYS7 homocitrate dehydratase 4.1.3.21 CoA Acetyl-CoA 2.6.1.39 2-Oxoglutarate L-Glutamate 1.2.1.31 NAD+( or NADP+); H2O H+ ; NADH (or NADPH) But-1-ene-1,2,4-tricarboxylate H2O H+; NADH NAD+ 1.1.1.87 1.1.1.87 CO2 L-lysine 4.2.1.36 LYS4 homoaconitate hydratase LYS1 saccharopine dehydrogenase (lysine forming) 2-Oxoglutarate ; NADPH (OR NADH) ; H+ 1.5.1.7 NADP+ (OR NAD+) ; H2O 1.5.1.10 NADP+ (OR NAD+); H2O L-Glutamate ; NADPH (or NADH); H+ LYS2 LYS5

slide-19
SLIDE 19

KEGG - Lysine biosynthesis – Escherichia coli K12

19

http://www.genome.jp/kegg-bin/show_pathway?org_name=eco&mapno=00300

slide-20
SLIDE 20

http://www.genome.jp/kegg-bin/show_pathway?org_name=sce&mapno=00300

KEGG - Lysine biosynthesis – Saccharomyces cerevisiae

20

slide-21
SLIDE 21

From pathways to super-pathways

slide-22
SLIDE 22

Lysine biosynthesis in Escherichia coli

L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate N-succinyl-epsilon-keto- L-alpha-aminopimelic acid succinyl diaminopimelate LL-diaminopimelic acid meso-diaminopimelic acid dapF diaminopimelate epimerase 5.1.1.7 L-aspartic semialdehyde tetrahydrodipicolinae N-succinyltransferase dapD 2.3.1.117 metL aspartate kinase III 2.7.2.4 dapC succinyl diaminopimelate aminotransferase 2.6.1.17 dapE N-succinyldiaminopimelate desuccinylase 3.5.1.18 asd aspartate semialdehyde deshydrogenase 1.2.1.11 2 H2O pyruvate NADP+ or NAD+ NADPH or NADH; H+ succinyl CoA ADP ATP alpha-ketoglutarate glutamate succinate H2O NADP+; Pi NADPH; H+ dapB dihydrodipicolinate reductase 1.3.1.26 CoA lysR lysR protein Methionine biosynthesis Aspartate biosynthesis L-lysine CO2 4.2.1.52 dapA dihydrodipicolinate synthase 3.5.1.18 lysA diaminopimelate decarboxylase Threnonine biosynthesis

slide-23
SLIDE 23

asd thrABC mRNA Cystathionine-gamma-synthase Aspartate kinase I homoserine dehydrogenase I Cystathionine-beta-lyase Aspartate semialdehyde deshydrogenase L-Aspartyl-4-P L-Aspartate L-Homoserine L-Homoserine phosphate L-Threonine L-Aspartic semialdehyde 1.1.1.3 2.7.1.39 2.7.2.4 4.4.1.8 1.2.1.11

NADP+ NADPH ATP ADP ATP Pi H2O NADP+; Pi NADPH ADP

catalysis catalysis catalysis catalysis catalysis translation expression translation translation inhibition inhibition inhibition

thrABC operon

transcription Attenuation

Threonine biosynthesis in Escherichia coli

23

slide-24
SLIDE 24

Lysine, Methionine and Threonine biosynthesis in E.coli

24 N-succinyl-epsilon-keto- L-alpha-aminopimelic acid meso-diaminopimelic acid L-aspartyl-4-P L-Aspartate dihydropicolinic acid tetrahydrodipicolinate succinyl diaminopimelate LL-diaminopimelic acid 5.1.1.7 L-aspartic semialdehyde 2.3.1.117 2.7.2.4 2.6.1.17 3.5.1.18 1.2.1.11 1.3.1.26 L-lysine 4.2.1.52 3.5.1.18 L-aspartyl-4-P L-aspartate L-Homoserine L-Homoserine phosphate L-Threonine L-aspartic semialdehyde 1.1.1.3 2.7.1.39 2.7.2.4 4.4.1.8 1.2.1.11 L-aspartyl-4-P L-Aspartate L-Homoserine Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine S-Adenosyl-L-Methionine 2.5.1.6 L-aspartic semialdehyde 1.1.1.3 2.3.1.46 4.2.99.9 2.7.2.4 4.4.1.8 1.2.1.11 2.1.1.13 2.1.1.14

slide-25
SLIDE 25

L-Methionine L-Threonine L-Isoleucine L-Lysine L-aspartic semialdehyde L-Homoserine L-Cysteine aspartate Methionine biosynthesis Homoserine biosynthesis Threonine biosynthesis common fork for aspartate derivatives Lysine biosynthesis Isoleucine biosynthesis

inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition inhibition

Super-pathway : Aspartate-derivative amino acids

25

slide-26
SLIDE 26

What is a pathway ?

  • Should we consider that pathways are arbitrary definition of the boundaries ?
  • Should we even go further and consider that the full organism-specific network is

the only relevant level of analysis ?

  • If so, can we hope to get any insight from such a complex system ?
slide-27
SLIDE 27

Is there a metabolic modularity ?

  • The reductionist approach: 1 gene – 1 enzyme – 1 “function”
  • Remark: definition of function
  • “Fonction: action, rôle caractéristique d’un élément, d’un organe, dans un ensemble

(souvent opposé à structure)” Robert, 1982.

  • It is worthless to dissociate (as in GO) the “molecular” and “cellular” function.
  • Function is, by definition, the relationship between enzymatic activity and a process in

which it takes place.

  • > context-dependence
  • Multifunctionality - an element may be multi-functional by different means
  • same activity can play different roles in different contexts (tissues, processes)
  • different activities in the same context (e.g. multi-domain enzymes)
  • Auxotrophy.
  • Regulation: changes in conditions induce/activate defined sets of enzymes.
slide-28
SLIDE 28

Part 2 – From reactions/compounds to metabolic networks

slide-29
SLIDE 29

Building metabolic networks

slide-30
SLIDE 30

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine 4.2.99.9 4.4.1.8 2.1.1.14 L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine 2.3.1.31 4.2.99.10 CoA AcetlyCoA Sulfide 2.3.1.46 HSCoA SuccinylSCoA

E.coli S.cerevisiae Metabolic network

30

slide-31
SLIDE 31

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine CoA AcetlyCoA Sulfide HSCoA SuccinylSCoA

2.3.1.46 2.3.1.46 2.3.1.46 2.3.1.46 4.2.99.9 4.2.99.9 4.2.99.9 4.2.99.9

One node per compound

31

  • vertices = compounds
  • arcs = reactions
  • problem: no

representation of cross- point reactions

slide-32
SLIDE 32

4.2.99.9 4.4.1.8 2.1.1.14 2.3.1.31 4.2.99.10 2.3.1.46 Alpha-succinyl-L-Homoserine Cystathionine Homocysteine O-acetyl-homoserine Homocysteine

One node per reaction

32

  • vertices = reactions
  • arcs = intermediate

compounds

  • problem: no representation
  • f cross-point compounds
slide-33
SLIDE 33

Alpha-succinyl-L-Homoserine Cystathionine Homocysteine L-Methionine 4.2.99.9 4.4.1.8 2.1.1.14 L-Cysteine NH4+ H2O THF 5-MethylTHF Succinate Pyruvate O-acetyl-homoserine L-Homoserine 2.3.1.31 4.2.99.10 CoA AcetlyCoA Sulfide 2.3.1.46 HSCoA SuccinylSCoA

One node per compound and per reaction

33

  • 2 types of vertices
  • compounds and reactions
  • arcs
  • from substrate to reaction
  • from reaction to product
  • arc labels can be used to

represent stoichiometry

slide-34
SLIDE 34

Reactions and compounds: directed bipartite graph

  • A bipartite graph is a graph whose vertex-set V can be partitioned into two

subsets U and W, such that each edge of G has one endpoint in U and one endpoint in W.

  • Metabolic networks can be represented as a bipartite graph
  • Node types: compounds (U) and reactions (W), respectively
  • Arcs never go from compound to compound
  • Arcs never go from reaction to reaction

34

5,871 compounds 5,223 reactions 21,194 arcs

slide-35
SLIDE 35

Boerhinger-Mannheim Metabolic Wall Chart

35

http://www.expasy.ch/cgi-bin/show_thumbnails.pl

slide-36
SLIDE 36

EcoCyc metabolic chart

36

http://biocyc.org/ECOLI/new-image?type=OVERVIEW

slide-37
SLIDE 37

KEGG organism-specific network – Mycoplasma genitalium

  • Compounds and reactions are shown as nodes.
  • Edges represent substrate/product relationships between intermediate compounds and reactions.
  • Side compounds are ignored
  • Network
  • 238 compounds
  • 180 reactions
  • Bipartite graph

(forward + reverse reactions)

  • 238+2*180 = 598 nodes
  • 820 edges
  • substrate -> reaction
  • reaction -> product
slide-38
SLIDE 38

KEGG organism-specific network - Escherichia coli K12

  • Compounds and reactions are shown as nodes.
  • Edges represent substrate/product relationships between intermediate compounds and reactions.
  • Side compounds are ignored
  • Network
  • 1115 compounds
  • 1146 reactions
  • Bipartite graph
  • 1115 +2*1146 = 3407 nodes
  • 5188 edges
  • substrate -> reaction
  • reaction -> product
slide-39
SLIDE 39

KEGG organism-specific network – Saccharoyces cerevisiae

  • Compounds and reactions are shown as nodes.
  • Edges represent substrate/product relationships between intermediate compounds and reactions.
  • Side compounds are ignored
  • Network
  • 923 compounds
  • 1796 reactions
  • Bipartite graph
  • 923+2*1796 = 4515 nodes
  • 4110 edges
  • substrate -> reaction
  • reaction -> product
slide-40
SLIDE 40

KEGG reference network

  • Compounds and reactions are shown as nodes.
  • Edges represent substrate/product relationships between intermediate compounds and reactions.
  • Side compounds are ignored
  • Network
  • 3,801 compounds
  • 5,020 reactions
  • Bipartite graph
  • 13,841 nodes
  • 21,486 edges
  • substrate -> reaction
  • reaction -> product
slide-41
SLIDE 41

The powerful law of the power law and other myths in network biology

Gipsi Lima-Mendez and Jacques van Helden (2009). Molecular BioSystems, 2009, 5, 1482 – 1493.

Topology of biochemical networks

slide-42
SLIDE 42

Topological properties of metabolic networks

  • Power-law
  • Small world
  • Scale-freeness
  • Error tolerance (robustness to random

deletions)

  • Vulnerability to attacks (targeted on hubs)
  • Evolutionary scenarios

Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. and Barabasi, A. L. (2000). The large-scale organization of metabolic networks. Nature 407, 651-4. Degree distribution Metabolites Theoretical models for generating networks Small world

Distance betw. compounds Network size

Diameter

# compound pairs

Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

slide-43
SLIDE 43

Properties of graphs with power-law degree distribution

  • Small-world property
  • Distances between node pairs are very short.
  • The distribution of distances between pairs of

compounds in the metabolic network peaks at 3 (Figure a).

  • This results from the shortcuts through the

highly connected nodes (the « hubs »).

  • Scale-free properties
  • When only a subset of the network is selected

(e.g. the reactions catalyzed in a organisms with small number of enzymes), there is a conservation of

  • the power-law property
  • the average distances (Figure b).
  • Robustness to errors

(random node deletions)

  • Random node deletions barely affect the

average distance between nodes (Figure e, green).

  • Sensitivity to attacks

(targeted node deletions)

  • When the most connected nodes (“hubs”) are

removed from the network, the average distance rapidly increases (Figure e, red).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

slide-44
SLIDE 44

Lethality and centrality in protein networks

  • The power law is also

apparent in protein interaction networks.

  • Degree correlates with

essentiality (deletion phenotypes).

Jeon

  • ng, H., Mason
  • n, S.
  • S. P., Barabasi, A. L. and Oltvai, Z. N. (2001). Lethality and centrality in prot
  • tein networ
  • rks. Nature 411, 41-2.
slide-45
SLIDE 45

Hierarchical organization of modularity in metabolic networks

Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. and Barabasi, A. L. (2002). Hierarchical organization of modularity in metabolic networks. Science 297, 1551-5.

Power law Manifestly modular Hierarchical

slide-46
SLIDE 46

Universal laws in network biology ...

slide-47
SLIDE 47

... and beyond

Socio-ecological networks

Ostrom,

  • m, E. (2009). A general frame

mewor

  • rk for
  • r analyzing sustainability of
  • f soc
  • cial-

ecol

  • log
  • gical systems
  • ms. Sc

Science 325, 419-22.

The web of life

Bascomp

  • mpte, J. (2009). Disentangling the web of
  • f life. Sc

Science 325, 416-9.

slide-48
SLIDE 48

Myths and dogmas in scale-free networks

  • Myth
  • a traditional story, esp. one concerning the early history of a people or explaining some natural or

social phenomenon, and typically involving supernatural beings or events

  • a widely held but false belief or idea
  • Dogma
  • a principle or set of principles laid down by an authority as incontrovertibly true
  • Myth 1: the degree distribution of biological networks follows a power law

I will also show how this myth is becoming a dogma

  • Myth 2: the metabolic network is a small world
  • Myth 3: Biological networks are scale-free
  • Myth 4: small worlds are tolerant to random deletions, but vulnerable to targeted attacks
  • Myth 5: biological networks grow by preferential attachment
  • We challenged those 5 myths for two network types: metabolism and protein interactions.
  • I will only discuss here about metabolic networks.

Lima-Mendez, G. and van Helden, J. (2009). The powerful law of the power law and other myths in network biology.

  • Mol. BioSyst., 2009, 5, 1482 - 1493, DOI: 10.1039/b908681a. [Pubmed 20023717]..
slide-49
SLIDE 49

Myth 1: the degree distribution of biological networks follows a power law

slide-50
SLIDE 50

Degree - definition

  • In a non-directed graph
  • The degree (k) of a node is the number of edges for which it is an endpoint.
  • In a directed graph
  • The in-degree (kin) of a node is the number of arcs for which it is the tail.
  • The out-degree (kout) of a node is the number of arcs for which it is the head.
  • The total degree (k) of a node is the sum of in-degree and out-degree
  • k=kin+kout
slide-51
SLIDE 51

Graph types

Homogeneous networks

  • Erdös-Rényi model (ER model)
  • Pairs of nodes are connected with a constant

random probability

  • The connectivity follows a Poisson law
  • P(k) ~ λke-λ /k!
  • λ mean number of connections per node
  • k number of connections for a given node
  • The probability of finding a highly connected node

decreases exponentially with connectivity.

Scale-free networks

  • A few nodes are highly connected, most nodes are

poorly connected.

  • Can be generated randomly with a model where

new nodes are preferentially connected to already established nodes

  • The connectivity follows a power law
  • P(k) = Ck-γ <=> log(P) = -y * log(k) + log(C)
  • γ the slope of the distribution in a log-log

graph.

  • k number of connections for a given node

Jeon

  • ng, H., B. Tomb
  • mbor
  • r, R.
  • R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000.

The large-scale or

  • rganization
  • n of
  • f me

metabol

  • lic networ
  • rks. Nature

Nature 4 407: 6 : 651-654.

slide-52
SLIDE 52

A representation detail

Note: in Jeong (2000), the

schematic drawing is misleading.

Power law is shown on

logarithmic axes whereas Poisson is shown on linear axes

The Poisson has been chosen

with a mean (lambda) of ~20.

Power law Poisson

slide-53
SLIDE 53

The shape of the Poisson strongly depends on lambda

Density function Density + cCDF Density + cCDF (log scales)

slide-54
SLIDE 54

Connectivity in the metabolic network

  • Jeong et al. (2000) calculate

compound connectivity in metabolic networks reconstructed from the genome of various

  • rganisms.
  • They show that it follows a

power-law.

Jeon

  • ng, H., B. Tomb
  • mbor
  • r, R.
  • R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000.

The large-scale or

  • rganization
  • n of
  • f me

metabol

  • lic networ
  • rks. Nature

Nature 4 407: 6 : 651-654.

slide-55
SLIDE 55

Compound degree

  • The distribution shown in Jeong et al (2000) was

simplified by “binning” the data in class intervals.

  • The actual distribution shows a more complex shape.
  • The “hub” compounds generally correspond to pool

metabolites.

compound reactions H2O 1615 NAD+ 578 NADH 569 NADP+ 564 NADPH 559 Oxygen 527 ATP 435 Orthophosphate 349 ADP 324 CO2 323 CoA 303 H+ 272 NH3 270 Pyrophosphate 252 UDP 190 S-Adenosyl-L-methionine 174 S-Adenosyl-L-homocysteine 165 Pyruvate 150 AMP 142 H2O2 138 L-Glutamate 132 2-Oxoglutarate 129 Acceptor 126 Acetyl-CoA 122 Reduced acceptor 122 Acetate 87 UDPglucose 79 D-Glucose 62 Succinate 59 CMP 54

Compound connectivity

0.1 1 10 100 1000 10000 1 10 100 1000 10000

number of reactions (avg=4.9, std=34.9) number of compounds

Compounds from KEGG/LIGAND, 2002 version van Helden, J., L. Wernisch, D. Gilbert, and S. S.J. Wod

  • dak. 2002. Graph-based analysis of
  • f me

metabol

  • lic networ
  • rks.

In In Ernst Sc Schering Re Res Fou

  • und Wor
  • rkshop
  • p (ed. M.H.-W.e. al.), pp. 245-274. Sp

Springer-Verlag.

slide-56
SLIDE 56

Metabolic network: Power law fit on the degree distribution

  • Network: all reactions from

KEGG/LIGAND (

http://www.genome.jp/ligand/).

  • Degree: number of

reactions in which a compound is involved as substrate or product.

  • Important: the plot

represents all values, the data is not “binned”.

From Jeong (2000)

slide-57
SLIDE 57

Metabolic network: Power law fit on the truncated distribution

  • The fit looks better when the

right tail of the distribution is truncated.

  • Note: the right tail

represents the “hubs”, which are claimed to confer the power law property to the distribution.

  • It is thus paradoxical that

the power-law fit improves when they are discarded from the network.

From Jeong (2000)

slide-58
SLIDE 58

Metabolic network: Power law fit on the cCDF

  • The fit should be done on

the complementary cumulative distribution function (cCDF).

  • The fit with the complete

cCDFF remains apparently poor.

  • The truncated cCDF fits

better the beginning of the curve, but the hubs appear clearly as outliers.

From Jeong (2000)

slide-59
SLIDE 59

“Universality” of the power law in biological networks

Compounds <-> Reactions Transcription Factors -> Genes Genes <- Transcription Factors

(Poisson fit)

Proteins - proteins (Gavin, 2006) Proteins - proteins (Krogan, 2006)

slide-60
SLIDE 60

Comparing the likelihood of theoretical distributions

  • Stumpf & Ingram (2005) measured the likelihood of various distributions fit onto

protein interaction network of various organisms.

  • The most likely distribution is neither the Poisson nor the power-law but the

stretched exponential (and the Gamma for E.coli)

  • M. P. H. Stumpf and P. J. Ingram (2005). Probability models for degree distributions of protein interaction networks. Europhys. Lett.71:152-158.

Poisson Exponential Gamma Power-law Lognormal Stretched exponential

S.cerevisiae

slide-61
SLIDE 61

Testing the goodness of fit

  • Khanin and Wit (2006) tested the goodness of the fit of a Power law with 12

biological networks.

  • None of those networks passed the test.
  • Even the truncated distributions do not fit a Power law.

Khanin, R. and Wit, E. (2006). How scale-free are biological networks. J Comput Biol 13, 810-8.

H0: degree distribution fits power law Reject hypothesis if p-value is small

slide-62
SLIDE 62

Myth 2: the metabolic network is a small world

The powerful law of the power law and other myths in network biology

slide-63
SLIDE 63

Is the metabolic network a small world ?

  • Small-world property
  • Distances between node pairs

are very short.

  • The distribution of distances

between pairs of compounds in the metabolic network peaks at 3 (Figure a).

  • This results from the shortcuts

through the highly connected nodes (the « hubs »).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

slide-64
SLIDE 64

Who are the metabolic hubs ?

  • Metabolic hubs appear as side- reactants in most

reactions.

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245

O-acetyl-homoserine L-aspartyl-4-P L-Aspartate L-Homoserine Homocysteine L-Methionine S-Adenosyl-L-Methionine L-aspartic semialdehyde 1.1.1.3 2.3.1.31 2.5.1.49 2.7.2.4 2.1.1.14 2.5.1.6 1.2.1.11 NADP+ NADPH CoA AcetlyCoA Sulfide ADP ATP 5-tetrahydropteroyltri-L-glutamate 5-methyltetrahydropteroyltri-L-glutamate Pi, PPi H20; ATP NADP+; Pi NADPH

slide-65
SLIDE 65

Small world

Distance betw. compounds Network size

Diameter

# compound pairs

Fermenting grape to wine in 2 steps

  • Metabolic hubs cannot be used as

valid intermediate to link reactions.

  • Counter-example: from glucose to

ethanol

  • Accepting any compound as

intermediate between two reactions leads to irrelevant 2-steps shortcuts.

  • All the distances computed in the

seminal articles are thus meaningless.

slide-66
SLIDE 66

Should we not simply filter out the “hubs” ?

  • Wagner and Fell described the small-world properties of a metabolic nework at

the same tiem as Jeong & Barabasi.

  • Fell and Wagner (2000). The small world of metabolism. Nat Biotechnol 18:121-122.
  • Wagner and Fell (2001). The small world inside large metabolic networks. Proc R Soc

Lond B Biol Sci 268: 1803-1810.

  • Network building
  • Context-dependent network:
  • 317 reactions involving 275 metabolites “that represente central routes of energy

metabolism and small-molecule building block synthesis in E. coli under aerobic growth, with glucose as sole carbon source and O2 as electron acceptor”.

  • They filtered out common co-enzymes (ATP, ADP, NAD)
  • Compound-reaction matrix
  • 1 if the compound is a substrate/product of the matrix
  • 0 otherwise
  • Center of the network
  • glutamate (mean path length 2.46) followed by pyruvate (2.59).
  • Generative model: network growth by accretion (new members are preferentially

connected to mebers having a hight number of connections).

  • They interpret this generative model as an evolutionary scenario
  • “This potential link with evolutionary history is consistent with Morowitz’s20 claim that

intermediary metabolism recapitulates the evolution of biochemistry”.

slide-67
SLIDE 67

Raw graph: from L-aspartate to L-methionine

  • The 5 shortest paths from L-aspartate to L-methionine in the raw graph
  • L-aspartic acid --> 6.3.5.4 --> AMP --> 6.1.1.10 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
  • L-aspartic acid --> 4.3.1.1 --> NH3 --> 4.4.1.11 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.5.1.31 --> L-methionine
  • All these paths convert L-aspartate to L-methionine in 2 reactions steps.
  • In all these cases, the intermediate compound belongs to the group of highly

connected nodes in the metabolic graph.

  • These compounds cannot be considered as valid intermediates between

these reactions.

slide-68
SLIDE 68

Filtered graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the filtered graph

  • L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine
  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine -->

5.1.1.2 --> L-methionine

  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-

homoserine --> 2.5.1.49 --> L-methionine

  • L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

  • These paths use valid intermediate compounds.
  • However, they are much shorter (2 or 3 intermediate reactions) than the

annotated methionine pathway.

  • The intermediate compounds and reactions are not part of the annotated

pathway.

slide-69
SLIDE 69

Myth 3: biological networks are scale-free

slide-70
SLIDE 70

Are metabolic networks scale-free ?

  • Scale-free properties
  • When only a subset of the network

is selected (e.g. the reactions catalyzed in a organisms with small number of enzymes), there is a conservation of

  • the power-law property
  • the small average distances

(Figure b).

  • Problems
  • The power law does not fit any of

the actual data sets (see myth 1).

  • The smal average distances are an

artefact (see myth 2).

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

slide-71
SLIDE 71

Myth 4: small worlds are tolerant to random deletions, but vulnerable to targeted attacks

The powerful law of the power law and other myths in network biology

slide-72
SLIDE 72

Are metabolic networks robust to errors vulnerable to attacks ?

  • Robustness to errors

(random node deletions)

  • Random node deletions barely

affect the average distance between nodes (Figure e, green).

  • Sensitivity to attacks

(targeted node deletions)

  • When the most connected nodes

(“hubs”) are removed from the network, the average distance rapidly increases (Figure e, red).

  • How can those concepts be

transposed to metabolic networks ?

Jeong, H., B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi. 2000. The large-scale organization of metabolic networks. Nature 407: 651-654.

slide-73
SLIDE 73

Are cells resistant to random attacks ?

  • 100 years of genetics and biochemistry show the opposite
  • All the characterized enzymes were isolated because the mutation of a single enzyme

leads to auxotrophy.

  • > those mutations are lethal unless the enzyme product is supplied

Source: Byrne & Meacock Microbiology. 2001 Sep;147(Pt 9):2389-98. Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

slide-74
SLIDE 74

Targeted attacks: can we conceive a water-free cell ?

  • Deletions act on enzymes, not compounds.
  • Removing a “hub” involves deleting several hundreds enzymes.
  • Double or triple mutations are generally lethal
  • > this is conceivable neither in nature nor in laboratory

Error tolerance + vulnerability to attacks Diameter

# deleted nodes

Random Hub

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245

slide-75
SLIDE 75

Myth 5: biological networks grow by preferential attachment

The powerful law of the power law and other myths in network biology

slide-76
SLIDE 76

A logical fallacy

  • A => B

does not mean B => A

  • Several generative

models can produce a power law degree distribution.

  • The underlying structure

is however very different.

  • The power law is not

informative about a network’s origin and evolution.

  • Keller. Revisiting "scale-free" networks. Bioessays (2005) vol. 27 (10) pp. 1060-8
slide-77
SLIDE 77

Do metabolic “hubs” correspond to more ancient compounds ?

  • This hypothesis seems reasonable to

understand relationships between central and secondary metabolism.

  • However, a strict extrapolation to

compound degree would lead to

  • bvious absurdity
  • ATP before adenine
  • S-Adenosyl-L-methionine before

methionine

  • ...

Rank Name In degree Out degree Total Degree 1 H2O 769 1444 2213 2 H+ 809 460 1269 3 Oxygen 43 817 860 4 NADP+ 318 406 724 5 NADPH 405 316 721 6 NAD+ 160 503 663 7 NADH 497 158 655 8 ATP 17 449 466 9 CO2 378 49 427 10 Orthophosphate 315 78 393 11 CoA 242 127 369 12 ADP 313 20 333 13 NH3 253 43 296 14 Pyrophosphate 256 30 286 15 S-Adenosyl-L-methionine (SAM) 6 239 245 16 S-Adenosyl-L-homocysteine 227 9 236 17 UDP 216 6 222 18 H2O2 142 21 163 19 2-Oxoglutarate 33 125 158 20 AMP 144 14 158 21 Pyruvate 101 50 151 22 Acetyl-CoA 35 101 136 23 L-Glutamate 83 46 129 24 Oxaloacetate 29 14 43

slide-78
SLIDE 78

Part 3 From networks to pathways

slide-79
SLIDE 79

Tricks and traps for metabolic path finding

slide-80
SLIDE 80

Path finding traps - Ubiquitous compounds

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate 3.5.1.18 LL-diaminopimelic acid succinate Sucinyl diaminopimelate H2O

Invalid pathway

4.2.1.52 H2O L-Aspartic Semialdehyde 3.5.1.18 LL-diaminopimelic acid

Reactions

slide-81
SLIDE 81

Path finding traps - Direct traversal of reversible reactions

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate

Invalid pathway

4.2.1.52 L-Aspartic Semialdehyde Pyruvate 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde

Valid pathways

4.2.1.52 L-Aspartic Semialdehyde dihydrodipicolinic acid

Reaction

slide-82
SLIDE 82

Path finding traps - Mutual exclusion of reverse reactions

4.2.1.52 H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate

Invalid pathway Reactions

4.2.1.52 reverse H2O dihydrodipicolinic acid L-Aspartic Semialdehyde Pyruvate 4.2.1.52 dihydrodipicolinic acid L-Aspartic Semialdehyde 4.2.1.52 reverse Pyruvate

slide-83
SLIDE 83

Path finding traps – “generic” compounds and unbalanced reactions

  • KEGG contains “generic” compounds, i.e. entities that repesent a whole class of compounds.
  • Examples: sugar, DNA, ...
  • Those compounds are sometimes involved in reactions which are not properly balanced.
  • E.g. R00375

dATP + DNA <=> Diphosphate + DNA

  • Such compounds can fool path finding algorithms and return irrelevant pathways.
slide-84
SLIDE 84

(Two-ends) path finding

slide-85
SLIDE 85

Raw graph: from L-aspartate to L-methionine

  • The 5 shortest paths from L-aspartate to L-methionine in the raw graph
  • L-aspartic acid --> 6.3.5.4 --> AMP --> 6.1.1.10 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.4.13.12 --> L-methionine
  • L-aspartic acid --> 4.3.1.1 --> NH3 --> 4.4.1.11 --> L-methionine
  • L-aspartic acid --> 3.5.1.15 --> H2O --> 3.5.1.31 --> L-methionine
  • All these paths convert L-aspartate to L-methionine in 2 reactions steps.
  • In all these cases, the intermediate compound belongs to the group of highly

connected nodes in the metabolic graph.

  • These compounds cannot be considered as valid intermediates between

these reactions.

slide-86
SLIDE 86

Filtered graph: discarding pool metabolites

  • To avoid irrelevant shortcuts, a set of highly connected

compounds are discarded from the graph.

  • The selection is fine-tuned manually
  • some compounds are maintained (e.g. S–Adenosyl–L–

methionine, …).

  • thers, although less connected, are removed (e.g. pyruvate,

CMP).

1. H20 2. ATP 3. NAD 4. NADH 5. NADPH 6. NADP 7. O2 8. ADP 9. Pi 10. CoA 11. CO2 12. Ppi 13. NH3 14. UDP 15. AMP 16. pyruvate 17. acetyl-CoA 18. L-glutamate

  • 19. 2-oxoglutarate
  • 20. H2O2
  • 21. Acceptor
  • 22. UDP
  • 23. Reduced acceptor
  • 24. Acetate
  • 25. GDP
  • 26. oxalacetic acid
  • 27. succinic acid
  • 28. GTP
  • 29. CMP
  • 30. UTP
  • 31. H+
  • 32. UMP
  • 33. CDP
  • 34. reduced ferredoxin
  • 35. H2
  • 36. FADH2

Filtered out

compound reactions H2O 1615 NAD+ 578 NADH 569 NADP+ 564 NADPH 559 Oxygen 527 ATP 435 Orthophosphate 349 ADP 324 CO2 323 CoA 303 H+ 272 NH3 270 Pyrophosphate 252 UDP 190 S-Adenosyl-L-methionine 174 S-Adenosyl-L-homocysteine 165 Pyruvate 150 AMP 142 H2O2 138 L-Glutamate 132 2-Oxoglutarate 129 Acceptor 126 Acetyl-CoA 122 Reduced acceptor 122 Acetate 87 UDPglucose 79 D-Glucose 62 Succinate 59 CMP 54 … …

slide-87
SLIDE 87

Filtered graph : choice of excluded compounds

  • Where to set the limit ?

Seems obvious for H2O (1615), NADH (569), ... What about ATP (435) ? And pyruvate ? And NH3 ?

  • Depends on the reaction/pathway considered

e.g. ATP is valid intermediate in nucleotide biosynthesis

  • Depends on the atoms being transferred during the reaction

e.g. NADH gives one proton

  • Depends on the focus of the question

e.g. analysis of energy metabolism

→ ATP, NAD will matter

slide-88
SLIDE 88

Filtered graph: from L-aspartate to L-methionine

The 5 shortest paths from L-aspartate to L-methionine in the filtered graph

  • L-aspartic acid --> 2.6.1.35 --> glycine --> 2.6.1.73 --> L-methionine
  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.41 --> d-methionine -->

5.1.1.2 --> L-methionine

  • L-aspartic acid --> 2.6.1.12 --> L-alpha-alanine --> 2.6.1.2 --> o-acetyl-L-

homoserine --> 2.5.1.49 --> L-methionine

  • L-aspartic acid --> 4.1.1.12 --> L-alpha-alanine --> 2.6.1.44 --> glycine -->

2.6.1.73 --> L-methionine

  • These paths use valid intermediate compounds.
  • However, they are much shorter (2 or 3 intermediate reactions) than the

annotated methionine pathway.

  • The intermediate compounds and reactions are not part of the annotated

pathway.

slide-89
SLIDE 89

Path finding in a weighted graph

  • Principle
  • Each compound node is assigned a weight proportional to its connectivity degree.
  • All compounds are allowed for path finding, but the cost is higher for highly

connected compounds.

  • This reduces the probability to use a pool metabolite as intermediate between two

successive reactions.

slide-90
SLIDE 90

Weighted graph: methionine biosynthesis

  • Search of the 5 shortest paths from L-aspartate to L-methionine
  • Weighted graph (compound weight = connectivity

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.31 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-

diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 -- > L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 3.5.5.4 --> L-beta-cyanoalanine --> R03972 --> L-2,4-

diaminobutyrate --> 2.6.1.46 --> L-aspartic 4-semialdehyde --> 1.1.1.3 -- > L-homoserine --> 2.3.1.31 --> o-acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

L-aspartic acid --> 2.7.2.4 --> L-4-aspartyl phosphate --> 1.2.1.11 --> L-

aspartic 4-semialdehyde --> 1.1.1.3 --> L-homoserine --> 2.3.1.46 --> o- succinyl-L-homoserine --> 2.5.1.48 --> L-cystathionine --> 2.5.1.49 --> o- acetyl-L-homoserine --> 2.5.1.49 --> L-methionine

E.coli pathway Yeast pathway

slide-91
SLIDE 91

Heme biosynthesis (Saccharomyces cerevisiae)

2.5.1.61 hydroxymethylbylane 2.3.1.37 5-aminolevulinate 1.3.3.3 protoporphyrinogen ix 4.1.1.37 coproporphyrinogen iii 4.99.1.1 1.3.3.4 protoproporphyrin 4.2.1.24 porphobilinogen 4.2.1.75 uroporphyrinogen iii

Annotated pathway

protoporphyrin

1.14.12.1

H+

6.4.1.- 1.1.1.170 1.1.1.270 1.14.13.72

2.3.1.37

CO2

5-aminolevulinate 2-amino-3-

  • xoadipate

haem

4.2.1.104 1.14.12.1

4.99.1.1 fe2+

B

Path finding in raw graph

co2 1.3.7.2 biliverdin 1.14.99.3 2.6.1.43 l-alanine 2.6.1.44 2-amino-3-

  • xoadipate

2.3.1.37 1.2.7.3

  • xidized ferredoxin

2.3.1.37 5-aminolevulinate 1.4.2.1 ferrocytochrome c succinyl-coa 4.99.1.1 haem protoporphyrin h+ fe2+ gly 2.3.1.37 1.3.7.5 1.9.99.1

Path finding in filtered graph

porphobilinogen 2.5.1.61 2.3.1.37 5-aminolevulinate 2-amino-3-oxoadipate 2.3.1.37

  • xidized ferredoxin

1.3.7.5 1.3.7.4 1.3.7.2 1.2.7.3 hydroxymethylbilane 4.2.1.24 2.3.1.37 co2 fe2+ 4.2.1.75 uroporphyrinogen iii 1.3.3.3 protoporphyrinogen ix biliverdin 1.3.3.4 succinyl-coa coproporphyrinogen iii h+ 4.1.1.37 protoporphyrin 4.99.1.1 1.14.99.3 haem

D

Path finding in we

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

slide-92
SLIDE 92

Alignment between inferred and annotated pathways

Threonine biosynthesis

slide-93
SLIDE 93

Evaluation of inferred paths (KEGG/LIGAND network, aMAZE pathways)

  • Comparison between inferred paths and annotated pathways based on

intermediate reactions (those not provided as source and target)

Shortest path Graph Average sensitivity Average PPV Average accuracy Raw 31.4% 25.4% 28.4% Filtered 68.0% 63.0% 65.5% Weighted 88.5% 83.4% 85.9% Most accurate among the 5 shortest paths Graph Average sensitivity Average PPV Average accuracy Raw 33.3% 26.5% 29.9% Filtered 71.4% 66.7% 69.1% Weighted 92.2% 88.1% 90.1%

Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity)

PPV = TP/(TP+FP)

Accuracy

Acc = (Sn+PPV)/2

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

False Negative: annotated not inferred False Positive: inferred not annotated True Negative: not inferred not annotated True Positive: Inferred and annotated

slide-94
SLIDE 94

Evaluation of inferred paths (EcoCyc network, EcoCyc pathways)

  • Comparison between inferred paths and annotated pathways based on

intermediate reactions (those not provided as source and target)

Shortest path Graph Average sensitivity Average PPV Average accuracy Raw 29.6% 31.0% 29.3% Filtered 63.3% 68.8% 66.6% Weighted 80.7% 85.3% 83.0% Most accurate among the 5 shortest paths Graph Average sensitivity Average PPV Average accuracy Raw 35.0% 40.0% 37.5% Filtered 85.6% 89.2% 87.4% Weighted 92.2% 95.1% 93.7%

Sensitivity Sn = TP/(TP + FN) Positive predictive value (specificity)

PPV = TP/(TP+FP)

Accuracy

Acc = (Sn+PPV)/2

Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2006. J Mol Biol 356: 222-236.

False Negative: annotated not inferred False Positive: inferred not annotated True Negative: not inferred not annotated True Positive: Inferred and annotated

slide-95
SLIDE 95

Inferred paths versus KEGG/LIGAND pathway maps

  • Each inferred path is compared to the 85 pathway

maps, and the significant correspondences are retained (hypergeometric test).

  • X axis
  • number of intermediate reactions in the

inferred path

  • Y axis
  • number of reaction in common with a

KEGG pathway

  • Values
  • number of inferred paths
  • On the diagonal
  • inferred paths completely included in one

KEGG pathway.

  • Inferred length
  • Raw graph < Filtered graph < Weighted graph
  • Consistency with KEGG
  • Raw graph < Filtered graph < Weighted graph
slide-96
SLIDE 96

Navigating in a network of reactant pairs (RPAIRs)

slide-97
SLIDE 97

Reactant pairs (RPAIR)

  • RPAIR definition
  • “pairs of compounds

that have atoms or atom groups in common on two sides of a reaction” (Kotera et al, 2004)

  • Example (from Faust et

al., 2009).

1. Kotera, M., Hattori, M., Oh, M.-A., Yamamoto, R., Komeno, T., Yabuzaki, J., Tonomura, K., Goto, S. & Kanehisa, M. (2004). RPAIR: a reactant-pair database representing chemical changes in enzymatic reactions Genome Informatics 15. 2. Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817]. R00480 A00003 (main) A00932 (main) A06173 (trans)

slide-98
SLIDE 98

Path finding in the RPAIR versus reaction network

  • Shortest paths fro L-Aspartate to L-Methionine

Reaction network All RPAIRs Main RPAIRs

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-99
SLIDE 99

Alternative paths found in organism-specific networks

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-100
SLIDE 100

Impact of path finding parameters Path finding in reference network (all organisms merged)

  • Path finding in a metabolic network built from all KEGG reactions or reactant

pairs.

  • 104 combinations of parameters tested: network type, weighting policy,

compound filtering, directed/undirected network.

  • Estimated using a collection of 55 linear pathways from E.coli (32) S.cerevisiae

(11) and H.sapiens (12).

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-101
SLIDE 101

Impact of path finding parameters Path finding in an organism-specific network (Escherichia coli)

  • Path finding in a metaboic network built from all KEGG reactions catalyzed in

E.coli + spontaneous reactions.

  • Estimated using a collection of 32 linear pathways from E.coli.

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-102
SLIDE 102

Impact of path finding parameters Path finding in an organism-specific network (S.cerevisiae)

  • Path finding in a metaboic network built from all KEGG reactions catalyzed in

S.cerevisiae + spontaneous reactions.

  • Estimated using a collection of 11 linear pathways from S.cerevisiae.

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-103
SLIDE 103

Impact of path finding parameters Path finding in an organism-specific network (H.sapiens)

  • Path finding in a metaboic network built from all KEGG reactions catalyzed in

H.sapiens + spontaneous reactions.

  • Estimated using a collection of 12 linear pathways from H.sapiens.

...

Faust, K., Croes, D. and van Helden, J. (2009). Metabolic Pathfinding Using RPAIR Annotation. J Mol Biol. [Pubmed 19281817].

slide-104
SLIDE 104

Multi-seed pathway building

slide-105
SLIDE 105

Reconstructing a pathway from a subset of reactions

  • Input:
  • a set of reactions (the seed reactions)
  • Output:
  • a metabolic pathway including
  • the seed reactions, together with their substrates and products
  • optionally, some additional reactions, intercalated to improve the pathway

connectivity

  • the pathway can either be connected, or contain several unconnected components
slide-106
SLIDE 106

Seed nodes

Compound Reaction Seed Reaction

slide-107
SLIDE 107

Linking seed nodes

Compound Reaction Direct link Seed Reaction

slide-108
SLIDE 108

Enhance linking by intercalating reactions

Compound Reaction Direct link Intercalated reaction Seed Reaction

slide-109
SLIDE 109

Subgraph extraction

slide-110
SLIDE 110

Providing intermediate nodes help finding relevant paths

  • Reference pathway: pyrimidine

ribonucleotides de novo biosynthesis pathway (MetaCyc identifier: PWY0-162) in E. coli.

  • A: reference pathway.
  • B: Path found with the 2 terminal

nodes as seeds (blue).

  • C: path found with 4 seed nodes

(blue).

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8. [Pubmed 20228128].

slide-111
SLIDE 111

Multiple seed nodes allow to find branched pathways

  • A: Reference pathway: superpathway of lysine, threonine and methionine

biosynthesis I (MetaCyc identifier: P4-PWY) in E. coli.

  • B: Path found with the 5 terminal nodes as seeds.
  • C: path found with the 5 terminal nodes + 2 intermediate nodes

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8.

slide-112
SLIDE 112

Evaluation of sub-graph extraction

  • Identification of optimal path finding algorithms/parameters.
  • 71 reference pathways (including branched and cyclic)

Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction. Bioinformatics 26:1211-8.

slide-113
SLIDE 113

Outlook

slide-114
SLIDE 114

Outlook

  • Application: inferring metabolic pathways from genomes, using as seed nodes
  • Sets of genes belonging to the same operon.
  • Sets of genes showing similar cis-regulatory elements (phylogenetic footprinting).
  • Sets of genes co-occuring/co-disappearing in Bacterial genomes (phylgenetic footprints).
  • Sets of co-expressed genes (microarray experiments).
  • ... any other criterion of functional regroupment.
  • Weaknesses
  • Rely on defined set of reactions/compounds (we infer pathways, not reactions)
  • Path finding or subgraph extraction are only a very naive approximation of metabolic pathways.
  • Despite the improvements, path finding approaches still return some irrelevant pathways.
  • Strenghts
  • Tranctability: can deal wth metabolic networks made of thousands of reactions.
  • Possibility to discover novel pathways (rater than mapping on “reference” patwhays).
  • Possibility to introduce context-dependence (organism-specific network, reactions weighted

according to enzyme expression, ...).

  • Hybrid approaches
  • Subgraph extraction can suggest reasonable hypotheses about potential pathways, which can be
  • Used as input for more refined modeling approaches.
  • Tested experimentally.
slide-115
SLIDE 115

Are metabolic systems in (near) steady state ?

  • Do we have reasons to think that cells, in their natural environment, are living in

(near) steady-state conditions for all metabolic concentrations ?

  • An intuitive example: methioninie biosynthesis in E.coli
  • Methionine biosynthesis consumes cysteine. This happens very rapidly

(nanoseconds).

  • This provokes a depletion of cystein concentration.
  • Depletion of cystein concentration triggers
  • Fast response (nanoseconds): activation of cystein-synthesizing enzymes (if

present)

  • Slow response (minutes): transcriptional activation of the same enzymes (if

absent)

  • Question for the audience
  • Do we have experimental evidence for the existence of steady-states ?
  • If yes, how can they be understood in the light of the example above ?
  • Has this kind of dual-timescales responses been modeled ?
slide-116
SLIDE 116

Job announcement : Postdoc position @ BiGRe.ULB.ac.be

  • Location
  • Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
  • Université Libre de Bruxelles (Brussels, Belgium)
  • http://www.bigre.ulb.ac.be/
  • 2-years position, starting ASAP
  • Skills
  • Bioinformatics
  • Good understanding of metabolism, genetics, regulation.
  • Familarity with Unix environment, scripting capabilities, java programming is a plus.
  • Context: MICROME project
  • EU FP7 project involving 14 partners
  • http://www.microme.eu/
  • Scope: Metabolic annotation of Bacterial genomes (enzyme identification, pathway reconstruction, metabolic modelling).
  • BiGRe focus in the project:
  • Prediction of operons (distance-based + synteny methods).
  • Prediction of regulons by comparative genomics (phylogenetic footprinting).
  • Co-occurrence of genes across genomes (phylogenetic profiles).
  • Applying path-fiding methods to infer pathways from the above-defined groups of related genes.
  • Evolution of Bacterial metabolism and its regulation.
  • Related publications
  • Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010). Pathway discovery in metabolic networks by subgraph extraction.

Bioinformatics 26:1211-8. [Pubmed 20228128].

  • Faust, K., Croes, D. and van Helden, J. (2009). In response to "Can sugars be produced from fatty acids? A test case for pathway

analysis tools". Bioinformatics 2009 Sept 23. [Pubmed 19776213].

  • Janky, R. and van Helden, J. Evaluation of phylogenetic footprint discovery for the prediction of bacterial cis-regulatory elements

(2008). BMC Bioinformatics 2008, 9:37doi:10.1186/1471-2105-9-37. [Pubmed 18215291].

  • Croes, D., F. Couche, S.J. Wodak, J. van Helden (2006). Inferring Meaningful Pathways in Weighted Metabolic Networks. J. Mol.
  • Biol. 356:222-36. [Pubmed 16337962].
slide-117
SLIDE 117

Acknowledgements

Network Analysis Tools

(NeAT, http://rsat.ulb.ac.be/neat/)

  • Sylvain Brohée
  • Karoline Faust
  • Gipsi Lima-Mendez

Programs from external developers

  • Sijn van Dongen (Sanger, UK) for MCL
  • Igor Jurisica (USA) for RNSC

Metabolic path finding

  • Karoline Faust
  • Pierre Dupont (UCL, Belgium) for kWalks
  • Shoshana Wodak
  • Didier Croes
  • Fabian Couche
  • The former aMAZE team

Interactome

  • Nicolas Simonis
  • Léon Juvénal Hagingambo

Bioinformatique des Génomes et des Réseaux (BiGRe) http://www.bigre.ulb.ac.be/

Collaborations (former aMAZE project)

  • Georges Cohen (Institut Pasteur - France)
  • Yves Deville (UCL)
  • Grégoire Dooms (UCL)
  • Pierre Schaus (UCL)
  • Stéphane Zampelli (UCL)
  • Lorenz Wernisch (Birbick college, UK)
  • David Gilbert (London City – UK)

Former aMAZE team

  • Shoshana Wodak
  • Hassan Anerhour
  • Erick Antezana
  • Jean Richelle
  • Xavier Santaloria
  • Jesintha Maniraja
  • Christian Lemer
  • Olivier Hubaut
  • Fabian Couche
  • Frederic Fays
  • Simon De Keyzer
slide-118
SLIDE 118

Links and references

  • Network Analysis Tools (NeAT)
  • http://neat.rsat.eu/
  • Publications

1.

  • 1. Faust, K. and van Helden, J. (2012) Predicting metabolic pathways by sub-network extraction. Methods Mol Biol, 804, 107–

130. 2.

  • 2. Faust, K., Croes, D. and van Helden, J. (2011) Prediction of metabolic pathways from genome-scale metabolic networks.

BioSystems, 105, 109–121. 3.

  • 3. Faust, K., Dupont, P., Callut, J. and van Helden, J. (2010) Pathway discovery in metabolic networks by subgraph extraction.

Bioinformatics, 10.1093/bioinformatics/btq105. 4. Lima-Mendez, G. and van Helden, J. (2009). The powerful law of the power law and other myths in network biology. Mol. BioSyst., 2009, 5, 1482 - 1493, DOI: 10.1039/b908681a. 5. Faust, K., Croes, D. and van Helden, J. (2009). In response to "Can sugars be produced from fatty acids? A test case for pathway analysis tools". Bioinformatics 2009 Sept 23. 6. Faust K., Croes, D., van Helden J. (2009). Metabolic Pathfinding Using RPAIR Annotation. Journal of Molecular Biology 388, 390-414. 7. Brohée S., Faust K., Lima-Mendez G., Vanderstocken G., van Helden J. (2008). Network Analysis Tools: from biological networks to clusters and pathways. Nature protocols 3 (10), 1616-29. 8. Brohée, S., Faust, K., Lima-Mendez, G., Sand, O., Janky, R., Vanderstocken, G., Deville, Y. & van Helden, J. (2008). NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Research 36, W444-451. 9. Brohée, S. & van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics 7, 488. Pubmed 17087821

  • 10. Croes, D., F. Couche, S.J. Wodak, J. van Helden (2006). Inferring Meaningful Pathways in Weighted Metabolic Networks. J.
  • Mol. Biol. 356:222-36. Pubmed 16337962
  • 11. Croes, D., F. Couche, S.J. Wodak, and J. van Helden. 2005. Metabolic PathFinding: inferring relevant pathways in

biochemical networks. Nucleic Acids Res 33: W326-330.

  • 12. van Helden, J., Wernisch, L., Gilbert, D. & Wodak, S. J. (2002). Graph-based analysis of metabolic networks. In Ernst

Schering Res Found Workshop (al., M. H.-W. e., ed.), pp. 245-74. Springer-Verlag.

  • 13. van Helden, J., Gilbert, D., Wernisch, L., Schroeder, M. & Wodak, S. (2001). Applications of regulatory sequence analysis and

metabolic network analysis to the interpretation of gene expression data. Computational Biology : First International Conference on Biology, Informatics, and Mathematics, JOBIM 2000. LNCS volume 2066, Montpellier.