SLIDE 1 Falk Schreiber
Graph Algorithms and Graph Measures for the Life Sciences
23/10/2014 1
SLIDE 2
Networks and Graphs in the Life Sciences
Graph Network
SLIDE 3
Network Representation
Network is an informal description for a set of elements with connections or interactions between them and data attached to them Graph is a formal description, it is a mathematical object consisting of vertices and edges representing elements and connections, respectively
SLIDE 4
Interactions à Networks à Pathways
A collection of interactions and/or transformations defines a network Pathways are subsets of networks All pathways are networks, however not all networks are pathways Difference: level of annotation/understanding We can define a pathway as a biological network that relates to a known physiological process or phenotype There is no precise biological definition of a pathway Partitioning of networks into pathways is somewhat arbitrary
SLIDE 5
Networks a Decade Ago
SLIDE 6 Can you Spot the Error?
[from Milo et al., Science, 2002]
SLIDE 7
Retraction and Impact Factor
SLIDE 8
Just an Example …
SLIDE 9
From Biological Building Blocks to Complex Systems
Genome Set of hereditary instructions needed to build, run and maintain a particular organism Genes Transcripts Proteins Metabolites
SLIDE 10 From Biological Building Blocks to Complex Systems
Transcriptome Set of RNA transcribed from genes within the genome by a particular cell at a particular time Depends on the tissue, the developmental stage of the
- rganism and the metabolic state of the cell
Genes Transcripts Proteins Metabolites
SLIDE 11
From Biological Building Blocks to Complex Systems
Proteome Set of proteins translated from RNA within a transcriptome by a particular cell at a particular time Complete proteome of a cell: set of all potential proteins that could be synthesised by the cell Genes Transcripts Proteins Metabolites
SLIDE 12
From Biological Building Blocks to Complex Systems
Metabolome Set of all the metabolites inside a particular cell at a particular time Genes Transcripts Proteins Metabolites
SLIDE 13
From Biological Building Blocks to Complex Systems
Genes Transcripts Proteins Metabolites
SLIDE 14
From Biological Building Blocks to Complex Systems
Genes Transcripts Proteins Metabolites 20th century biology (reductionist approach) Phenylketonuria is caused by a mutated gene for the enzyme phenylalanine hydroxylase (PAH)
SLIDE 15
From Biological Building Blocks to Complex Systems
Genes Transcripts Proteins Metabolites 20th century biology (reductionist approach) Cancer, heart diseases, … multiple, complex changes 21th century biology (integrative approach)
SLIDE 16
From Biological Building Blocks to Complex Systems
Genes Transcripts Proteins Metabolites
SLIDE 17
Biological Pathways and Networks - Examples
Signal transduction pathway and networks Cellular processes that recognize extra- or intra-cellular signals and induce appropriate cellular responses Gene regulatory networks Pathways that regulate a cell’s behaviors, including transcription and translation Metabolic pathway A series of enzymatic reactions that produce a specific product Protein interaction networks Interaction of proteins (e.g. activation, non-covalent binding)
SLIDE 18 Biological Pathways and Networks
gene regulation level 1 level 2 protein clustering protein interaction metabolism chromosome location of genes
Andreas Kerren Helen C. Purchase Matthew O. Ward (Eds.)
Multivariate Network Visualization
State-of-the-Art Survey LNCS 8380
123
Dagstuhl Seminar #13201 Dagstuhl Castle, Germany, May 12–17, 2013 Revised Discussions
SLIDE 19
Many Informatics Areas
Health informatics/ Environmental informatics Medical informatics Bioinformatics Chemoinformatics Evolutionary networks Infection networks Ecological networks / food webs Neuronal networks Hormonal networks Signalling networks Gene regulatory network Protein interaction networks Metabolic networks Chemical structure graphs
SLIDE 20
Network Usage - Examples
Representation/exploration Network analysis Data context/analysis Simulation
SLIDE 21 Network Analysis - Network Centralities
Centrality of graph G=(V,E) Funktion c:V→R With c(u)>c(v), if u∈V more important than v∈V Ranking of vertices According to importance Based on the network structure Application examples Hypothesis generation for experiments Which patients should be vaccinated first Problem Works not well with existing algorithms
[from Jeong et al., Nature, 2001]
SLIDE 22 New Centrality Measure
Based on network motifs Sub-graphs representing patterns of local interconnections May represent basic building blocks and design patterns of functional modules
[from Babu et al., Current Opinion in Structural Biology, 2004]
SLIDE 23 Motifs in Gene Regulatory Networks: Feed-forward Loop
Example of functional properties Noise filtering: responds only to persistent activations
[from Shen-Orr et al., Nature Genetics, 2002]
SLIDE 24 Motif-based Centrality
1 v5 1 v1 2 v4 2 v3 3 v2 centrality vertex
Combines centrality measures and network motifs Uses occurrences of a motif in the network Incorporates functional substructures into centrality analysis
- Motif (Feed-forward loop) Target graph
M G
}
{
M G G G G
M M M M
− ∧ ⊆ = ~ G
}
{
| ) ( ) (
M M M M
G V v G G v c ∈ ∧ ∈ = G
SLIDE 25 Motif-based Centrality with Roles
Different vertices have different roles Count number of matches according to roles
v4 v3 2 v2 1 v1 centrality vertex 1 1 1 2 1 A B C
Motif (Feed-forward loop) Target graph M G
}
{
M G G G G
M M M M
− ∧ ⊆ = ~ G
}
{
| ) , ( ) ( ) , ( r G v role G V v G G r v c
M M M M M
= ∧ ∈ ∧ ∈ = G
SLIDE 26
Gene Regulatory Network of E. coli
Based on data from RegulonDB (http://regulondb.ccg.unam.mx/) 1250 vertices and 2515 edges Global regulators?
SLIDE 27 Motif-based Centrality with Roles for E. coli
Top 20 genes (of 1250) 11 of 18 global regulators (Martínez-Antonio and Collado-Vides) Method works also for
Even better results with different motifs
40 fis 58 arcA 61 ihfAB 150 fnr 254 crp centrality gene 70 53 53 A B C 11 gadE 11 fhlA 14 hns 18 soxS 18 modE 39 1 8 gadX 8 galR 10 rob 11 cpxR 26 5 srlR 6 tdcR 6
6 fur 6 gntR 11 1 36 1 1 5 narL 95
SLIDE 28 Two Vague Ideas
Are scale-free and small- world networks relevant or more an artifact ?
THEINTERNET, mapped on the opposite page, is a scale-free network in that some siteS (starbursts and detail above) have a seemingly unlimited number of connections to other sites. This map, made on February 6, 2003, traces the shortest routes from atest WebsinHo about 100,000 others, using like colors for similar Webaddresses.
a
- Scientistshaverecentlydiscoveredthat variouscomplexsystemshave
antlnderlyihg~..'~tJ;i~e~tu"eg~Ye'l"rne(;lb9.$ha redorga nili ngprincipies. Thisinsighthas important impli~ationsfor a hostof applications, fromdrugdevelopment to Internetsecurity
BYALBERT-U\SZLO BARABASI ANDERICBONABEAU
50
SCIENTIFIC AMERICAN MAY 2003
SLIDE 29
Degree Distribution - Examples
SLIDE 30
Erdős-Rényi (1960) Watts-Strogatz (1998) Barabási-Albert (1999)
Models for Networks of Complex Topology
SLIDE 31
Start with n nodes and 0 edges Connect each pair of vertices with probability pER Many properties in these graphs appear quite suddenly, at a threshold value of pER If pER~c/n with c<1, then almost all nodes belong to isolated trees
The Erdős-Rényi [ER] Model (1960)
SLIDE 32
The Watts-Strogatz [WS] Model (1998)
Start with a regular network with n nodes Rewire each edge with probability p For p=0 (regular networks) High clustering coefficient C, high characteristic path length L For p=1 (random networks) Low clustering coefficient C, low characteristic path length L
SLIDE 33
The Watts-Strogatz [WS] Model (1998)
There is a broad interval of p for which characteristic path length L is small but clustering coefficient C remains large Small world networks are common
SLIDE 34
The Barabási-Albert [BA] Model (1999)
Look at the distribution of degrees k A scale-free network is a network where small proportion of the nodes have high degree of connection ("highly connected hubs“) The probability of finding a highly connected node decreases exponentially with k p(k) ~ k-γ , a given node has k connections to other nodes with probability as the power law distribution with γ = [2, 3]
SLIDE 35
The Barabási-Albert [BA] Model (1999)
SLIDE 36
Protein Interaction Networks
Also other networks, e.g. transcript correlation networks
SLIDE 37
Two Vague Ideas
Are scale-free and small-world networks relevant or more an artifact ? Taxonomy for centrality measures
SLIDE 38
Taxonomy for Centrality Measures