Mining co-expression networks Nathalie Villa-Vialaneix - - PowerPoint PPT Presentation

mining co expression networks
SMART_READER_LITE
LIVE PREVIEW

Mining co-expression networks Nathalie Villa-Vialaneix - - PowerPoint PPT Presentation

Overview on co-expression network analysis Case study 1 Case study 2 References Mining co-expression networks Nathalie Villa-Vialaneix http://www.nathalievilla.org INRA, Unit MIA-T, INRA, Toulouse (France) School for advanced sciences of


slide-1
SLIDE 1

Overview on co-expression network analysis Case study 1 Case study 2 References

Mining co-expression networks

Nathalie Villa-Vialaneix

http://www.nathalievilla.org

INRA, Unité MIA-T, INRA, Toulouse (France) School for advanced sciences of Luchon Network analysis and applications

NV2 | Mining co-expression networks 1/32

slide-2
SLIDE 2

Overview on co-expression network analysis Case study 1 Case study 2 References

Outline

1

Overview on co-expression network analysis

2

Case study 1: gene network analysis in relations with meat quality

3

Case study 2: gene network analysis in LCD experiment

NV2 | Mining co-expression networks 2/32

slide-3
SLIDE 3

Overview on co-expression network analysis Case study 1 Case study 2 References

Outline

1

Overview on co-expression network analysis

2

Case study 1: gene network analysis in relations with meat quality

3

Case study 2: gene network analysis in LCD experiment

NV2 | Mining co-expression networks 3/32

slide-4
SLIDE 4

Overview on co-expression network analysis Case study 1 Case study 2 References

Transcriptomic data

DNA transcripted into mRNA to produce proteins

NV2 | Mining co-expression networks 4/32

slide-5
SLIDE 5

Overview on co-expression network analysis Case study 1 Case study 2 References

Transcriptomic data

DNA transcripted into mRNA to produce proteins transcriptomic data: measure

  • f the quantity of mRNA

corresponding to a given gene in given cells (blood, muscle...) of a living organism

NV2 | Mining co-expression networks 4/32

slide-6
SLIDE 6

Overview on co-expression network analysis Case study 1 Case study 2 References

Systems biology

Some genes’ expressions activate or repress other genes’ expressions ⇒ understanding the whole cascade helps to comprehend the global functioning of living organisms1

1Picture taken from: Abdollahi A et al., PNAS 2007, 104:12890-12895. c

2007 by National Academy of Sciences

NV2 | Mining co-expression networks 5/32

slide-7
SLIDE 7

Overview on co-expression network analysis Case study 1 Case study 2 References

Standard issues in network analysis

Inference Giving expression data, how to build a graph whose edges represent the direct links between genes?

Example: co-expression networks built from microarray/RNAseq data (nodes = genes; edges = significant “direct links” between expressions of two genes)

NV2 | Mining co-expression networks 6/32

slide-8
SLIDE 8

Overview on co-expression network analysis Case study 1 Case study 2 References

Standard issues in network analysis

Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples)

1

Network visualization: nodes are not a priori given a position.

Random positions Positions aiming at representing connected nodes closer

NV2 | Mining co-expression networks 6/32

slide-9
SLIDE 9

Overview on co-expression network analysis Case study 1 Case study 2 References

Standard issues in network analysis

Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples)

1

Network visualization: nodes are not a priori given a position.

2

Important node extraction (high degree, high centrality...)

NV2 | Mining co-expression networks 6/32

slide-10
SLIDE 10

Overview on co-expression network analysis Case study 1 Case study 2 References

Standard issues in network analysis

Inference Giving expression data, how to build a graph whose edges represent the direct links between genes? Graph mining (examples)

1

Network visualization: nodes are not a priori given a position.

2

Important node extraction (high degree, high centrality...)

3

Network clustering: identify “communities”

NV2 | Mining co-expression networks 6/32

slide-11
SLIDE 11

Overview on co-expression network analysis Case study 1 Case study 2 References

Network inference

Data: large scale gene expression data

individuals

n ≃ 30/50

          

X =

           . . . . . . . .

Xj

i

. . . . . . . . .           

  • variables (genes expression), p≃103/4

What we want to obtain: a graph/network with nodes: genes (a selected sublist of interest2; usually, DE genes); edges: “strong relations” between gene expressions.

2See [Verzelen, 2012] for conditions on respective n/p suited for inference. NV2 | Mining co-expression networks 7/32

slide-12
SLIDE 12

Overview on co-expression network analysis Case study 1 Case study 2 References

Advantages of this network model

1

  • ver raw data: focuses on the strongest direct relationships:

irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations).

NV2 | Mining co-expression networks 8/32

slide-13
SLIDE 13

Overview on co-expression network analysis Case study 1 Case study 2 References

Advantages of this network model

1

  • ver raw data: focuses on the strongest direct relationships:

irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations). Expression data are analyzed all together and not by pairs (systems model).

NV2 | Mining co-expression networks 8/32

slide-14
SLIDE 14

Overview on co-expression network analysis Case study 1 Case study 2 References

Advantages of this network model

1

  • ver raw data: focuses on the strongest direct relationships:

irrelevant or indirect relations are removed (more robust) and the data are easier to visualize and understand (track transcription relations). Expression data are analyzed all together and not by pairs (systems model).

2

  • ver bibliographic network: can handle interactions with yet

unknown (not annotated) genes and deal with data collected in a particular condition.

NV2 | Mining co-expression networks 8/32

slide-15
SLIDE 15

Overview on co-expression network analysis Case study 1 Case study 2 References

Using correlations: relevance network [Butte and Kohane, 1999, Butte and Kohane, 2000]

First (naive) approach: calculate correlations between expressions for all pairs of genes, threshold the smallest ones and build the network. Correlations Thresholding Graph

NV2 | Mining co-expression networks 9/32

slide-16
SLIDE 16

Overview on co-expression network analysis Case study 1 Case study 2 References

Using partial correlations

strong indirect correlation y z x

NV2 | Mining co-expression networks 10/32

slide-17
SLIDE 17

Overview on co-expression network analysis Case study 1 Case study 2 References

Using partial correlations

strong indirect correlation y z x

set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105

NV2 | Mining co-expression networks 10/32

slide-18
SLIDE 18

Overview on co-expression network analysis Case study 1 Case study 2 References

Using partial correlations

strong indirect correlation y z x

set.seed(2807); x <- rnorm(100) y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826 z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751 cor(y,z) [1] 0.9971105 ♯ Partial correlation cor(lm(x∼z)$residuals,lm(y∼z)$residuals) [1] 0.7801174 cor(lm(x∼y)$residuals,lm(z∼y)$residuals) [1] 0.7639094 cor(lm(y∼x)$residuals,lm(z∼x)$residuals) [1] -0.1933699

NV2 | Mining co-expression networks 10/32

slide-19
SLIDE 19

Overview on co-expression network analysis Case study 1 Case study 2 References

Partial correlation and GGM

Gaussian Graphical Model framework:

(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene

expression); then j ←→ j′(genes j and j′ are linked) ⇔ Cor

  • Xj, Xj′|(Xk)kj,j′
  • NV2 | Mining co-expression networks

11/32

slide-20
SLIDE 20

Overview on co-expression network analysis Case study 1 Case study 2 References

Partial correlation and GGM

Gaussian Graphical Model framework:

(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene

expression); then j ←→ j′(genes j and j′ are linked) ⇔ Cor

  • Xj, Xj′|(Xk)kj,j′
  • If (concentration matrix) S = Σ−1,

Cor

  • Xj, Xj′|(Xk)kj,j′
  • = −

Sjj′

  • SjjSj′j′

⇒ Estimate Σ−1 to unravel the graph structure

NV2 | Mining co-expression networks 11/32

slide-21
SLIDE 21

Overview on co-expression network analysis Case study 1 Case study 2 References

Partial correlation and GGM

Gaussian Graphical Model framework:

(Xi)i=1,...,n are i.i.d. Gaussian random variables N(0, Σ) (gene

expression); then j ←→ j′(genes j and j′ are linked) ⇔ Cor

  • Xj, Xj′|(Xk)kj,j′
  • If (concentration matrix) S = Σ−1,

Cor

  • Xj, Xj′|(Xk)kj,j′
  • = −

Sjj′

  • SjjSj′j′

⇒ Estimate Σ−1 to unravel the graph structure

Problem: Σ: p-dimensional matrix and n ≪ p ⇒ (

Σn)−1 is a poor

estimate of S)!

NV2 | Mining co-expression networks 11/32

slide-22
SLIDE 22

Overview on co-expression network analysis Case study 1 Case study 2 References

Estimation in GGM

Graphical Gaussian Model estimation seminal work:

[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with

shrinkage and a proposal for a Bayesian test of significance)

estimate Σ−1 by ( Σn + λI)−1 use a Bayesian test to test which coefficients are significantly non zero.

NV2 | Mining co-expression networks 12/32

slide-23
SLIDE 23

Overview on co-expression network analysis Case study 1 Case study 2 References

Estimation in GGM

Graphical Gaussian Model estimation seminal work:

[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b] (with

shrinkage and a proposal for a Bayesian test of significance)

estimate Σ−1 by ( Σn + λI)−1 use a Bayesian test to test which coefficients are significantly non zero.

sparse approaches:

[Meinshausen and Bühlmann, 2006, Friedman et al., 2008]: ∀ j,

estimate the linear models Xj = βT

j X−j + ǫ by penalized ML

arg min(βjj′)j′

n

i=1

  • Xij − βT

j X−j i

2 + λβjL1, with βjL1 =

j′ |βjj′|, L1 penalty yields to βjj′ = 0 for most j′

(variable selection)

NV2 | Mining co-expression networks 12/32

slide-24
SLIDE 24

Overview on co-expression network analysis Case study 1 Case study 2 References

Visualization

Purpose: How to display the nodes in a meaningful and aesthetic way?

NV2 | Mining co-expression networks 13/32

slide-25
SLIDE 25

Overview on co-expression network analysis Case study 1 Case study 2 References

Visualization

Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) (e.g., [Fruchterman and Reingold, 1991])

NV2 | Mining co-expression networks 13/32

slide-26
SLIDE 26

Overview on co-expression network analysis Case study 1 Case study 2 References

Visualization

Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) (e.g., [Fruchterman and Reingold, 1991]) attractive forces: similar to springs along the edges

NV2 | Mining co-expression networks 13/32

slide-27
SLIDE 27

Overview on co-expression network analysis Case study 1 Case study 2 References

Visualization

Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) (e.g., [Fruchterman and Reingold, 1991]) attractive forces: similar to springs along the edges repulsive forces: similar to electric forces between all pairs of vertices

NV2 | Mining co-expression networks 13/32

slide-28
SLIDE 28

Overview on co-expression network analysis Case study 1 Case study 2 References

Visualization

Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) (e.g., [Fruchterman and Reingold, 1991]) iterative algorithm until stabilization of the vertex positions.

NV2 | Mining co-expression networks 13/32

slide-29
SLIDE 29

Overview on co-expression network analysis Case study 1 Case study 2 References

Important node extraction

1

vertex degree: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity.

NV2 | Mining co-expression networks 14/32

slide-30
SLIDE 30

Overview on co-expression network analysis Case study 1 Case study 2 References

Important node extraction

1

vertex degree: number of edges adjacent to a given vertex. Vertices with a high degree are called hubs: measure of the vertex popularity.

2

vertex betweenness: number of shortest paths between all pairs of vertices that pass through the vertex. Betweenness is a centrality measure (vertices with a large betweenness that are the

most likely to disconnect the network if removed).

The orange node’s degree is equal to 2, its betweenness to 4.

NV2 | Mining co-expression networks 14/32

slide-31
SLIDE 31

Overview on co-expression network analysis Case study 1 Case study 2 References

Vertex clustering

Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are

  • ften called communities (social sciences) or modules (biology).

NV2 | Mining co-expression networks 15/32

slide-32
SLIDE 32

Overview on co-expression network analysis Case study 1 Case study 2 References

Vertex clustering

Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are

  • ften called communities (social sciences) or modules (biology).

Several clustering methods: min cut minimization minimizes the number of edges between clusters; spectral clustering [von Luxburg, 2007] and kernel clustering

uses eigen-decomposition of the Laplacian

Lij =

−wij if i j

di

  • therwise

(matrix strongly related to the graph structure); Generative (Bayesian) models [Zanghi et al., 2008]; Markov clustering simulate a flow on the graph; modularity maximization ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,

Schaeffer, 2007, Brohée and van Helden, 2006])

NV2 | Mining co-expression networks 15/32

slide-33
SLIDE 33

Overview on co-expression network analysis Case study 1 Case study 2 References

Modularity optimization

The modularity [Newman and Girvan, 2004] of the partition

(C1, . . . , CK) is equal to: Q(C1, . . . , CK) =

1 2m

K

  • k=1
  • xi,xj∈Ck

(Wij − Pij)

with Pij: weight of a “null model” (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1

2

  • ji Wij.

NV2 | Mining co-expression networks 16/32

slide-34
SLIDE 34

Overview on co-expression network analysis Case study 1 Case study 2 References

Interpretation

A good clustering should maximize the modularity:

Q ր when (xi, xj) are in the same cluster and Wij ≫ Pij Q ց when (xi, xj) are in two different clusters and Wij ≫ Pij

(m = 20) Pij = 7.5 Wij = 5 ⇒ Wij − Pij = −2.5 di = 15 dj = 20 i and j in the same cluster decreases the modularity

NV2 | Mining co-expression networks 17/32

slide-35
SLIDE 35

Overview on co-expression network analysis Case study 1 Case study 2 References

Interpretation

A good clustering should maximize the modularity:

Q ր when (xi, xj) are in the same cluster and Wij ≫ Pij Q ց when (xi, xj) are in two different clusters and Wij ≫ Pij

(m = 20) Pij = 0.05 Wij = 5 ⇒ Wij − Pij = 4.95 di = 1 dj = 2 i and j in the same cluster increases the modularity

NV2 | Mining co-expression networks 17/32

slide-36
SLIDE 36

Overview on co-expression network analysis Case study 1 Case study 2 References

Interpretation

A good clustering should maximize the modularity:

Q ր when (xi, xj) are in the same cluster and Wij ≫ Pij Q ց when (xi, xj) are in two different clusters and Wij ≫ Pij

Modularity

helps separate hubs ( spectral clustering or min cut criterion); is not an increasing function of the number of clusters: useful to choose the relevant number of clusters (with a grid search: several values are tested, the clustering with the highest modularity is kept) but modularity has a small resolution default (see [Fortunato and Barthélémy, 2007])

NV2 | Mining co-expression networks 17/32

slide-37
SLIDE 37

Overview on co-expression network analysis Case study 1 Case study 2 References

Interpretation

A good clustering should maximize the modularity:

Q ր when (xi, xj) are in the same cluster and Wij ≫ Pij Q ց when (xi, xj) are in two different clusters and Wij ≫ Pij

Modularity

helps separate hubs ( spectral clustering or min cut criterion); is not an increasing function of the number of clusters: useful to choose the relevant number of clusters (with a grid search: several values are tested, the clustering with the highest modularity is kept) but modularity has a small resolution default (see [Fortunato and Barthélémy, 2007])

Main issue: Optimization = NP-complete problem (exhaustive search is not not usable) Different solutions are provided in [Newman and Girvan, 2004,

Blondel et al., 2008, Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011]

(among others) and some of them are implemented in the R package igraph.

NV2 | Mining co-expression networks 17/32

slide-38
SLIDE 38

Overview on co-expression network analysis Case study 1 Case study 2 References

Outline

1

Overview on co-expression network analysis

2

Case study 1: gene network analysis in relations with meat quality

3

Case study 2: gene network analysis in LCD experiment

NV2 | Mining co-expression networks 18/32

slide-39
SLIDE 39

Overview on co-expression network analysis Case study 1 Case study 2 References

Dataset description

[Villa-Vialaneix et al., 2013]

F2: 1200 animals muscle sampling phenotypic measures (30) (pH ...)

NV2 | Mining co-expression networks 19/32

slide-40
SLIDE 40

Overview on co-expression network analysis Case study 1 Case study 2 References

Dataset description

[Villa-Vialaneix et al., 2013]

F2: 1200 animals muscle sampling phenotypic measures (30) (pH ...) Used data: 57 F2 pigs (largest variability for PH); transcriptomic data for 272 genes regulated by an eQTL Problems with these particular data: how to understand the relationships between these genes’ expression as their co-expression is weaker than between

  • ther kind of genes (TF/genes, for instance)?

how to relate gene expression with a phenotype of interest (muscle pH)?

NV2 | Mining co-expression networks 19/32

slide-41
SLIDE 41

Overview on co-expression network analysis Case study 1 Case study 2 References

Inferred network description

Use of [Schäfer and Strimmer, 2005a] Obtained network: 272 nodes (connected); Density: 6,4%; Transitivity: 25,4%

NV2 | Mining co-expression networks 20/32

slide-42
SLIDE 42

Overview on co-expression network analysis Case study 1 Case study 2 References

Inferred network description

Use of [Schäfer and Strimmer, 2005a] Obtained network: 272 nodes (connected); Density: 6,4%; Transitivity: 25,4% degree distribution

Degrés Frequency 5 10 15 20 25 30 10 20 30 40

NV2 | Mining co-expression networks 20/32

slide-43
SLIDE 43

Overview on co-expression network analysis Case study 1 Case study 2 References

Inferred network description

Use of [Schäfer and Strimmer, 2005a] Obtained network: 272 nodes (connected); Density: 6,4%; Transitivity: 25,4% 8 genes both have high degree and high betweenness BX921641; FTH1; TRIAP1; SLC9A14; GPI; SUZ12; MGP; PRDX4 and several have been identified by the biologist as relevant to meat quality.

NV2 | Mining co-expression networks 20/32

slide-44
SLIDE 44

Overview on co-expression network analysis Case study 1 Case study 2 References

Clustering

clustering with modularity optimization: 7 clusters for each cluster, annotated genes submitted to IPA3 (bibliographic network database): from 71% to 94% of the genes of a single cluster belong to the same IPA network with a biological function associated

3https://analysis.ingenuity.com/pa NV2 | Mining co-expression networks 21/32

slide-45
SLIDE 45

Overview on co-expression network analysis Case study 1 Case study 2 References

Relation to muscle pH

model: label each node of the network with its partial correlation to the muscle pH. Questions: is there a relation between muscle pH and network structure? is there a relation between clustering and muscle pH?

NV2 | Mining co-expression networks 22/32

slide-46
SLIDE 46

Overview on co-expression network analysis Case study 1 Case study 2 References

Relation to muscle pH

model: label each node of the network with its partial correlation to the muscle pH. Moran’s I (used in spatial statistics): I =

1 2m

  • ij wij¯

ci¯ cj

1 n

  • i ¯

c2

i

, where m = 1

2

  • ij Wij and ci is the partial correlation with pH, ¯

ci = ci − ¯ c with ¯ c = 1

n

  • i ci. Using a MC simulation (edge permutations):

Moran's I Frequency −0.05 0.00 0.05 0.10 0.15 0.20 50 100 150

Moran’s I is significantly larger than expected: genes tend to be linked to genes which have a similar correlation to muscle pH.

NV2 | Mining co-expression networks 22/32

slide-47
SLIDE 47

Overview on co-expression network analysis Case study 1 Case study 2 References

Relation to muscle pH

model: label each node of the network with its partial correlation to the muscle pH.

  • 1

2 3 4 5 6 7 −0.02 −0.01 0.00 0.01 0.02 0.03 Cluster Partial correlation with PH

Significant Student test for cluster 4: its correlation with pH is larger than for the other clusters

NV2 | Mining co-expression networks 22/32

slide-48
SLIDE 48

Overview on co-expression network analysis Case study 1 Case study 2 References

Moran’s plot

Moran’s plot help to emphasize influential points: WC vs C

  • −0.02

−0.01 0.00 0.01 0.02 0.03 −0.005 0.000 0.005 0.010 CorPH A x CorPH

H−H H−L L−L L−H

NV2 | Mining co-expression networks 23/32

slide-49
SLIDE 49

Overview on co-expression network analysis Case study 1 Case study 2 References

Moran’s plot

Moran’s plot help to emphasize influential points: WC vs C

  • −0.02

−0.01 0.00 0.01 0.02 0.03 −0.005 0.000 0.005 0.010 CorPH A x CorPH

H−H H−L L−L L−H

Associated influential measures and tests for finding influential points.

NV2 | Mining co-expression networks 23/32

slide-50
SLIDE 50

Overview on co-expression network analysis Case study 1 Case study 2 References

Influential points: example of cluster 4

  • BX919092

PSMC3IP THRB XIAP ARHGAP8 X91721 BX917912 EAPP LSM2 BX922053 BX922491 H2AFY ENH_RAT LMF1 FTCD BX925690 B2M GPI BX667979 BX920538 BX671131 RNF2 BX673501 KPNA1 BX674063 BX918923 RPS11 UBE2H

Cluster 4

NV2 | Mining co-expression networks 24/32

slide-51
SLIDE 51

Overview on co-expression network analysis Case study 1 Case study 2 References

Outline

1

Overview on co-expression network analysis

2

Case study 1: gene network analysis in relations with meat quality

3

Case study 2: gene network analysis in LCD experiment

NV2 | Mining co-expression networks 25/32

slide-52
SLIDE 52

Overview on co-expression network analysis Case study 1 Case study 2 References

Data: DIOGENES project

Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...)

NV2 | Mining co-expression networks 26/32

slide-53
SLIDE 53

Overview on co-expression network analysis Case study 1 Case study 2 References

Data: DIOGENES project

Experimental protocol 135 obese women and 3 times: before LCD, after a 2-month LCD and 6 months later (between the end of LCD and the last measurement, women are randomized into one of 5 recommended diet groups). At every time step, 221 gene expressions, 28 fatty acids and 15 clinical variables (i.e., weight, HDL, ...) Correlations between gene expressions and between a gene expression and a fatty acid levels are not of the same order: inference method must be different inside the groups and between two groups.

NV2 | Mining co-expression networks 26/32

slide-54
SLIDE 54

Overview on co-expression network analysis Case study 1 Case study 2 References

Data: DIOGENES project

Data pre-processing At CID3, individuals are split into three groups: weight loss, weight regain and stable weight (groups are not correlated to the diet group according to χ2-test).

NV2 | Mining co-expression networks 26/32

slide-55
SLIDE 55

Overview on co-expression network analysis Case study 1 Case study 2 References

Method for CID1, CID2 and 3× CID3

Network inference Clustering Mining 3 inter-dataset networks rCCA

merge into one network

3 intra-dataset networks sparse partial correlation 5 networks CID1 CID2 3×CID3 Extract important nodes Study/Compare clusters

NV2 | Mining co-expression networks 27/32

slide-56
SLIDE 56

Overview on co-expression network analysis Case study 1 Case study 2 References

Inference

Intra-level networks: use of partial correlations and a sparse approach (graphical Lasso as in the R package gLasso) to select edges [Friedman et al., 2008]

NV2 | Mining co-expression networks 28/32

slide-57
SLIDE 57

Overview on co-expression network analysis Case study 1 Case study 2 References

Inference

Intra-level networks: use of partial correlations and a sparse approach (graphical Lasso as in the R package gLasso) to select edges [Friedman et al., 2008] Inter-levels networks: use of regularized CCA (as in the R package mixOmics) to evaluate strength of the correlations [Lê Cao et al., 2009]

NV2 | Mining co-expression networks 28/32

slide-58
SLIDE 58

Overview on co-expression network analysis Case study 1 Case study 2 References

Inference

Intra-level networks: use of partial correlations and a sparse approach (graphical Lasso as in the R package gLasso) to select edges [Friedman et al., 2008] Inter-levels networks: use of regularized CCA (as in the R package mixOmics) to evaluate strength of the correlations [Lê Cao et al., 2009] Combination of the 6 informations: tune the number of edges intra

  • r inter-levels so that it is of the order of the number of nodes in the

corresponding level(s)

NV2 | Mining co-expression networks 28/32

slide-59
SLIDE 59

Overview on co-expression network analysis Case study 1 Case study 2 References

Brief overview on results

5 networks inferred with 264 nodes each: CID1 CID2 CID3g1 CID3g2 CID3g3 size LCC 244 251 240 259 258 density 2.3% 2.3% 2.3% 2.3% 2.3% transitivity 17.2% 11.9% 21.6% 10.6% 10.4% nb clusters 14 (2-52) 10 (4-52) 11 (2-46) 12 (2-51) 12 (3-54)

clusters were visualized and analyzed for important node extraction

NV2 | Mining co-expression networks 29/32

slide-60
SLIDE 60

Overview on co-expression network analysis Case study 1 Case study 2 References

Findings

CID1: clusters were found to be associated to biological functions (fatty acids biosynthesis, adhesion and diapedesis...) CID3: for people with weight loss, an unexpected fatty acid was found to be an important node (high betweenness) in a cluster linked to fatty acids biosynthesis

NV2 | Mining co-expression networks 30/32

slide-61
SLIDE 61

Overview on co-expression network analysis Case study 1 Case study 2 References

Conclusion

biological network mining can help the biologist comprehend the complex biological system in its whole groups of genes are more robust models, often linked to a biological function, than pairwise relations between genes simple tools, such as, numeric characteristics are useful to extract important nodes

NV2 | Mining co-expression networks 31/32

slide-62
SLIDE 62

Overview on co-expression network analysis Case study 1 Case study 2 References

Thank you for your attention... ... questions?

NV2 | Mining co-expression networks 32/32

slide-63
SLIDE 63

Overview on co-expression network analysis Case study 1 Case study 2 References Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communites in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008:1742–5468. Brohée, S. and van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7(488). Butte, A. and Kohane, I. (1999). Unsupervised knowledge discovery in medical databases using relevance networks. In Proceedings of the AMIA Symposium, pages 711–715. Butte, A. and Kohane, I. (2000). Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Proceedings of the Pacific Symposium on Biocomputing, pages 418–429. Fortunato, S. and Barthélémy, M. (2007). Resolution limit in community detection. In Proceedings of the National Academy of Sciences, volume 104, pages 36–41. doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract. Friedman, J., Hastie, T., and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441. Fruchterman, T. and Reingold, B. (1991). Graph drawing by force-directed placement. Software, Practice and Experience, 21:1129–1164. NV2 | Mining co-expression networks 32/32

slide-64
SLIDE 64

Overview on co-expression network analysis Case study 1 Case study 2 References Lê Cao, K., González, I., and Déjean, S. (2009). *****Omics: an R package to unravel relationships between two omics data sets. Bioinformatics, 25(21):2855–2856. Meinshausen, N. and Bühlmann, P . (2006). High dimensional graphs and variable selection with the lasso. Annals of Statistic, 34(3):1436–1462. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. Noack, A. and Rotta, R. (2009). Multi-level algorithms for modularity clustering. In SEA ’09: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268, Berlin,

  • Heidelberg. Springer-Verlag.

Rossi, F. and Villa-Vialaneix, N. (2011). Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets. Journal de la Société Française de Statistique, 152(3):34–65. Schaeffer, S. (2007). Graph clustering. Computer Science Review, 1(1):27–64. Schäfer, J. and Strimmer, K. (2005a). An empirical bayes approach to inferring large-scale gene association networks. Bioinformatics, 21(6):754–764. NV2 | Mining co-expression networks 32/32

slide-65
SLIDE 65

Overview on co-expression network analysis Case study 1 Case study 2 References Schäfer, J. and Strimmer, K. (2005b). A shrinkage approach to large-scale covariance matrix estimation and implication for functional genomics. Statistical Applications in Genetics and Molecular Biology, 4:1–32. Verzelen, N. (2012). Minimax risks for sparse regressions: ultra-high-dimensional phenomenons. Electronic Journal of Statistics, 6:38–90. Villa-Vialaneix, N., Liaubet, L., Laurent, T., Cherel, P ., Gamot, A., and San Cristobal, M. (2013). The structure of a gene co-expression network reveals biological functions underlying eQTLs. PLoS ONE, 8(4):e60045. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416. Zanghi, H., Ambroise, C., and Miele, V. (2008). Fast online graph clustering via erdös-rényi mixture. Pattern Recognition, 41:3592–3599. NV2 | Mining co-expression networks 32/32