[PPT] - Inferring Sparse Gaussian Graphical Models for Biological Network PowerPoint Presentation

SLIDE 1

Inferring Sparse Gaussian Graphical Models for Biological Network

Christophe Ambroise Camille Charbonnier, Julien Chiquet, Gilles Grasseau, Catherine Matias, Yves Grandvalet

Laboratoire Statistique et G´ enome, UMR CNRS 8071 - Universit´ e d’´ Evry AAFD’10, 29 juin 2010

SIMoNe: inferring structured Gaussian networks 1

SLIDE 2

Problem

n ≈ 10s/100s of slides g ≈ 1000s of genes O(g2) parameters (edges) ! Inference Which interactions?

The main statistical issue is the high dimensional setting

SIMoNe: inferring structured Gaussian networks 2

SLIDE 3

Handling the scarcity of data (1)

By reducing the number of parameters

Assumption

Connections will only appear between informative genes

differential analysis select p key genes P

p “reasonable” compared to n typically, n ∈ [p/5; 5p]

the learning dataset

n size–p vectors of expression (X1, . . . , Xn) with Xi ∈ Rp

inference

SIMoNe: inferring structured Gaussian networks 3

SLIDE 4

Handling the scarcity of data (2)

By collecting as many observations as possible

Multitask learning

Go to learning

How should we merge the data?

rganism

drug 1 drug 2 drug 3 SIMoNe: inferring structured Gaussian networks 4

SLIDE 5

Handling the scarcity of data (2)

By collecting as many observations as possible

Multitask learning

Go to learning

by inferring each network independently

rganism

drug 1 drug 2 drug 3 (X(1) 1 , . . . , X(1) n1 ), X(1) i ∈ Rp1 (X(2) 1 , . . . , X(2) n2 ), X(2) i ∈ Rp2 (X(3) 1 , . . . , X(3) n3 ), X(3) i ∈ Rp3 inference inference inference SIMoNe: inferring structured Gaussian networks 4

SLIDE 6

Handling the scarcity of data (2)

By collecting as many observations as possible

Multitask learning

Go to learning

by pooling all the available data

rganism

drug 1 drug 2 drug 3 (X1, . . . , Xn), Xi ∈ Rp, with n = n1 + n2 + n3. inference SIMoNe: inferring structured Gaussian networks 4

SLIDE 7

Handling the scarcity of data (2)

By collecting as many observations as possible

Multitask learning

Go to learning

by breaking the separability

rganism

drug 1 drug 2 drug 3 (X(1) 1 , . . . , X(1) n1 ), X(1) i ∈ Rp1 (X(2) 1 , . . . , X(2) n2 ), X(2) i ∈ Rp2 (X(3) 1 , . . . , X(3) n3 ), X(3) i ∈ Rp3 inference SIMoNe: inferring structured Gaussian networks 4

SLIDE 8

Handling the scarcity of data (3)

By introducing some prior

Priors should be biologically grounded

1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data

(directedness relies on the modelling).

G0 G1 G2 G3 G4 G5 G6 G7 G8 G9

SIMoNe: inferring structured Gaussian networks 5

SLIDE 9

Handling the scarcity of data (3)

By introducing some prior

Priors should be biologically grounded

1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data

(directedness relies on the modelling).

G0 G1 G2 G3 G4 G5 G6 G7 G8 G9

SIMoNe: inferring structured Gaussian networks 5

SLIDE 10

Handling the scarcity of data (3)

By introducing some prior

Priors should be biologically grounded

1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data

(directedness relies on the modelling).

A1 A2 A3 B1 B2 B3 B4 B5 C1 C2

SIMoNe: inferring structured Gaussian networks 5

SLIDE 11

Handling the scarcity of data (3)

By introducing some prior

Priors should be biologically grounded

1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data

(directedness relies on the modelling).

A1 A2 A3 B1 B2 B3 B4 B5 C1 C2

SIMoNe: inferring structured Gaussian networks 5

SLIDE 12

Handling the scarcity of data (3)

By introducing some prior

Priors should be biologically grounded

1. few genes effectively interact (sparsity),
2. networks are organized (latent clustering),
3. steady-state or time-course data

(directedness relies on the modelling).

A1 A2 A3 B1 B2 B3 B4 B5 C1 C2

SIMoNe: inferring structured Gaussian networks 5

SLIDE 13

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 6

SLIDE 14

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 6

SLIDE 15

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 6

SLIDE 16

The graphical models: general settings

Assumption

A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp.

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.

Graphical interpretation

conditional dependency between X(i) and X(j)

r

non null partial correlation between X(i) and X(j) if and only if

j i SIMoNe: inferring structured Gaussian networks 7

SLIDE 17

The graphical models: general settings

Assumption

A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp.

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.

Graphical interpretation

conditional dependency between X(i) and X(j)

r

non null partial correlation between X(i) and X(j) if and only if

j i

?

SIMoNe: inferring structured Gaussian networks 7

SLIDE 18

The graphical models: general settings

Assumption

A microarray can be represented as a multivariate Gaussian vector X = (X(1), . . . , X(p)) ∈ Rp.

Collecting gene expression

1. Steady-state data leads to an i.i.d. sample.
2. Time-course data gives a time series.

Graphical interpretation

conditional dependency between Xt(i) and Xt−1(j)

r

non null partial correlation between Xt(i) and Xt−1(j) if and only if

j i

?

SIMoNe: inferring structured Gaussian networks 7

SLIDE 19

The general statistical approach

Let Θ be the parameters to infer (the edges).

A penalized likelihood approach

ˆ Θλ = arg max

Θ L(Θ; data) − λ penℓ1(Θ, Z), ◮ L is the model log-likelihood, ◮ Z is a latent clustering of the network, ◮ penℓ1 is a penalty function tuned by λ > 0.

It performs

1. regularization (needed when n ≪ p),
2. selection (sparsity induced by the ℓ1-norm),
3. model-driven inference (penalty adapted according to Z).

SIMoNe: inferring structured Gaussian networks 8

SLIDE 20

The general statistical approach

Let Θ be the parameters to infer (the edges).

A penalized likelihood approach

ˆ Θλ = arg max

Θ L(Θ; data) − λ penℓ1(Θ, Z), ◮ L is the model log-likelihood, ◮ Z is a latent clustering of the network, ◮ penℓ1 is a penalty function tuned by λ > 0.

It performs

1. regularization (needed when n ≪ p),
2. selection (sparsity induced by the ℓ1-norm),
3. model-driven inference (penalty adapted according to Z).

SIMoNe: inferring structured Gaussian networks 8

SLIDE 21

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 9

SLIDE 22

The Gaussian model for an i.i.d. sample

Let

◮ X ∼ N(0p, Σ) with X1, . . . , Xn i.i.d. copies of X, ◮ X be the n × p matrix whose kth row is Xk, ◮ Θ = (θij)i,j∈P Σ−1 be the concentration matrix.

Graphical interpretation

Since corij|P\{i,j} = −θij/

θiiθjj for i = j,

X(i) ⊥ ⊥ X(j)|X(P\{i, j}) ⇔    θij = 0

r

edge (i, j) / ∈ network. Θ describes the undirected graph of conditional dependencies.

SIMoNe: inferring structured Gaussian networks 10

SLIDE 23

Neighborhood selection (1)

Let

◮ Xi be the ith column of X, ◮ X\i be X deprived of Xi.

Xi = X\iβ + ε, where βj = −θij θii .

Meinshausen and B¨ ulhman, 2006

Since sign(corij|P\{i,j}) = sign(βj), select the neighbors of i with arg min

β

1 n

Xi − X\iβ
2

2 + λ βℓ1 .

The sign pattern of Θλ is inferred after a symmetrization step.

SIMoNe: inferring structured Gaussian networks 11

SLIDE 24

The Gaussian likelihood for an i.i.d. sample

Let S = n−1X⊺X be the empirical variance-covariance matrix: S is a sufficient statistic of Θ.

The log-likelihood

Liid(Θ; S) = n 2 log det(Θ) − n 2 Trace(SΘ) + n 2 log(2π). The MLE = S−1 of Θ is not defined for n < p and never sparse. The need for regularization is huge.

SIMoNe: inferring structured Gaussian networks 12

SLIDE 25

Neighborhood vs. Likelihood

Pseudo-likelihood (Besag, 1975)

P(X1, . . . , Xp) ≃

p

j=1

P(Xj|{Xk}k=j)

L(Θ; S) = n

2 log det(D) − n 2 trace

SD−1Θ2

− n 2 log(2π) L(Θ; S) = n 2 log det(Θ) − n 2 trace(SΘ) − n 2 log(2π) with D = diag(Θ) Proposition Neighborhood selection leads to the graph maximizing the penalized pseudo-log-likelihood Proof:

βj
i = −
θij
θjj

, where Θ maximizes the penalized pseudo-log-likelihood

SIMoNe: inferring structured Gaussian networks 13

SLIDE 26

Penalized log-likelihood

Banerjee et al., JMLR 2008

ˆ Θλ = arg max

Θ

Liid(Θ; S) − λΘℓ1, efficiently solved by the graphical LASSO of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009

Use adaptive penalty parameters for different coefficients Liid(Θ; S) − λPZ ⋆ Θℓ1, where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efficient).

SIMoNe: inferring structured Gaussian networks 14

SLIDE 27

Penalized log-likelihood

Banerjee et al., JMLR 2008

ˆ Θλ = arg max

Θ

Liid(Θ; S) − λΘℓ1, efficiently solved by the graphical LASSO of Friedman et al, 2008.

Ambroise, Chiquet, Matias, EJS 2009

Use adaptive penalty parameters for different coefficients ˜ Liid(Θ; S) − λPZ ⋆ Θℓ1, where PZ is a matrix of weights depending on the underlying clustering Z. Works with the pseudo log-likelihood (computationally efficient).

SIMoNe: inferring structured Gaussian networks 14

SLIDE 28

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 15

SLIDE 29

The Gaussian model for time-course data (1)

Let X1, . . . , Xn be a first order vector autoregressive process Xt = ΘXt−1 + b + εt, t ∈ [1, n] where we are looking for Θ = (θij)i,j∈P and

◮ X0 ∼ N(0p, Σ0), ◮ εt is a Gaussian white noise with covariance σ2Ip, ◮ cov(Xt, εs) = 0 for s > t, so that Xt is markovian.

Graphical interpretation

since θij = cov (Xt(i), Xt−1(j)|Xt−1(P\j)) var (Xt−1(j)|Xt−1(P\j)) , Xt(i) ⊥ ⊥ Xt−1(j)|Xt−1(P\j) ⇔    θij = 0

r

edge (j i) / ∈ network

SIMoNe: inferring structured Gaussian networks 16

SLIDE 30

The Gaussian model for time-course data (2)

Interpr´ etation

Homogeneous Markov Process

SIMoNe: inferring structured Gaussian networks 17

SLIDE 31

The Gaussian model for time-course data (3)

Let

◮ X be the n × p matrix whose kth row is Xk, ◮ S = n−1X⊺ \nX\n be the within time covariance matrix, ◮ V = n−1X⊺ \nX\0 be the across time covariance matrix.

The log-likelihood

Ltime(Θ; S, V) = n Trace (VΘ) − n 2 Trace (Θ⊺SΘ) + c. The MLE = S−1V of Θ is still not defined for n < p.

SIMoNe: inferring structured Gaussian networks 18

SLIDE 32

Penalized log-likelihood

Charbonnier, Chiquet, Ambroise, SAGMB 2010

ˆ Θλ = arg max

Θ

Ltime(Θ; S, V) − λPZ ⋆ Θℓ1 where PZ is a (non-symmetric) matrix of weights depending on the underlying clustering Z.

Major difference with the i.i.d. case

The graph is directed: θij = cov (Xt(i), Xt−1(j)|Xt−1(P\j)) var (Xt−1(j)|Xt−1(P\j)) = cov (Xt(j), Xt−1(i)|Xt−1(P\i)) var (Xt−1(i)|Xt−1(P\i)) .

SIMoNe: inferring structured Gaussian networks 19

SLIDE 33

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 20

SLIDE 34

Coupling related problems

Consider

◮ T samples concerning the expressions of the same p genes, ◮ X(t) 1 , . . . , X(t) nt is the tth sample drawn from N(0p, Σ(t)), with

covariance matrix S(t).

Multiple samples setup

Go to scheme

Ignoring the relationships between the tasks leads to arg max

Θ(t),t=1...,T T

t=1

L(Θ(t); S(t)) − λ penℓ1(Θ(t), Z).

Breaking the separability

◮ Either by modifying the objective function ◮ or the constraints.

SIMoNe: inferring structured Gaussian networks 21

SLIDE 35

Coupling related problems

Consider

◮ T samples concerning the expressions of the same p genes, ◮ X(t) 1 , . . . , X(t) nt is the tth sample drawn from N(0p, Σ(t)), with

covariance matrix S(t).

Multiple samples setup

Go to scheme

Ignoring the relationships between the tasks leads to arg max

Θ(t),t=1...,T T

t=1

L(Θ(t); S(t)) − λ penℓ1(Θ(t), Z).

Breaking the separability

◮ Either by modifying the objective function ◮ or the constraints.

SIMoNe: inferring structured Gaussian networks 21

SLIDE 36

Coupling related problems

Consider

◮ T samples concerning the expressions of the same p genes, ◮ X(t) 1 , . . . , X(t) nt is the tth sample drawn from N(0p, Σ(t)), with

covariance matrix S(t).

Multiple samples setup

Go to scheme

Ignoring the relationships between the tasks leads to arg max

Θ(t),t=1...,T T

t=1

L(Θ(t); S(t)) − λ penℓ1(Θ(t), Z).

Breaking the separability

◮ Either by modifying the objective function ◮ or the constraints.

Remarks

◮ In the sequel, the Z is eluded for clarity (no loss of generality). ◮ Multitask learning is easily adapted to time-course data yet only

steady state version is presented here.

SIMoNe: inferring structured Gaussian networks 21

SLIDE 37

Coupling problems through the objective function

The Intertwined LASSO

max

Θ(t),t...,T T

t=1

˜ L(Θ(t); ˜ S(t)) − λΘ(t)ℓ1

◮ ¯

S = 1

n

T

t=1 ntS(t) is an “across-task” covariance matrix. ◮ ˜

S(t) = αS(t) + (1 − α)¯ S is a mixture between inner/over-tasks covariance matrices. setting α = 0 is equivalent to pooling all the data and infer one common network, setting α = 1 is equivalent to treating T independent problems.

SIMoNe: inferring structured Gaussian networks 22

SLIDE 38

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 23

SLIDE 39

Coupling Through Penalties

Group-LASSO

We group parameters by sets of corresponding edges across graphs:

X1 X2 X3 X4

Graphical Group-LASSO

max

Θ(t),t...,T T

t=1

˜ L

Θ(t); S(t)

− λ

i,j∈P

i=j

T

t=1
θ(t)

ij

2 1/2 . ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously.

SIMoNe: inferring structured Gaussian networks 24

SLIDE 40

Coupling Through Penalties

Group-LASSO

We group parameters by sets of corresponding edges across graphs:

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Group-LASSO

max

Θ(t),t...,T T

t=1

˜ L

Θ(t); S(t)

− λ

i,j∈P

i=j

T

t=1
θ(t)

ij

2 1/2 . ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously.

SIMoNe: inferring structured Gaussian networks 24

SLIDE 41

Coupling Through Penalties

Group-LASSO

We group parameters by sets of corresponding edges across graphs:

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Group-LASSO

max

Θ(t),t...,T T

t=1

˜ L

Θ(t); S(t)

− λ

i,j∈P

i=j

T

t=1
θ(t)

ij

2 1/2 . ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously.

SIMoNe: inferring structured Gaussian networks 24

SLIDE 42

Coupling Through Penalties

Group-LASSO

We group parameters by sets of corresponding edges across graphs:

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Group-LASSO

max

Θ(t),t...,T T

t=1

˜ L

Θ(t); S(t)

− λ

i,j∈P

i=j

T

t=1
θ(t)

ij

2 1/2 . ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously.

SIMoNe: inferring structured Gaussian networks 24

SLIDE 43

Coupling Through Penalties

Group-LASSO

We group parameters by sets of corresponding edges across graphs:

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Group-LASSO

max

Θ(t),t...,T T

t=1

˜ L

Θ(t); S(t)

− λ

i,j∈P

i=j

T

t=1
θ(t)

ij

2 1/2 . ⌣ Most relationships between the genes are kept or removed across all tasks simultaneously.

SIMoNe: inferring structured Gaussian networks 24

SLIDE 44

A Geometric View of Sparsity

Constrained Optimization

L(β1, β2) β2 β1 max

β1,β2 L(β1, β2) − λΩ(β1, β2)

SIMoNe: inferring structured Gaussian networks 25

SLIDE 45

A Geometric View of Sparsity

Constrained Optimization

L(β1, β2) β2 β1 max

β1,β2 L(β1, β2) − λΩ(β1, β2)

⇔

max

β1,β2

L(β1, β2) s.t. Ω(β1, β2) ≤ c

SIMoNe: inferring structured Gaussian networks 25

SLIDE 46

A Geometric View of Sparsity

Constrained Optimization

β2 β1 max

β1,β2 L(β1, β2) − λΩ(β1, β2)

⇔

max

β1,β2

L(β1, β2) s.t. Ω(β1, β2) ≤ c

SIMoNe: inferring structured Gaussian networks 25

SLIDE 47

Group-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1

β(t)

j 2

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 26

SLIDE 48

Group-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1

β(t)

j 2

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 26

SLIDE 49

Group-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1

β(t)

j 2

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 26

SLIDE 50

Group-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1

β(t)

j 2

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 26

SLIDE 51

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 27

SLIDE 52

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 53

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 54

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 55

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 56

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 57

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 58

Coupling Through Penalties

Coop-LASSO

Same grouping, and bet that correlations are likely to be sign consistent Gene interactions are either inhibitory or activating across assays

X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4

Graphical Coop-LASSO

max

Θ(t),t...,T T

t=1

˜ L

S(t); Θ(t)

− λ

i,j∈P

i=j

   T

t=1
θ(t)

ij

2

+

1/2 + T

t=1
θ(t)

ij

2

−

1/2   , where [u]+ = max(0, u) and [u]− = min(0, u). ⌣ Inside a group, interaction are most likeliky sign consistent ⌣ Plausible in many other situations

SIMoNe: inferring structured Gaussian networks 28

SLIDE 59

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 60

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 61

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 62

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 63

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 64

Coop-LASSO balls

Admissible set for

◮ 2 tasks (T = 2) ◮ 2 coefficients (p = 2) 2

j=1

2

t=1
β(t)

j

2

+

1/2 +

2

j=1

2

t=1
−β(t)

j

2

+

1/2 ≤ 1

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

1 1 −1 −1

β(1)

1

β(1)

2

β(2)

1

= 0 β(2)

1

= 0.3 β(2)

2

= 0 β(2)

2

= 0.3

SIMoNe: inferring structured Gaussian networks 29

SLIDE 65

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 30

SLIDE 66

The overall strategy

Our basic criteria is of the form L(Θ; data) − λ PZ ⋆ Θℓ1.

What we are looking for

◮ the edges, through Θ, ◮ the correct level of sparsity λ, ◮ the underlying clustering Z with connectivity matrix πZ.

What SIMoNe does

1. Infer a family of networks G = {Θλ : λ ∈ [λmax, 0]}
2. Select G⋆ that maximizes an information criteria
3. Learn Z on the selected network G⋆
4. Infer a family of networks with PZ ∝ 1 − πZ
5. Select G⋆

Z that maximizes an information criteria

SIMoNe: inferring structured Gaussian networks 31

SLIDE 67

The overall strategy

Our basic criteria is of the form L(Θ; data) − λ PZ ⋆ Θℓ1.

What we are looking for

◮ the edges, through Θ, ◮ the correct level of sparsity λ, ◮ the underlying clustering Z with connectivity matrix πZ.

What SIMoNe does

1. Infer a family of networks G = {Θλ : λ ∈ [λmax, 0]}
2. Select G⋆ that maximizes an information criteria
3. Learn Z on the selected network G⋆
4. Infer a family of networks with PZ ∝ 1 − πZ
5. Select G⋆

Z that maximizes an information criteria

SIMoNe: inferring structured Gaussian networks 31

SLIDE 68

SIMoNe

SIMoNE

Suppose you want to recover a clustered network:

Target Adjacency Matrix Target Network SIMoNe: inferring structured Gaussian networks 32

SLIDE 69

SIMoNe

SIMoNE

Start with microarray data

Data SIMoNe: inferring structured Gaussian networks 32

SLIDE 70

SIMoNe

SIMoNE

Data Adjacency Matrix corresponding to G⋆

SIMoNE without prior

SIMoNe: inferring structured Gaussian networks 32

SLIDE 71

SIMoNe

SIMoNE

Data Adjacency Matrix corresponding to G⋆

SIMoNE without prior

πZ

Connectivity matrix Mixer Penalty matrix PZ Decreasing transformation SIMoNe: inferring structured Gaussian networks 32

SLIDE 72

SIMoNe

SIMoNE

Data Adjacency Matrix corresponding to G⋆

SIMoNE without prior

πZ

Connectivity matrix Mixer Penalty matrix PZ Decreasing transformation Adjacency Matrix corresponding to G⋆ Z

+

SIMoNE

SIMoNe: inferring structured Gaussian networks 32

SLIDE 73

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 33

SLIDE 74

Tuning the penalty parameter

What does the literature say?

Theory based penalty choices

1. Optimal order of penalty in the p ≫ n framework: √n log p

Bunea et al. 2007, Bickel et al. 2009

2. Control on the probability of connecting two distinct connectivity

sets

Meinshausen et al. 2006, Banerjee et al. 2008, Ambroise et al. 2009

practically much too conservative

Cross-validation

◮ Optimal in terms of prediction, not in terms of selection ◮ Problematic with small samples:

changes the sparsity constraint due to sample size

SIMoNe: inferring structured Gaussian networks 34

SLIDE 75

Tuning the penalty parameter

BIC / AIC

Theorem (Zou et al. 2008)

df(ˆ βlasso

λ

) =

ˆ

βlasso

λ

Straightforward extensions to the graphical framework

BIC(λ) = L( ˆ Θλ; X) − df( ˆ Θλ)log n 2 AIC(λ) = L( ˆ Θλ; X) − df( ˆ Θλ)

◮ Rely on asymptotic approximations, but still relevant for small data

set

◮ Easily adapted to Liid, ˜

Liid, Ltime and multitask framework.

SIMoNe: inferring structured Gaussian networks 35

SLIDE 76

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 36

SLIDE 77

MixNet

Erd¨

s-R´

enyi Mixture for Networks

The data is now the network itself

A = (aij = 1{θij=0}.)i,j∈P adjacency matrix associated to Θ:

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π•• n = 10, Z5• = 1 a12 = 1, a15 = 0

Binary case

◮ Q groups (=colors •••). ◮ {Zi}1≤i≤n i.i.d. vectors Zi = (Zi1, . . . , ZiQ) ∼ M(1, α) ◮ Conditional on the {Zi}’s, the random variables Aij are

independent B(πZiZj).

SIMoNe: inferring structured Gaussian networks 37

SLIDE 78

MixNet

Erd¨

s-R´

enyi Mixture for Networks

The data is now the network itself

A = (aij = 1{θij=0}.)i,j∈P adjacency matrix associated to Θ:

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π•• n = 10, Z5• = 1 a12 = 1, a15 = 0

Binary case

◮ Q groups (=colors •••). ◮ {Zi}1≤i≤n i.i.d. vectors Zi = (Zi1, . . . , ZiQ) ∼ M(1, α) ◮ Conditional on the {Zi}’s, the random variables Aij are

independent B(πZiZj).

SIMoNe: inferring structured Gaussian networks 37

SLIDE 79

MixNet

Erd¨

s-R´

enyi Mixture for Networks

The data is now the network itself

A = (aij = 1{θij=0}.)i,j∈P adjacency matrix associated to Θ:

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π•• n = 10, Z5• = 1 a12 = 1, a15 = 0

Binary case

◮ Q groups (=colors •••). ◮ {Zi}1≤i≤n i.i.d. vectors Zi = (Zi1, . . . , ZiQ) ∼ M(1, α) ◮ Conditional on the {Zi}’s, the random variables Aij are

independent B(πZiZj).

SIMoNe: inferring structured Gaussian networks 37

SLIDE 80

MixNet

Erd¨

s-R´

enyi Mixture for Networks

The data is now the network itself

A = (aij = 1{θij=0}.)i,j∈P adjacency matrix associated to Θ:

1 2 3 4 5 6 7 8 4 5 6 7 8

π••

9 10

π•• π•• π•• π•• n = 10, Z5• = 1 a12 = 1, a15 = 0

Binary case

◮ Q groups (=colors •••). ◮ {Zi}1≤i≤n i.i.d. vectors Zi = (Zi1, . . . , ZiQ) ∼ M(1, α) ◮ Conditional on the {Zi}’s, the random variables Aij are

independent B(πZiZj).

SIMoNe: inferring structured Gaussian networks 37

SLIDE 81

Estimation strategy

Likelihoods

◮ the observed data: P(A|α, π) = Z P(A, Z|α, π). ◮ the complete data: P(A, Z|α, π).

The EM criteria

E

log P(A, Z|α, π)|A′

.

requires P(Z|A, α, π) which is not tractable!

SIMoNe: inferring structured Gaussian networks 38

SLIDE 82

Variational inference

Principle

Approximate P(Z|A, α, π) by Rτ(Z) chosen to minimize KL(Rτ(Z); P(Z|A, α, π)), where Rτ is such as log Rτ(Z) =

iq Ziq log τiq and τ are the variational

parameters to optimize.

Variational Bayes (Latouche et al.)

◮ Appropriate priors on α and π, ◮ Good performances especially for the choice of Q and is thus

relevant in the SIMoNe context.

SIMoNe: inferring structured Gaussian networks 39

SLIDE 83

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 40

SLIDE 84

Example 1: time-course data with star-pattern

Simulation settings

1. 50 networks with p = 100 edges, time series of length n = 100,
2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.

SIMoNe: inferring structured Gaussian networks 41

SLIDE 85

Example 1: time-course data with star-pattern

Simulation settings

1. 50 networks with p = 100 edges, time series of length n = 100,
2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.

precision wocl.BIC precision wocl.AIC 0.2 0.4 0.6 0.8

Boxplot of Precision values, without and with structure inference

precision = TP/(TP+FP)

SIMoNe: inferring structured Gaussian networks 41

SLIDE 86

Example 1: time-course data with star-pattern

Simulation settings

1. 50 networks with p = 100 edges, time series of length n = 100,
2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.

recall wocl.BIC recall wcl.BIC recall wocl.AIC recall wcl.AIC 0.2 0.4 0.6 0.8 1.0

Boxplot of Recall values, without and with structure inference

recall = TP/P (power)

SIMoNe: inferring structured Gaussian networks 41

SLIDE 87

Example 1: time-course data with star-pattern

Simulation settings

1. 50 networks with p = 100 edges, time series of length n = 100,
2. two classes, hubs and leaves, with proportions α = (0.1, 0.9),
3. P(hub to leaf) = 0.3, P(hub to hub) = 0.1, 0 otherwise.
fallout wocl.BIC

fallout wcl.BIC fallout wocl.AIC fallout wcl.AIC 0.00 0.01 0.02 0.03 0.04

Boxplot of Fallout values, without and with structure inference

fallout = FP/N (type I error)

SIMoNe: inferring structured Gaussian networks 41

SLIDE 88

Example 2: steady-state, multitask framework

Simulating the tasks

1. generate a “ancestor” with p = 20 node and K = 20 edges,
2. generate T = 4 children by adding and deleting δ edges,
3. generate T = 4 Gaussian samples.

Figure: ancestor and children with δ perturbations

SIMoNe: inferring structured Gaussian networks 42

SLIDE 89

Example 2: steady-state, multitask framework

Simulating the tasks

1. generate a “ancestor” with p = 20 node and K = 20 edges,
2. generate T = 4 children by adding and deleting δ edges,
3. generate T = 4 Gaussian samples.

Figure: ancestor and children with δ perturbations

SIMoNe: inferring structured Gaussian networks 42

SLIDE 90

Example 2: steady-state, multitask framework

Simulating the tasks

1. generate a “ancestor” with p = 20 node and K = 20 edges,
2. generate T = 4 children by adding and deleting δ edges,
3. generate T = 4 Gaussian samples.

Figure: ancestor and children with δ perturbations

SIMoNe: inferring structured Gaussian networks 42

SLIDE 91

Example 2: steady-state, multitask framework

Simulating the tasks

1. generate a “ancestor” with p = 20 node and K = 20 edges,
2. generate T = 4 children by adding and deleting δ edges,
3. generate T = 4 Gaussian samples.

Figure: ancestor and children with δ perturbations

SIMoNe: inferring structured Gaussian networks 42

SLIDE 92

Results

Precision/Recall curve precision = TP/(TP+FP) recall = TP/P (power)

SIMoNe: inferring structured Gaussian networks 43

SLIDE 93

Results Large sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 100 δ = 1

SIMoNe: inferring structured Gaussian networks 43

SLIDE 94

Results Large sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 100 δ = 5

SIMoNe: inferring structured Gaussian networks 43

SLIDE 95

Results Medium sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 50 δ = 1

SIMoNe: inferring structured Gaussian networks 43

SLIDE 96

Results Medium sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 50 δ = 5

SIMoNe: inferring structured Gaussian networks 43

SLIDE 97

Results Small sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

● ● ● ● ● ● ●
CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 25 δ = 1

SIMoNe: inferring structured Gaussian networks 43

SLIDE 98

Results Small sample size

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

● ● ● ● ● ● ●
CoopLasso

GroupLasso Intertwined Independent Pooled

λmax − → recall precision

Figure: nt = 25 δ = 5

SIMoNe: inferring structured Gaussian networks 43

SLIDE 99

Outline

Statistical models Steady-state data Time-course data Multitask learning Group-LASSO Coop-LASSO Algorithms and methods Overall view Model selection Latent structure Numerical experiments Performance on simulated data R package demo: the breast cancer data set

SIMoNe: inferring structured Gaussian networks 44

SLIDE 100

Breast cancer

Prediction of the outcome of preoperative chemotherapy

Two types of patients

Patient response can be classified as

1. either a pathologic complete response (PCR),
2. or residual disease (not PCR).

Gene expression data

◮ 133 patients (99 not PCR, 34 PCR) ◮ 26 identified genes (differential analysis)

SIMoNe: inferring structured Gaussian networks 45

SLIDE 101

Multitask approach: PCR / not PCR

cancer data: graphical cooperative Lasso

SIMoNe: inferring structured Gaussian networks 46

SLIDE 102

Conclusions

To sum-up

◮ Studying the conditional dependencies allows to go beyond

classical differential analysis,

◮ SIMoNe embeds most state-of-the-art statistical methods for GGM

inference based upon ℓ1-penalization,

◮ both steady-state and time course data can be dealt with.

Current work

Adding transversal tools

◮ network comparison, ◮ more critieria to choose the penalty parameter, ◮ interface to Gene Ontology. ◮ Theoretical analysis of the Coop-LASSO

◮ uniqueness of the solution ◮ selection consistency (sparsistence) SIMoNe: inferring structured Gaussian networks 47

SLIDE 103

Publications

Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models.

SIMoNe: inferring structured Gaussian networks 48

SLIDE 104

Publications

Ambroise, Chiquet, Matias, 2009. Inferring sparse Gaussian graphical models with latent structure Electronic Journal of Statistics, 3, 205-238. Chiquet, Smith, Grasseau, Matias, Ambroise, 2009. SIMoNe: Statistical Inference for MOdular NEtworks Bioinformatics, 25(3), 417-418. Charbonnier, Chiquet, Ambroise, 2010. Weighted-Lasso for Structured Network Inference from Time Course Data., SAGMB, 9. Chiquet, Grandvalet, Ambroise, arXiv preprint. Inferring multiple Gaussian graphical models. Working paper: Chiquet, Charbonnier, Ambroise, Grasseau. SIMoNe: An R package for inferring Gausssian networks with latent structure, Journal of Statistical Softwares. Working paper: Chiquet, Grandvalet, Ambroise, Jeanmougin. Biological analysis of breast cancer by multitasks learning.

SIMoNe: inferring structured Gaussian networks 48

SLIDE 105

Network generation

Let fix

◮ the number p = card(P) of nodes, ◮ if the graph is directed or not.

Affiliation matrix A = (aij)i,j∈P

1. usual MixNet framework

◮ the Q × Q matrix Π, with πqℓ = P(aij = 1|i ∈ q, j ∈ ℓ), ◮ the Q-size vector α with αq = P(i ∈ q).

2. constraint MixNet version

◮ the Q × Q matrix Π, with πqℓ = card{(i, j) ∈ P × P : i ∈ q, j ∈ ℓ}, ◮ the Q-size vector α with αq = card({i ∈ P : i ∈ q})/p. SIMoNe: inferring structured Gaussian networks 49

SLIDE 106

Gaussian data generation

The Θ⋆ matrix

1. for undirected case, Θ⋆ is the concentration matrix

◮ compute the normalized Laplacian of A, ◮ generate a symmetric pattern of random signs.

2. for directed case, Θ⋆ represents the VAR(1) parameters

◮ generate random correlations for aij = 0, ◮ normalized by the eigen-value with greatest modulus, ◮ generate a pattern of random signs.

The Gaussian sample X

1. for undirected case,

◮ compute Σ⋆ by pseudo-inversion of Θ⋆, ◮ generate the multivariate Gaussian sample with Cholesky

decomposition of Σ⋆.

2. for directed case,

◮ Θ⋆ permits to generate a stable VAR(1) process. SIMoNe: inferring structured Gaussian networks 50

SLIDE 107

Gaussian data generation

The Θ⋆ matrix

1. for undirected case, Θ⋆ is the concentration matrix

◮ compute the normalized Laplacian of A, ◮ generate a symmetric pattern of random signs.

2. for directed case, Θ⋆ represents the VAR(1) parameters

◮ generate random correlations for aij = 0, ◮ normalized by the eigen-value with greatest modulus, ◮ generate a pattern of random signs.

The Gaussian sample X

1. for undirected case,

◮ compute Σ⋆ by pseudo-inversion of Θ⋆, ◮ generate the multivariate Gaussian sample with Cholesky

decomposition of Σ⋆.

2. for directed case,

◮ Θ⋆ permits to generate a stable VAR(1) process. SIMoNe: inferring structured Gaussian networks 50

SLIDE 108

When is Lasso really good for selection ?

When explanatory variables are highly correlated....

◮ irrepresentable condition is not satisfied, ◮ Lasso is unable to recover the support, ◮ Lasso is unstable (like numerous other variable selection methods.)

Are there solutions ?

◮ elastic net, .... ◮ pre-processing ◮ ...

SIMoNe: inferring structured Gaussian networks 51

SLIDE 109

Is there a proper way to estimate the penalty parameter when p >> n

◮ cross validation is not always adapted since the number of

selected variables depends on the sample size

◮ Information criteria depend on the degree of freedom and are

asymptotic approximations

◮ ....

SIMoNe: inferring structured Gaussian networks 52

SLIDE 110

Structured sparsity

Each application is specific, there are many penalties encouraging sparsity to be proposed:

◮ group-lasso, ◮ fused-lasso, ◮ coop-lasso, ◮ ???-lasso....

SIMoNe: inferring structured Gaussian networks 53