Learning graphical models of the brain Ga el Varoquaux functional - - PowerPoint PPT Presentation

learning graphical models of the brain
SMART_READER_LITE
LIVE PREVIEW

Learning graphical models of the brain Ga el Varoquaux functional - - PowerPoint PPT Presentation

Learning graphical models of the brain Ga el Varoquaux functional MRI (fMRI) t Recordings of brain activity G Varoquaux 2 functional MRI (fMRI) t Recordings of brain activity Brain mapping : the motor system: move the right hand


slide-1
SLIDE 1

Learning graphical models of the brain

Ga¨ el Varoquaux

slide-2
SLIDE 2

functional MRI (fMRI)

t

Recordings of brain activity

G Varoquaux 2

slide-3
SLIDE 3

functional MRI (fMRI)

t

Recordings of brain activity Brain mapping: the motor system: “move the right hand” the language system: “say three names of animals”

G Varoquaux 2

slide-4
SLIDE 4

functional MRI (fMRI) Brain mapping: The language network the language system: “say three names of animals”

G Varoquaux 3

slide-5
SLIDE 5

functional MRI (fMRI) Brain mapping: The language network Interacting sub-systems: Sounds Lexical access Syntax the language system: “say three names of animals”

G Varoquaux 3

slide-6
SLIDE 6

The functional connectome View of the brain as a set of regions and their interactions

G Varoquaux 4

slide-7
SLIDE 7

The functional connectome View of the brain as a set of regions and their interactions Intrinsic brain architecture Biomarkers of pathologies Learn a

graphical model

Human Connectome Project: 30M$ G Varoquaux 4

slide-8
SLIDE 8

Resting-state fMRI

G Varoquaux 5

slide-9
SLIDE 9

Outline

1 Graphical structures of brain activity 2 Multi-subject graph learning 3 Beyond ℓ1 models

G Varoquaux 6

slide-10
SLIDE 10

1 Graphical structures of brain activity

Functional connectome Graph of interactions between regions

[Varoquaux & Craddock 2013] G Varoquaux 7

slide-11
SLIDE 11

1 From correlations to connectomes Conditional independence structure?

G Varoquaux 8

slide-12
SLIDE 12

1 Probabilistic model for interactions Simplest data generating process = multivariate normal:

P(X) ∝

  • |Σ−1|e−1

2XT Σ−1X

Model parametrized by inverse covariance matrix, K = Σ−1: conditional covariances Goodness of fit: likelihood of observed covariance ˆ Σ in model Σ

L( ˆ Σ|K) = log |K| − trace( ˆ Σ K)

G Varoquaux 9

slide-13
SLIDE 13

1 Graphical structure from correlations

Observations

Covariance

1 2 3 4

Diagonal: signal variance

Direct connections

Inverse covariance

1 2 3 4

Diagonal: node innovation

G Varoquaux 10

slide-14
SLIDE 14

1 Independence structure (Markov graph) Zeros in partial correlations give conditional independence Reflects the large-scale brain interaction structure

G Varoquaux 11

slide-15
SLIDE 15

1 Independence structure (Markov graph) Zeros in partial correlations give conditional independence Ill-posed problem: multi-collinearity ⇒ noisy partial correlations Independence between nodes makes estimation

  • f partial correlations well-conditionned.

Chicken and egg problem

G Varoquaux 11

slide-16
SLIDE 16

1 Independence structure (Markov graph) Zeros in partial correlations give conditional independence Ill-posed problem: multi-collinearity ⇒ noisy partial correlations Independence between nodes makes estimation

  • f partial correlations well-conditionned.

1 2 3 4 1 2 3 4

+

Joint estimation:

Sparse inverse covariance

G Varoquaux 11

slide-17
SLIDE 17

1 Sparse inverse covariance estimation: penalized

[Varoquaux NIPS 2010] [Smith 2011]

Maximum a posteriori: Fit models with a penalty Sparsity ⇒ Lasso-like problem: ℓ1 penalization

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

Data fit, Likelihood Penalization,

x2 x1

G Varoquaux 12

slide-18
SLIDE 18

1 Sparse inverse covariance estimation: penalized Maximum a posteriori: Fit models with a penalty Sparsity ⇒ Lasso-like problem: ℓ1 penalization

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

x2 x1

Optimal graph almost dense

2.5 3.0 3.5 4.0

−log10λ Test-data likelihood Sparsity G Varoquaux 12

slide-19
SLIDE 19

1 Sparse inverse covariance estimation: penalized Maximum a posteriori: Fit models with a penalty Sparsity ⇒ Lasso-like problem: ℓ1 penalization

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

x2 x1

Optimal graph almost dense

2.5 3.0 3.5 4.0

−log10λ Test-data likelihood Sparsity

Bias of ℓ1: very sparse graphs don’t fit the data

G Varoquaux 12

slide-20
SLIDE 20

1 Sparse inverse covariance estimation: penalized Maximum a posteriori: Fit models with a penalty Sparsity ⇒ Lasso-like problem: ℓ1 penalization

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

x2 x1

Optimal graph almost dense

2.5 3.0 3.5 4.0

−log10λ Test-data likelihood Sparsity

Bias of ℓ1: very sparse graphs don’t fit the data Algorithmic considerations: Very ill-conditionned input matrices Graph-lasso [Friedman 2008] doesn’t work well primal-dual algorithm with approximation when switching from dual to primal [Mazumder, 2012] Good success with ADMM split optimization: loss solved with SPD matrices penalty solved with sparse matrices

G Varoquaux 12

slide-21
SLIDE 21

1 Very sparse graphs: greedy construction

[Varoquaux J. Physio Paris, 2012]

Sparse inverse covariance algorithm: PC-DAG [Rutimann & Buhlmann 2009] Greedy approach

  • 1. PC-alg: fill graph by independence tests

conditioning on neighbors

  • 2. Learn covariance on resulting structure

Good for very sparse graphs

G Varoquaux 13

slide-22
SLIDE 22

1 Sparse graphs: greedy construction

[Varoquaux J. Physio Paris, 2012]

Iterate construction alg. High-degree nodes appear very quickly

complexity ∝ exp degree

20

Fillingfactor (percents) Test data likelihood

Lattice-like structure with hubs

G Varoquaux 14

slide-23
SLIDE 23

2 Multi-subject graph learning

Not enough data per subject to recover structure

G Varoquaux 15

slide-24
SLIDE 24

2 Subject-level data scarsity Sparse recovery for Gaussian graphs ℓ1 structure recovery has phase-transitions behaviors For Gaussian graphs with s edges, p nodes: n = O

  • (s + p) log p
  • ,

s = o

√p

  • [Lam & Fan 2009]

Need to accumulate data across subjects Concatenate series = iid data

G Varoquaux 16

slide-25
SLIDE 25

2 Graphs on group data

[Varoquaux NIPS 2010]

ˆ Σ−1

Sparse inverse Sparse group concat

Likelihood of new data (cross-validation) Subject data, Σ−1

  • 57.1

Subject data, sparse inverse 43.0 Group concat data, Σ−1 40.6 Group concat data, sparse inverse 41.8 Inter-subect variability

G Varoquaux 17

slide-26
SLIDE 26

2 Multi-subject modeling

[Varoquaux NIPS 2010]

Common independence structure but different connection values

{Ks} = argmin

{Ks≻0}

  • s L( ˆ

Σs|Ks) + λ ℓ21({Ks})

Multi-subject data fit, Likelihood Group-lasso penalization

G Varoquaux 18

slide-27
SLIDE 27

2 Multi-subject modeling

[Varoquaux NIPS 2010]

Common independence structure but different connection values

{Ks} = argmin

{Ks≻0}

  • s L( ˆ

Σs|Ks) + λ ℓ21({Ks})

Multi-subject data fit, Likelihood ℓ1 on the connections of the ℓ2 on the subjects

G Varoquaux 18

slide-28
SLIDE 28

2 Population-sparse graph perform better

[Varoquaux NIPS 2010]

ˆ Σ−1

Sparse inverse Population prior

Likelihood of new data (cross-validation) sparsity Subject data, Σ−1

  • 57.1

Subject data, sparse inverse 43.0 60% full Group concat data, Σ−1 40.6 Group concat data, sparse inverse 41.8 80% full Group sparse model 45.6 20% full

G Varoquaux 19

slide-29
SLIDE 29

2 Independence structure of brain activity Subject-sparse estimate

G Varoquaux 20

slide-30
SLIDE 30

2 Independence structure of brain activity Population- sparse estimate

G Varoquaux 20

slide-31
SLIDE 31

2 Large scale organization High-level cognitive function arises from the interplay of specialized brain regions: The functional segregation of local areas [...] contrasts sharply with their global integration during perception and behavior

[Tononi 1994]

Functional segregation: nodes of connectome atomic functions – tonotopy Global integration: functional networks high-level functions – language

G Varoquaux 21

slide-32
SLIDE 32

2 Large scale organization High-level cognitive function arises from the interplay of specialized brain regions: The functional segregation of local areas [...] contrasts sharply with their global integration during perception and behavior

[Tononi 1994]

Scale-free hierarchical integration / segregation Graph modularity = divide in communities to maximize intra-class connections versus extra-class

[Eguiluz 2005] G Varoquaux 21

slide-33
SLIDE 33

2 Graph cuts to isolate functional communities Find communities to maximize modularity: Q =

k

  • c=1

 A(Vc, Vc)

A(V , V ) −

 A(V , Vc)

A(V , V )

  2 

A(Va, Vb): sum of edges going from Va to Vb Rewrite as an eigenvalue problem [White 2005]

A ·

1 1 1 1 0 0

⇒ Spectral clustering = spectral embedding + k-means Similar to normalized graph cuts

G Varoquaux 22

slide-34
SLIDE 34

2 Large scale organization Neural communities Non-sparse

G Varoquaux 23

slide-35
SLIDE 35

2 Large scale organization Neural communities = large known functional networks Group-sparse

G Varoquaux 23

slide-36
SLIDE 36

2 Brain integration between communities Proposed measure for functional integration: mutual information (Tononi)

[Marrelec 2008, Varoquaux & Craddock 2013]

Integration: Ic1 = 1

2 log det(Kc1)

“energy” in network Mutual information: Mc1,c2 = Ic1∪c2 − Ic1 − Is2 “cross-talks” between networks

G Varoquaux 24

slide-37
SLIDE 37

2 Brain integration between communities

[Varoquaux NIPS 2010]

With population prior:

Posterior inferior temporal 2 Posterior inferior temporal 1 Lateral visual areas M edial visual areas Occipital pole visual areas Default mode network Fronto-parietal networks Fronto-lateral network Pars

  • percularis

Dorsal motor Ventral motor Auditory Basal ganglia Left Putamen Cingulo-insular network Right T halamus

Raw correlations:

G Varoquaux 24

slide-38
SLIDE 38

3 Beyond ℓ1 models

2.5 3.0 3.5 4.0

−log10λ Test-data likelihood Sparsity

G Varoquaux 25

slide-39
SLIDE 39

3 Weighted-ℓ1: incorporating additional prior

[Ng MICCAI 2012]

Not all connections are as likely Tractography: the physical wiring Noisy estimate of likelihood of functional connection ⇒ Provides a soft prior: P(func conn) ∝ exp

  • −anat conn

σ

  • Graph MAP estimate:

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

λi,j = λ0 exp

  • −anat conn

σ

  • G Varoquaux

26

slide-40
SLIDE 40

3 Weighted-ℓ1: incorporating additional prior

[Ng MICCAI 2012]

Not all connections are as likely Tractography: the physical wiring Noisy estimate of likelihood of functional connection ⇒ Provides a soft prior: P(func conn) ∝ exp

  • −anat conn

σ

  • Graph MAP estimate:

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K)

λi,j = λ0 exp

  • −anat conn

σ

  • Limitation:

Tractography estimates unreliable

G Varoquaux 26

slide-41
SLIDE 41

3 Reweighted-ℓ1: learning inhomogenous penalty

[Phlypo MICCAI 2014]

Ideas As in regression reweighted ℓ1 [Candes 2008]: First ℓ1 estimates gives rescaling for penalties ⇒ Support recovery in heteroschedastic settings Equivalent to non-convex ℓ0 approximation But we have no edge-level residual As in stability selection [Meinshausen 2010]: Edges stable to perturbations most likely

G Varoquaux 27

slide-42
SLIDE 42

3 Reweighted-ℓ1: learning inhomogenous penalty

[Phlypo MICCAI 2014]

Perturbations We have many subjects: run an ℓ1 model per subject ⇒ Posterior probability of edge presence: Pij

fit a binomial

Reweighting

K = argmin

K≻0 L( ˆ

Σ|K) + λ ℓ1(K) λi,j = λ0 Pij

G Varoquaux 28

slide-43
SLIDE 43

Statistical learning for functional connectomes fMRI: scarsity of data + low SNR Graphical Gaussian models: sparse inverse covariance ℓ1/ℓ21 penalty Iterative non convexity Software: Python, open source http://scikit-learn.org http://nilearn.github.io

slide-44
SLIDE 44

Statistical learning for functional connectomes Complex graph with a modular structure The communities are cognitive networks that link to behavior

Posterior inferior temporal 2 Posterior inferior temporal 1 Lateral visual areas M edial visual areas Occipital pole visual areas Default mode network Fronto-parietal networks Fronto-lateral network Pars

  • percularis

Dorsal motor Ventral motor Auditory Basal ganglia Left Putamen Cingulo-insular network Right T halamus

Requires the definition of regions

[Abraham 2013]

@GaelVaroquaux