Bayesian modelling of brain network data via latent space models
Emanuele Aliverti, University of Padova, Department of Statistical Sciences emanuelealiverti.github.io aliverti@stat.unipd.it 22th November 2019
Bayesian modelling of brain network data via latent space models - - PowerPoint PPT Presentation
Bayesian modelling of brain network data via latent space models Emanuele Aliverti , University of Padova, Department of Statistical Sciences emanuelealiverti.github.io aliverti@stat.unipd.it 22 th November 2019 Brain Networks
Emanuele Aliverti, University of Padova, Department of Statistical Sciences emanuelealiverti.github.io aliverti@stat.unipd.it 22th November 2019
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Lobe and hemisphere information
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Lobe and hemisphere information ◮ For each individual, brain network
connections (edges)
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Lobe and hemisphere information ◮ For each individual, brain network
connections (edges)
◮ High-resolution scans with
n = 998 for m = 5 subjects (Hagmann et al., 2008)
2
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Lobe and hemisphere information ◮ For each individual, brain network
connections (edges)
◮ High-resolution scans with
n = 998 for m = 5 subjects (Hagmann et al., 2008)
2
◮ Non-invasive imaging technologies provide accurate data on brain activity
and structure at increasing resolution for multiple subjects
◮ Neuro–imaging study comprising
data for m = 21 individuals (Landman et al., 2011).
◮ n = 68 brain regions (nodes),
spatially located
◮ Lobe and hemisphere information ◮ For each individual, brain network
connections (edges)
◮ High-resolution scans with
n = 998 for m = 5 subjects (Hagmann et al., 2008)
◮ Goal: investigate network connectivity patterns, accounting for anatomical
constraints and unobservable patters (e.g. shapes, functionalities)
2
◮ For each subject, data can be represented as an (n × n) symmetric
adjacency matrix A(k), k = 1, . . . , m
Subject 37 3
◮ For each subject, data can be represented as an (n × n) symmetric
adjacency matrix A(k), k = 1, . . . , m
◮ a(k)
ij
= a(k)
ji
= 1 if at least one white matter fiber has been observed between regions i = 2, . . . , n and j = 1, . . . , i − 1
◮ a(k)
ij
= a(k)
ji
= 0 otherwise.
Subject 37 3
◮ For each subject, data can be represented as an (n × n) symmetric
adjacency matrix A(k), k = 1, . . . , m
◮ a(k)
ij
= a(k)
ji
= 1 if at least one white matter fiber has been observed between regions i = 2, . . . , n and j = 1, . . . , i − 1
◮ a(k)
ij
= a(k)
ji
= 0 otherwise. Anatomical information
◮ Spatial coordinates for the i-th
region (xi, yi, zi)
◮ Lobes and hemisphere membership
→ lobeij = 1 region i and region j are in the same lobe → hemiij = 1 region i and region j are in the same hemisphere
Subject 37 3
◮ Developed in social sciences (e.g. Hoff et al., 2002; Hoff, 2008)
Anduin Aragorn Arathorn Arwen Bag End Balin Beregond Bilbo Bill Boromir Bree Celeborn Company Denethor Durin Dwarves Edoras Elendil Elrond Elves Ents Éomer Eorl Éowyn Faramir Frodo Galadriel Gandalf Gildor Gimli Glóin Glorfindel Gollum Gondor Gorbag Wormtongue Haldir Helm Hobbiton Isengard Isildur Legolas Lórien Lothlórien Mount Doom Merry Mordor Morgul Moria Nine Riders Númenor Orcs Orthanc Osgiliath Pippin Ring Rivendell Rohan Sam Saruman Sauron Shadowfax Shelob Shire Théoden Thorin Thráin Bombadil Treebeard
◮ Edges a(k)
ij
are conditionally independent given their own probability πij
4
◮ Developed in social sciences (e.g. Hoff et al., 2002; Hoff, 2008)
Anduin Aragorn Arathorn Arwen Bag End Balin Beregond Bilbo Bill Boromir Bree Celeborn Company Denethor Durin Dwarves Edoras Elendil Elrond Elves Ents Éomer Eorl Éowyn Faramir Frodo Galadriel Gandalf Gildor Gimli Glóin Glorfindel Gollum Gondor Gorbag Wormtongue Haldir Helm Hobbiton Isengard Isildur Legolas Lórien Lothlórien Mount Doom Merry Mordor Morgul Moria Nine Riders Númenor Orcs Orthanc Osgiliath Pippin Ring Rivendell Rohan Sam Saruman Sauron Shadowfax Shelob Shire Théoden Thorin Thráin Bombadil Treebeard
◮ Edges a(k)
ij
are conditionally independent given their own probability πij
◮ The probability πij of
connection between i and j, is a function of their positions in a H-dimensional latent space
4
◮ Developed in social sciences (e.g. Hoff et al., 2002; Hoff, 2008)
Anduin Aragorn Arathorn Arwen Bag End Balin Beregond Bilbo Bill Boromir Bree Celeborn Company Denethor Durin Dwarves Edoras Elendil Elrond Elves Ents Éomer Eorl Éowyn Faramir Frodo Galadriel Gandalf Gildor Gimli Glóin Glorfindel Gollum Gondor Gorbag Wormtongue Haldir Helm Hobbiton Isengard Isildur Legolas Lórien Lothlórien Mount Doom Merry Mordor Morgul Moria Nine Riders Númenor Orcs Orthanc Osgiliath Pippin Ring Rivendell Rohan Sam Saruman Sauron Shadowfax Shelob Shire Théoden Thorin Thráin Bombadil Treebeard
◮ Edges a(k)
ij
are conditionally independent given their own probability πij
◮ The probability πij of
connection between i and j, is a function of their positions in a H-dimensional latent space
Benefits
◮ Reduce dimensionality from
n × (n − 1)/2 to n × H
◮ Takes into account network
properties (e.g. transitivity, homophily)
4
◮ Each latent coordinate may be interpreted as measure of its
“propensity” for different functions / metabolic processes
◮ Regions with similar propensities are more likely to be connected
5
◮ Each latent coordinate may be interpreted as measure of its
“propensity” for different functions / metabolic processes
◮ Regions with similar propensities are more likely to be connected
Desiderata
◮ Allow modelling of replicated
networks, multiple subjects
→ Joint modelling of m networks
5
◮ Each latent coordinate may be interpreted as measure of its
“propensity” for different functions / metabolic processes
◮ Regions with similar propensities are more likely to be connected
Desiderata
◮ Allow modelling of replicated
networks, multiple subjects
→ Joint modelling of m networks
◮ Include covariates
→ Connectivity as a function of anatomical constraints (distance, lobes)
5
◮ Each latent coordinate may be interpreted as measure of its
“propensity” for different functions / metabolic processes
◮ Regions with similar propensities are more likely to be connected
Desiderata
◮ Allow modelling of replicated
networks, multiple subjects
→ Joint modelling of m networks
◮ Include covariates
→ Connectivity as a function of anatomical constraints (distance, lobes)
◮ Estimate local clusters of brain
regions
→ Some regions might be similar
features.
5
◮ Focus on modelling A = m
k=1 A(k)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij − ¯ dij
6
◮ Focus on modelling A = m
k=1 A(k)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij − ¯ dij
◮ dij =
→ Euclidean distance between region i and j in the original space
◮ (β1, β2, β3) ∈ R3 effect of lobe and hemisphere membership and
distance between regions
6
◮ Focus on modelling A = m
k=1 A(k)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij − ¯ dij
◮ dij =
→ Euclidean distance between region i and j in the original space
◮ (β1, β2, β3) ∈ R3 effect of lobe and hemisphere membership and
distance between regions
◮ ¯
dij =
xi − ¯ xj)2 + (¯ yi − ¯ yj)2 + (¯ zi − ¯ zj)2
→ Euclidean distance between region i and j in the latent space
◮ (¯
xi, ¯ yi, ¯ zi) ∈ R3 latent coordinates of region i
6
◮ Joint clustering of the latent coordinates might ignore important local
differences
7
◮ Joint clustering of the latent coordinates might ignore important local
differences
◮ Estimate groups of brain regions similar to subsets of latent features
¯ xi ∼ Px, Px =
Hx
νxhN(µxh, σ2
xh),
i = 1, . . . , n, ¯ yi ∼ Py, Py =
Hy
νyhN(µyh, σ2
yh),
i = 1, . . . , n, ¯ zi ∼ Pz, Pz =
Hz
νzhN(µzh, σ2
zh),
i = 1, . . . , n,
7
◮ Joint clustering of the latent coordinates might ignore important local
differences
◮ Estimate groups of brain regions similar to subsets of latent features
¯ xi ∼ Px, Px =
Hx
νxhN(µxh, σ2
xh),
i = 1, . . . , n, ¯ yi ∼ Py, Py =
Hy
νyhN(µyh, σ2
yh),
i = 1, . . . , n, ¯ zi ∼ Pz, Pz =
Hz
νzhN(µzh, σ2
zh),
i = 1, . . . , n,
◮ Sparse Dirichlet on νx, νx and νy to favour deletion of redundant
components (Rousseau and Mengersen, 2011)
◮ Gaussian for (β0, β1, β2, β3), conditionally conjugate trough the
Pòlya-Gamma data augmentation (Polson et al., 2013).
◮ Metropolis step for updating the latent coordinates (Euclidean distance)
7
Latent Positions x − y Top view Latent Positions x − z Front view Latent Positions y − z Side view Real Positions x − y Top view Real Positions x − z Front view Real Positions y − z Side view −10 −5 5 10 −10 −5 5 10 −4 4 30 60 90 120 150 30 60 90 120 150 50 100 150 50 75 100 125 −10 −5 5 50 75 100 125 −10 −5 5 50 100 150 −4 4
hemi
left right
lobe
frontal inter−hemispheric limbic
parietal temporal
8
◮ Inference on the separate partitions
x − y Top view Clustering of x x − y Top view Clustering of y x − y Top view Clustering of z
9
◮ Inference on the separate partitions
x − y Top view Clustering of x x − y Top view Clustering of y x − y Top view Clustering of z
◮ ...and on the coefficients
Mean Median
Intercept 7.27 7.27 0.18 (6.94, 7.60) hemisphere 0.60 0.61 0.18 (0.29, 0.92) lobes 0.24 0.24 0.06 (0.13,0.35) distance
0.06 (-0.47,-0.23)
9
Without Latent Space Positions With Latent Space Positions
Front L 3L 12L 14L 17L 18L 19L 20L 24L 27L 28L 32L Inter L 4L Limb L 2L 10L 16L 23L 26L Temp L 1L 6L 7L 9L 15L 30L 33L 34L Pari L 8L 22L 25L 29L 31L Occ L 5L 11L 13L 21L Front R 3R 12R 14R 17R 18R 19R 20R 24R 27R 28R 32R Inter R 4R Limb R 2R 10R 16R 23R 26R Temp R 1R 6R 7R 9R 15R 30R 33R 34R Pari R 8R 22R 25R 29R 31R Occ R 5R 11R 13R 21R Front L 3L 12L 14L 17L 18L 19L 20L 24L 27L 28L 32L Inter L 4L Limb L 2L 10L 16L 23L 26L Temp L 1L 6L 7L 9L 15L 30L 33L 34L Pari L 8L 22L 25L 29L 31L Occ L 5L 11L 13L 21L Front R 3R 12R 14R 17R 18R 19R 20R 24R 27R 28R 32R Inter R 4R Limb R 2R 10R 16R 23R 26R Temp R 1R 6R 7R 9R 15R 30R 33R 34R Pari R 8R 22R 25R 29R 31R Occ R 5R 11R 13R 21R 21R 13R 11R 5R Occ R 31R 29R 25R 22R 8R Pari R 34R 33R 30R 15R 9R 7R 6R 1R Temp R 26R 23R 16R 10R 2R Limb R 4R Inter R 32R 28R 27R 24R 20R 19R 18R 17R 14R 12R 3R Front R 21L 13L 11L 5L Occ L 31L 29L 25L 22L 8L Pari L 34L 33L 30L 15L 9L 7L 6L 1L Temp L 26L 23L 16L 10L 2L Limb L 4L Inter L 32L 28L 27L 24L 20L 19L 18L 17L 14L 12L 3L Front L10
◮ Issue: Inference relying on Markov Chain Monte Carlo scales poorly ◮ CPU time
→ n = 68, 2 min ×1000 it. → n = 998, 7 hours ×1000 it.
11
◮ Issue: Inference relying on Markov Chain Monte Carlo scales poorly ◮ CPU time
→ n = 68, 2 min ×1000 it. → n = 998, 7 hours ×1000 it.
◮ Approximate Bayesian inference
→ Analytical approximation (e.g Laplace) → Approximate mcmc (e.g. Alquier et al., 2016) → Case-control likelihood (Raftery et al., 2012) → Variational Inference (e.g. Blei et al., 2017)
11
◮ Issue: Inference relying on Markov Chain Monte Carlo scales poorly ◮ CPU time
→ n = 68, 2 min ×1000 it. → n = 998, 7 hours ×1000 it.
◮ Approximate Bayesian inference
→ Analytical approximation (e.g Laplace) → Approximate mcmc (e.g. Alquier et al., 2016) → Case-control likelihood (Raftery et al., 2012) → Variational Inference (e.g. Blei et al., 2017)
◮ Variational inference is widely popular in the network-science literature
(Gollini and Murphy, 2016; Salter-Townshend and Murphy, 2013)
◮ The Euclidean distance requires several Taylor expansions of the complete
data log-likelihood
11
◮ Latent Factor Model (Hoff, 2008)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij + ˜ dij
◮ ˜
dij = ψx˜ xi˜ xj + ψy ˜ yi ˜ yj + ψz˜ zi˜ zj
12
◮ Latent Factor Model (Hoff, 2008)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij + ˜ dij
◮ ˜
dij = ψx˜ xi˜ xj + ψy ˜ yi ˜ yj + ψz˜ zi˜ zj
◮ wi = (˜
xi, ˜ yi, ˜ zi) ∈ R3 latent positions of region i
◮ ψ = (ψx, ψy, ψz) ∈ R3 importance of latent coordinates
12
◮ Latent Factor Model (Hoff, 2008)
(aij | πij) ∼ Binom(m, πij) logit(πij) = β0 + β1hemij + β2lobeij + β3dij + ˜ dij
◮ ˜
dij = ψx˜ xi˜ xj + ψy ˜ yi ˜ yj + ψz˜ zi˜ zj
◮ wi = (˜
xi, ˜ yi, ˜ zi) ∈ R3 latent positions of region i
◮ ψ = (ψx, ψy, ψz) ∈ R3 importance of latent coordinates ◮ Gaussian priors
β = (β0, β1, β2, β3)⊺ ∼ N4(0, Σ0), Σ0 = diag(σ0, . . . , σ3), ˜ xi ∼ N(0, 1), ˜ yi ∼ N(0, 1), ˜ zi ∼ N(0, 1) i = 1, . . . , n, (ψx, ψy, ψx) ∼ N3(0, γψ0I3)
◮ Conditional conjugancy introducing (ωij | −) ∼ PG (m, logit(πij))
12
◮ Variational Bayes: find best approximation of the true posterior p
in a restricted class Q of distributions q⋆(β, W, ω, ψ) = arg min
q∈Q
KL {q(β, W, ω, ψ) || p(β, W, ω, ψ | A)} .
13
◮ Variational Bayes: find best approximation of the true posterior p
in a restricted class Q of distributions q⋆(β, W, ω, ψ) = arg min
q∈Q
KL {q(β, W, ω, ψ) || p(β, W, ω, ψ | A)} .
◮ Mean Field: product restriction
Q = {q(β, W, ω, ψ) : q(β, W, ω, ψ) = q(β)q(W)q(ω)q(ψ)}
13
◮ Variational Bayes: find best approximation of the true posterior p
in a restricted class Q of distributions q⋆(β, W, ω, ψ) = arg min
q∈Q
KL {q(β, W, ω, ψ) || p(β, W, ω, ψ | A)} .
◮ Mean Field: product restriction
Q = {q(β, W, ω, ψ) : q(β, W, ω, ψ) = q(β)q(W)q(ω)q(ψ)}
◮ Analytical form for the optimal factors (same EF form than
Full-Conditionals) q⋆(β) ∝ exp
q⋆
i (wi) ∝ exp
i = 1, . . . , n (Gaussian) q⋆(ψ) ∝ exp
q⋆
ij(ωij) ∝ exp
13
◮ Variational Bayes: find best approximation of the true posterior p
in a restricted class Q of distributions q⋆(β, W, ω, ψ) = arg min
q∈Q
KL {q(β, W, ω, ψ) || p(β, W, ω, ψ | A)} .
◮ Mean Field: product restriction
Q = {q(β, W, ω, ψ) : q(β, W, ω, ψ) = q(β)q(W)q(ω)q(ψ)}
◮ Analytical form for the optimal factors (same EF form than
Full-Conditionals) q⋆(β) ∝ exp
q⋆
i (wi) ∝ exp
i = 1, . . . , n (Gaussian) q⋆(ψ) ∝ exp
q⋆
ij(ωij) ∝ exp
◮ CAVI: cycle over each variational factor until convergence
13
Latent Positions x ~ − y ~ Top view Latent Positions x ~ − z ~ Front view Latent Positions y ~ − z ~ Side view Real Positions x − y Top view Real Positions x − z Front view Real Positions y − z Side view −0.05 0.00 0.05 −0.05 0.00 0.05 −0.05 0.00 0.05 0.10 −40 40 −40 40 −100 −50 50 −25 25 50 75 −0.05 0.00 0.05 0.10 0.15 −25 25 50 75 −0.05 0.00 0.05 0.10 0.15 −100 −50 50 −0.05 0.00 0.05 0.10
hemi
left right
lobe
bankssts central cingulate cuneus entorhinal frontal fusiform lingual
parahippocampal parietal parsopercularis parsorbitalis parstriangularis pericalcarine precuneus supramarginal temporal
14
◮ There has been considerable interest in Bayesian modelling of brain
network data
◮ We extended the classical latent space models for networks including,
replicated networks, inclusion of covariates, marginal partitioning
15
◮ There has been considerable interest in Bayesian modelling of brain
network data
◮ We extended the classical latent space models for networks including,
replicated networks, inclusion of covariates, marginal partitioning
◮ Results suggests a general tendency of brain regions to connect with
hemisphere
◮ However, this determinants are not sufficient to explain brain
connectivity, and inference on the latent space provides additional insights on the architecture not explained by physical constraints
15
◮ There has been considerable interest in Bayesian modelling of brain
network data
◮ We extended the classical latent space models for networks including,
replicated networks, inclusion of covariates, marginal partitioning
◮ Results suggests a general tendency of brain regions to connect with
hemisphere
◮ However, this determinants are not sufficient to explain brain
connectivity, and inference on the latent space provides additional insights on the architecture not explained by physical constraints
Emanuele Aliverti and Daniele Durante (2019). “Spatial modeling of brain connectivity data via latent distance models with nodes clustering”. In: Statistical Analysis and Data Mining: The ASA Data Science Journal 12.3, pp. 185–196 Emanuele Aliverti and Massimiliano Russo (2019). Scalable inference for the network factor model. 15
Aliverti, Emanuele and Daniele Durante (2019). “Spatial modeling of brain connectivity data via latent distance models with nodes clustering”. In: Statistical Analysis and Data Mining: The ASA Data Science Journal 12.3, pp. 185–196. Aliverti, Emanuele and Massimiliano Russo (2019). Scalable inference for the network factor model. Alquier, Pierre et al. (2016). “Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels”. In: Statistics and Computing 26.1-2, pp. 29–47. Blei, David M, Alp Kucukelbir, and Jon D McAuliffe (2017). “Variational inference: A review for statisticians”. In: Journal of the American Statistical Association 112.518, pp. 859–877. Gollini, Isabella and Thomas Brendan Murphy (2016). “Joint modeling of multiple network views”. In: Journal of Computational and Graphical Statistics 25.1, pp. 246–265. Hagmann, Patric et al. (2008). “Mapping the structural core of human cerebral cortex”. In: PLoS biology 6.7, e159. Hoff, P.D. (2008). “Modeling homophily and stochastic equivalence in symmetric relational data”. In: Advances in Neural Information Processing Systems, pp. 657–664. Hoff, P.D., A.E. Raftery, and M.S. Handcock (2002). “Latent space approaches to social network analysis”. In: Journal of the American Statistical Association 97.460,
16
Landman, Bennett A et al. (2011). “Multi-parametric neuroimaging reproducibility: a 3-T resource study”. In: Neuroimage 54.4, pp. 2854–2866. Polson, Nicholas G, James G Scott, and Jesse Windle (2013). “Bayesian inference for logistic models using Pólya–Gamma latent variables”. In: Journal of the American statistical Association 108.504, pp. 1339–1349. Raftery, Adrian E et al. (2012). “Fast inference for the latent space network model using a case-control approximate likelihood”. In: Journal of Computational and Graphical Statistics 21.4, pp. 901–919. Rousseau, J. and K. Mengersen (2011). “Asymptotic behaviour of the posterior distribution in
Methodology) 73.5, pp. 689–710. Salter-Townshend, Michael and Thomas Brendan Murphy (2013). “Variational Bayesian inference for the latent position cluster model for network data”. In: Computational Statistics & Data Analysis 57.1, pp. 661–671. 17