Manifold-regression to predict from MEG/EEG brain signals without - - PowerPoint PPT Presentation

manifold regression to predict from meg eeg brain signals
SMART_READER_LITE
LIVE PREVIEW

Manifold-regression to predict from MEG/EEG brain signals without - - PowerPoint PPT Presentation

Manifold-regression to predict from MEG/EEG brain signals without source modeling D. Sabbagh, P. Ablin, G. Varoquaux, A. Gramfort, D.Engemann NeurIPS 2019 1 / 14 Non-invasive measure of brain activity 2 / 14 Non-invasive measure of brain


slide-1
SLIDE 1

Manifold-regression to predict from MEG/EEG brain signals without source modeling

  • D. Sabbagh, P. Ablin, G. Varoquaux, A. Gramfort, D.Engemann

NeurIPS 2019

1 / 14

slide-2
SLIDE 2

Non-invasive measure of brain activity

2 / 14

slide-3
SLIDE 3

Non-invasive measure of brain activity

2 / 14

slide-4
SLIDE 4

Objective: predict a variable from M/EEG brain signals

3 / 14

slide-5
SLIDE 5

Objective: predict a variable from M/EEG brain signals

We want to predict a continuous variable From M/EEG brain signals

3 / 14

slide-6
SLIDE 6

Objective: predict a variable from M/EEG brain signals

We want to predict a continuous variable From M/EEG brain signals

3 / 14

slide-7
SLIDE 7

Model: generative model of M/EEG data

4 / 14

slide-8
SLIDE 8

Model: generative model of M/EEG data

We measure M/EEG signal of subject i = 1 . . . N on P channels: xi(t) = A si(t) + ni(t) ∈ RP t = 1 . . . T mixing matrix A = [a1, . . . , aQ] ∈ RP×Q fixed accross subjects source patterns aj ∈ RP, j = 1 . . . Q with Q < P source vector si(t) ∈ RQ noise ni(t) ∈ RP

4 / 14

slide-9
SLIDE 9

Model: generative model of M/EEG data

We measure M/EEG signal of subject i = 1 . . . N on P channels: xi(t) = A si(t) + ni(t) ∈ RP t = 1 . . . T mixing matrix A = [a1, . . . , aQ] ∈ RP×Q fixed accross subjects source patterns aj ∈ RP, j = 1 . . . Q with Q < P source vector si(t) ∈ RQ noise ni(t) ∈ RP Under stationnarity and gaussianity assumptions, we can represent band-pass filtered signal by its second-order statistics C i = E[xi(t)xi(t)⊤] ≃ X iX ⊤

i

T ∈ RP×P with X i ∈ RP×T

4 / 14

slide-10
SLIDE 10

Model: generative model of target variable

We want to predict a continuous variable: yi =

Q

  • j=1

αjf (pi,j) ∈ R with pi,j = Et[s2

i,j(t)] band-power of sources

5 / 14

slide-11
SLIDE 11

Model: generative model of target variable

We want to predict a continuous variable: yi =

Q

  • j=1

αjf (pi,j) ∈ R with pi,j = Et[s2

i,j(t)] band-power of sources

linear yi = Q

j=1 αjpi,j

5 / 14

slide-12
SLIDE 12

Model: generative model of target variable

We want to predict a continuous variable: yi =

Q

  • j=1

αjf (pi,j) ∈ R with pi,j = Et[s2

i,j(t)] band-power of sources

linear yi = Q

j=1 αjpi,j

Euclidean vectorization leads to consistent model yi =

k≤l Θk,lC i(k, l)

i.e. yi is linear in coeff. of Upper(C i)

5 / 14

slide-13
SLIDE 13

Model: generative model of target variable

We want to predict a continuous variable: yi =

Q

  • j=1

αjf (pi,j) ∈ R with pi,j = Et[s2

i,j(t)] band-power of sources

linear yi = Q

j=1 αjpi,j

Euclidean vectorization leads to consistent model yi =

k≤l Θk,lC i(k, l)

i.e. yi is linear in coeff. of Upper(C i) log linear yi = Q

j=1 αj log (pi,j)

5 / 14

slide-14
SLIDE 14

Model: generative model of target variable

We want to predict a continuous variable: yi =

Q

  • j=1

αjf (pi,j) ∈ R with pi,j = Et[s2

i,j(t)] band-power of sources

linear yi = Q

j=1 αjpi,j

Euclidean vectorization leads to consistent model yi =

k≤l Θk,lC i(k, l)

i.e. yi is linear in coeff. of Upper(C i) log linear yi = Q

j=1 αj log (pi,j)

C i live on a Riemannian manifold so can’t be naively vectorized

5 / 14

slide-15
SLIDE 15

Riemannian matrix manifolds (in a nutshell)

ξ

M

T M

M

M

M'

LogM Exp

M

6 / 14

slide-16
SLIDE 16

Riemannian matrix manifolds (in a nutshell)

ξ

M

T M

M

M

M'

LogM Exp

M

[Absil & al. Optimization algorithms on matrix manifolds. 2009]

7 / 14

slide-17
SLIDE 17

Riemannian matrix manifolds (in a nutshell)

ξ

M

T M

M

M

M'

LogM Exp

M

Vectorization operator: PM(M′) = φM(LogM(M′)) ≃ Upper(LogM(M′)) d(M, M′) = PM(M′)2 + o(PM(M′)2) d(Mi, Mj) ≃ PM(Mi) − PM(Mj)2 Vectorization operator key for ML

[Absil & al. Optimization algorithms on matrix manifolds. 2009]

7 / 14

slide-18
SLIDE 18

Regression on matrix manifolds

Given a training set of samples M1, . . . , MN ∈ M and target continuous variables y1, . . . , yN ∈ R:

8 / 14

slide-19
SLIDE 19

Regression on matrix manifolds

Given a training set of samples M1, . . . , MN ∈ M and target continuous variables y1, . . . , yN ∈ R: compute the mean of the samples M = Meand(M1, . . . , MN)

8 / 14

slide-20
SLIDE 20

Regression on matrix manifolds

Given a training set of samples M1, . . . , MN ∈ M and target continuous variables y1, . . . , yN ∈ R: compute the mean of the samples M = Meand(M1, . . . , MN) compute the vectorization of the samples w.r.t. this mean: v 1, . . . , v N ∈ RK as v i = PM(Mi)

8 / 14

slide-21
SLIDE 21

Regression on matrix manifolds

Given a training set of samples M1, . . . , MN ∈ M and target continuous variables y1, . . . , yN ∈ R: compute the mean of the samples M = Meand(M1, . . . , MN) compute the vectorization of the samples w.r.t. this mean: v 1, . . . , v N ∈ RK as v i = PM(Mi) use those vectors as features in regularized linear regression algorithm (e.g. ridge regression) with parameters β ∈ RK assuming that yi ≃ v ⊤

i β

8 / 14

slide-22
SLIDE 22

Distance and invariance on positive matrix manifolds

[Bhatia. Positive Definite Matrices. Princeton University Press, 2007]

9 / 14

slide-23
SLIDE 23

Distance and invariance on positive matrix manifolds

Manifold of positive definite matrices: Mi = C i ∈ S++

P

[Bhatia. Positive Definite Matrices. Princeton University Press, 2007]

9 / 14

slide-24
SLIDE 24

Distance and invariance on positive matrix manifolds

Manifold of positive definite matrices: Mi = C i ∈ S++

P

Geometric distance: dG(S, S′) = log(S−1S′)F =

  • P

i=1 log2 λk

1

2

where λk, k = 1 . . . P are the real eigenvalues of S−1S′. Tangent Space Projection: PS(S′) = Upper(log(S− 1

2 S′S− 1 2 ))

[Bhatia. Positive Definite Matrices. Princeton University Press, 2007]

9 / 14

slide-25
SLIDE 25

Distance and invariance on positive matrix manifolds

Manifold of positive definite matrices: Mi = C i ∈ S++

P

Geometric distance: dG(S, S′) = log(S−1S′)F =

  • P

i=1 log2 λk

1

2

where λk, k = 1 . . . P are the real eigenvalues of S−1S′. Tangent Space Projection: PS(S′) = Upper(log(S− 1

2 S′S− 1 2 ))

Affine invariance property: For W invertible, dG(W ⊤SW , W ⊤S′W ) = dG(S, S′)

[Bhatia. Positive Definite Matrices. Princeton University Press, 2007]

9 / 14

slide-26
SLIDE 26

Distance and invariance on positive matrix manifolds

Manifold of positive definite matrices: Mi = C i ∈ S++

P

Geometric distance: dG(S, S′) = log(S−1S′)F =

  • P

i=1 log2 λk

1

2

where λk, k = 1 . . . P are the real eigenvalues of S−1S′. Tangent Space Projection: PS(S′) = Upper(log(S− 1

2 S′S− 1 2 ))

Affine invariance property: For W invertible, dG(W ⊤SW , W ⊤S′W ) = dG(S, S′) Affine invariance is key: working with C i is then equivalent to working with covariance matrices of sources si

[Bhatia. Positive Definite Matrices. Princeton University Press, 2007]

9 / 14

slide-27
SLIDE 27

Consistency of linear regression in tangent space of S++

P

10 / 14

slide-28
SLIDE 28

Consistency of linear regression in tangent space of S++

P

Geometric vectorization Assume yi = Q

j=1 αj log(pi,j). Denote C = MeanG(C 1, . . . , C N)

and v i = PC(C i). Then, the relationship between yi and v i is linear.

10 / 14

slide-29
SLIDE 29

Consistency of linear regression in tangent space of S++

P

Geometric vectorization Assume yi = Q

j=1 αj log(pi,j). Denote C = MeanG(C 1, . . . , C N)

and v i = PC(C i). Then, the relationship between yi and v i is linear. We generate i.i.d. samples following the log linear generative

  • model. A = exp(µB) with B ∈ RP×P random
  • chance level

0.00 0.25 0.50 0.75 1.00 0.0 0.5 1.0 1.5 2.0 2.5 3.0

µ Normalized MAE

  • log−diag

Wasserstein geometric

10 / 14

slide-30
SLIDE 30

And in the real world ?

We investigated 3 violations from previous idealized model:

11 / 14

slide-31
SLIDE 31

And in the real world ?

We investigated 3 violations from previous idealized model: Noise in target variable yi =

  • j

αj log(pij) + εi , with εi ∼ N(0, σ2)

11 / 14

slide-32
SLIDE 32

And in the real world ?

We investigated 3 violations from previous idealized model: Noise in target variable yi =

  • j

αj log(pij) + εi , with εi ∼ N(0, σ2) Subject-dependent mixing matrix Ai = A + E i , with entries of E i ∼ N(0, σ2)

11 / 14

slide-33
SLIDE 33

And in the real world ?

We investigated 3 violations from previous idealized model: Noise in target variable yi =

  • j

αj log(pij) + εi , with εi ∼ N(0, σ2) Subject-dependent mixing matrix Ai = A + E i , with entries of E i ∼ N(0, σ2)

  • chance level

0.00 0.25 0.50 0.75 1.00 0.01 0.10 1.00 10.00

σ Normalized MAE

  • log−diag

Wasserstein geometric

  • chance level

0.00 0.25 0.50 0.75 1.00 0.01 0.03 0.10 0.30

σ Normalized MAE

  • log−diag
  • sup. log−diag

Wasserstein geometric

  • sup. geometric

11 / 14

slide-34
SLIDE 34

And in the real world ?

Rank-deficient signals (e.g. cleaning process): C i ∈ S+

P,R

[Massart, Absil. Quotient geometry with simple geodesics for the manifold

  • f fixed-rank positive-semidefinite matrices. Technical report, UCLouvain, 2018] 12 / 14
slide-35
SLIDE 35

And in the real world ?

Rank-deficient signals (e.g. cleaning process): C i ∈ S+

P,R

We can manipulate them in their native manifolds S+

P,R:

Wasserstein distance: dW (S, S′) =

  • Tr(S) + Tr(S′) − 2Tr((S

1 2 S′S 1 2 ) 1 2 )

1

2

Tangent Space Projection: PY Y ⊤(Y ′Y ′⊤) = vect(Y ′Q∗ − Y ) ∈ RPR where UΣV ⊤ = Y ⊤Y ′, Q∗ = V U⊤ Orthogonal invariance property: For W orthogonal, dW (W ⊤SW , W ⊤S′W ) = dW (S, S′)

[Massart, Absil. Quotient geometry with simple geodesics for the manifold

  • f fixed-rank positive-semidefinite matrices. Technical report, UCLouvain, 2018] 12 / 14
slide-36
SLIDE 36

And in the real world ?

Rank-deficient signals (e.g. cleaning process): C i ∈ S+

P,R

We can manipulate them in their native manifolds S+

P,R:

Wasserstein distance: dW (S, S′) =

  • Tr(S) + Tr(S′) − 2Tr((S

1 2 S′S 1 2 ) 1 2 )

1

2

Tangent Space Projection: PY Y ⊤(Y ′Y ′⊤) = vect(Y ′Q∗ − Y ) ∈ RPR where UΣV ⊤ = Y ⊤Y ′, Q∗ = V U⊤ Orthogonal invariance property: For W orthogonal, dW (W ⊤SW , W ⊤S′W ) = dW (S, S′) Wasserstein vectorization Assume yi = Q

j=1 αj√pi,j and A orthogonal. Denote

C = MeanW (C 1, . . . , C N) and v i = PC(C i). Then the relationship between yi and v i is linear.

[Massart, Absil. Quotient geometry with simple geodesics for the manifold

  • f fixed-rank positive-semidefinite matrices. Technical report, UCLouvain, 2018] 12 / 14
slide-37
SLIDE 37

Experiment: predict age from MEG data

13 / 14

slide-38
SLIDE 38

Experiment: predict age from MEG data

Task-free MEG recordings from Cam-CAN dataset

13 / 14

slide-39
SLIDE 39

Experiment: predict age from MEG data

Task-free MEG recordings from Cam-CAN dataset Age is a dominant driver of cross-person variance in neuroscience data

13 / 14

slide-40
SLIDE 40

Experiment: predict age from MEG data

Task-free MEG recordings from Cam-CAN dataset Age is a dominant driver of cross-person variance in neuroscience data Dimension: N = 595, P = 102, T ≃ 520, 000, 65 ≤ Ri ≤ 73

13 / 14

slide-41
SLIDE 41

Experiment: predict age from MEG data

Task-free MEG recordings from Cam-CAN dataset Age is a dominant driver of cross-person variance in neuroscience data Dimension: N = 595, P = 102, T ≃ 520, 000, 65 ≤ Ri ≤ 73 Signals is filtered into 9 frequency bands

13 / 14

slide-42
SLIDE 42

Experiment: predict age from MEG data

Task-free MEG recordings from Cam-CAN dataset Age is a dominant driver of cross-person variance in neuroscience data Dimension: N = 595, P = 102, T ≃ 520, 000, 65 ≤ Ri ≤ 73 Signals is filtered into 9 frequency bands

biophysics unsupervised identity supervised identity 6 7 8 9 10 11

mean absolute error (years)

log−diag Wasserstein geometric MNE

13 / 14

slide-43
SLIDE 43

Conclusion

Proposed Riemannian method for regression from M/EEG data

14 / 14

slide-44
SLIDE 44

Conclusion

Proposed Riemannian method for regression from M/EEG data With theoritical guarantees and empirical robustness

14 / 14

slide-45
SLIDE 45

Conclusion

Proposed Riemannian method for regression from M/EEG data With theoritical guarantees and empirical robustness Working to translate proposed method into the clinic

14 / 14

slide-46
SLIDE 46

Conclusion

Proposed Riemannian method for regression from M/EEG data With theoritical guarantees and empirical robustness Working to translate proposed method into the clinic [Sabbagh, Ablin, Varoquaux, Gramfort, Engemann (2019), Manifold-regression to predict from MEG/EEG brain signals without source modeling, Proc. NeurIPS 2019] https://arxiv.org/abs/1906.02687

14 / 14