Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv - - PowerPoint PPT Presentation

linear non gaussian acyclic model for causal discovery
SMART_READER_LITE
LIVE PREVIEW

Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv - - PowerPoint PPT Presentation

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Linear Non-Gaussian Acyclic Model for Causal Discovery Aapo Hyv arinen Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer,


slide-1
SLIDE 1

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Linear Non-Gaussian Acyclic Model for Causal Discovery

Aapo Hyv¨ arinen

Dept of Computer Science University of Helsinki, Finland with Patrik Hoyer, Shohei Shimizu, Kun Zhang, Steve M. Smith

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-2
SLIDE 2

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Abstract

◮ Estimating causal direction is fundamental problem in science ◮ Bayesian networks or structural equation models (SEM) are

ill-defined for gaussian data

◮ For non-Gaussian data, SEM is identifiable (Shimizu et al,

JMLR 2006)

◮ Theory closely related to independent component analysis

(ICA)

◮ A simple approach possible based on likelihood ratios of

variable pairs (Hyv¨ arinen and Smith, JMLR, 2013)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-3
SLIDE 3

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Practical models for causal discovery

◮ Model connections between the measured variables: Which

variable causes which?

◮ “Discovery” means data-driven approach ◮ “Correlation does not equal causation”:

but we can go beyond correlation

◮ Two fundamental approaches

◮ If we have time series and time-resolution of measurements

fast enough:

◮ we may be able to use autoregressive modelling (Granger

causality)

◮ Otherwise, use structural equation models (here) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-4
SLIDE 4

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Structural equation models

◮ How does an externally imposed change in

  • ne variable affect the others?

◮ Assume influences are linear, and all variables

  • bservable:

xi =

  • j=i

bijxj + ei for all i

◮ Difficult to estimate, not simple regression

◮ Classic methods fail: not identifiable

◮ Becomes identifiable if data non-Gaussian

(Shimizu et al., JMLR, 2006)

x4 x2

  • 0.56

x3

  • 0.3

x1 0.89 x5 0.37 0.82 0.14 x6 1 x7

  • 0.26

0.12

  • 1

1

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-5
SLIDE 5

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Starting point: Two variables

◮ Consider two random variables, x and y, both standardized

(zero mean, unit variance)

◮ Goal: distinguish between two statistical models:

y = ρx + d (x → y) (1) x = ρy + e (y → x) (2) where disturbances d, e are independent of x, y.

◮ If variables gaussian, completely symmetric:

◮ Variance explained same for both models ◮ Likelihood same for both models (simple function of ρ) Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-6
SLIDE 6

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Non-Gaussianity comes to rescue

Real-life signals often non-Gaussian

1 2 3 4 5 6 7 8 9 10 −2 −1.5 −1 −0.5 0.5 1 1.5 2 1 2 3 4 5 6 7 8 9 10 −4 −3 −2 −1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 −6 −4 −2 2 4 6 −2 −1.5 −1 −0.5 0.5 1 1.5 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −4 −3 −2 −1 1 2 3 4 5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 −6 −4 −2 2 4 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-7
SLIDE 7

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Assumption of non-Gaussianity

◮ We assume that in each model, regressor or residual or both

are non-Gaussian y = ρx + d (x → y) (3) x = ρy + e (y → x) (4) where disturbances d, e are independent of x, y.

◮ Non-Gaussianity breaks the symmetry between x, y

(Dodge and Rousson, 2001; Shimizu et al, 2006).

◮ We can just compare the likelihoods of the models.

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-8
SLIDE 8

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Illustration of symmetry-breaking

non-Gaussian

y x y x

Gaussian

y x y x

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-9
SLIDE 9

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Intuitive idea behind non-Gaussianity

◮ Central limit theorem: sums of independent variables tend to

be more Gaussian

◮ Assume (just on this slide!) that residuals are Gaussian

◮ For y = ρx + d, y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the

more Gaussian

◮ We could measure non-Gaussianity with classical measures,

e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y.

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-10
SLIDE 10

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Intuitive idea behind non-Gaussianity

◮ Central limit theorem: sums of independent variables tend to

be more Gaussian

◮ Assume (just on this slide!) that residuals are Gaussian

◮ For y = ρx + d, y must be more gaussian than x ◮ So, causality must be from the less Gaussian variable to the

more Gaussian

◮ We could measure non-Gaussianity with classical measures,

e.g. kurtosis/skewness and just look at the difference of kurtoses of x and y.

◮ This is a simple illustration with its flaws

◮ The method fails for non-Gaussian residuals ◮ Kurtosis/skewness not a good measure of non-Gaussianity in

terms of classical statistical measures (asymptotic variance, robusteness)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-11
SLIDE 11

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Likelihood ratio and non-Gaussianity

◮ Principled approach (Hyv¨

arinen and Smith, JMLR, 2013)

◮ Ratio of probabilities that data comes from the two models ◮ Asymptotic limit of the log-likelihood ratio

lim log L(x → y) L(y → x) = −H(x) − H(d/σd) + H(y) + H(e/σe) with H, differential entropy; residuals d = y − ρx, e = x − ρy with variances σ2

d, σ2 e. ◮ Entropy is maximized by Gaussian distribution ◮ Log-likelihood ratio is thus

nongaussianity(x) + nongaussianity(residual x → y) − nongaussianity(y) − nongaussianity(residual y → x)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-12
SLIDE 12

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Likelihood ratios and independence

◮ We can equally interpret the likelihood ratio as independence ◮ We had asymptotic limit of the likelihood ratio as

log L(x → y) log L(y → x) = −H(x) − H(d/σd) + H(y) + H(e/σe) (5) with H, differential entropy; residuals d = y − ρx, e = x − ρy with variances σ2

d, σ2 e. ◮ Mutual information I(u, v) = H(u) + H(v) − H(u, v)

measures statistical dependence

◮ Log-likelihood ratio can be manipulated to give

I(y, e) − I(x, d) (6) since the terms related to H(x, e) and H(y, e) cancel.

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-13
SLIDE 13

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Even simpler approximation of likelihood ratios

◮ We can make first-order approximations to obtain:

log L(x → y) log L(y → x) ≈ ρ T

  • t

−xtg(yt) + g(xt)yt where typically g(u) = − tanh(u) and ρ is the correlation coefficient.

◮ Choosing between models is reduced to considering the sign of

a nonlinear correlation

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-14
SLIDE 14

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Definition of Linear non-Gaussian Acyclic Model (LiNGAM)

◮ Given the general, n-dimensional SEM

xi =

  • j=i

bijxj + ei for all i Make the following assumptions:

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-15
SLIDE 15

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Definition of Linear non-Gaussian Acyclic Model (LiNGAM)

◮ Given the general, n-dimensional SEM

xi =

  • j=i

bijxj + ei for all i Make the following assumptions:

  • 1. the ei(t) are non-Gaussian, e.g. sparse

◮ Crucial departure from classical framework Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-16
SLIDE 16

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Definition of Linear non-Gaussian Acyclic Model (LiNGAM)

◮ Given the general, n-dimensional SEM

xi =

  • j=i

bijxj + ei for all i Make the following assumptions:

  • 1. the ei(t) are non-Gaussian, e.g. sparse

◮ Crucial departure from classical framework

  • 2. the ei(t) are mutually independent

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-17
SLIDE 17

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Definition of Linear non-Gaussian Acyclic Model (LiNGAM)

◮ Given the general, n-dimensional SEM

xi =

  • j=i

bijxj + ei for all i Make the following assumptions:

  • 1. the ei(t) are non-Gaussian, e.g. sparse

◮ Crucial departure from classical framework

  • 2. the ei(t) are mutually independent
  • 3. the bij are acyclic

◮ Not completely necessary but simplifies theory ◮ Variables can be ordered so that connections only “forward” ◮ Could also mean we are analysing the “main directions” Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-18
SLIDE 18

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Another viewpoint to importance of non-gaussianity

◮ A gaussian distribution is completely determined by

covariances (and means)

◮ The number of covariances is ≈ n2/2 due to symmetry ◮ So, we cannot solve for the ≈ n2 connections!

(“More variables than equations”)

◮ This is why gaussian methods (PCA, factor analysis, classic

SEM) are fundamentally indetermined

◮ For non-gaussian data, we can use other information than

covariances

◮ Nonlinear correlations e.g. E{xix2

j }, higher-order statistics

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-19
SLIDE 19

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Introduction Definition of two-variable model Assumption of non-Gaussianity Likelihood ratio General definition of LiNGAM Estimation of LiNGAM by ICA

Estimation of LiNGAM by ICA

◮ As a first approach, we proposed estimation using ICA

(Shimizu et al, JMLR, 2006).

◮ Transform

x = Bx + e ⇔ x = (I − B)−1e

◮ Becomes an ICA model ◮ But one complication: ICA does not estimate order of ei

◮ In SEM, ei do have a specific order ◮ Acyclicity allows determination of the right order:

Only the right ordering of components allows transformation back to SEM.

◮ This proves identifiability, in contrast to Gaussian case!

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-20
SLIDE 20

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Application to fMRI Application to EEG and MEG

Application in functional magnetic resonance imaging (fMRI)

◮ Specific problems with fMRI when using Granger causality

◮ Hemodynamic response functions variable over the cortex

(David et al, PLoS Biol, 2008)

◮ Granger causality may give very misleading results (S.M. Smith

et al, NIMG, 2010)

◮ Steve Smith et al compared different causal analysis methods

with simulated fMRI data.

◮ Given enough data (250 minutes, TR=3s, 5 variables),

LiNGAM worked better than other methods in finding the directionality

◮ How to make LiNGAM work with less data? Two-variable

methods help a lot.

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-21
SLIDE 21

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Application to fMRI Application to EEG and MEG

Application of LiNGAM to simulated fMRI

−10 −5 5 10 causality (Zright − Zwrong) Simulation 1 (5 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s ) PW−LR skew PW−LR tanh PW−LR r skew Patel’s τ Patel’s τ bin.75 ICA−LiNGAM 25 50 75 100 % directions correct −10 −5 5 10 causality (Zright − Zwrong) Simulation 2 (10 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s ) PW−LR skew PW−LR tanh PW−LR r skew Patel’s τ Patel’s τ bin.75 ICA−LiNGAM 25 50 75 100 % directions correct −10 −5 5 10 causality (Zright − Zwrong) Simulation 3 (15 nodes, 10 minute sessions, TR=3.00s, noise=1.0%, HRFstd=0.5s ) PW−LR skew PW−LR tanh PW−LR r skew Patel’s τ Patel’s τ bin.75 ICA−LiNGAM 25 50 75 100 % directions correct

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-22
SLIDE 22

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Application to fMRI Application to EEG and MEG

Specific characteristics of EEG and MEG

◮ In EEG/MEG, connections might be between energies σ2 i,t of

sources si

◮ First, separate sources by ICA, then apply LiNGAM on

energies? (Future work)

◮ Alternatively, generalized autoregressive conditional

heteroscedasticity or GARCH (Zhang and Hyv¨ arinen, UAI 2009).

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-23
SLIDE 23

Abstract Structural equation models Applications in brain imaging Extensions Conclusion Application to fMRI Application to EEG and MEG

Results of GARCH model on real MEG

Black: positive influence, red: negative influence. Yellow: manually drawn grouping (Zhang and Hyv¨ arinen, 2009)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-24
SLIDE 24

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Extensions of basic LiNGAM framework

◮ Latent variables: equivalent to ICA model with more

components (Hoyer et al, IJAR 2008)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-25
SLIDE 25

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Extensions of basic LiNGAM framework

◮ Latent variables: equivalent to ICA model with more

components (Hoyer et al, IJAR 2008)

◮ We can combine instantaneous and lagged effects in the same

model (Hyv¨ arinen et al, JMLR, 2010)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-26
SLIDE 26

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Extensions of basic LiNGAM framework

◮ Latent variables: equivalent to ICA model with more

components (Hoyer et al, IJAR 2008)

◮ We can combine instantaneous and lagged effects in the same

model (Hyv¨ arinen et al, JMLR, 2010)

◮ Cyclicity probably not a major problem in most methods

(Lacerda et al 2008)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-27
SLIDE 27

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Extensions of basic LiNGAM framework

◮ Latent variables: equivalent to ICA model with more

components (Hoyer et al, IJAR 2008)

◮ We can combine instantaneous and lagged effects in the same

model (Hyv¨ arinen et al, JMLR, 2010)

◮ Cyclicity probably not a major problem in most methods

(Lacerda et al 2008)

◮ Group data: individual differences may help in identification

(Ramsey, 2011; Shimizu, 2012)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-28
SLIDE 28

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Extensions of basic LiNGAM framework

◮ Latent variables: equivalent to ICA model with more

components (Hoyer et al, IJAR 2008)

◮ We can combine instantaneous and lagged effects in the same

model (Hyv¨ arinen et al, JMLR, 2010)

◮ Cyclicity probably not a major problem in most methods

(Lacerda et al 2008)

◮ Group data: individual differences may help in identification

(Ramsey, 2011; Shimizu, 2012)

◮ Nonlinear versions (Hoyer et al, 2009, Hyv¨

arinen and Smith, 2013)

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery

slide-29
SLIDE 29

Abstract Structural equation models Applications in brain imaging Extensions Conclusion

Conclusion

◮ Causal analysis possible using statistics which go beyond

correlations

◮ Structural equation models can be estimated by

non-Gaussianity (Shimizu et al, JMLR, 2006)

◮ An intuitive approach is likelihood ratios for two variables ◮ Alternatively, ICA and re-arrange the coefficients

◮ Many extensions of basic framework developed ◮ Applicability to real data, e.g. brain imaging to be

determined...

Aapo Hyv¨ arinen Linear Non-Gaussian Acyclic Model for Causal Discovery