in the presence of latent confounders and linear non-Gaussian SEMs - - PowerPoint PPT Presentation

in the presence of latent confounders
SMART_READER_LITE
LIVE PREVIEW

in the presence of latent confounders and linear non-Gaussian SEMs - - PowerPoint PPT Presentation

1 Causal Modeling and Machine Learning Beijing, China, June 2014 Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of


slide-1
SLIDE 1

Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of North Carolina, Chapel Hill, USA

1

Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs

Causal Modeling and Machine Learning Beijing, China, June 2014

slide-2
SLIDE 2

Abstract

  • Estimation of causal direction of two
  • bserved variables in the presence of

latent confounders

  • A key challenge in causal discovery
  • Propose a non-Gaussian method
  • Not require to specify the number of latent

confounders

  • Experiments on artificial and sociology data

2

slide-3
SLIDE 3

Background

slide-4
SLIDE 4

4

Motivation

  • Causality is a main interest in many empirical

sciences

  • Many recent methods for estimating causal

directions (with no temporal information)

– Linear non-Gaussian model (Dodge & Rousson 2001; Shimizu et al., 2006) – Nonlinear model (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Peters et al.

2011)

  • Another important challenge: Latent confounders

Sleep problems Depression mood Sleep problems Depression mood

?

  • r

Epidemiology (Rosenstrom et al., 2012) Which is dominant?

slide-5
SLIDE 5

Structural equation modeling (SEM) (Bollen, 1989; Pearl, 2000, 2009)

  • A framework for describing causal relations
  • An example (of linear cases):

– The value of 𝑦2 is determined by the values of 𝑦1 and error/exogenous variable 𝑓2 through the linear function

  • Generally speaking, if the value of 𝑦1 is changed

and that of 𝑦2 also changes, then 𝑦1 causes 𝑦2

5

= 𝒄𝟑𝟐𝒚𝟐 + 𝒇𝟑

x1 x2

𝒚𝟑 ∶= 𝒈(𝒚𝟐, 𝒇𝟑)

e2

slide-6
SLIDE 6
  • 1. Estimation of causal direction when temporal

information is not available

  • 2. Coping with latent confounders

6

Major challenges

x1 x2 ? x1 x2

  • r

x1 x2 ? x1 x2 or f1 f1

slide-7
SLIDE 7
  • Acyclic SEMs with different directions distinguishable

(Dodge & Rousson, 2001; Shimizu et al., 2006)

  • Fundamental assumptions:

– e1 and e2 are non-Gaussian – Independence btw. e1 and e2 (No latent confounders)

where and are error/exogenous variables

Non-Gaussian approach: LiNGAM

(Linear Non-Gaussian Acyclic Model) (Shimizu et al., 2006)

7

  • r

2 1 21 2 1 1

e x b x e x   

2 2 1 2 12 1

e x e x b x   

Model 1: Model 2:

x1 x2 e1 e2

1

e

2

e

x1 x2 e1 e2

slide-8
SLIDE 8

8

Different directions give different data distributions

Gaussian Non-Gaussian (uniform) Model 1: Model 2:

x1 x2

x1 x2

e1 e2

x1 x2

e1 e2

x1 x2 x1 x2 x1 x2 2 1 2 1 1

8 . e x x e x   

2 2 1 2 1

8 . e x e x x   

   

1 var var

2 1

  x x

   

,

2 1

  e E e E

0.8 0.8

slide-9
SLIDE 9
  • Extension to incorporate non-Gaussian latent

confounders

i i j j ij Q q q iq i i

e x b f x    

 

 1

 

LiNGAM with latent confounders

(Hoyer, Shimizu & Kerminen, 2008)

9

where, WLG, are independent:

) , , 1 ( Q q fq  

q

f

x1 x2

2

e

1

e

1

f

2

f

2 1 21 1 2 2 2 1 1 1 1 1

e x b f x e f x

Q q q q Q q q q

      

 

 

   

slide-10
SLIDE 10

Previous estimation approaches

  • Explicitly model latent confounders and

compare two models with opposite directions of causation

– Maximum likelihood principle (Hoyer et al., 2008 ) – Bayesian model selection (Henao & Winther, 2011) – Laplace / finite mixture of Gaussians for p( )

  • Require to specify the number of latent

confounders, which is difficult in general

10

x1 x2 f1 x1 x2

  • r

fQ f1 fQ

… …

2

e

1

e

2

e

1

e

i

e

slide-11
SLIDE 11

Our proposal

Reference: Shimizu and Bollen (2014) Journal of Machine Learning Research In press

slide-12
SLIDE 12

) ( 2 m

Observations are generated from the LiNGAM model with possibly different intercepts

) ( 2 2 m

  

) 1 ( 1

x

) 1 ( 2

x

) ( 2 m

x

) 1 ( 1

x

) ( 2 ) ( 1 21 1 ) ( 2 2 ) ( 2 m m Q q m q q m

e x b f x    

 

Key idea (1/2)

  • Another look at the LiNGAM with latent confounders:

12

x1 x2 f1 fQ

2

e

1

e

m-th obs.:

) 1 ( 2

e

) 1 ( 1

e

) ( 2 m

e

) ( 1 m

e

… …

21

b

) ( 2 2 m

  

) 1 ( 2 2

  

21

b

21

b

slide-13
SLIDE 13

Key idea (2/2)

  • Include the sums of latent confounders as

the observation-specific intercepts:

  • Not explicitly model latent confounders
  • Neither necessary to specify the number
  • f latent confounders Q nor estimate the

coefficients

13

) ( 2 m

) ( 2 ) ( 1 21 1 ) ( 2 2 ) ( 2 m m Q q m q q m

e x b f x    

 

m-th obs.:

q 2

Obs.-specific intercept

slide-14
SLIDE 14
  • Compare these two LiNGAM models with opposite

directions:

  • Many additional parameters
  • Prior for the observation-specific intercepts
  • Other para. low-informative: Gaussian with large sd.
  • Bayesian model selection (marginal likelihoods)

) ( ) ( 1 21 ) ( 2 2 ) ( 2 ) ( 1 ) ( 1 1 ) ( 1 m i m m m m m m

e x b x e x           

Our approach

14

) , , 1 ; 2 , 1 (

) (

n m i

m i

   

) (m i

Model 3 (x1  x2)

) ( 2 ) ( 2 2 ) ( 2 ) ( 1 ) ( 2 12 ) ( 1 1 ) ( 1 m m m m m m m

e x e x b x           

Model 4 (x1  x2)

slide-15
SLIDE 15

v

Prior for the observation-specific intercepts

  • Motivation: Central limit theorem

– Sums of independent variables tend to be more Gaussian

  • Approximate the density by a bell-shaped curve dist.
  • Select the hyper-parameter values that maximize the

marginal likelihood: Empirical Bayes

– – DOF fixed to be 6 in the experiments below

  • Small means similar intercepts

15

 

 

 

Q q m q q m Q q m q q m

f f

1 ) ( 2 ) ( 2 1 ) ( 1 ) ( 1

,    

~

) ( 2 ) ( 1

     

m m

 

t-distribution with sd , correlation , and DOF

12

2 1,

v

)}, ( sd . 1 , ), ( sd 2 . , {

l l l

x x      } 9 . , , 1 . , {

12

    

l

slide-16
SLIDE 16

Experiments on artificial data

slide-17
SLIDE 17

80 72 58 47 34 39

20 40 60 80 100

  • N. latent confounders = 6
  • N. latent confounders = 1

Experimental results (100 obs.)

  • Data generated from LiNGAM with latent confounders
  • Various non-Gaussian distributions

– Laplace, Uniform, asymmetric dist. etc.

  • Our method uses Laplace for p( )

17

x1 x2 f1 fQ

Numbers of successful discoveries (100 rep.)

86 55 58 55 51 54

20 40 60 80 100

Our mthd Henao: 1, 4, 10 conf. Hoyer: 1, 4 conf. Our mthd Henao: 1, 4, 10 conf. Hoyer: 1, 4 conf.

i

e

slide-18
SLIDE 18

Experiment on sociology data

slide-19
SLIDE 19

Sociology data

  • Source: General Social Survey (n=1380)

– Non-farm background, ages 35-44, white, male, in the labor force, no missing data for any of the covariates, 1972-2006

19

Status attainment model

(Duncan et al., 1972)

x2: Son’s Income

slide-20
SLIDE 20

Evaluation of our method using the sociology data

Known (temporal)

  • rderings of 15 pairs

20

Son’s Education Father’s Education Son’s Income Father’s Education Son’s Income Son’s Occupation

… …

slide-21
SLIDE 21

Conclusions

slide-22
SLIDE 22

Conclusions

  • Estimation of causal direction in the presence of

latent confounders is a major challenge in causal discovery

  • Our proposal: Fit linear non-Gaussian SEM with

possibly different intercepts to data

  • Future works

– Test other informative priors for observation-specific intercepts – Implement a wider variety of error/prior distributions (e.g., learn DOF of t dist.) – Develop extensions using nonlinear/cyclic models

(Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Lacerda et al., 2008)

instead of LiNGAM

22