in the presence of latent confounders
play

in the presence of latent confounders and linear non-Gaussian SEMs - PowerPoint PPT Presentation

1 Causal Modeling and Machine Learning Beijing, China, June 2014 Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of


  1. 1 Causal Modeling and Machine Learning Beijing, China, June 2014 Estimation of causal direction in the presence of latent confounders and linear non-Gaussian SEMs Shohei Shimizu Osaka University, Japan with Kenneth Bollen University of North Carolina, Chapel Hill, USA

  2. 2 Abstract • Estimation of causal direction of two observed variables in the presence of latent confounders • A key challenge in causal discovery • Propose a non-Gaussian method • Not require to specify the number of latent confounders • Experiments on artificial and sociology data

  3. Background

  4. 4 Motivation • Causality is a main interest in many empirical sciences • Many recent methods for estimating causal directions (with no temporal information) – Linear non-Gaussian model (Dodge & Rousson 2001; Shimizu et al., 2006) – Nonlinear model (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Peters et al. 2011) Sleep Depression problems mood or Which is dominant? Sleep Depression ? problems mood Epidemiology (Rosenstrom et al., 2012) • Another important challenge: Latent confounders

  5. 5 Structural equation modeling (SEM) (Bollen, 1989; Pearl, 2000, 2009) • A framework for describing causal relations • An example (of linear cases): 𝒚 𝟑 ∶= 𝒈(𝒚 𝟐 , 𝒇 𝟑 ) e2 x2 x1 = 𝒄 𝟑𝟐 𝒚 𝟐 + 𝒇 𝟑 – The value of 𝑦 2 is determined by the values of 𝑦 1 and error/exogenous variable 𝑓 2 through the linear function • Generally speaking, if the value of 𝑦 1 is changed and that of 𝑦 2 also changes, then 𝑦 1 causes 𝑦 2

  6. 6 Major challenges 1. Estimation of causal direction when temporal information is not available x2 ? x2 or x1 x1 2. Coping with latent confounders x2 ? or f 1 f 1 x1 x1 x2

  7. Non-Gaussian approach: LiNGAM 7 (Linear Non-Gaussian Acyclic Model) (Shimizu et al., 2006) • Acyclic SEMs with different directions distinguishable (Dodge & Rousson, 2001; Shimizu et al., 2006) Model 2: Model 1: e2 e2 e1 e1    x b x e x e or 1 12 2 1 1 1 x1 x2 x1 x2    x e x b x e 2 2 2 21 1 2 e e where and are error/exogenous variables 1 2 • Fundamental assumptions: – e1 and e2 are non-Gaussian – Independence btw. e1 and e2 (No latent confounders)

  8. Different directions give 8 different data distributions Gaussian Non-Gaussian (uniform) x2 x2 Model 1:  x e x1 x1 x1 e1 1 1   0.8 x 0 . 8 x e 2 1 2 x2 e2 Model 2: x2 x2   x 0 . 8 x e x1 e1 1 2 1 x1 0.8 x1  x e x2 e2 2 2       E e E e 0 , 1 2       var x var x 1 1 2

  9. 9 LiNGAM with latent confounders (Hoyer, Shimizu & Kerminen, 2008) • Extension to incorporate non-Gaussian latent f confounders q Q         x f b x e i i iq q ij j i  1  q j i  f q ( q 1 , , Q ) where, WLG, are independent:  f f Q       2 x f e 1 1 1 1 q q 1  q 1 Q        x f b x e 2 2 2 q q 21 1 2 e e x1 x2  q 1 1 2

  10. 10 Previous estimation approaches • Explicitly model latent confounders and compare two models with opposite directions of causation – Maximum likelihood principle (Hoyer et al., 2008 ) – Bayesian model selection (Henao & Winther, 2011) e – Laplace / finite mixture of Gaussians for p( ) i • Require to specify the number of latent confounders, which is difficult in general … … f Q f Q f 1 f 1 or e e e e x1 x2 x1 x2 1 2 1 2

  11. Our proposal Reference: Shimizu and Bollen (2014) Journal of Machine Learning Research In press

  12. 12 Key idea (1/2) • Another look at the LiNGAM with latent confounders: Q        ( m ) ( m ) ( m ) ( m ) x f b x e m -th obs.: 2 2 2 q q 21 1 2  q 1  ( m ) 2 Observations are generated from the LiNGAM    ( m ) model with possibly different intercepts 2 2    ( 1 ) b 2 2 21 ( 1 ) ( 1 ) ( 1 ) e ( 1 ) e x x … f Q 2 f 1 1 2 1 …    ( m ) e e b 2 2 x1 x2 21 1 2 ( 1 ) ( m ) b x e ( m ) ( m ) e x 21 1 2 1 2 …

  13. 13 Key idea (2/2) • Include the sums of latent confounders as the observation-specific intercepts: Q        ( m ) ( m ) ( m ) ( m ) x f b x e m -th obs.: 2 2 2 q q 21 1 2  q 1  ( m ) Obs.-specific 2 intercept • Not explicitly model latent confounders • Neither necessary to specify the number of latent confounders Q nor estimate the  coefficients 2 q

  14. 14 Our approach • Compare these two LiNGAM models with opposite directions: Model 3 (x1  x2) Model 4 (x1  x2) ( m )     ( m )  ( m ) ( m )     ( m )  ( m )  ( m ) x e x b x e 1 1 1 1 1 1 1 12 2 1       ( m ) ( m ) ( m ) ( m )      x b x e ( m ) ( m ) ( m ) x e 2 2 2 21 1 i 2 2 2 2    ( m ) • Many additional parameters ( i 1 , 2 ; m 1 , , n )  i  ( m ) • Prior for the observation-specific intercepts i • Other para. low-informative: Gaussian with large sd. • Bayesian model selection (marginal likelihoods)

  15. 15 Prior for the observation - specific Q Q   intercepts       ( m ) ( m ) ( m ) ( m ) f , f 1 1 q q 2 2 q q   q 1 q 1 • Motivation: Central limit theorem – Sums of independent variables tend to be more Gaussian • Approximate the density by a bell-shaped curve dist.    ( m )  1 ,  1 t -distribution with sd , ~   2  ( m )  v   correlation , and DOF 2 12 • Select the hyper-parameter values that maximize the marginal likelihood: Empirical Bayes         { 0 , 0 . 2 sd ( x ), , 1 . 0 sd ( x )}, { 0 , 0 . 1 , , 0 . 9 }   – l l l 12 v – DOF fixed to be 6 in the experiments below  • Small means similar intercepts l

  16. Experiments on artificial data

  17. 17 Experimental results (100 obs.) • Data generated from LiNGAM with latent confounders • Various non-Gaussian distributions … f Q f 1 – Laplace, Uniform, asymmetric dist. etc. e • Our method uses Laplace for p( ) x1 x2 i Numbers of successful discoveries (100 rep.) N. latent confounders = 6 N. latent confounders = 1 100 100 86 80 80 72 80 58 58 55 55 54 51 60 60 47 39 34 40 40 20 20 0 0 Our Our Hoyer: Henao: Hoyer: Henao: mthd mthd 1, 4 conf. 1, 4, 10 conf. 1, 4 conf. 1, 4, 10 conf.

  18. Experiment on sociology data

  19. 19 Sociology data • Source: General Social Survey (n=1380) – Non-farm background, ages 35-44, white, male, in the labor force, no missing data for any of the covariates, 1972-2006 x 2: Son’s Income Status attainment model (Duncan et al., 1972)

  20. 20 Evaluation of our method using the sociology data Known (temporal) orderings of 15 pairs Father’s Son’s Education Education … Father’s Son’s Education Income … Son’s Son’s Occupation Income

  21. Conclusions

  22. 22 Conclusions • Estimation of causal direction in the presence of latent confounders is a major challenge in causal discovery • Our proposal: Fit linear non-Gaussian SEM with possibly different intercepts to data • Future works – Test other informative priors for observation-specific intercepts – Implement a wider variety of error/prior distributions (e.g., learn DOF of t dist.) – Develop extensions using nonlinear/cyclic models (Hoyer et al., 2009; Zhang & Hyvarinen, 2009; Lacerda et al., 2008) instead of LiNGAM

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend