distinguishing causes from effects using nonlinear
play

Distinguishing Causes from Effects using Nonlinear Acyclic Causal - PowerPoint PPT Presentation

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvrinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki Outline l Introduction l


  1. Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvärinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki

  2. Outline l Introduction l Post-nonlinear causal model with inner additive noise ¡ Relation to post-nonlinear independent component analysis (ICA) ¡ Identification method l Special cases l Experiments 2

  3. Methods for causal discovery l Two popular kinds of methods ¡ Constraint-based: using independence tests to find the patterns of relationships. Example: PC/IC ¡ Score-based: using a score (such as BIC) to compare different causal models l Model-based: a special case of score-based methods ¡ Assumes a generative model for the data generating process ¡ Can discover in what form each variable is influenced by others ¡ Examples l Granger causality: effects follow causes in a linear form l LiNGAM: linear, non-Gaussian and acyclic causal model (Shimizu, et al., 2006) 3

  4. Three effects usually encountered in a causal model l Without prior knowledge, the assumed model is expected to be ¡ general enough: adapted to approximate the true generating process ¡ identifiable: asymmetry in causes and effects Noise effect Noise + Cause Effect f 1 f 2 Nonlinear effect Sensor or measurement of the cause distortion l Represented by post-nonlinear causal model with inner additive noise 4

  5. Post-nonlinear (PNL) causal model with inner additive noise l The directed acyclic graph (DAG) is used to represent the data generating process: pa i : parents (causes) of x i x i = f i,2 ( f i,1 ( pa i ) + e i ) f i,2 : assumed to be e i : noise/disturbance: f i,1 : not necessarily continuous and invertible independent from pa i invertible l Here consider the two-variable case ¡ x 1 � x 2 : x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 ) l Identifiability: related to the separability of PNL mixing independent component analysis (ICA) model 5

  6. Three cases of ICA: linear, general nonlinear, and PNL l Linear ICA: separable under weak assumptions estimate x 1 y 1 s 1 W A … … … … … x m y n s n de-mixing mixing matrix observed signals output: as independent independent sources as possible ICA system unknown mixing system y = W·x x = A·s l Nonlinear ICA: A and W become invertible nonlinear mappings l not separable: y i may be totally different from s i 6

  7. PNL mixing ICA: a nice trade-off l Mixing system: linear transformation followed by invertible component-wise nonlinear transformation x 1 g s 1 f 1 y 1 1 A W … … … … x n s n y n g f n n mixing independent observed invertible outputs matrix sources mixtures PNL Separation system (g,W ) Unknown mixing system (A,f) l Separability (Taleb and Jutten, 1999): under the following conditions, y i are independent iff h i = g i o f i is linear and y i are a estimate of s i l A has at least two nonzero entries per row or per column; l f i are differentiable invertible function; 7 l each s i accepts a density function that vanishes at one point at least.

  8. Identifiability of the proposed causal model l If f 2,1 is invertible, it is a special case of PNL mixing ICA model with A =(1, 0; 1 1): x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 )  − = 1 ( ) x f s 1 2 , 1 1  = + ( ) x f s s  2 2 , 2 1 2 l Identifiability: the causal relation between x 1 and x 2 can be uniquely identified if ¡ x 1 and x 2 are generated according to this causal model with invertible f 2,1 ; the densities of f 2,1 ( x 1 ) and e 2 vanish at one point at least. ¡ l If f 2,1 is not invertible, it is not PNL mixing ICA model. But it is empirically identifiable under very general conditions. 8

  9. Identification Method Basic idea: which one of x 1 � x 2 and x 1 � x 2 can make the cause and l disturbance independent ? l Two-step procedure for each possible causal relation ¡ Step 1: constrained nonlinear ICA to estimate the corresponding disturbance Suppose x 1 � x 2 , i.e., x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 ). y 2 provides an estimate of e 2 , learned by minimizing the mutual information (which is equivalent to negative likelihood): ( y 2 produces an estimate of e 2 ) = + + − ( , ) ( ) ( ) {log | |} ( ) I y y H y H y E J H x 1 2 1 2 = − − + − log ( ) log ( ) {log | |} ( ) E p y E p y E J H x 1 1 2 2 y y ¡ Step 2: uses independence tests to verify if the assumed cause and the estimated disturbance are independent 9

  10. Special cases x i = f i,2 ( f i,1 ( pa i ) + e i ) l If f i,1 and f i,2 are both linear ¡ at most one of e i is Gaussian: LiNGAM (linear, non-Gaussian, acyclic causal model, Shimizu et al., 2006 ) ¡ all of e i are Gaussian: linear Gaussian model l If f i,2 are linear: nonlinear causal discovery with additive noise models (Hoyer et al., 2009) 10

  11. Experiments l For the CausalEffectPairs task in the Pot-luck challenge ¡ Eight data sets ¡ Each contains the realizations of two variables ¡ Goal: to identify which variable is the cause and which one the effect l Settings ¡ g 1 and g 2 in constrained nonlinear ICA: modeled by multilayer perceptrons (MLP’s) with one hidden layer ¡ Different # hidden units (4~10) were tried; results remained the same ¡ Kernel-based independence tests (Gretton et al., 2008) were adopted 11

  12. Results Data set Result (direction of causality) Remark x 1 � x 2 1 Significant x 1 � x 2 2 Significant x 1 � x 2 3 Significant x 2 � x 1 4 not significant x 2 � x 1 5 Significant x 1 � x 2 6 Significant x 2 � x 1 7 Significant x 1 � x 2 8 Significant 12

  13. 15 x 1 vs. its Data Set 1 10 nonlinear effect x 2 5 on x 2 0 -5 0 1000 2000 3000 x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 vs. f 2,2 -1 ( x 2 ) independent Independence test results on y 1 and y 2 with different assumed causal relations 13

  14. 2500 2000 Data Set 2 1500 x 2 5 1000 1 Nonlinear effect of x 500 0 0 0 1000 2000 3000 x 1 x 1 vs. its -5 nonlinear effect on x 2 -10 0 1000 2000 3000 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 1 10 5 10 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 5 5 0 --1 (x 2 ) 0 0 f 2,2 -5 -5 -5 -1 ( x 2 ) x 2 vs. f 2,2 independent -10 -10 -10 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 y 1 (x 1 ) y 1 (x 2 ) x 2 14

  15. 15 Data Set 3 10 1 x 1 vs. its x 2 5 1 Nonlinear effect of x nonlinear effect 0 on x 2 0 -5 6 8 10 12 14 16 -1 x 1 -2 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 5 10 15 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 1 10 10 10 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 5 --1 (x 2 ) 0 f 2,2 -5 -10 0 -10 x 2 vs. f 2,2 -1 ( x 2 ) independent -20 -15 -5 -10 0 10 20 5 10 15 -5 0 5 10 15 x 2 y 1 (x 1 ) y 1 (x 2 ) 15

  16. 3000 Data Set 4 2000 x 2 2 1000 2 Nonlinear effect of x 1 0 1000 1500 2000 0 x 1 x 2 vs. its -1 nonlinear effect on x 1 -2 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 0 1000 2000 3000 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 10 6 10 4 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 2 --1 (x 1 ) 0 0 f 1,2 -5 -10 -2 independent -10 x 1 vs. f 1,2 -1 ( x 1 ) -4 -6 -15 -20 1000 1500 2000 0 1000 2000 3000 1000 1500 2000 y 1 (x 1 ) y 1 (x 2 ) x 1 16

  17. 30 25 Data Set 5 20 2 x 2 15 2 Nonlinear effect of x 10 0 5 0 0 0.5 1 x 2 vs. its -2 x 1 nonlinear effect on x 1 -4 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 0 10 20 30 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 10 15 10 x 1 vs. f 1,2 -1 ( x 1 ) 10 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 5 --1 (x 1 ) 5 0 f 1,2 0 0 -5 -5 independent -10 -5 -10 0 0.5 1 0 10 20 30 0 0.5 1 y 1 (x 1 ) y 1 (x 2 ) x 1 17

  18. 1.5 Data Set 6 1 5 1 x 2 Nonlinear effect of x 0.5 0 x 1 vs. its 0 0 10 20 30 nonlinear effect x 1 on x 2 -5 0 10 20 30 x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 5 independent 4 10 0 --1 (x 2 ) 2 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) f 2,2 -5 0 0 x 2 vs. f 2,2 -1 ( x 2 ) -2 -5 -10 0 0.5 1 1.5 -4 -10 0 10 20 30 0 0.5 1 1.5 x 2 y 1 (x 1 ) y 1 (x 2 ) 18

  19. 30 25 Data Set 7 20 15 x 2 4 2 10 Nonlinear effect of x 2 5 0 0 x 2 vs. its 0 0.2 0.4 0.6 0.8 x 1 nonlinear effect -2 on x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under -4 0 10 20 30 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 15 6 5 independent 10 4 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 --1 (x 1 ) 5 2 f 1,2 0 0 -5 -5 -2 x 1 vs. f 1,2 -1 ( x 1 ) -10 -10 -4 0 0.5 1 0 0.2 0.4 0.6 0.8 0 10 20 30 x 1 y 1 (x 1 ) y 1 (x 2 ) 19

  20. 10000 8000 Data Set 8 2.2 6000 1 Nonlinear effect of x x 2 2 4000 1.8 2000 x 1 vs. its 0 nonlinear effect 1.6 0 20 40 60 80 100 x 1 on x 2 1.4 0 50 100 (a) y 1 vs y 2 under (b) y 1 vs y 2 under x 1 hypothesis x 1 � x 2 hypothesis x 2 � x 1 10 10 15 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 10 0 --1 (x 2 ) 0 5 f 2,2 -5 -10 0 independent -10 x 2 vs. f 2,2 -1 ( x 2 ) -15 -5 -20 0 50 100 0 5000 10000 0 5000 10000 y 1 (x 1 ) y 1 (x 2 ) x 2 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend