Distinguishing Causes from Effects using Nonlinear Acyclic Causal - PowerPoint PPT Presentation

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvärinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki

Outline l Introduction l Post-nonlinear causal model with inner additive noise ¡ Relation to post-nonlinear independent component analysis (ICA) ¡ Identification method l Special cases l Experiments 2

Methods for causal discovery l Two popular kinds of methods ¡ Constraint-based: using independence tests to find the patterns of relationships. Example: PC/IC ¡ Score-based: using a score (such as BIC) to compare different causal models l Model-based: a special case of score-based methods ¡ Assumes a generative model for the data generating process ¡ Can discover in what form each variable is influenced by others ¡ Examples l Granger causality: effects follow causes in a linear form l LiNGAM: linear, non-Gaussian and acyclic causal model (Shimizu, et al., 2006) 3

Three effects usually encountered in a causal model l Without prior knowledge, the assumed model is expected to be ¡ general enough: adapted to approximate the true generating process ¡ identifiable: asymmetry in causes and effects Noise effect Noise + Cause Effect f 1 f 2 Nonlinear effect Sensor or measurement of the cause distortion l Represented by post-nonlinear causal model with inner additive noise 4

Post-nonlinear (PNL) causal model with inner additive noise l The directed acyclic graph (DAG) is used to represent the data generating process: pa i : parents (causes) of x i x i = f i,2 ( f i,1 ( pa i ) + e i ) f i,2 : assumed to be e i : noise/disturbance: f i,1 : not necessarily continuous and invertible independent from pa i invertible l Here consider the two-variable case ¡ x 1 � x 2 : x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 ) l Identifiability: related to the separability of PNL mixing independent component analysis (ICA) model 5

Three cases of ICA: linear, general nonlinear, and PNL l Linear ICA: separable under weak assumptions estimate x 1 y 1 s 1 W A … … … … … x m y n s n de-mixing mixing matrix observed signals output: as independent independent sources as possible ICA system unknown mixing system y = W·x x = A·s l Nonlinear ICA: A and W become invertible nonlinear mappings l not separable: y i may be totally different from s i 6

PNL mixing ICA: a nice trade-off l Mixing system: linear transformation followed by invertible component-wise nonlinear transformation x 1 g s 1 f 1 y 1 1 A W … … … … x n s n y n g f n n mixing independent observed invertible outputs matrix sources mixtures PNL Separation system (g,W ) Unknown mixing system (A,f) l Separability (Taleb and Jutten, 1999): under the following conditions, y i are independent iff h i = g i o f i is linear and y i are a estimate of s i l A has at least two nonzero entries per row or per column; l f i are differentiable invertible function; 7 l each s i accepts a density function that vanishes at one point at least.

Identifiability of the proposed causal model l If f 2,1 is invertible, it is a special case of PNL mixing ICA model with A =(1, 0; 1 1): x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 )  − = 1 ( ) x f s 1 2 , 1 1  = + ( ) x f s s  2 2 , 2 1 2 l Identifiability: the causal relation between x 1 and x 2 can be uniquely identified if ¡ x 1 and x 2 are generated according to this causal model with invertible f 2,1 ; the densities of f 2,1 ( x 1 ) and e 2 vanish at one point at least. ¡ l If f 2,1 is not invertible, it is not PNL mixing ICA model. But it is empirically identifiable under very general conditions. 8

Identification Method Basic idea: which one of x 1 � x 2 and x 1 � x 2 can make the cause and l disturbance independent ? l Two-step procedure for each possible causal relation ¡ Step 1: constrained nonlinear ICA to estimate the corresponding disturbance Suppose x 1 � x 2 , i.e., x 2 = f 2,2 ( f 2,1 ( x 1 ) + e 2 ). y 2 provides an estimate of e 2 , learned by minimizing the mutual information (which is equivalent to negative likelihood): ( y 2 produces an estimate of e 2 ) = + + − ( , ) ( ) ( ) {log | |} ( ) I y y H y H y E J H x 1 2 1 2 = − − + − log ( ) log ( ) {log | |} ( ) E p y E p y E J H x 1 1 2 2 y y ¡ Step 2: uses independence tests to verify if the assumed cause and the estimated disturbance are independent 9

Special cases x i = f i,2 ( f i,1 ( pa i ) + e i ) l If f i,1 and f i,2 are both linear ¡ at most one of e i is Gaussian: LiNGAM (linear, non-Gaussian, acyclic causal model, Shimizu et al., 2006 ) ¡ all of e i are Gaussian: linear Gaussian model l If f i,2 are linear: nonlinear causal discovery with additive noise models (Hoyer et al., 2009) 10

Experiments l For the CausalEffectPairs task in the Pot-luck challenge ¡ Eight data sets ¡ Each contains the realizations of two variables ¡ Goal: to identify which variable is the cause and which one the effect l Settings ¡ g 1 and g 2 in constrained nonlinear ICA: modeled by multilayer perceptrons (MLP’s) with one hidden layer ¡ Different # hidden units (4~10) were tried; results remained the same ¡ Kernel-based independence tests (Gretton et al., 2008) were adopted 11

Results Data set Result (direction of causality) Remark x 1 � x 2 1 Significant x 1 � x 2 2 Significant x 1 � x 2 3 Significant x 2 � x 1 4 not significant x 2 � x 1 5 Significant x 1 � x 2 6 Significant x 2 � x 1 7 Significant x 1 � x 2 8 Significant 12

15 x 1 vs. its Data Set 1 10 nonlinear effect x 2 5 on x 2 0 -5 0 1000 2000 3000 x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 vs. f 2,2 -1 ( x 2 ) independent Independence test results on y 1 and y 2 with different assumed causal relations 13

2500 2000 Data Set 2 1500 x 2 5 1000 1 Nonlinear effect of x 500 0 0 0 1000 2000 3000 x 1 x 1 vs. its -5 nonlinear effect on x 2 -10 0 1000 2000 3000 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 1 10 5 10 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 5 5 0 --1 (x 2 ) 0 0 f 2,2 -5 -5 -5 -1 ( x 2 ) x 2 vs. f 2,2 independent -10 -10 -10 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 y 1 (x 1 ) y 1 (x 2 ) x 2 14

15 Data Set 3 10 1 x 1 vs. its x 2 5 1 Nonlinear effect of x nonlinear effect 0 on x 2 0 -5 6 8 10 12 14 16 -1 x 1 -2 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 5 10 15 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 1 10 10 10 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 5 --1 (x 2 ) 0 f 2,2 -5 -10 0 -10 x 2 vs. f 2,2 -1 ( x 2 ) independent -20 -15 -5 -10 0 10 20 5 10 15 -5 0 5 10 15 x 2 y 1 (x 1 ) y 1 (x 2 ) 15

3000 Data Set 4 2000 x 2 2 1000 2 Nonlinear effect of x 1 0 1000 1500 2000 0 x 1 x 2 vs. its -1 nonlinear effect on x 1 -2 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 0 1000 2000 3000 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 10 6 10 4 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 2 --1 (x 1 ) 0 0 f 1,2 -5 -10 -2 independent -10 x 1 vs. f 1,2 -1 ( x 1 ) -4 -6 -15 -20 1000 1500 2000 0 1000 2000 3000 1000 1500 2000 y 1 (x 1 ) y 1 (x 2 ) x 1 16

30 25 Data Set 5 20 2 x 2 15 2 Nonlinear effect of x 10 0 5 0 0 0.5 1 x 2 vs. its -2 x 1 nonlinear effect on x 1 -4 (a) y 1 vs y 2 under (b) y 1 vs y 2 under 0 10 20 30 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 10 15 10 x 1 vs. f 1,2 -1 ( x 1 ) 10 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 5 --1 (x 1 ) 5 0 f 1,2 0 0 -5 -5 independent -10 -5 -10 0 0.5 1 0 10 20 30 0 0.5 1 y 1 (x 1 ) y 1 (x 2 ) x 1 17

1.5 Data Set 6 1 5 1 x 2 Nonlinear effect of x 0.5 0 x 1 vs. its 0 0 10 20 30 nonlinear effect x 1 on x 2 -5 0 10 20 30 x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under hypothesis x 1 � x 2 hypothesis x 2 � x 1 5 independent 4 10 0 --1 (x 2 ) 2 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) f 2,2 -5 0 0 x 2 vs. f 2,2 -1 ( x 2 ) -2 -5 -10 0 0.5 1 1.5 -4 -10 0 10 20 30 0 0.5 1 1.5 x 2 y 1 (x 1 ) y 1 (x 2 ) 18

30 25 Data Set 7 20 15 x 2 4 2 10 Nonlinear effect of x 2 5 0 0 x 2 vs. its 0 0.2 0.4 0.6 0.8 x 1 nonlinear effect -2 on x 1 (a) y 1 vs y 2 under (b) y 1 vs y 2 under -4 0 10 20 30 hypothesis x 1 � x 2 hypothesis x 2 � x 1 x 2 15 6 5 independent 10 4 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 0 --1 (x 1 ) 5 2 f 1,2 0 0 -5 -5 -2 x 1 vs. f 1,2 -1 ( x 1 ) -10 -10 -4 0 0.5 1 0 0.2 0.4 0.6 0.8 0 10 20 30 x 1 y 1 (x 1 ) y 1 (x 2 ) 19

10000 8000 Data Set 8 2.2 6000 1 Nonlinear effect of x x 2 2 4000 1.8 2000 x 1 vs. its 0 nonlinear effect 1.6 0 20 40 60 80 100 x 1 on x 2 1.4 0 50 100 (a) y 1 vs y 2 under (b) y 1 vs y 2 under x 1 hypothesis x 1 � x 2 hypothesis x 2 � x 1 10 10 15 5 y 2 (estimate of e 2 ) y 2 (estimate of e 1 ) 10 0 --1 (x 2 ) 0 5 f 2,2 -5 -10 0 independent -10 x 2 vs. f 2,2 -1 ( x 2 ) -15 -5 -20 0 50 100 0 5000 10000 0 5000 10000 y 1 (x 1 ) y 1 (x 2 ) x 2 20

Distinguishing Causes from Effects using Nonlinear Acyclic Causal - PowerPoint PPT Presentation

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvrinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki Outline l Introduction l

Root Causes Analysis Case studies & exercises Sansanee Choowaew Causes of Problems Causes

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Using Stata to estimate nonlinear models with fixed effects Paulo high-dimensional fixed effects

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Numerical Proofs in Nonlinear Control Sicun Gao, UCSD Nonlinear control working Nonlinear

World War I Outcome: Causes for World War I Underlying Causes 1. What Causes Most Wars Extreme

Nonlinear Filtering using Particles and Outline Nonlinear Quadrature Filtering Monte Carlo

Indistinguishability Theory Ueli Maurer ETH Zurich FOSAD 2009, Bertinoro, Sept. 2009.

EC3062 ECONOMETRICS MATRIX KRONECKER PRODUCTS Consider the matrix equation Y = AXB . When all

Quickest detection & classification of transient stability status for power systems Jhonny

PERFORMANCE STANDARDS WORKSHOP ON GENERATOR TECHNICAL PERFORMANCE STANDARDS JUNE 2018 Agenda 1.

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Disturbance Propagation in Leader- Follower Systems FOCUS Paolo Minero joint work with Yingbo

Motion Coordination and Other Distributed Control Problems for Large Networks of Agents Prabir

c i f i c a DIgSILENT Pacific P Power system engineering and software T N Emerging

QualityDeepSense: Quality-Aware Deep Learning Framework for Internet of Things Applications with

Distinguishing Causes from Effects using Nonlinear Acyclic Causal - PowerPoint PPT Presentation

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvrinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki Outline l Introduction l

Root Causes Analysis Case studies &amp; exercises Sansanee Choowaew Causes of Problems Causes

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Nonlinear mixed-effects models using Stata Yulia Marchenko Executive Director of Statistics

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

Using Stata to estimate nonlinear models with fixed effects Paulo high-dimensional fixed effects

Effects and State Liam OConnor CSE, UNSW (and Data61) Term 2 2019 1 Effects State IO

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Numerical Proofs in Nonlinear Control Sicun Gao, UCSD Nonlinear control working Nonlinear

World War I Outcome: Causes for World War I Underlying Causes 1. What Causes Most Wars Extreme

Nonlinear Filtering using Particles and Outline Nonlinear Quadrature Filtering Monte Carlo

Indistinguishability Theory Ueli Maurer ETH Zurich FOSAD 2009, Bertinoro, Sept. 2009.

EC3062 ECONOMETRICS MATRIX KRONECKER PRODUCTS Consider the matrix equation Y = AXB . When all

Quickest detection &amp; classification of transient stability status for power systems Jhonny

PERFORMANCE STANDARDS WORKSHOP ON GENERATOR TECHNICAL PERFORMANCE STANDARDS JUNE 2018 Agenda 1.

Industrial Robots Industrial Robots Control Control Part 2 Control Control Part 2 Part 2

Disturbance Propagation in Leader- Follower Systems FOCUS Paolo Minero joint work with Yingbo

Motion Coordination and Other Distributed Control Problems for Large Networks of Agents Prabir

c i f i c a DIgSILENT Pacific P Power system engineering and software T N Emerging

QualityDeepSense: Quality-Aware Deep Learning Framework for Internet of Things Applications with

Root Causes Analysis Case studies & exercises Sansanee Choowaew Causes of Problems Causes

Quickest detection & classification of transient stability status for power systems Jhonny