Distinguishing Causes from Effects using Nonlinear Acyclic Causal - - PowerPoint PPT Presentation

distinguishing causes from effects using nonlinear
SMART_READER_LITE
LIVE PREVIEW

Distinguishing Causes from Effects using Nonlinear Acyclic Causal - - PowerPoint PPT Presentation

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models Kun Zhang 1 and Aapo Hyvrinen 1,2 1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics University of Helsinki Outline l Introduction l


slide-1
SLIDE 1

Distinguishing Causes from Effects using Nonlinear Acyclic Causal Models

Kun Zhang1 and Aapo Hyvärinen1,2

1 Dept. of Computer Science & HIIT 2 Dept. of Mathematics and Statistics

University of Helsinki

slide-2
SLIDE 2

2

Outline

l

Introduction

l Post-nonlinear causal model with inner additive noise

¡ Relation to post-nonlinear independent component analysis (ICA) ¡ Identification method

l Special cases

l

Experiments

slide-3
SLIDE 3

3

Methods for causal discovery

l

Two popular kinds of methods

¡

Constraint-based: using independence tests to find the patterns of

  • relationships. Example: PC/IC

¡

Score-based: using a score (such as BIC) to compare different causal models

l

Model-based: a special case of score-based methods

¡

Assumes a generative model for the data generating process

¡

Can discover in what form each variable is influenced by others

¡

Examples

l

Granger causality: effects follow causes in a linear form

l

LiNGAM: linear, non-Gaussian and acyclic causal model (Shimizu, et al., 2006)

slide-4
SLIDE 4

4

Three effects usually encountered in a causal model

Cause Effect

f1 f2

+

Noise Nonlinear effect

  • f the cause

Noise effect Sensor or measurement distortion

l

Without prior knowledge, the assumed model is expected to be

¡

general enough: adapted to approximate the true generating process

¡

identifiable: asymmetry in causes and effects

l

Represented by post-nonlinear causal model with inner additive noise

slide-5
SLIDE 5

5

Post-nonlinear (PNL) causal model with inner additive noise

l

The directed acyclic graph (DAG) is used to represent the data generating process:

xi = fi,2 ( fi,1 (pai) + ei)

l

Here consider the two-variable case

¡ x1 x2: x2 = f2,2 ( f2,1 (x1) + e2)

l

Identifiability: related to the separability of PNL mixing independent component analysis (ICA) model

fi,2: assumed to be

continuous and invertible

pai: parents (causes) of xi ei: noise/disturbance:

independent from pai

fi,1: not necessarily

invertible

slide-6
SLIDE 6

6

Three cases of ICA: linear, general nonlinear, and PNL

l

Linear ICA: separable under weak assumptions

l

Nonlinear ICA: A and W become invertible nonlinear mappings

l

not separable: yi may be totally different from s

i

x1 xm

  • bserved signals

ICA system

y = W·x

  • utput: as independent

as possible

W

… …

y1 yn

de-mixing estimate

A

… …

s1 sn

unknown mixing system

x = A·s

independent sources mixing matrix

slide-7
SLIDE 7

7

PNL mixing ICA: a nice trade-off

l

Mixing system: linear transformation followed by invertible component-wise nonlinear transformation

l

Separability (Taleb and Jutten, 1999): under the following conditions,

yi are independent iff hi=gi o fi is linear and yi are a estimate of si

l A has at least two nonzero entries per row or per column; l fi are differentiable invertible function; l

each si accepts a density function that vanishes at one point at least.

x1 xn

  • bserved

mixtures

Separation system (g,W )

  • utputs

W

… …

y1 yn

A

… …

s1 sn

Unknown mixing system (A,f)

independent sources mixing matrix f1 fn g

1

g

n

invertible PNL

slide-8
SLIDE 8

8

Identifiability of the proposed causal model

l

If f2,1 is invertible, it is a special case of PNL mixing ICA model with A=(1, 0; 1 1):

x2 = f2,2 ( f2,1 (x1) + e2) l

Identifiability: the causal relation between x1 and x2 can be uniquely identified if

¡ x1 and x2 are generated according to this causal model with invertible f2,1; ¡

the densities of f2,1 (x1) and e2 vanish at one point at least.

l

If f2,1 is not invertible, it is not PNL mixing ICA model. But it is empirically identifiable under very general conditions.

   + = =

) ( ) (

2 1 2 , 2 2 1 1 1 , 2 1

s s f x s f x

slide-9
SLIDE 9

9

Identification Method

l

Basic idea: which one of x1 x2 and x1 x2 can make the cause and disturbance independent ?

l

Two-step procedure for each possible causal relation

¡ Step 1: constrained nonlinear ICA to estimate the corresponding

disturbance

¡ Step 2: uses independence tests to verify if the assumed cause and the

estimated disturbance are independent

(y2 produces an estimate of e2)

Suppose x1 x2, i.e., x2 = f2,2 ( f2,1 (x1) + e2). y2 provides an estimate of e2 , learned by minimizing the mutual information (which is equivalent to negative likelihood): ) ( |} | {log ) ( log ) ( log ) ( |} | {log ) ( ) ( ) , (

2 2 1 1 2 1 2 1

x J x J H E y p E y p E H E y H y H y y I

y y

− + − − = − + + =

slide-10
SLIDE 10

10

Special cases

xi = fi,2 ( fi,1 (pai) + ei) l

If fi,1 and fi,2 are both linear

¡ at most one of ei is Gaussian: LiNGAM (linear, non-Gaussian,

acyclic causal model, Shimizu et al., 2006)

¡ all of ei are Gaussian: linear Gaussian model

l

If fi,2 are linear: nonlinear causal discovery with additive noise models (Hoyer et al., 2009)

slide-11
SLIDE 11

11

Experiments

l

For the CausalEffectPairs task in the Pot-luck challenge

¡

Eight data sets

¡

Each contains the realizations of two variables

¡

Goal: to identify which variable is the cause and which one the effect

l

Settings

¡ g1 and g2 in constrained nonlinear ICA: modeled by multilayer

perceptrons (MLP’s) with one hidden layer

¡ Different # hidden units (4~10) were tried; results remained the same ¡ Kernel-based independence tests (Gretton et al., 2008) were adopted

slide-12
SLIDE 12

12

Results

Significant x1x2

8

Significant x2x1

7

Significant x1x2

6

Significant x2x1

5

not significant x2x1

4

Significant x1x2

3

Significant x1x2

2

Significant x1x2

1

Remark Result (direction of causality) Data set

slide-13
SLIDE 13

13

Data Set 1

1000 2000 3000

  • 5

5 10 15 x1 x2

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1 independent

x1 vs. its

nonlinear effect

  • n x2

x2 vs. f2,2

  • 1(x2)

Independence test results on y1 and y2 with different assumed causal relations

slide-14
SLIDE 14

14

Data Set 2

(a) y1 vsy2 under hypothesisx1 x2 (b) y1 vsy2 under hypothesisx2 x1

1000 2000 3000 500 1000 1500 2000 2500 x1 x2

1000 2000 3000

  • 10
  • 5

5 10 y1 (x1) y2 (estimate of e2) 1000 2000 3000

  • 10
  • 5

5 y1 (x2) y2 (estimate of e1)

independent

1000 2000 3000

  • 10
  • 5

5 x1 Nonlinear effect of x

1

1000 2000 3000

  • 10
  • 5

5 10 x2 f2,2

  • -1(x2)

x1 vs. its nonlinear effect

  • n x2

x2 vs. f2,2

  • 1(x2)
slide-15
SLIDE 15

15

Data Set 3

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

5 10 15

  • 2
  • 1

1 x1 Nonlinear effect of x

1

  • 10

10 20

  • 20
  • 10

10 x2 f2,2

  • -1(x2)

x1 vs. its

nonlinear effect

  • n x2

x2 vs. f2,2

  • 1(x2)

6 8 10 12 14 16

  • 5

5 10 15 x1 x2

5 10 15

  • 15
  • 10
  • 5

5 10 y1 (x1) y2 (estimate of e2)

  • 5

5 10 15

  • 5

5 10 y1 (x2) y2 (estimate of e1)

independent

slide-16
SLIDE 16

16

Data Set 4

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

1000 1500 2000

  • 6
  • 4
  • 2

2 4 6 y1 (x1) y2 (estimate of e2) 1000 2000 3000

  • 15
  • 10
  • 5

5 10 y1 (x2) y2 (estimate of e1)

independent

1000 1500 2000 1000 2000 3000 x1 x2

1000 2000 3000

  • 2
  • 1

1 2 x2 Nonlinear effect of x

2

1000 1500 2000

  • 20
  • 10

10 x1 f1,2

  • -1(x1)

x2 vs. its

nonlinear effect

  • n x1

x1 vs. f1,2

  • 1(x1)
slide-17
SLIDE 17

17

Data Set 5

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

0.5 1 5 10 15 20 25 30 x1 x2

0.5 1

  • 10
  • 5

5 10 15 y1 (x1) y2 (estimate of e2) 10 20 30

  • 5

5 10 y1 (x2) y2 (estimate of e1)

independent

10 20 30

  • 4
  • 2

2 x2 Nonlinear effect of x

2

0.5 1

  • 10
  • 5

5 10 x1 f1,2

  • -1(x1)

x2 vs. its

nonlinear effect

  • n x1

x1 vs. f1,2

  • 1(x1)
slide-18
SLIDE 18

18

Data Set 6

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

10 20 30

  • 5

5 x1 Nonlinear effect of x

1

0.5 1 1.5

  • 10
  • 5

5 x2 f2,2

  • -1(x2)

x1 vs. its

nonlinear effect

  • n x2

x2 vs. f2,2

  • 1(x2)

10 20 30 0.5 1 1.5 x1 x2

independent

10 20 30

  • 4
  • 2

2 4 y1 (x1) y2 (estimate of e2) 0.5 1 1.5

  • 10
  • 5

5 10 y1 (x2) y2 (estimate of e1)

slide-19
SLIDE 19

19

Data Set 7

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

0.2 0.4 0.6 0.8 5 10 15 20 25 30 x1 x2 0.2 0.4 0.6 0.8

  • 10
  • 5

5 10 15 y1 (x1) y2 (estimate of e2) 10 20 30

  • 4
  • 2

2 4 6 y1 (x2) y2 (estimate of e1)

independent

10 20 30

  • 4
  • 2

2 4 x2 Nonlinear effect of x

2

0.5 1

  • 10
  • 5

5 x1 f1,2

  • -1(x1)

x2 vs. its

nonlinear effect

  • n x1

x1 vs. f1,2

  • 1(x1)
slide-20
SLIDE 20

20

Data Set 8

(a) y1 vs y2 under hypothesisx1 x2 (b) y1 vs y2 under hypothesisx2 x1

20 40 60 80 100 2000 4000 6000 8000 10000 x1 x2

50 100 1.4 1.6 1.8 2 2.2 x1 Nonlinear effect of x

1

5000 10000

  • 20
  • 10

10 x2 f2,2

  • -1(x2)

x1 vs. its

nonlinear effect

  • n x2

x2 vs. f2,2

  • 1(x2)

50 100

  • 15
  • 10
  • 5

5 10 y1 (x1) y2 (estimate of e2) 5000 10000

  • 5

5 10 15 y1 (x2) y2 (estimate of e1)

independent

slide-21
SLIDE 21

21

Conclusion

l

Post-nonlinear acyclic causal model with inner additive noise

¡

Very general: nonlinear effect of cause, noise effect & sensor nonlinear distortion

¡

Still identifiable

l

Experimental results on the CauseEffectPairs problem show its applicability for some practical problems

l

Future work

¡

Identifiability of this model in the general case of more than two variables

¡

Efficient identification methods

slide-22
SLIDE 22

22

¡

  • A. Taleb and C. Jutten. Source separation in post-nonlinear mixtures. IE E E
  • Trans. on Signal Processing, 47(10): 2802—2820, 1999

¡

  • S. Shimizu, P.O. Hoyer, A. Hyvärinen, and A.J. Kerminen. A linear non-

Gaussian acyclic model for causal discovery, Journal of Machine L earning

Research, 7:2003--2030, 2006

¡

P.O. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In NIPS 21. To appear, 2009

¡

  • A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf, and A.J. Smola.

A kernel statistical test of independence. In NIPS 20, pages 585—592, 2008

References