Identifiability of Restricted Structural Equation Models Networks: - - PowerPoint PPT Presentation

identifiability of restricted structural equation models
SMART_READER_LITE
LIVE PREVIEW

Identifiability of Restricted Structural Equation Models Networks: - - PowerPoint PPT Presentation

Identifiability of Restricted Structural Equation Models Networks: Processes and Causality, Menorca Jonas Peters 1 , 2 J. Mooij 3 , D. Janzing 2 , B. Sch olkopf 2 , R. Tanase 1 , P. B uhlmann 1 1 Seminar for Statistics, ETH Z urich,


slide-1
SLIDE 1

Identifiability of Restricted Structural Equation Models

Networks: Processes and Causality, Menorca

Jonas Peters1,2

  • J. Mooij3, D. Janzing2, B. Sch¨
  • lkopf2, R. Tanase1, P. B¨

uhlmann1

1 Seminar for Statistics, ETH Z¨

urich, Switzerland

2 MPI for Intelligent Systems, T¨

ubingen, Germany

3 Radboud University, Nijmegen, Netherlands

3rd September 2012

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 1 / 30

slide-2
SLIDE 2

What is the Problem?

Random variables: X : water temperature of Mediterranean Sea Y : # networks and causality related workshops in Cala Galdana Z : # scientists on Menorca What is the causal structure?

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 2 / 30

slide-3
SLIDE 3

What is the Problem?

Random variables: X : water temperature of Mediterranean Sea Y : # networks and causality related workshops in Cala Galdana Z : # scientists on Menorca What is the causal structure? Understand the (physical) process in more detail.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 2 / 30

slide-4
SLIDE 4

What is the Problem?

Random variables: X : water temperature of Mediterranean Sea Y : # networks and causality related workshops in Cala Galdana Z : # scientists on Menorca What is the causal structure? Understand the (physical) process in more detail. Intervene: Organize workshop in Cala Galdana! Go swimming!

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 2 / 30

slide-5
SLIDE 5

What is the Problem?

Random variables: X : water temperature of Mediterranean Sea Y : # networks and causality related workshops in Cala Galdana Z : # scientists on Menorca What is the causal structure? Understand the (physical) process in more detail. Intervene: Organize workshop in Cala Galdana! Go swimming! Use observational data!

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 2 / 30

slide-6
SLIDE 6

What is the Problem?

  • bserved iid data

from P(X1, . . . , X5)

X1 3.4 1.7 −2.4 · · · X2 −0.2 7.0 −1.2 · · · X3 −0.1 4.3 −0.7 · · · X4 0.3 5.8 0.3 · · · X5 3.5 1.9 −1.9 · · · ?

− → causal DAG G0

X1 X2 X3 X4 X5

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 3 / 30

slide-7
SLIDE 7

Relating Causal Graph and Joint Distribution

X1 X3 X4 X2

1 Markov Condition:

X1 ⊥ ⊥ X4 | {X2, X3} X2 ⊥ ⊥ X3 | {X1} (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 4 / 30

slide-8
SLIDE 8

Relating Causal Graph and Joint Distribution

X1 X3 X4 X2

1 Markov Condition:

X1 ⊥ ⊥ X4 | {X2, X3} X2 ⊥ ⊥ X3 | {X1} (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 4 / 30

slide-9
SLIDE 9

Relating Causal Graph and Joint Distribution

X1 X3 X4 X2

1 Markov Condition:

X1 ⊥ ⊥ X4 | {X2, X3} X2 ⊥ ⊥ X3 | {X1} (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 4 / 30

slide-10
SLIDE 10

PC Algorithm

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, N3) X4 = f4(X2, X3, N4)

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov unique? t r i v i a l

Method: PC [Spirtes et al., 2001]

1 Find all (cond.) independences from the data. 2 Select the DAG(s) that corresponds to these independences. Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 5 / 30

slide-11
SLIDE 11

PC Algorithm

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, X2, N3) X4 = f4(X2, X3, N4)

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov unique? t r i v i a l

Method: PC [Spirtes et al., 2001]

1 Find all (cond.) independences from the data. 2 Select the DAG(s) that corresponds to these independences. Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 5 / 30

slide-12
SLIDE 12

PC Algorithm

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, X2, N3) X4 = f4(X2, X3, N4)

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov unique? t r i v i a l

Method: PC [Spirtes et al., 2001]

1 Find all (cond.) independences from the data. Be smart. 2 Select the DAG(s) that corresponds to these independences. Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 5 / 30

slide-13
SLIDE 13

Relating Causal Graph and Joint Distribution

The PC algorithm makes very few assumptions. Can we gain something by making more/different assumptions?

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 6 / 30

slide-14
SLIDE 14

Relating Causal Graph and Joint Distribution

PC assumptions:

Markov Faithfulness Strong Faithfulness strong assumption weak assumption

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 7 / 30

slide-15
SLIDE 15

Relating Causal Graph and Joint Distribution

New assumptions:

Markov Faithfulness Causal Minimality SEM Restricted SEM Strong Faithfulness strong assumption weak assumption

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 8 / 30

slide-16
SLIDE 16

Relating Causal Graph and Joint Distribution

New assumptions:

Markov Faithfulness Causal Minimality SEM Restricted SEM Strong Faithfulness strong assumption weak assumption

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 9 / 30

slide-17
SLIDE 17

Causal Minimality

Causal Minimality is a weak form of faithfulness:

Definition

Let G0 be the true causal graph. If P(X1, . . . , Xp) is not Markov to any proper subgraph of G0, causal minimality is satisfied. “Each arrow does something.”

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 10 / 30

slide-18
SLIDE 18

Violation of Causal Minimality

X1 X3 X4 X2

1 Markov Condition:

X2 ⊥ ⊥ X3 | {X1} X1 ⊥ ⊥ X4 | {X2, X3} (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 11 / 30

slide-19
SLIDE 19

Violation of Causal Minimality

X1 X3 X4 X2

1 Markov Condition:

X2 ⊥ ⊥ X3 | {X1} X1 ⊥ ⊥ X4 | {X2, X3} X4 ⊥ ⊥ X3 | {X2} (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 11 / 30

slide-20
SLIDE 20

Violation of Faithfulness

X1 X3 X4 X2

1 Markov Condition:

X2 ⊥ ⊥ X3 | {X1} X1 ⊥ ⊥ X4 | {X2, X3} X1 ⊥ ⊥ X4 (d-separation ⇒ cond. independence)

2 Faithfulness:

no more (no d-separation ⇒ no cond. independence)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 11 / 30

slide-21
SLIDE 21

Relating Causal Graph and Joint Distribution

New assumptions:

Markov Faithfulness Causal Minimality SEM Restricted SEM Strong Faithfulness strong assumption weak assumption

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 12 / 30

slide-22
SLIDE 22

Structural Equation Models

The joint distribution P(X1, . . . , Xp) satisfies a Structural Equation Model (SEM) with graph G0 if Xi = fi(PAi, Ni) 1 ≤ i ≤ p with PAi being the parents of Xi in G0. The Ni are required to be jointly independent.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 13 / 30

slide-23
SLIDE 23

The Alternative Route

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, N3) X4 = f4(X2, X3, N4)

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov unique? t r i v i a l

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 14 / 30

slide-24
SLIDE 24

The Alternative Route

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, N3) X4 = f4(X2, X3, N4)

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov unique? t r i v i a l

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 14 / 30

slide-25
SLIDE 25

Relating Causal Graph and Joint Distribution

New assumptions:

Markov Faithfulness Causal Minimality SEM Restricted SEM Strong Faithfulness strong assumption weak assumption

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 15 / 30

slide-26
SLIDE 26

Restricted Structural Equation Models

Linear Gaussian Additive Noise Models Xi =

  • j∈PAi

βjXj + Ni 1 ≤ i ≤ p with Ni

iid

∼ N(0, σ2

i ) and graph G0.

asd

Proposition

Assume faithfulness. Then one can identify the Markov equivalence class

  • f G0 from P(X1, . . . , Xp).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 16 / 30

slide-27
SLIDE 27

Restricted Structural Equation Models

Linear Non-Gaussian Additive Noise Models Xi =

  • j∈PAi

βjXj + Ni 1 ≤ i ≤ p with Ni

iid

∼ non-Gaussian and graph G0. (One can show: βj = 0 ⇒ causal minimality.)

Theorem ([Shimizu et al., 2006])

One can identify G0 from P(X1, . . . , Xp).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 17 / 30

slide-28
SLIDE 28

Restricted Structural Equation Models

Linear Gaussian Models with same Error Variance Xi =

  • j∈PAi

βjXj + Ni 1 ≤ i ≤ p with Ni

iid

∼ N(0, σ2). (One can show: βj = 0 ⇒ causal minimality.)

Theorem ([Peters and B¨ uhlmann, 2012])

One can identify G0 from P(X1, . . . , Xp).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 18 / 30

slide-29
SLIDE 29

Restricted Structural Equation Models

Non-Linear Additive Noise Models Xi = fi(XPAi) + Ni 1 ≤ i ≤ p with Ni iid and graph G0.

Theorem ([Hoyer et al., 2009, Peters et al., 2011b])

Exclude a few combinations of fi and Ni. Then one can identify G0 from P(X1, . . . , Xp).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 19 / 30

slide-30
SLIDE 30

Restricted Structural Equation Models

Discrete Additive Noise Models Xi = fi(XPAi) + Ni 1 ≤ i ≤ p with Ni

iid

∼ non-uniform and graph G0.

Theorem ([Peters et al., 2011a,b])

Exclude a few combinations of fi and Ni. Then one can identify G0 from P(X1, . . . , Xp).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 20 / 30

slide-31
SLIDE 31

Restricted Structural Equation Models

Assumption

Assume that P(X1, . . . , Xp) follows any of the restricted SEMs mentioned above with graph G0 and assume causal minimality.

Theorem

Then, the true causal DAG can be recovered from the joint distribution.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 21 / 30

slide-32
SLIDE 32

Linear Gaussian Models with fixed Variance

Proof Idea: Assume P(X1, X2, X3) allows for two SEMs leading to G1 and G2:

X1 X2 X3 G1 X3 = α1X1 + α2X2 + N3 X ∗

3 := X3|X1=x= α1x + α2X2|X1=x + N3

⇒ var(X ∗

3 )= 0 + α2 2var(X2|X1=x) + σ2 > σ2

X1 X2 X3 G2 X3 = M3 X ∗

3 := X3|X1=x= M3|X1=x

⇒ var(X ∗

3 )≤ σ2 Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 22 / 30

slide-33
SLIDE 33

Linear Gaussian Models with fixed Variance

Proof Idea: Assume P(X1, X2, X3) allows for two SEMs leading to G1 and G2:

X1 X2 X3 G1 X3 = α1X1 + α2X2 + N3 X ∗

3 := X3|X1=x = α1x + α2X2|X1=x + N3

⇒ var(X ∗

3 )= 0 + α2 2var(X2|X1=x) + σ2 > σ2

X1 X2 X3 G2 X3 = M3 X ∗

3 := X3|X1=x = M3|X1=x

⇒ var(X ∗

3 )≤ σ2 Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 22 / 30

slide-34
SLIDE 34

Linear Gaussian Models with fixed Variance

Proof Idea: Assume P(X1, X2, X3) allows for two SEMs leading to G1 and G2:

X1 X2 X3 G1 X3 = α1X1 + α2X2 + N3 X ∗

3 := X3|X1=x = α1x + α2X2|X1=x + N3

⇒ var(X ∗

3 ) = 0 + α2 2var(X2|X1=x) + σ2> σ2

X1 X2 X3 G2 X3 = M3 X ∗

3 := X3|X1=x = M3|X1=x

⇒ var(X ∗

3 ) ≤ σ2 Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 22 / 30

slide-35
SLIDE 35

Practical Method I

Method: IFMOC (Identifiable Functional Model Class)

Idea: If we fit a wrong SEM, noise variables become dependent.

1 Find all SEMs that fit the data. 2 If there is exactly one, output the DAG. Otherwise: “I do not know”. 3 Avoid enumerating all DAGs [Mooij et al., 2009]: always find sink and

remove additional edges at the end. needed:

  • regression method (e.g. linear, GP),
  • independence test (e.g. HSIC).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 23 / 30

slide-36
SLIDE 36

Practical Method II

Method: GDS (Greedy DAG Search)

Only for: linear Gaussian models with same noise variances. Idea: Define a score (e.g. BIC) to a given DAG.

1 Start with random DAG. 2 At each step, look at all neighbouring DAGs. 3 Go to DAG with best score. Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 24 / 30

slide-37
SLIDE 37

Practical Method - Properties

Both: + Identifiability within Markov equivalence class. + Option to say “I do not know.” + No faithfulness.

  • Strong structural assumptions.
  • Not scalable to high-dimensional problems (yet :-)).

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 25 / 30

slide-38
SLIDE 38

Restricted Structural Equation Models

Experiment 1: Comparison IFMOC and PC when both assumptions are met.

sample size: 400 #data sets: 100 α = 5% X1 = N1 X2 = N2 X3 = f3(X1, X2) + N3 X4 = f4(X2, X3) + N4 Ni

iid

∼ U([−0.5, 0.5]). linear nonlinear PClin

47% 53% 100%

X4 X3 X2 X1

PCnonlin

3% 97% 4% 96%

IFMOClin

86% 14% 100%

X4 X3 X2 X1

IFMOCnonlin

76% 1% 23% 86% 8% 6%

correct/wrong/no decision

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 26 / 30

slide-39
SLIDE 39

Restricted Structural Equation Models

Experiment 2: How often are we close to non-faithfulness?

#data sets: 500, α = 5%. X1 = β1N1 X2 = γ12X1 + β2N2 X3 = γ13X1 + β3N3 X4 = γ24X2 + γ34X3 + β4N4 Ni

iid

∼ N(0, 1), γij

iid

∼ U([−5, 5]), βi

iid

∼ U([0, 0.5]). X4 X3 X2 X1

10

2

10

3

10

4

10

5

10

6

0.2 0.4 0.6 0.8 1 sample size proportion how often faithfulness was missed in total due to partial corr. (given two vars) due to partial corr. (given one var) due to correlations Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 27 / 30

slide-40
SLIDE 40

Restricted Structural Equation Models

Experiment 3: Linear Gaussian Models with Fixed Variances (GDS).

  • 0.0

0.4 0.8 Algorithm Performance prob= 0.1 Nr Variables Performance (prop correct)

  • 3

5 7 10

  • 1.0

2.0 3.0 SHD for Wrongly Identified DAGs Nr Variables Mean SHD

  • 3

5 7 10

  • 0.0

0.4 0.8 Algorithm Performance prob= 0.5 Nr Variables Performance (prop correct)

  • 3

5 7 10

  • 5

10 15 20 SHD for Wrongly Identified DAGs Nr Variables Mean SHD

  • 3

5 7 10

  • 0.0

0.4 0.8 Algorithm Performance prob= 0.9 Nr Variables Performance (prop correct)

  • 3

5 7 10

  • 10

20 30 SHD for Wrongly Identified DAGs Nr Variables Mean SHD

  • 3

5 7 10 GDS−MEC GDS−DAG PCalg ges

coefs sampled from U([−1.5, −0.1] ∪ [0.1, 1.5]); n = 1000; 100 repetitions

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 28 / 30

slide-41
SLIDE 41

Restricted Structural Equation Models

Experiment 4: Violation of Same Error Variances.

noise variances sampled from U([4 − τ, 4 + τ]) n = 1000, p = 7, prob = 0.5, 100 repetitions

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 29 / 30

slide-42
SLIDE 42

Summary

P(X1, ..., X4)

X1 ⊥ ⊥ X2 X2 ⊥ ⊥ X3 X1 ⊥ ⊥ X4 | {X3} X1 ⊥ ⊥ X2 | {X3} X2 ⊥ ⊥ X3 | {X1} X4 X2 X3 X1 G X1 = N1 X2 = N2 X3 = f3(X1, X2) + N3 X4 = f4(X2, X3) + N4

Ni jointly independent

independence tests G′ G′′ Faithfuln. Markov RSEM caus min t r i v i a l

¡Muchas gracias!

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 30 / 30

slide-43
SLIDE 43

References I

****

  • P. Hoyer, D. Janzing, J. Mooij, J. Peters, and B. Sch¨
  • lkopf. Nonlinear causal discovery with additive noise models. In NIPS 21,

2009.

  • D. Janzing and B. Sch¨
  • lkopf. Causal inference using the algorithmic Markov condition. IEEE Trans. on Information Theory, 56

(10):5168–5194, 2010.

  • D. Janzing and B. Steudel. Justifying additive-noise-model based causal discovery via algorithmic information theory. Open

Systems and Information Dynamics, 17:189–212, 2010.

  • J. Lemeire and E. Dirkx. Causal models as minimal descriptions of multivariate systems. http://parallel.vub.ac.be/∼jan/,

2006.

  • C. Meek. Causal inference and causal explanation with background knowledge. In UAI 11, pages 403–441, 1995.
  • J. Mooij, D. Janzing, J. Peters, and B. Sch¨
  • lkopf. Regression by dependence minimization and its application to causal
  • inference. In ICML 26, 2009.
  • J. Peters and P. B¨
  • uhlmann. Identifiability of Gaussian Structural Equation Models with Same Error Variances. ArXiv e-prints,

2012.

  • J. Peters, D. Janzing, and B. Sch¨
  • lkopf. Causal inference on discrete data using additive noise models. IEEE Trans. Pattern

Analysis Machine Intelligence, 33(12):2436–2450, 2011a.

  • J. Peters, J. M. Mooij, D. Janzing, and B. Sch¨
  • lkopf. Identifiability of causal graphs using functional models. In UAI 27, 2011b.
  • S. Shimizu, P. O. Hoyer, A. Hyv¨

arinen, and A. J. Kerminen. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030, 2006.

  • P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2001.
  • K. Zhang and A. Hyv¨
  • arinen. On the identifiability of the post-nonlinear causal model. In UAI 25, 2009.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 31 / 30

slide-44
SLIDE 44

Restricted Structural Equation Models

Experiment 1a: Both methods should work when both assumptions are met.

sample size: 400 #data sets: 100 α = 5% X1 = N1 X2 = N2 X3 = f3(X1) + N3 X4 = f4(X1, X2, X3) + N4 Ni

iid

∼ U([−0.5, 0.5]). linear nonlinear PClin

90% 10% 6% 94%

X4 X3 X2 X1

PCnonlin

60% 40% 96% 4%

IFMOClin

82% 18% 100%

X4 X3 X2 X1

IFMOCnonlin

79% 2% 19% 86% 1% 13%

correct/wrong/no decision

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 32 / 30

slide-45
SLIDE 45

Future Work

Understand relations to Bayesian Network Learning. Joint independence of noise ↔ joint independence noise to ancestors. Discrete Confounder. Extensive tests on real data, especially on data sets with > 2 variables. Robustness.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 33 / 30

slide-46
SLIDE 46

Markov Condition and Faithfulness

Let G be the true causal graph of X1, . . . , Xp.

Assumption (Markov Assumption)

Xi and Xj are d-separated by S in G ⇒ Xi ⊥ ⊥ Xj | S

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 34 / 30

slide-47
SLIDE 47

Markov Condition and Faithfulness

Let G be the true causal graph of X1, . . . , Xp.

Assumption (Markov Assumption)

Xi and Xj are d-separated by S in G ⇒ Xi ⊥ ⊥ Xj | S

Assumption (Faithfulness Assumption)

Xi and Xj are d-separated by S in G ⇐ Xi ⊥ ⊥ Xj | S

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 34 / 30

slide-48
SLIDE 48

Method: IFMOC (two variables)

1 Assume an ANM from cause to effect. 2 Fit Y = f (X) + N and X = g(Y ) + M and check which of the two

models lead to independent residuals.

3 If only one direction does, output it. Otherwise do not decide. Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 35 / 30

slide-49
SLIDE 49

Independence of Conditional and Marginal

Suppose X is the cause and Y effect. What if Y = f (X) + N, N ⊥ ⊥ X , but X = g(Y ) + M, M ⊥ ⊥ Y ?

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 36 / 30

slide-50
SLIDE 50

Independence of Conditional and Marginal

Suppose X is the cause and Y effect. What if Y = f (X) + N, N ⊥ ⊥ X , but X = g(Y ) + M, M ⊥ ⊥ Y ? Janzing and Steudel [2010]: This implies “dependence” (based on Kolmogorov complexity) between p(cause) and p(effect | cause) One rather expects input and mechanism to be most often “independent” [Lemeire and Dirkx, 2006, Janzing and Sch¨

  • lkopf, 2010].

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 36 / 30

slide-51
SLIDE 51

Identifiable Functional Model Class (IFMOC)

Definition (Bivariate Identifiable Set)

We call a set B ⊆ F × PR × PR containing combinations of functions f ∈ F and distributions P(X), P(N) of input X and noise N bivariate identifiable in F if the following holds:

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 37 / 30

slide-52
SLIDE 52

Identifiable Functional Model Class (IFMOC)

Definition (Bivariate Identifiable Set)

We call a set B ⊆ F × PR × PR containing combinations of functions f ∈ F and distributions P(X), P(N) of input X and noise N bivariate identifiable in F if the following holds: (f , P(X), P(N)) ∈ B and Y = f (X, N), N ⊥ ⊥ X ⇒ ∃g ∈ F : X = g(Y , M), M ⊥ ⊥ Y

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 37 / 30

slide-53
SLIDE 53

Identifiable Functional Model Class (IFMOC)

Definition (Bivariate Identifiable Set)

We call a set B ⊆ F × PR × PR containing combinations of functions f ∈ F and distributions P(X), P(N) of input X and noise N bivariate identifiable in F if the following holds: (f , P(X), P(N)) ∈ B and Y = f (X, N), N ⊥ ⊥ X ⇒ ∃g ∈ F : X = g(Y , M), M ⊥ ⊥ Y Additionally we require f (X, N) ⊥ ⊥ X (1) for all (f , P(X), P(N)) ∈ B with N ⊥ ⊥ X.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 37 / 30

slide-54
SLIDE 54

Identifiable Functional Model Class (IFMOC)

Lemma

The following sets are bivariate identifiable: (i) linear ANMs [Shimizu et al., 2006]: F1 = {f (x, n) = ax + n} B1 = {(X, N) not both Gaussian} \ ˜ B1 (ii) discrete ANMs [Peters et al., 2011b]: F2 = {f (x, n) ≡ φ(x) + n( ˜ m)} B2 = {(φ, X) not affine and uniform} \ ˜ B2 (iii) non-linear ANMs [Hoyer et al., 2009] B3 = {(φ, X, N) not lin., Gauss, Gauss} \ ˜ B3 (iv) post-nonlinear ANMs [Zhang and Hyv¨ arinen, 2009]

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 38 / 30

slide-55
SLIDE 55

Identifiable Functional Model Class (IFMOC)

How can we transfer these identifiability results to p variables?

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 39 / 30

slide-56
SLIDE 56

Identifiable Functional Model Class (IFMOC)

How can we transfer these identifiability results to p variables?

Definition (F-FMOC)

p equations Xi = fi(PAi, Ni) 1 ≤ i ≤ p are called a functional model if Ni are jointly independent and the corresponding graph is acyclic.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 39 / 30

slide-57
SLIDE 57

Identifiable Functional Model Class (IFMOC)

How can we transfer these identifiability results to p variables?

Definition (F-FMOC)

p equations Xi = fi(PAi, Ni) 1 ≤ i ≤ p are called a functional model if Ni are jointly independent and the corresponding graph is acyclic. A set of functional models is called a functional model class with function class F, for short F-FMOC, if each of the functional models satisfies fi ∈ F for all i.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 39 / 30

slide-58
SLIDE 58

Identifiable Functional Model Class (IFMOC)

Definition ((B, F)-IFMOC)

Let B be bivariate identifiable in F. An F-FMOC is called a (B, F)-Identifiable Functional Model Class, for short (B, F)-IFMOC, if for all its functional models Xi = fi(PAi, Ni) , 1 ≤ i ≤ p and for all 1 ≤ i ≤ p, j ∈ PAi, for all sets S ⊆ {1, . . . , p} with PAi \ {j} ⊆ S ⊆ NDi \ {i, j} we have: There exists an xS with pS(xS) > 0 and

  • fi(xPAi\{j},

·

  • Xj

, ·

  • Ni

), P(Xj | XS = xS), P(Ni)

  • ∈ B .

(2)

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 40 / 30

slide-59
SLIDE 59

Identifiable Functional Model Class (IFMOC)

Experiment 1b: Both methods should work when both assumptions are met.

sample size: 400 #data sets: 100 α = 5% X1 = f1(N1) X2 = f2(N2) X3 = f3(X1, X2, N3) X4 = f4(X2, X3, N4) Ni

iid

∼ U([−0.5, 0.5]). linear nonlinear PCpart.corr

47% 53% 100%

X4 X3 X2 X1

PCHSIC

3% 97% 4% 96%

IFMOClin

86% 14% 100%

X4 X3 X2 X1

IFMOCGP

76% 1% 23% 86% 8% 6%

correct/wrong/no decision

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 41 / 30

slide-60
SLIDE 60

Restricted Structural Equation Models

Experiment 2: If the distribution is not faithful, PC fails, IFMOC does not.

sample size: 1000 #data sets: 100 α = 5% X1 = N1 X2 = 1.5X1 + N2 X3 = 3X1 − 2X2 + N3 X4 = 1.8X3 + N4 with Ni

iid

∼ U([0, 0.5]). linear PClin

100%

X4 X3 X2 X1

IFMOClin

85% 4% 11%

X4 X3 X2 X1

correct/wrong/no decision

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 42 / 30

slide-61
SLIDE 61

Identifiable Functional Model Class (IFMOC)

Experiment 2b: Both methods should work when both assumptions are met.

10

−3

10

−2

10

−1

10 20 40 60 80 100 significance level α proportion of DAGs (in %) correct IFMOC wrong IFMOC correct PC wrong PC type I bound IFMOC 0.05

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 43 / 30

slide-62
SLIDE 62

Identifiable Functional Model Class (IFMOC)

already known: 2 variable case

Theorem (Hoyer et al. [2009])

Let Y = f (X) + N, N ⊥ ⊥ X . Then for most combinations (f , P(X), P(N)) X = g(Y ) + M, M ⊥ ⊥ Y . Those combinations (f , P(X), P(N)) are called bivariate identifying.

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 44 / 30

slide-63
SLIDE 63

Identifiable Functional Model Class (IFMOC)

already known: 2 variable case

Theorem (Hoyer et al. [2009])

Let Y = f (X) + N, N ⊥ ⊥ X . Then for most combinations (f , P(X), P(N)) X = g(Y ) + M, M ⊥ ⊥ Y . Those combinations (f , P(X), P(N)) are called bivariate identifying. Similar results for (i) post-nonlinear additive noise [Zhang et al., 2009] (ii) discrete additive noise [Peters et al., 2011b]

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 44 / 30

slide-64
SLIDE 64

Identifiable Functional Model Class (IFMOC)

What happens in the case of p variables?

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 45 / 30

slide-65
SLIDE 65

Identifiable Functional Model Class (IFMOC)

Assume P(X1, X2, X3, X4) allows for two functional models leading to G1 and G2:

X4 X2 X3 X1 G1

X3 = f (X1, X2, N3)

X4 X2 X3 X1 G2

X2 = g(X1, X3, N2) ⇒ X3|X1=x = f (x, X2|X1=x, N3) X2|X1=x = g(x, X3|X1=x, N2) If the triple (f (x, ·, ·), P(X2|X1=x), P(N3)) is bivariate identifying, then .

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 46 / 30

slide-66
SLIDE 66

Identifiable Functional Model Class (IFMOC)

Assume P(X1, X2, X3, X4) allows for two functional models leading to G1 and G2:

X4 X2 X3 X1 G1

X3 = f (X1, X2, N3)

X4 X2 X3 X1 G2

X2 = g(X1, X3, N2) ⇒ X3|X1=x = f (x, X2|X1=x, N3) X2|X1=x = g(x, X3|X1=x, N2) If the triple (f (x, ·, ·), P(X2|X1=x), P(N3)) is bivariate identifying, then .

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 46 / 30

slide-67
SLIDE 67

Identifiable Functional Model Class (IFMOC)

Assume P(X1, X2, X3, X4) allows for two functional models leading to G1 and G2:

X4 X2 X3 X1 G1

X3 = f (X1, X2, N3)

X4 X2 X3 X1 G2

X2 = g(X1, X3, N2) ⇒ X3|X1=x = f (x, X2|X1=x, N3) X2|X1=x = g(x, X3|X1=x, N2) If the triple (f (x, ·, ·), P(X2|X1=x), P(N3)) is bivariate identifying, then .

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 46 / 30

slide-68
SLIDE 68

Identifiable Functional Model Class (IFMOC)

Assume P(X1, X2, X3, X4) allows for two functional models leading to G1 and G2:

X4 X2 X3 X1 G1

X3 = f (X1, X2, N3)

X4 X2 X3 X1 G2

X2 = g(X1, X3, N2) ⇒ X3|X1=x = f (x, X2|X1=x, N3) X2|X1=x = g(x, X3|X1=x, N2) If the triple (f (x, ·, ·), P(X2|X1=x), P(N3)) is bivariate identifying, then .

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 46 / 30

slide-69
SLIDE 69

Identifiable Functional Model Class (IFMOC)

Definition (IFMOC)

A set of functional models is called a functional model class with function class F, for short F-FMOC, if each of the functional models satisfies fi ∈ F for all i. An F-FMOC is called an Identifiable Functional Model Class, for short IFMOC, if for all its functional models Xi = fi (PAi , Ni ) , 1 ≤ i ≤ p and for all 1 ≤ i ≤ p, j ∈ PAi , for all sets S ⊆ {1, . . . , p} with PAi \ {j} ⊆ S ⊆ NDi \ {i, j} there exists an xS with pS(xS) > 0 and

  • fi (xPAi \{j},

·

  • Xj

, ·

  • Ni

), P(Xj | XS = xS), P(Ni )

  • is bivariate identifying .

(3)

b a s e d

  • n

b i v a r i a t e i d e n t i fi a b i l i t y

Jonas Peters (ETH Z¨ urich) Identifiability of Restricted SEMs 3rd September 2012 47 / 30