Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl - - PowerPoint PPT Presentation

causal inference in case of feedback
SMART_READER_LITE
LIVE PREVIEW

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl - - PowerPoint PPT Presentation

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl Institute for Computing Informatics Institute and Information Sciences December 12th, 2013 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 1 / 59 Part


slide-1
SLIDE 1

Causal Inference in case of feedback

Joris Mooij j.m.mooij@uva.nl

Institute for Computing and Information Sciences Informatics Institute

December 12th, 2013

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 1 / 59

slide-2
SLIDE 2

Part I Introduction

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 2 / 59

slide-3
SLIDE 3

Causality: ubiquitous in the sciences

Genetics: how to infer gene regulatory networks from micro-array data?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 3 / 59

slide-4
SLIDE 4

Causality: ubiquitous in the sciences

Neuroscience: how to infer functional connectivity networks from fMRI data?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 4 / 59

slide-5
SLIDE 5

Causality: ubiquitous in the sciences

Social sciences: does playing violent computer games cause aggressive behavior?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 5 / 59

slide-6
SLIDE 6

Causality: ubiquitous in the sciences

Economy: does austerity reduce national debt?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 6 / 59

slide-7
SLIDE 7

Probabilistic Inference

Modeling: modeling the joint distribution of a set of random variables p(X, Y , Z, . . . ) = f (X, Y , Z, . . . ; θ)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 7 / 59

slide-8
SLIDE 8

Probabilistic Inference

Modeling: modeling the joint distribution of a set of random variables p(X, Y , Z, . . . ) = f (X, Y , Z, . . . ; θ) Reasoning: using the rules of probability theory to express different marginal and conditional distributions in terms of each other Bayes’ rule: p(X|Y ) = p(Y |X)p(X) p(Y )

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 7 / 59

slide-9
SLIDE 9

Probabilistic Inference

Modeling: modeling the joint distribution of a set of random variables p(X, Y , Z, . . . ) = f (X, Y , Z, . . . ; θ) Reasoning: using the rules of probability theory to express different marginal and conditional distributions in terms of each other Bayes’ rule: p(X|Y ) = p(Y |X)p(X) p(Y ) Learning: find the best model(s) to describe the data ML estimation: arg max

θ N

  • i=1

f (xi, yi, zi, . . . ; θ)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 7 / 59

slide-10
SLIDE 10

Probabilistic Inference

Modeling: modeling the joint distribution of a set of random variables p(X, Y , Z, . . . ) = f (X, Y , Z, . . . ; θ) Reasoning: using the rules of probability theory to express different marginal and conditional distributions in terms of each other Bayes’ rule: p(X|Y ) = p(Y |X)p(X) p(Y ) Learning: find the best model(s) to describe the data ML estimation: arg max

θ N

  • i=1

f (xi, yi, zi, . . . ; θ) Prediction: given a model and an observation of some random variables, what are the values of other random variables? p(Y |X = x) =?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 7 / 59

slide-11
SLIDE 11

Causal Inference

Causal Modeling: modeling the joint distribution of a set of random variables and how this changes under interventions p(X, Y , Z, . . . ) = . . . , p(Y , Z . . . | do(X = x)) = . . . , p(X, Z, . . . | do(Y = y)) = . . . , p(Z, . . . | do(X = x, Y = y)) = . . .

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 8 / 59

slide-12
SLIDE 12

Causal Inference

Causal Modeling: modeling the joint distribution of a set of random variables and how this changes under interventions p(X, Y , Z, . . . ) = . . . , p(Y , Z . . . | do(X = x)) = . . . , p(X, Z, . . . | do(Y = y)) = . . . , p(Z, . . . | do(X = x, Y = y)) = . . . Causal Reasoning: using rules for expressing different marginal, conditional and interventional distributions in terms of each other Pearl’s “do-calculus”, SGS’s “manipulation theorem”

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 8 / 59

slide-13
SLIDE 13

Causal Inference

Causal Modeling: modeling the joint distribution of a set of random variables and how this changes under interventions p(X, Y , Z, . . . ) = . . . , p(Y , Z . . . | do(X = x)) = . . . , p(X, Z, . . . | do(Y = y)) = . . . , p(Z, . . . | do(X = x, Y = y)) = . . . Causal Reasoning: using rules for expressing different marginal, conditional and interventional distributions in terms of each other Pearl’s “do-calculus”, SGS’s “manipulation theorem” Causal Discovery: find the best causal model(s) to describe the

  • bservational data and interventional data

PC, FCI algorithms (use only observational data)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 8 / 59

slide-14
SLIDE 14

Causal Inference

Causal Modeling: modeling the joint distribution of a set of random variables and how this changes under interventions p(X, Y , Z, . . . ) = . . . , p(Y , Z . . . | do(X = x)) = . . . , p(X, Z, . . . | do(Y = y)) = . . . , p(Z, . . . | do(X = x, Y = y)) = . . . Causal Reasoning: using rules for expressing different marginal, conditional and interventional distributions in terms of each other Pearl’s “do-calculus”, SGS’s “manipulation theorem” Causal Discovery: find the best causal model(s) to describe the

  • bservational data and interventional data

PC, FCI algorithms (use only observational data) Causal Prediction: given a causal model and given an intervention, what are the values of other random variables? “Covariate adjustment”: p(Y | do(X)) =

W p(Y |X, W)p(W)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 8 / 59

slide-15
SLIDE 15

Probabilistic inference vs. causal inference

Traditional statistics, machine learning Models the distribution of the data Focuses on predicting results of observations Useful e.g. in medical diagnosis: given the symptoms, what is the most likely disease? Causal Inference Models the mechanism that generates the data Also allows to predict results of interventions Useful e.g. in medical treatment: if we treat the patient with a drug, will it cure the disease?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 9 / 59

slide-16
SLIDE 16

Outline

Introduction to Causal Inference:

1 Introduction 2 Causal Modeling

Some recent developments:

3 Causal Modeling in case of feedback1 4 Causal Discovery in case of feedback2 5 Outlook 1Joint work with Dominik Janzing and Bernhard Sch¨

  • lkopf

2Joint work with Tom Heskes Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 10 / 59

slide-17
SLIDE 17

Part II Causal Modeling

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 11 / 59

slide-18
SLIDE 18

Structural Causal Models: Definition

Definition [Pearl, 2000; Wright, 1921] A Structural Causal Model (SCM), also known as Structural Equation Models (SEM), M is defined by:

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 12 / 59

slide-19
SLIDE 19

Structural Causal Models: Definition

Definition [Pearl, 2000; Wright, 1921] A Structural Causal Model (SCM), also known as Structural Equation Models (SEM), M is defined by:

1 N observed random variables X1, . . . , XN and N latent random

variables E1, . . . , EN

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 12 / 59

slide-20
SLIDE 20

Structural Causal Models: Definition

Definition [Pearl, 2000; Wright, 1921] A Structural Causal Model (SCM), also known as Structural Equation Models (SEM), M is defined by:

1 N observed random variables X1, . . . , XN and N latent random

variables E1, . . . , EN

2 N structural equations:

Xi = fi(Xpa(i), Ei), i = 1, . . . , N; effect causal mechanism

  • bserved direct causes

noise where the subsets pa(i) ⊆ {1, . . . , N} \ {i} define the observed direct causes of Xi,

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 12 / 59

slide-21
SLIDE 21

Structural Causal Models: Definition

Definition [Pearl, 2000; Wright, 1921] A Structural Causal Model (SCM), also known as Structural Equation Models (SEM), M is defined by:

1 N observed random variables X1, . . . , XN and N latent random

variables E1, . . . , EN

2 N structural equations:

Xi = fi(Xpa(i), Ei), i = 1, . . . , N; effect causal mechanism

  • bserved direct causes

noise where the subsets pa(i) ⊆ {1, . . . , N} \ {i} define the observed direct causes of Xi,

3 a joint probability distribution p(E1, . . . , EN) on latent variables. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 12 / 59

slide-22
SLIDE 22

Structural Causal Models: Example

Example Causal graph GM: X1 X2 X3 X4 X5 Structural causal model M:

X1 = f1(E1) p(E1) = . . . X2 = f2(E2) p(E2) = . . . X3 = f3(X1, X2, E3) p(E3) = . . . X4 = f4(X1, E4) p(E4) = . . . X5 = f5(X3, X4, E5) p(E5) = . . . p(E) =

i p(Ei)

Definition Given a SCM M, the causal graph GM is the directed graph with vertices {X1, . . . , XN} and edges Xj → Xi iff fi depends on Xj (i.e., if j ∈ pa(i)).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 13 / 59

slide-23
SLIDE 23

Structural Causal Models: Interventions

For a causal model, we also need to model interventions.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 14 / 59

slide-24
SLIDE 24

Structural Causal Models: Interventions

For a causal model, we also need to model interventions. Interventions in SCMs An intervention do(Xi = ξi) on a variable Xi, forcing it to attain the value ξi, changes the structural equation for Xi as follows: Original SCM M: Intervened SCM Mξi: Xi = fi(Xpa(i), Ei) Xi = ξi Xj = fj(Xpa(j), Ej) ∀j = i Xj = fj(Xpa(j), Ej) ∀j = i p(E) = . . . p(E) = . . .

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 14 / 59

slide-25
SLIDE 25

Structural Causal Models: Interventions

For a causal model, we also need to model interventions. Interventions in SCMs An intervention do(Xi = ξi) on a variable Xi, forcing it to attain the value ξi, changes the structural equation for Xi as follows: Original SCM M: Intervened SCM Mξi: Xi = fi(Xpa(i), Ei) Xi = ξi Xj = fj(Xpa(j), Ej) ∀j = i Xj = fj(Xpa(j), Ej) ∀j = i p(E) = . . . p(E) = . . . Interpretation: overriding default causal mechanisms that normally would determine the values of the intervened variables.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 14 / 59

slide-26
SLIDE 26

Structural Causal Models: Interventions

Example Observational (no intervention): Causal graph GM : X1 X2 X3 X4 X5 Structural causal model M :

X1 = f1(E1) p(E1) = . . . X2 = f2(E2) p(E2) = . . . X3 = f3(X1, X2, E3) p(E3) = . . . X4 = f4(X1, E4) p(E4) = . . . X5 = f5(X3, X4, E5) p(E5) = . . . p(E) =

i p(Ei)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

slide-27
SLIDE 27

Structural Causal Models: Interventions

Example Intervention do(X1 = ξ1): Causal graph GMξ1: X1 X2 X3 X4 X5 Structural causal model Mξ1:

X1 = ξ1 p(E1) = . . . X2 = f2(E2) p(E2) = . . . X3 = f3(X1, X2, E3) p(E3) = . . . X4 = f4(X1, E4) p(E4) = . . . X5 = f5(X3, X4, E5) p(E5) = . . . p(E) =

i p(Ei)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

slide-28
SLIDE 28

Structural Causal Models: Interventions

Example Intervention do(X3 = ξ3): Causal graph GMξ3: X1 X2 X3 X4 X5 Structural causal model Mξ3:

X1 = f1(E1) p(E1) = . . . X2 = f2(E2) p(E2) = . . . X3 = ξ3 p(E3) = . . . X4 = f4(X1, E4) p(E4) = . . . X5 = f5(X3, X4, E5) p(E5) = . . . p(E) =

i p(Ei)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

slide-29
SLIDE 29

Confounders and causal sufficiency

Definition: Confounder A confounder is a latent common cause of two or more observed variables.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

slide-30
SLIDE 30

Confounders and causal sufficiency

Definition: Confounder A confounder is a latent common cause of two or more observed variables. Example Significant correlation (p = 0.008) between human birth rate and number of stork populations in European countries [Matthews, 2000] Most people nowadays do not believe that storks deliver babies (nor that babies deliver storks) There must be some confounder explaining the correlation S B S B ? S B

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

slide-31
SLIDE 31

Confounders and causal sufficiency

Definition: Confounder A confounder is a latent common cause of two or more observed variables. Absence of confounders implies causal sufficiency. Definition: Causal Sufficiency If all latent variables E1, . . . , EN in an SCM are jointly independent, i.e., if p(E) =

N

  • i=1

p(Ei) then we say that the observed variables X are causally sufficient.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

slide-32
SLIDE 32

Causal feedback and (A)cyclicity

Definition: causal feedback A SCM incorporates causal feedback if its graph contains a directed cycle Xi0 → Xi1 → · · · → Xin, Xi0 = Xin If it does not contain such a directed cycle, the model is called acyclic. If it is also causally sufficient, its graph is a Directed Acyclic Graph (DAG).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 17 / 59

slide-33
SLIDE 33

Causal feedback and (A)cyclicity

Definition: causal feedback A SCM incorporates causal feedback if its graph contains a directed cycle Xi0 → Xi1 → · · · → Xin, Xi0 = Xin If it does not contain such a directed cycle, the model is called acyclic. If it is also causally sufficient, its graph is a Directed Acyclic Graph (DAG). Example In economy, causal feedback is of- ten present: R: risks taken by bank; B: imminent bankruptcy; S: saved by the government. S R B

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 17 / 59

slide-34
SLIDE 34

Factorization: Bayesian Networks

Theorem Any probability distribution induced by an acyclic, causally sufficient SCM M can be factorized as: pM(X1, . . . , XN) =

N

  • i=1

pM(Xi | Xpa(i)) Example Causal graph GM:

X1 X2 X3 X4 X5

Structural causal model M:

X1 = f1(E1) p(E1) = . . . X2 = f2(E2) p(E2) = . . . X3 = f3(X1, X2, E3) p(E3) = . . . X4 = f4(X1, E4) p(E4) = . . . X5 = f5(X3, X4, E5) p(E5) = . . .

p(X1, . . . , X5) = p(X1) p(X2) p(X3 | X1, X2) p(X4 | X1) p(X5 | X3, X4)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 18 / 59

slide-35
SLIDE 35

Causal Reasoning: Truncated factorization

The following theorem expresses the joint distribution of a Bayesian network after an intervention. It is an example of causal reasoning. Theorem: Truncated factorization Any probability distribution induced by an acyclic, causally sufficient SCM M can be factorized as: pM(X1, . . . , XN) =

N

  • i=1

pM(Xi | Xpa(i)) After an intervention do(XI = ξI), the probability distribution becomes: pMξI

  • X1, . . . , XN | do(XI = ξI)
  • =

N

  • i=1

i / ∈I

pM(Xi | Xpa(i))

  • i∈I

1[Xi=ξi]

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 19 / 59

slide-36
SLIDE 36

Part III Causal Modeling in case of feedback

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 20 / 59

slide-37
SLIDE 37

Feedback

Cyclic causal dependencies are also called feedback loops. Examples: Holding a microphone too close to a loudspeaker. Predator-prey relationships in biology. Computer programs running on a single core are acyclic; parallel programs running on multiple cores can be cyclic.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 21 / 59

slide-38
SLIDE 38

Feedback

Cyclic causal dependencies are also called feedback loops. Examples: Holding a microphone too close to a loudspeaker. Predator-prey relationships in biology. Computer programs running on a single core are acyclic; parallel programs running on multiple cores can be cyclic. Example Two masses, connected by a spring, suspended from the ceiling by another spring. Vertical equilibrium positions Q1 and Q2. Q1 causes Q2. Q2 causes Q1. Example of a two-cycle: cannot be modeled with (causal) Bayesian network. Q1 Q2

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 21 / 59

slide-39
SLIDE 39

Causal modeling of feedback systems

Question: What are good mathematical representations of cyclic causal models?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

slide-40
SLIDE 40

Causal modeling of feedback systems

Question: What are good mathematical representations of cyclic causal models? No consensus in the field. . . (Causal) Bayesian networks are acyclic by definition, and extending the definition to cyclic graphs [Schmidt & Murphy, 2009; Itani et al., 2010] seems problematic. Extending the global Markov condition to cyclic models for linear models works [Spirtes, 1993], but nonlinear and discrete models yield problems [Spirtes, 1995; Pearl & Dechter, 1996; Neal 2000]. Structural Causal Models have a “natural” extension to the cyclic

  • case. But how to interpret these models in terms of a data generating

process? Is this “the right” mathematical framework?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

slide-41
SLIDE 41

Causal modeling of feedback systems

Question: What are good mathematical representations of cyclic causal models? No consensus in the field. . . (Causal) Bayesian networks are acyclic by definition, and extending the definition to cyclic graphs [Schmidt & Murphy, 2009; Itani et al., 2010] seems problematic. Extending the global Markov condition to cyclic models for linear models works [Spirtes, 1993], but nonlinear and discrete models yield problems [Spirtes, 1995; Pearl & Dechter, 1996; Neal 2000]. Structural Causal Models have a “natural” extension to the cyclic

  • case. But how to interpret these models in terms of a data generating

process? Is this “the right” mathematical framework? How do scientists usually model systems with feedback?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

slide-42
SLIDE 42

Two different worlds?

Ordinary Differential Equations Structural Causal Models

?

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 23 / 59

slide-43
SLIDE 43

From dynamical systems to causal models (in a nutshell)

1 Ordinary Differential Equations

˙ X = −0.5X + Y , X(0) = 1 ˙ Y = −X + 0.2Y , Y (0) = 2

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

slide-44
SLIDE 44

From dynamical systems to causal models (in a nutshell)

1 Ordinary Differential Equations

˙ X = −0.5X + Y , X(0) = 1 ˙ Y = −X + 0.2Y , Y (0) = 2

2 Labeled Equilibrium Equations

  • X :

0 = −0.5X + Y Y : 0 = −X + 0.2Y

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

slide-45
SLIDE 45

From dynamical systems to causal models (in a nutshell)

1 Ordinary Differential Equations

˙ X = −0.5X + Y , X(0) = 1 ˙ Y = −X + 0.2Y , Y (0) = 2

2 Labeled Equilibrium Equations

  • X :

0 = −0.5X + Y Y : 0 = −X + 0.2Y

3 Structural Causal Model

  • X = 2Y

Y = 5X

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

slide-46
SLIDE 46

From dynamical systems to causal models (in a nutshell)

1 Ordinary Differential Equations

˙ X = −0.5X + Y , X(0) = 1 ˙ Y = −X + 0.2Y , Y (0) = 2

2 Labeled Equilibrium Equations

  • X :

0 = −0.5X + Y Y : 0 = −X + 0.2Y

3 Structural Causal Model

  • X = 2Y

Y = 5X

4 Dealing with Uncertainty

     X = 2Y + EX Y = 5X + EY p(EX, EY ) = . . .

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

slide-47
SLIDE 47

ODEs: Definition

Definition (ODE) An Ordinary Differential Equation model (ODE) is a dynamical system D described by D coupled first-order ordinary differential equations and initial condition X0: ˙ Xi(t) := dXi

dt (t)

= fi(XpaD(i)) Xi(0) = (X0)i ∀i = 1, . . . , D paD(i) ⊆ {1, . . . , D} is the set of parents of variable Xi. Each fi : RpaD(i) → Ri is a (sufficiently smooth) function. The structure can be represented as a directed graph GD, with nodes {Xi}i∈I and a directed edge Xi → Xj iff ˙ Xj depends on Xi.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 25 / 59

slide-48
SLIDE 48

ODEs: Example

Example (Lotka-Volterra model) Lotka-Volterra model: well-known model from population biology Abundance of prey X1 ∈ [0, ∞) (e.g., rabbits) Abundance of predators X2 ∈ [0, ∞) (e.g., wolves) ODE D: ˙ X1 = X1(θ11 − θ12X2) ˙ X2 = −X2(θ22 − θ21X1)

  • X1(0) = a

X2(0) = b Graph GD: X1 X2

2 4 6 8 10 12 50 100 150 200 250 300 prey predators

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 26 / 59

slide-49
SLIDE 49

Perfect Interventions

The dynamical system D is assumed to describe the “natural” or

  • bservational state of the system.

Causal models aim to predict also the effects of interventions in which the system is actively perturbed from its natural state. Interventions can be modeled in different ways. Here we look at perfect interventions. Definition (Perfect Interventions) The perfect intervention do(XI = ξI) means that XI is enforced to attain the value ξI for all times t ∈ [0, ∞). This changes the ODE D into the intervened system Ddo(XI =ξI ): ˙ Xi(t) =

  • i ∈ I

fi(XpaD(i)) i ∈ I \ I, Xi(0) =

  • ξi

i ∈ I (X0)i i ∈ I \ I

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 27 / 59

slide-50
SLIDE 50

Perfect Interventions in ODEs: Example

Example (Lotka-Volterra model) D: ˙ X1 = X1(θ11 − θ12X2) ˙ X2 = −X2(θ22 − θ21X1)

  • X1(0) = a

X2(0) = b Perfect intervention do(X2 = ξ2): Monitor the abundance of wolves and make sure that the number equals the target value ξ2 at all time. Ddo(X2=ξ2): ˙ X1 = X1(θ11 − θ12X2) ˙ X2 = − X2(θ22 − θ21X1) 0

  • X1(0) = a

X2(0) = b ξ2 GDdo(X2=ξ2): X1 X2

2 4 6 8 10 12 50 100 150 200 250 300 prey predators

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 28 / 59

slide-51
SLIDE 51

ODEs: Stability

When studying the system in the limit t → ∞, an important concept is stability: Definition (Stability) The ODE D is called stable if there exists a unique equilibrium state X∗ ∈ RI such that for any initial state X0 ∈ RI, the system converges to this equilibrium state as t → ∞: ∃!X∗∈RI ∀X0∈RI : lim

t→∞ X(t) = X∗.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 29 / 59

slide-52
SLIDE 52

ODEs: Stability

When studying the system in the limit t → ∞, an important concept is stability: Definition (Stability) The ODE D is called stable if there exists a unique equilibrium state X∗ ∈ RI such that for any initial state X0 ∈ RI, the system converges to this equilibrium state as t → ∞: ∃!X∗∈RI ∀X0∈RI : lim

t→∞ X(t) = X∗.

Example (Counter-example: Lotka-Volterra model) The Lotka-Volterra model is not stable (it keeps oscillating).

2 4 6 8 10 12 50 100 150 200 250 300 prey predators

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 29 / 59

slide-53
SLIDE 53

Stability: Example

Example (Damped coupled harmonic oscillators) m1 m2 m3 m4 k0 k1 k2 k3 k4

Q = 0 Q = L

Equations of motion (with Q0 := 0, QD+1 := L): ˙ Pi = ki(Qi+1 − Qi − li) − ki−1(Qi − Qi−1 − li−1) − bi mi Pi ˙ Qi = Pi/mi Because of the friction, this system is stable (oscillations die out):

5 10 15 20 25 30 35 40 45 50 −4 −2 2 4 6 8

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 30 / 59

slide-54
SLIDE 54

Equilibrium of Observational system

Given an ODE D: ˙ Xi(t) = fi(XpaD(i)) Xi(0) = (X0)i ∀i ∈ I At equilibrium, the rate of change of any variable is zero. This yields the following equilibrium equations: 0 = fi(XpaD(i)) ∀i ∈ I This is a set of D coupled equations with unknowns X1, . . . , XD. The stability assumption implies that there exists a unique solution X∗ of the equilibrium equations.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 31 / 59

slide-55
SLIDE 55

Labeling Equilibrium Equations

Note that the dynamical system contains “labels” for the equations: in case of an intervention on Xi, simply change the dynamical equation for ˙ Xi. This information is lost when considering the equilibrium equations. In order to model perfect interventions, we introduce labels for the equilibrium equations. Definition Given an ODE D: ˙ Xi(t) = fi(XpaD(i)) Xi(0) = (X0)i ∀i ∈ I its system ED of Labeled Equilibrium Equations (LEE) is given by: i : 0 = fi(XpaD(i)) ∀i ∈ I

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 32 / 59

slide-56
SLIDE 56

Induced LEE: Example

Example (Damped coupled harmonic oscillators) m1 m2 m3 m4 k0 k1 k2 k3 k4

Q = 0 Q = L

Equations of motion: ˙ Pi = ki(Qi+1 − Qi − li) − ki−1(Qi − Qi−1 − li−1) − bi mi Pi ˙ Qi = Pi/mi The induced Labeled Equilibrium Equations are given by: Ei :

  • = ki(Qi+1 − Qi − li) − ki−1(Qi − Qi−1 − li−1) − bi

mi Pi

= Pi

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 33 / 59

slide-57
SLIDE 57

Equilibrium of Intervened systems

D: ˙ Xi(t) = fi(XpaD(i)), Xi(0) = (X0)i i ∈ I Ddo(XI =ξI ): ˙ Xi(t) = 0, Xi(0) = ξi i ∈ I ˙ Xi(t) = fi(XpaD(i)), Xi(0) = (X0)i i ∈ I \ I ED: 0 = fi(XpaD(i)) i ∈ I EDdo(XI =ξI ): Xi = ξi i ∈ I 0 = fi(XpaD(i)) i ∈ I \ I intervention equilibration

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 34 / 59

slide-58
SLIDE 58

From LEE to SCM

Definition Given a system of Labeled Equilibrium Equations (LEE) E: i : 0 = fi(XpaE(i)) ∀i ∈ I the induced SCM is obtained by solving each equation Ei for Xi in terms of the other variables: Xi = gi(XpaE(i)\{i}) ∀i ∈ I Note: This definition only makes sense if each labeled equilibrium equation Ei has a unique solution for Xi.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 35 / 59

slide-59
SLIDE 59

Induced SCM: Example

Example (Damped coupled harmonic oscillators)

ODE D: ˙ Pi = ki(Qi+1 − Qi − li) − ki−1(Qi − Qi−1 − li−1) − bi mi Pi ˙ Qi = Pi/mi Induced LEE ED: Ei :

  • = ki(Qi+1 − Qi − li) − ki−1(Qi − Qi−1 − li−1) − bi

mi Pi

= Pi Induced SCM MED: Qi = ki(Qi+1 − li) + ki−1(Qi−1 + li−1) ki + ki+1 , Pi = 0. Graph of induced SCM GMED : Q1 Q2 Q3 Q4

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 36 / 59

slide-60
SLIDE 60

From ODEs to SCMs

Theorem (Mooij, Janzing, Sch¨

  • lkopf, UAI 2013)

Under certain stability conditions on the ODE D and the intervened ODE Ddo(XI =ξI ):

1 The following diagram commutes:

ODE D LEE ED SCM MED intervened ODE Ddo(XI =ξI ) intervened LEE EDdo(XI =ξI ) intervened SCM MEDdo(XI =ξI )

2 If the intervened ODE Ddo(XI =ξI ) is stable, the induced intervened

SCM MEDdo(XI =ξI ) has a unique solution that coincides with the stable equilibrium of the intervened ODE Ddo(XI =ξI ). (Similar result was derived by [Dash, 2003] for the acyclic case.)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 37 / 59

slide-61
SLIDE 61

Conclusion: There is a bridge between the two worlds!

Ordinary Differential Equations Structural Causal Models

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 38 / 59

slide-62
SLIDE 62

Discussion

We have shown one particular way in which structural causal models can be “derived”. This shows that cyclic SCMs (and cyclic LEEs) are a very natural way to model causal systems with feedback. This work dealt with the deterministic case. Uncertainty can arise in several ways:

1

uncertainty about (constant) parameters of the differential equations;

2

uncertainty about the initial condition (in the case of constants of motion);

3

latent variables (in the case of confounding).

Dealing with uncertainty is work in progress (similar ideas, but more involved).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 39 / 59

slide-63
SLIDE 63

Part IV Causal Discovery in case of feedback

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 40 / 59

slide-64
SLIDE 64

Case study: Reconstructing a signalling network

Protein Abundance Data:

[Sachs et al., 2005]

1 2 3 4 5 6 7 8 Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Condition Reagent Intervention 1

  • bservational

2 Akt-inhibitor inhibits AKT activity 3 G0076 inhibits PKC activity 4 Psitectorigenin inhibits PIP2 abundance 5 U0126 inhibits MEK activity 6 LY294002 inhibits PIP2/PIP3 activity 7 PMA activates PKC activity 8 β2CAMP activates PKA activity

Causal Mechanism:

(“Signalling network”)

Raf Mek Erk Plcg PIP2 PKC PIP3 Akt PKA P38 Jnk

(depicted here: “consensus” network) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 41 / 59

slide-65
SLIDE 65

Motivation

Good test case for causal discovery methods, because: High-quality data:

Single-cell measurements Many data points (about 104) Small measurement noise

Much knowledge about “ground truth” Possibly important applications in cancer medicine

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 42 / 59

slide-66
SLIDE 66

Motivation

Good test case for causal discovery methods, because: High-quality data:

Single-cell measurements Many data points (about 104) Small measurement noise

Much knowledge about “ground truth” Possibly important applications in cancer medicine Good results obtained by [Sachs et al., 2005] assuming acyclicity and causal sufficiency using Bayesian network learning with discretized data.

  • But. . .

Data shows evidence of feedback loops (cycles). No suitable cyclic causal discovery methods available (but: [Itani et al., 2010, Schmidt and Murphy, 2009] for discretized data).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 42 / 59

slide-67
SLIDE 67

The importance of modeling feedback

Feedback plays an important role in many biological systems. Ignoring feedback may lead to unwanted surprises, e.g., [Hall-Jackson et al., 1999]:

“Here, we describe a compound (ZM 336372) that is a potent inhibitor

  • f the protein kinase c-Raf in vitro. Paradoxically, however, incubation
  • f mammalian cells with this compound induces an enormous activation
  • f c-Raf and the B-Raf isoform (measured in the absence of the drug),

suggesting that a feedback control loop exists by which Raf isoforms suppress their own activation. This unexpected finding may explain why ZM 336372 does not reverse the phenotype of Ras-transformed cell lines, and suggests that inhibition of the kinase activity of Raf might not be a good approach for the development of an anti-cancer drug.”

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 43 / 59

slide-68
SLIDE 68

The data (scatter plots)

2 4 6 8 10 2 4 6 8 10 ln Raf ln Mek 2 4 6 8 10 2 4 6 8 10 ln Mek ln Erk

condition 1 (observational), condition 5 (MEK inhibitor)

Note: Noise can be very small (so observation noise is small) Strong correlation between Raf and Mek (consensus: Raf → Mek) Evidence for feedback (intervening on Mek changes Raf) No dependence between Mek and Erk (consensus: Mek → Erk)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 44 / 59

slide-69
SLIDE 69

Challenge: faithfulness violations

Expected correlations

Raf Raf Mek Mek PLCg PLCg PIP2 PIP2 PIP3 PIP3 Erk Erk Akt Akt PKA PKA PKC PKC p38 p38 JNK JNK

Measured correlations

Raf Raf Mek Mek PLCg PLCg PIP2 PIP2 PIP3 PIP3 Erk Erk Akt Akt PKA PKA PKC PKC p38 p38 JNK JNK

Faithfulness violations

Raf Raf Mek Mek PLCg PLCg PIP2 PIP2 PIP3 PIP3 Erk Erk Akt Akt PKA PKA PKC PKC p38 p38 JNK JNK Raf Mek Erk Plcg PIP2 PKC PIP3 Akt PKA P38 Jnk

Consensus causal graph

This means that we need to combine observational and interventional data.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 45 / 59

slide-70
SLIDE 70

Goal The goal of this work:

Perform more sophisticated causal analysis of the data by. . .

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

slide-71
SLIDE 71

Goal The goal of this work:

Perform more sophisticated causal analysis of the data by. . . Modeling feedback loops; Modeling the interventions in a realistic way; Using continuous data instead of a coarsely discretized version, allowing for nonlinear causal mechanisms;

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

slide-72
SLIDE 72

Goal The goal of this work:

Perform more sophisticated causal analysis of the data by. . . Modeling feedback loops; Modeling the interventions in a realistic way; Using continuous data instead of a coarsely discretized version, allowing for nonlinear causal mechanisms; . . . and by doing so, arrive at a more realistic reconstruction of the signalling network than [Sachs et al., 2005] originally obtained by using (acyclic) discrete-valued Bayesian networks.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

slide-73
SLIDE 73

Basic Modeling Assumptions

We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed;

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

slide-74
SLIDE 74

Basic Modeling Assumptions

We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model Xi = fi(Xpa(i), Ei), i = 1, . . . , D;

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

slide-75
SLIDE 75

Basic Modeling Assumptions

We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model Xi = fi(Xpa(i), Ei), i = 1, . . . , D; E is constant in time but varies over cells;

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

slide-76
SLIDE 76

Basic Modeling Assumptions

We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model Xi = fi(Xpa(i), Ei), i = 1, . . . , D; E is constant in time but varies over cells; The reagents may change the structural equations locally;

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

slide-77
SLIDE 77

Basic Modeling Assumptions

We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model Xi = fi(Xpa(i), Ei), i = 1, . . . , D; E is constant in time but varies over cells; The reagents may change the structural equations locally; Causal sufficiency (all Ei are jointly independent).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

slide-78
SLIDE 78

Induced distributions of cyclic SCMs

Lemma (Induced distribution of cyclic SCMs) If for each value of the noise E, there exists a unique solution X(E) of the structural equations {Xi = fi(Xpa(i), Ei)}, a SCM induces a unique observational distribution p(X). In the acyclic case, that assumption is automatically satisfied. If the mapping E → X(E) is invertable, the induced density satisfies: p(X) = pE

  • E(X)
  • det ∂E

∂X

  • .

This means that under these assumptions, we can write down the likelihood of the data as a function of the model parameters.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 48 / 59

slide-79
SLIDE 79

Modeling Interventions with a SCM

Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

slide-80
SLIDE 80

Modeling Interventions with a SCM

Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on Xi replaces the structural equation for Xi with Xi = ξi (standard “perfect” interventions); An activity intervention on Xi replaces the causal mechanisms for its children Xj, i ∈ pa(j) by other causal mechanisms Xj = ˜ fj(Xpa(j), Ej).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

slide-81
SLIDE 81

Modeling Interventions with a SCM

Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on Xi replaces the structural equation for Xi with Xi = ξi (standard “perfect” interventions); An activity intervention on Xi replaces the causal mechanisms for its children Xj, i ∈ pa(j) by other causal mechanisms Xj = ˜ fj(Xpa(j), Ej). Example: No intervention

X1 X2 X3 X4 X5 X6 X1 = f1(X5, E1) X2 = f2(E2) X3 = f3(X1, X2, E3) X4 = f4(X2, E4) X5 = f5(X3, E5) X6 = f6(X3, X4, E6)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

slide-82
SLIDE 82

Modeling Interventions with a SCM

Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on Xi replaces the structural equation for Xi with Xi = ξi (standard “perfect” interventions); An activity intervention on Xi replaces the causal mechanisms for its children Xj, i ∈ pa(j) by other causal mechanisms Xj = ˜ fj(Xpa(j), Ej). Example: Abundance intervention on X3

X1 X2 X3 X4 X5 X6 X1 = f1(X5, E1) X2 = f2(E2) X3 = f3(X1, X2, E3) ξ3 X4 = f4(X2, E4) X5 = f5(X3, E5) X6 = f6(X3, X4, E6)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

slide-83
SLIDE 83

Modeling Interventions with a SCM

Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on Xi replaces the structural equation for Xi with Xi = ξi (standard “perfect” interventions); An activity intervention on Xi replaces the causal mechanisms for its children Xj, i ∈ pa(j) by other causal mechanisms Xj = ˜ fj(Xpa(j), Ej). Example: Activity intervention on X3

X1 X2 X3 X4 X5 X6 X1 = f1(X5, E1) X2 = f2(E2) X3 = f3(X1, X2, E3) X4 = f4(X2, E4) X5 = f5(X3, E5) ˜ f5(X3, E5) X6 = f6(X3, X4, E6) ˜ f6(X3, X4, E6)

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

slide-84
SLIDE 84

Algorithm: Score-based approach

Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G, given the data (and prior assumptions).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

slide-85
SLIDE 85

Algorithm: Score-based approach

Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G, given the data (and prior assumptions). Given a hypothetical causal graph G, numerically optimize the posterior with respect to the parameters.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

slide-86
SLIDE 86

Algorithm: Score-based approach

Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G, given the data (and prior assumptions). Given a hypothetical causal graph G, numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

slide-87
SLIDE 87

Algorithm: Score-based approach

Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G, given the data (and prior assumptions). Given a hypothetical causal graph G, numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G. Number of possible causal graphs G for 11 variables: 31603459396418917607425 (acyclic) 1298074214633706907132624082305024 (cyclic). Use local search to explore posterior distribution over causal graphs.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

slide-88
SLIDE 88

Algorithm: Score-based approach

Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G, given the data (and prior assumptions). Given a hypothetical causal graph G, numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G. Number of possible causal graphs G for 11 variables: 31603459396418917607425 (acyclic) 1298074214633706907132624082305024 (cyclic). Use local search to explore posterior distribution over causal graphs. Stability selection [Meinshausen et al., 2010] to identify stable causal relations.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

slide-89
SLIDE 89

Comparison with ground truth (max. 17 edges, acyclic)

For comparison with the consensus model and the reconstructed model by Sachs et al., we constrain the number of edges:

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Consensus Sachs et al. This work

Black: expected, Blue: novel findings, Red dashed: missing.

Our acyclic, strongly regularised, result deviates more from the “consensus” network. Actually seems to be good news!

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 51 / 59

slide-90
SLIDE 90

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-91
SLIDE 91

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-92
SLIDE 92

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-93
SLIDE 93

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-94
SLIDE 94

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-95
SLIDE 95

Comparison with ground truth (max. 17 edges, acyclic)

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK no intervention AKT activity PKC activity PIP2 abundance MEK activity PIP2/PIP3 soft PKC activity PKA activity 50 100 150 200 250 300 350 400 450 500

This work KS test w.r.t. observational data

Black: expected, Blue: novel findings, Red dashed: missing.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

slide-96
SLIDE 96

Results (max. 17 edges, acyclic)

Acyclic, strongly regularized results for different priors:

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

linear nonlinear nonlinear Gaussian Gaussian non-Gaussian

Black: expected, Blue: novel findings, Red dashed: missing.

Note: no strong dependence on prior.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 53 / 59

slide-97
SLIDE 97

Results (max. 17 edges, cyclic)

Cyclic, strongly regularized results for different priors:

Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK

linear nonlinear nonlinear Gaussian Gaussian non-Gaussian

Black: expected, Blue: novel findings, Red dashed: missing.

Good news: Our method reveals some likely feedback cycles. Bad news: stronger dependence on prior (more data needed?).

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 54 / 59

slide-98
SLIDE 98

Discussion

Performing a proper causal analysis of this data is a challenging task: time-series data are absent, so need to assume homeostatis; confounders could be present; feedback loops are expected to be present; most interventions change the activity instead of the abundance; assumptions about the specificity of interventions may be unrealistic; faithfulness violations are present.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 55 / 59

slide-99
SLIDE 99

Discussion

Performing a proper causal analysis of this data is a challenging task: time-series data are absent, so need to assume homeostatis; confounders could be present; feedback loops are expected to be present; most interventions change the activity instead of the abundance; assumptions about the specificity of interventions may be unrealistic; faithfulness violations are present. Main contributions: More principled approach to learn structure of (a)cyclic causal models from combination of observational and interventional equilibrium data. Natural way to model activity interventions.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 55 / 59

slide-100
SLIDE 100

Conclusions and future work

Conclusions: Results support the hypothesis that the underlying system contains feedback loops. The proposed method identifies a few likely feedback loops, but more data is probably necessary.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 56 / 59

slide-101
SLIDE 101

Conclusions and future work

Conclusions: Results support the hypothesis that the underlying system contains feedback loops. The proposed method identifies a few likely feedback loops, but more data is probably necessary. Future work: Analysis of causal predictive performance: do our models give more accurate predictions, also for (new) interventions? Experimental evaluation of predictions.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 56 / 59

slide-102
SLIDE 102

Part V Causal Inference: Outlook

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 57 / 59

slide-103
SLIDE 103

Three interesting and important future directions

1 The field has focussed mainly on acyclic causal systems. Feedback

  • ccurs in many different systems in biology, economy, and other
  • fields. A lot of interesting work remains to be done for the cyclic case.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 58 / 59

slide-104
SLIDE 104

Three interesting and important future directions

1 The field has focussed mainly on acyclic causal systems. Feedback

  • ccurs in many different systems in biology, economy, and other
  • fields. A lot of interesting work remains to be done for the cyclic case.

2 The Causal Discovery literature has focussed mainly on the special

case of purely observational data. In practice, interventional data is

  • ften available as well, and this data typically conveys important

information about the underlying causal structure. Designing good methods and algorithms that can use this data may have a big impact in many empirical sciences.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 58 / 59

slide-105
SLIDE 105

Three interesting and important future directions

1 The field has focussed mainly on acyclic causal systems. Feedback

  • ccurs in many different systems in biology, economy, and other
  • fields. A lot of interesting work remains to be done for the cyclic case.

2 The Causal Discovery literature has focussed mainly on the special

case of purely observational data. In practice, interventional data is

  • ften available as well, and this data typically conveys important

information about the underlying causal structure. Designing good methods and algorithms that can use this data may have a big impact in many empirical sciences.

3 Related to AI: Can we build “intelligent” systems that are able to

learn a causal model of the world? An important ingredient (in addition to being able to learn from given data) is active learning, or experimental design.

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 58 / 59

slide-106
SLIDE 106

Three interesting and important future directions

1 The field has focussed mainly on acyclic causal systems. Feedback

  • ccurs in many different systems in biology, economy, and other
  • fields. A lot of interesting work remains to be done for the cyclic case.

2 The Causal Discovery literature has focussed mainly on the special

case of purely observational data. In practice, interventional data is

  • ften available as well, and this data typically conveys important

information about the underlying causal structure. Designing good methods and algorithms that can use this data may have a big impact in many empirical sciences.

3 Related to AI: Can we build “intelligent” systems that are able to

learn a causal model of the world? An important ingredient (in addition to being able to learn from given data) is active learning, or experimental design.

Thanks for your attention!

Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 58 / 59

slide-107
SLIDE 107

Acknowledgments and References

I thank Bram Thijssen and Tjeerd Dijkstra for stimulating discussions. JM was supported by NWO, the Netherlands Organization for Scientific Research (VENI grant 639.031.036). Dash, D. (2003). Caveats for Causal Reasoning. PhD thesis, University of Pittsburgh, Pittsburgh, PA. Hall-Jackson, C. A., Eyers, P. A., Cohen, P., Goedert, M., Boyle, F. T., Hewitt, N., Plant, H., and Hedge, P. (1999). Paradoxical activation of Raf by a novel Raf inhibitor. Chemistry & Biology, 6:559–568. Itani, S., Ohannessian, M., Sachs, K., Nolan, G. P., and Dahleh, M. A. (2010). Structure learning in causal cyclic networks. In JMLR Workshop and Conference Proceedings, volume 6, page 165176. Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., and Nolan, G. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308:523–529. Schmidt, M. and Murphy, K. (2009). Modeling discrete interventional data using directed cyclic graphical models. In Proceedings of the 25th Annual Conference on Uncertainty in Artificial Intelligence (UAI-09). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 59 / 59