Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl - PowerPoint PPT Presentation

Structural Causal Models: Interventions For a causal model, we also need to model interventions. Interventions in SCMs An intervention do( X i = ξ i ) on a variable X i , forcing it to attain the value ξ i , changes the structural equation for X i as follows: Original SCM M : Intervened SCM M ξ i : X i = f i ( X pa ( i ) , E i ) X i = ξ i X j = f j ( X pa ( j ) , E j ) ∀ j � = i X j = f j ( X pa ( j ) , E j ) ∀ j � = i p ( E ) = . . . p ( E ) = . . . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 14 / 59

Structural Causal Models: Interventions For a causal model, we also need to model interventions. Interventions in SCMs An intervention do( X i = ξ i ) on a variable X i , forcing it to attain the value ξ i , changes the structural equation for X i as follows: Original SCM M : Intervened SCM M ξ i : X i = f i ( X pa ( i ) , E i ) X i = ξ i X j = f j ( X pa ( j ) , E j ) ∀ j � = i X j = f j ( X pa ( j ) , E j ) ∀ j � = i p ( E ) = . . . p ( E ) = . . . Interpretation: overriding default causal mechanisms that normally would determine the values of the intervened variables. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 14 / 59

Structural Causal Models: Interventions Example Observational (no intervention): Causal graph G M : Structural causal model M : X 1 = f 1 ( E 1 ) p ( E 1 ) = . . . X 2 X 1 X 2 = f 2 ( E 2 ) p ( E 2 ) = . . . X 3 = f 3 ( X 1 , X 2 , E 3 ) p ( E 3 ) = . . . X 3 X 4 X 4 = f 4 ( X 1 , E 4 ) p ( E 4 ) = . . . X 5 = f 5 ( X 3 , X 4 , E 5 ) p ( E 5 ) = . . . p ( E ) = � i p ( E i ) X 5 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

Structural Causal Models: Interventions Example Intervention do( X 1 = ξ 1 ): Causal graph G M ξ 1 : Structural causal model M ξ 1 : X 1 = ξ 1 p ( E 1 ) = . . . X 2 X 1 X 2 = f 2 ( E 2 ) p ( E 2 ) = . . . X 3 = f 3 ( X 1 , X 2 , E 3 ) p ( E 3 ) = . . . X 3 X 4 X 4 = f 4 ( X 1 , E 4 ) p ( E 4 ) = . . . X 5 = f 5 ( X 3 , X 4 , E 5 ) p ( E 5 ) = . . . p ( E ) = � i p ( E i ) X 5 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

Structural Causal Models: Interventions Example Intervention do( X 3 = ξ 3 ): Causal graph G M ξ 3 : Structural causal model M ξ 3 : X 1 = f 1 ( E 1 ) p ( E 1 ) = . . . X 2 X 1 X 2 = f 2 ( E 2 ) p ( E 2 ) = . . . X 3 = ξ 3 p ( E 3 ) = . . . X 3 X 4 X 4 = f 4 ( X 1 , E 4 ) p ( E 4 ) = . . . X 5 = f 5 ( X 3 , X 4 , E 5 ) p ( E 5 ) = . . . p ( E ) = � i p ( E i ) X 5 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 15 / 59

Confounders and causal sufficiency Definition: Confounder A confounder is a latent common cause of two or more observed variables. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

Confounders and causal sufficiency Definition: Confounder A confounder is a latent common cause of two or more observed variables. Example Significant correlation ( p = 0 . 008) between human birth rate and number of stork populations in European countries [Matthews, 2000] Most people nowadays do not believe that storks deliver babies (nor that babies deliver storks) There must be some confounder explaining the correlation ? S B S B S B Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

Confounders and causal sufficiency Definition: Confounder A confounder is a latent common cause of two or more observed variables. Absence of confounders implies causal sufficiency. Definition: Causal Sufficiency If all latent variables E 1 , . . . , E N in an SCM are jointly independent, i.e., if N � p ( E ) = p ( E i ) i =1 then we say that the observed variables X are causally sufficient. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 16 / 59

Causal feedback and (A)cyclicity Definition: causal feedback A SCM incorporates causal feedback if its graph contains a directed cycle X i 0 → X i 1 → · · · → X i n , X i 0 = X i n If it does not contain such a directed cycle, the model is called acyclic. If it is also causally sufficient, its graph is a Directed Acyclic Graph (DAG). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 17 / 59

Causal feedback and (A)cyclicity Definition: causal feedback A SCM incorporates causal feedback if its graph contains a directed cycle X i 0 → X i 1 → · · · → X i n , X i 0 = X i n If it does not contain such a directed cycle, the model is called acyclic. If it is also causally sufficient, its graph is a Directed Acyclic Graph (DAG). Example In economy, causal feedback is of- S ten present: R R : risks taken by bank; B : imminent bankruptcy; B S : saved by the government. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 17 / 59

Factorization: Bayesian Networks Theorem Any probability distribution induced by an acyclic , causally sufficient SCM M can be factorized as: N � p M ( X 1 , . . . , X N ) = p M ( X i | X pa ( i ) ) i =1 Example Causal graph G M : Structural causal model M : X 1 = f 1 ( E 1 ) p ( E 1 ) = . . . X 2 X 1 X 2 = f 2 ( E 2 ) p ( E 2 ) = . . . X 3 = f 3 ( X 1 , X 2 , E 3 ) p ( E 3 ) = . . . X 3 X 4 X 4 = f 4 ( X 1 , E 4 ) p ( E 4 ) = . . . X 5 = f 5 ( X 3 , X 4 , E 5 ) p ( E 5 ) = . . . X 5 p ( X 1 , . . . , X 5 ) = p ( X 1 ) p ( X 2 ) p ( X 3 | X 1 , X 2 ) p ( X 4 | X 1 ) p ( X 5 | X 3 , X 4 ) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 18 / 59

Causal Reasoning: Truncated factorization The following theorem expresses the joint distribution of a Bayesian network after an intervention. It is an example of causal reasoning. Theorem: Truncated factorization Any probability distribution induced by an acyclic , causally sufficient SCM M can be factorized as: N � p M ( X 1 , . . . , X N ) = p M ( X i | X pa ( i ) ) i =1 After an intervention do( X I = ξ I ), the probability distribution becomes: N � � � � p M ξ I X 1 , . . . , X N | do( X I = ξ I ) = p M ( X i | X pa ( i ) ) 1 [ X i = ξ i ] i =1 i ∈ I i / ∈ I Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 19 / 59

Part III Causal Modeling in case of feedback Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 20 / 59

Feedback Cyclic causal dependencies are also called feedback loops. Examples: Holding a microphone too close to a loudspeaker. Predator-prey relationships in biology. Computer programs running on a single core are acyclic ; parallel programs running on multiple cores can be cyclic . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 21 / 59

Feedback Cyclic causal dependencies are also called feedback loops. Examples: Holding a microphone too close to a loudspeaker. Predator-prey relationships in biology. Computer programs running on a single core are acyclic ; parallel programs running on multiple cores can be cyclic . Example Two masses, connected by a spring, suspended from the ceiling by another spring. Vertical equilibrium positions Q 1 and Q 2 . Q 1 Q 1 causes Q 2 . Q 2 Q 2 causes Q 1 . Example of a two-cycle: cannot be modeled with (causal) Bayesian network. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 21 / 59

Causal modeling of feedback systems Question: What are good mathematical representations of cyclic causal models? Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

Causal modeling of feedback systems Question: What are good mathematical representations of cyclic causal models? No consensus in the field. . . (Causal) Bayesian networks are acyclic by definition, and extending the definition to cyclic graphs [Schmidt & Murphy, 2009; Itani et al. , 2010] seems problematic. Extending the global Markov condition to cyclic models for linear models works [Spirtes, 1993], but nonlinear and discrete models yield problems [Spirtes, 1995; Pearl & Dechter, 1996; Neal 2000]. Structural Causal Models have a “natural” extension to the cyclic case. But how to interpret these models in terms of a data generating process? Is this “the right” mathematical framework? Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

Causal modeling of feedback systems Question: What are good mathematical representations of cyclic causal models? No consensus in the field. . . (Causal) Bayesian networks are acyclic by definition, and extending the definition to cyclic graphs [Schmidt & Murphy, 2009; Itani et al. , 2010] seems problematic. Extending the global Markov condition to cyclic models for linear models works [Spirtes, 1993], but nonlinear and discrete models yield problems [Spirtes, 1995; Pearl & Dechter, 1996; Neal 2000]. Structural Causal Models have a “natural” extension to the cyclic case. But how to interpret these models in terms of a data generating process? Is this “the right” mathematical framework? How do scientists usually model systems with feedback? Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 22 / 59

Two different worlds? ? Ordinary Differential Equations Structural Causal Models Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 23 / 59

From dynamical systems to causal models (in a nutshell) � ˙ X = − 0 . 5 X + Y , X (0) = 1 1 Ordinary Differential Equations ˙ Y = − X + 0 . 2 Y , Y (0) = 2 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

From dynamical systems to causal models (in a nutshell) � ˙ X = − 0 . 5 X + Y , X (0) = 1 1 Ordinary Differential Equations ˙ Y = − X + 0 . 2 Y , Y (0) = 2 � X : 0 = − 0 . 5 X + Y 2 Labeled Equilibrium Equations Y : 0 = − X + 0 . 2 Y Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

From dynamical systems to causal models (in a nutshell) � ˙ X = − 0 . 5 X + Y , X (0) = 1 1 Ordinary Differential Equations ˙ Y = − X + 0 . 2 Y , Y (0) = 2 � X : 0 = − 0 . 5 X + Y 2 Labeled Equilibrium Equations Y : 0 = − X + 0 . 2 Y � X = 2 Y 3 Structural Causal Model Y = 5 X Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

From dynamical systems to causal models (in a nutshell) � ˙ X = − 0 . 5 X + Y , X (0) = 1 1 Ordinary Differential Equations ˙ Y = − X + 0 . 2 Y , Y (0) = 2 � X : 0 = − 0 . 5 X + Y 2 Labeled Equilibrium Equations Y : 0 = − X + 0 . 2 Y � X = 2 Y 3 Structural Causal Model Y = 5 X  X = 2 Y + E X   4 Dealing with Uncertainty Y = 5 X + E Y  p ( E X , E Y ) = . . .  Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 24 / 59

ODEs: Definition Definition (ODE) An Ordinary Differential Equation model (ODE) is a dynamical system D described by D coupled first-order ordinary differential equations and initial condition X 0 : � ˙ X i ( t ) := dX i dt ( t ) = f i ( X pa D ( i ) ) ∀ i = 1 , . . . , D X i (0) = ( X 0 ) i pa D ( i ) ⊆ { 1 , . . . , D } is the set of parents of variable X i . Each f i : R pa D ( i ) → R i is a (sufficiently smooth) function. The structure can be represented as a directed graph G D , with nodes { X i } i ∈I and a directed edge X i → X j iff ˙ X j depends on X i . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 25 / 59

ODEs: Example Example (Lotka-Volterra model) Lotka-Volterra model : well-known model from population biology Abundance of prey X 1 ∈ [0 , ∞ ) (e.g., rabbits) Abundance of predators X 2 ∈ [0 , ∞ ) (e.g., wolves) ODE D : � ˙ � X 1 = X 1 ( θ 11 − θ 12 X 2 ) X 1 (0) = a ˙ X 2 = − X 2 ( θ 22 − θ 21 X 1 ) X 2 (0) = b Graph G D : 300 prey 250 predators X 1 X 2 200 150 100 50 0 0 2 4 6 8 10 12 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 26 / 59

Perfect Interventions The dynamical system D is assumed to describe the “natural” or observational state of the system. Causal models aim to predict also the effects of interventions in which the system is actively perturbed from its natural state. Interventions can be modeled in different ways. Here we look at perfect interventions. Definition (Perfect Interventions) The perfect intervention do( X I = ξ I ) means that X I is enforced to attain the value ξ I for all times t ∈ [0 , ∞ ). This changes the ODE D into the intervened system D do( X I = ξ I ) : � � 0 i ∈ I ξ i i ∈ I ˙ X i ( t ) = X i (0) = f i ( X pa D ( i ) ) i ∈ I \ I , ( X 0 ) i i ∈ I \ I Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 27 / 59

Perfect Interventions in ODEs: Example Example (Lotka-Volterra model) � ˙ � X 1 = X 1 ( θ 11 − θ 12 X 2 ) X 1 (0) = a D : ˙ X 2 = − X 2 ( θ 22 − θ 21 X 1 ) X 2 (0) = b Perfect intervention do( X 2 = ξ 2 ): Monitor the abundance of wolves and make sure that the number equals the target value ξ 2 at all time. � ˙ � X 1 = X 1 ( θ 11 − θ 12 X 2 ) X 1 (0) = a D do( X 2 = ξ 2 ) : ˙ X 2 = − X 2 ( θ 22 − θ 21 X 1 ) 0 X 2 (0) = b ξ 2 300 prey 250 predators 200 150 100 X 1 X 2 G D do( X 2= ξ 2) : 50 0 0 2 4 6 8 10 12 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 28 / 59

ODEs: Stability When studying the system in the limit t → ∞ , an important concept is stability: Definition (Stability) The ODE D is called stable if there exists a unique equilibrium state X ∗ ∈ R I such that for any initial state X 0 ∈ R I , the system converges to this equilibrium state as t → ∞ : t →∞ X ( t ) = X ∗ . ∃ ! X ∗ ∈R I ∀ X 0 ∈R I : lim Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 29 / 59

ODEs: Stability When studying the system in the limit t → ∞ , an important concept is stability: Definition (Stability) The ODE D is called stable if there exists a unique equilibrium state X ∗ ∈ R I such that for any initial state X 0 ∈ R I , the system converges to this equilibrium state as t → ∞ : t →∞ X ( t ) = X ∗ . ∃ ! X ∗ ∈R I ∀ X 0 ∈R I : lim Example (Counter-example: Lotka-Volterra model) The Lotka-Volterra model is not stable (it keeps oscillating). 300 prey 250 predators 200 150 100 50 0 0 2 4 6 8 10 12 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 29 / 59

Stability: Example Example (Damped coupled harmonic oscillators) Q = 0 Q = L m 1 m 2 m 3 m 4 k 0 k 1 k 2 k 3 k 4 Equations of motion (with Q 0 := 0 , Q D +1 := L ): P i = k i ( Q i +1 − Q i − l i ) − k i − 1 ( Q i − Q i − 1 − l i − 1 ) − b i ˙ P i m i ˙ Q i = P i / m i Because of the friction, this system is stable (oscillations die out): 8 6 4 2 0 −2 −4 0 5 10 15 20 25 30 35 40 45 50 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 30 / 59

Equilibrium of Observational system Given an ODE D : � ˙ X i ( t ) = f i ( X pa D ( i ) ) ∀ i ∈ I X i (0) = ( X 0 ) i At equilibrium, the rate of change of any variable is zero. This yields the following equilibrium equations: 0 = f i ( X pa D ( i ) ) ∀ i ∈ I This is a set of D coupled equations with unknowns X 1 , . . . , X D . The stability assumption implies that there exists a unique solution X ∗ of the equilibrium equations. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 31 / 59

Labeling Equilibrium Equations Note that the dynamical system contains “labels” for the equations: in case of an intervention on X i , simply change the dynamical equation for ˙ X i . This information is lost when considering the equilibrium equations. In order to model perfect interventions, we introduce labels for the equilibrium equations. Definition Given an ODE D : � ˙ X i ( t ) = f i ( X pa D ( i ) ) ∀ i ∈ I X i (0) = ( X 0 ) i its system E D of Labeled Equilibrium Equations (LEE) is given by: i : 0 = f i ( X pa D ( i ) ) ∀ i ∈ I Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 32 / 59

Induced LEE: Example Example (Damped coupled harmonic oscillators) Q = 0 Q = L m 1 m 2 m 3 m 4 k 0 k 1 k 2 k 3 k 4 Equations of motion: P i = k i ( Q i +1 − Q i − l i ) − k i − 1 ( Q i − Q i − 1 − l i − 1 ) − b i ˙ P i m i ˙ Q i = P i / m i The induced Labeled Equilibrium Equations are given by: � = k i ( Q i +1 − Q i − l i ) − k i − 1 ( Q i − Q i − 1 − l i − 1 ) − b i 0 m i P i E i : 0 = P i Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 33 / 59

Equilibrium of Intervened systems E D : D : equilibration � ˙ X i ( t ) = f i ( X pa D ( i ) ) , 0 = f i ( X pa D ( i ) ) i ∈ I i ∈ I X i (0) = ( X 0 ) i intervention E D do( X I = ξ I ) : D do( X I = ξ I ) : � ˙ X i ( t ) = 0 , i ∈ I X i = ξ i i ∈ I X i (0) = ξ i � ˙ X i ( t ) = f i ( X pa D ( i ) ) , 0 = f i ( X pa D ( i ) ) i ∈ I \ I i ∈ I \ I X i (0) = ( X 0 ) i Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 34 / 59

From LEE to SCM Definition Given a system of Labeled Equilibrium Equations (LEE) E : i : 0 = f i ( X pa E ( i ) ) ∀ i ∈ I the induced SCM is obtained by solving each equation E i for X i in terms of the other variables: X i = g i ( X pa E ( i ) \{ i } ) ∀ i ∈ I Note: This definition only makes sense if each labeled equilibrium equation E i has a unique solution for X i . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 35 / 59

Induced SCM: Example Example (Damped coupled harmonic oscillators) ODE D : P i = k i ( Q i +1 − Q i − l i ) − k i − 1 ( Q i − Q i − 1 − l i − 1 ) − b i ˙ P i m i ˙ Q i = P i / m i Induced LEE E D : � = k i ( Q i +1 − Q i − l i ) − k i − 1 ( Q i − Q i − 1 − l i − 1 ) − b i 0 m i P i E i : 0 = P i Induced SCM M E D : Q i = k i ( Q i +1 − l i ) + k i − 1 ( Q i − 1 + l i − 1 ) , P i = 0 . k i + k i +1 Graph of induced SCM G M ED : Q 1 Q 2 Q 3 Q 4 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 36 / 59

From ODEs to SCMs Theorem (Mooij, Janzing, Sch¨ olkopf, UAI 2013) Under certain stability conditions on the ODE D and the intervened ODE D do( X I = ξ I ) : 1 The following diagram commutes: LEE SCM ODE M E D E D D intervened SCM intervened ODE intervened LEE M E D do( X I = ξ I ) E D do( X I = ξ I ) D do( X I = ξ I ) 2 If the intervened ODE D do( X I = ξ I ) is stable, the induced intervened SCM M E D do( X I = ξ I ) has a unique solution that coincides with the stable equilibrium of the intervened ODE D do( X I = ξ I ) . (Similar result was derived by [Dash, 2003] for the acyclic case.) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 37 / 59

Conclusion: There is a bridge between the two worlds! → Ordinary Differential Equations Structural Causal Models Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 38 / 59

Discussion We have shown one particular way in which structural causal models can be “derived”. This shows that cyclic SCMs (and cyclic LEEs) are a very natural way to model causal systems with feedback. This work dealt with the deterministic case. Uncertainty can arise in several ways: uncertainty about (constant) parameters of the differential equations; 1 uncertainty about the initial condition (in the case of constants of 2 motion); latent variables (in the case of confounding). 3 Dealing with uncertainty is work in progress (similar ideas, but more involved). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 39 / 59

Part IV Causal Discovery in case of feedback Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 40 / 59

Case study: Reconstructing a signalling network Protein Abundance Data: Causal Mechanism: [Sachs et al., 2005] (“Signalling network”) 1 2 PIP3 3 Plcg 4 5 PIP2 6 PKC PKA 7 8 Jnk Raf P38 Akt Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Condition Reagent Intervention Mek 1 - observational 2 Akt-inhibitor inhibits AKT activity 3 G0076 inhibits PKC activity Erk 4 Psitectorigenin inhibits PIP2 abundance 5 U0126 inhibits MEK activity 6 LY294002 inhibits PIP2/PIP3 activity 7 PMA activates PKC activity 8 β 2CAMP activates PKA activity (depicted here: “consensus” network) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 41 / 59

Motivation Good test case for causal discovery methods, because: High-quality data: Single-cell measurements Many data points (about 10 4 ) Small measurement noise Much knowledge about “ground truth” Possibly important applications in cancer medicine Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 42 / 59

Motivation Good test case for causal discovery methods, because: High-quality data: Single-cell measurements Many data points (about 10 4 ) Small measurement noise Much knowledge about “ground truth” Possibly important applications in cancer medicine Good results obtained by [Sachs et al., 2005] assuming acyclicity and causal sufficiency using Bayesian network learning with discretized data. But. . . Data shows evidence of feedback loops (cycles). No suitable cyclic causal discovery methods available (but: [Itani et al., 2010, Schmidt and Murphy, 2009] for discretized data). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 42 / 59

The importance of modeling feedback Feedback plays an important role in many biological systems. Ignoring feedback may lead to unwanted surprises, e.g., [Hall-Jackson et al., 1999]: “Here, we describe a compound (ZM 336372) that is a potent inhibitor of the protein kinase c-Raf in vitro. Paradoxically, however, incubation of mammalian cells with this compound induces an enormous activation of c-Raf and the B-Raf isoform (measured in the absence of the drug), suggesting that a feedback control loop exists by which Raf isoforms suppress their own activation. This unexpected finding may explain why ZM 336372 does not reverse the phenotype of Ras-transformed cell lines, and suggests that inhibition of the kinase activity of Raf might not be a good approach for the development of an anti-cancer drug.” Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 43 / 59

The data (scatter plots) 10 10 8 8 6 6 ln Mek ln Erk 4 4 2 2 0 0 0 2 4 6 8 10 0 2 4 6 8 10 ln Raf ln Mek condition 1 (observational), condition 5 (MEK inhibitor) Note: Noise can be very small (so observation noise is small) Strong correlation between Raf and Mek (consensus: Raf → Mek) Evidence for feedback (intervening on Mek changes Raf) No dependence between Mek and Erk (consensus: Mek → Erk) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 44 / 59

Challenge: faithfulness violations Consensus causal graph Expected correlations Raf PIP3 Mek PLCg Plcg PIP2 PIP2 PIP3 Erk PKC PKA Akt PKA Jnk Raf P38 Akt PKC p38 Mek JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Erk Measured correlations Faithfulness violations Raf Raf Mek Mek PLCg PLCg PIP2 PIP2 PIP3 PIP3 Erk Erk Akt Akt PKA PKA PKC PKC p38 p38 JNK JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK Raf Mek PLCg PIP2 PIP3 Erk Akt PKA PKC p38 JNK This means that we need to combine observational and interventional data. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 45 / 59

Goal The goal of this work: Perform more sophisticated causal analysis of the data by. . . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

Goal The goal of this work: Perform more sophisticated causal analysis of the data by. . . Modeling feedback loops; Modeling the interventions in a realistic way; Using continuous data instead of a coarsely discretized version, allowing for nonlinear causal mechanisms; Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

Goal The goal of this work: Perform more sophisticated causal analysis of the data by. . . Modeling feedback loops; Modeling the interventions in a realistic way; Using continuous data instead of a coarsely discretized version, allowing for nonlinear causal mechanisms; . . . and by doing so, arrive at a more realistic reconstruction of the signalling network than [Sachs et al., 2005] originally obtained by using (acyclic) discrete-valued Bayesian networks. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 46 / 59

Basic Modeling Assumptions We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

Basic Modeling Assumptions We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model X i = f i ( X pa ( i ) , E i ) , i = 1 , . . . , D ; Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

Basic Modeling Assumptions We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model X i = f i ( X pa ( i ) , E i ) , i = 1 , . . . , D ; E is constant in time but varies over cells; Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

Basic Modeling Assumptions We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model X i = f i ( X pa ( i ) , E i ) , i = 1 , . . . , D ; E is constant in time but varies over cells; The reagents may change the structural equations locally; Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

Basic Modeling Assumptions We make the following assumptions for modeling the data: Causal modeling assumptions No time-series data: the cells have reached equilibrium when the measurements are performed; The equilibrium abundances X are governed by a (possibly cyclic) Structural Causal Model X i = f i ( X pa ( i ) , E i ) , i = 1 , . . . , D ; E is constant in time but varies over cells; The reagents may change the structural equations locally; Causal sufficiency (all E i are jointly independent). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 47 / 59

Induced distributions of cyclic SCMs Lemma (Induced distribution of cyclic SCMs) If for each value of the noise E , there exists a unique solution X ( E ) of the structural equations { X i = f i ( X pa ( i ) , E i ) } , a SCM induces a unique observational distribution p ( X ) . In the acyclic case, that assumption is automatically satisfied. If the mapping E �→ X ( E ) is invertable, the induced density satisfies: � � � � det ∂ E � � � p ( X ) = p E E ( X ) � . � � ∂ X This means that under these assumptions, we can write down the likelihood of the data as a function of the model parameters. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 48 / 59

Modeling Interventions with a SCM Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

Modeling Interventions with a SCM Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on X i replaces the structural equation for X i with X i = ξ i (standard “perfect” interventions); An activity intervention on X i replaces the causal mechanisms for its children X j , i ∈ pa ( j ) by other causal mechanisms X j = ˜ f j ( X pa ( j ) , E j ). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

Modeling Interventions with a SCM Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on X i replaces the structural equation for X i with X i = ξ i (standard “perfect” interventions); An activity intervention on X i replaces the causal mechanisms for its children X j , i ∈ pa ( j ) by other causal mechanisms X j = ˜ f j ( X pa ( j ) , E j ). Example: No intervention X 1 = f 1 ( X 5 , E 1 ) X 1 X 2 X 2 = f 2 ( E 2 ) X 3 = f 3 ( X 1 , X 2 , E 3 ) X 3 X 4 X 4 = f 4 ( X 2 , E 4 ) X 5 = f 5 ( X 3 , E 5 ) X 5 X 6 X 6 = f 6 ( X 3 , X 4 , E 6 ) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

Modeling Interventions with a SCM Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on X i replaces the structural equation for X i with X i = ξ i (standard “perfect” interventions); An activity intervention on X i replaces the causal mechanisms for its children X j , i ∈ pa ( j ) by other causal mechanisms X j = ˜ f j ( X pa ( j ) , E j ). Example: Abundance intervention on X 3 X 1 = f 1 ( X 5 , E 1 ) X 1 X 2 X 2 = f 2 ( E 2 ) X 3 = f 3 ( X 1 , X 2 , E 3 ) ξ 3 X 3 X 4 X 4 = f 4 ( X 2 , E 4 ) X 5 = f 5 ( X 3 , E 5 ) X 5 X 6 X 6 = f 6 ( X 3 , X 4 , E 6 ) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

Modeling Interventions with a SCM Following [Sachs et al., 2005], we distinguish two types of interventions: abundance interventions that alter the abundance of some compound; activity interventions that alter the activity of some compound. Here, we propose to model these interventions as follows: An abundance intervention on X i replaces the structural equation for X i with X i = ξ i (standard “perfect” interventions); An activity intervention on X i replaces the causal mechanisms for its children X j , i ∈ pa ( j ) by other causal mechanisms X j = ˜ f j ( X pa ( j ) , E j ). Example: Activity intervention on X 3 X 1 = f 1 ( X 5 , E 1 ) X 1 X 2 X 2 = f 2 ( E 2 ) X 3 = f 3 ( X 1 , X 2 , E 3 ) X 3 X 4 X 4 = f 4 ( X 2 , E 4 ) ˜ X 5 = f 5 ( X 3 , E 5 ) f 5 ( X 3 , E 5 ) ˜ X 5 X 6 X 6 = f 6 ( X 3 , X 4 , E 6 ) f 6 ( X 3 , X 4 , E 6 ) Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 49 / 59

Algorithm: Score-based approach Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G , given the data (and prior assumptions). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

Algorithm: Score-based approach Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G , given the data (and prior assumptions). Given a hypothetical causal graph G , numerically optimize the posterior with respect to the parameters. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

Algorithm: Score-based approach Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G , given the data (and prior assumptions). Given a hypothetical causal graph G , numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G . Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

Algorithm: Score-based approach Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G , given the data (and prior assumptions). Given a hypothetical causal graph G , numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G . Number of possible causal graphs G for 11 variables: 31603459396418917607425 (acyclic) 1298074214633706907132624082305024 (cyclic). Use local search to explore posterior distribution over causal graphs. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

Algorithm: Score-based approach Use approximate Bayesian model selection in a multi-task learning setting to estimate the posterior probability of a putative causal graph G , given the data (and prior assumptions). Given a hypothetical causal graph G , numerically optimize the posterior with respect to the parameters. Employ Laplace approximation to approximate the evidence (marginal likelihood) for that causal graph G . Number of possible causal graphs G for 11 variables: 31603459396418917607425 (acyclic) 1298074214633706907132624082305024 (cyclic). Use local search to explore posterior distribution over causal graphs. Stability selection [Meinshausen et al. , 2010] to identify stable causal relations. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 50 / 59

Comparison with ground truth (max. 17 edges, acyclic) For comparison with the consensus model and the reconstructed model by Sachs et al. , we constrain the number of edges: PIP2 PIP2 PIP2 PLCg PLCg PLCg PIP3 PIP3 PIP3 Mek Mek Mek Erk Erk Erk Raf Raf Raf Akt Akt Akt JNK JNK JNK PKA PKA PKA p38 p38 p38 PKC PKC PKC Consensus Sachs et al. This work Black: expected, Blue: novel findings, Red dashed: missing. Our acyclic, strongly regularised, result deviates more from the “consensus” network. Actually seems to be good news! Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 51 / 59

Comparison with ground truth (max. 17 edges, acyclic) 500 no intervention 450 AKT activity 400 PIP2 PLCg 350 PKC activity PIP3 300 Mek PIP2 abundance 250 Erk MEK activity 200 Raf PIP2/PIP3 soft 150 Akt 100 PKC activity JNK 50 PKA PKA activity p38 0 PKC Raf Mek PLCgPIP2 PIP3 Erk Akt PKA PKC p38 JNK This work KS test w.r.t. observational data Black: expected, Blue: novel findings, Red dashed: missing. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 52 / 59

Results (max. 17 edges, acyclic) Acyclic, strongly regularized results for different priors: PIP2 PIP2 PIP2 PLCg PLCg PLCg PIP3 PIP3 PIP3 Mek Mek Mek Erk Erk Erk Raf Raf Raf Akt Akt Akt JNK JNK JNK PKA PKA PKA p38 p38 p38 PKC PKC PKC linear nonlinear nonlinear Gaussian Gaussian non-Gaussian Black: expected, Blue: novel findings, Red dashed: missing. Note: no strong dependence on prior. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 53 / 59

Results (max. 17 edges, cyclic) Cyclic, strongly regularized results for different priors: PIP2 PIP2 PIP2 PLCg PLCg PLCg PIP3 PIP3 PIP3 Mek Mek Mek Erk Erk Erk Raf Raf Raf Akt Akt Akt JNK JNK JNK PKA PKA PKA p38 p38 p38 PKC PKC PKC linear nonlinear nonlinear Gaussian Gaussian non-Gaussian Black: expected, Blue: novel findings, Red dashed: missing. Good news: Our method reveals some likely feedback cycles. Bad news: stronger dependence on prior (more data needed?). Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 54 / 59

Discussion Performing a proper causal analysis of this data is a challenging task : time-series data are absent, so need to assume homeostatis; confounders could be present; feedback loops are expected to be present; most interventions change the activity instead of the abundance; assumptions about the specificity of interventions may be unrealistic; faithfulness violations are present. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 55 / 59

Discussion Performing a proper causal analysis of this data is a challenging task : time-series data are absent, so need to assume homeostatis; confounders could be present; feedback loops are expected to be present; most interventions change the activity instead of the abundance; assumptions about the specificity of interventions may be unrealistic; faithfulness violations are present. Main contributions : More principled approach to learn structure of (a)cyclic causal models from combination of observational and interventional equilibrium data. Natural way to model activity interventions. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 55 / 59

Conclusions and future work Conclusions : Results support the hypothesis that the underlying system contains feedback loops. The proposed method identifies a few likely feedback loops, but more data is probably necessary. Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 56 / 59

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl - PowerPoint PPT Presentation

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl Institute for Computing Informatics Institute and Information Sciences December 12th, 2013 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 1 / 59 Part

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal inference Gary Goertz Kroc Institute for International Peace Studies University of Notre

Causal Inference An introduction based on S. Wagers course on Causal Inference (OIT 661) Imke

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal Inference Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal Inference and Response Surface Modeling Inference and

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

A theoretical study of the aromatic character of polyphosphaphospholes. Is the pyramidality the

Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2018/ (Some Slides by

Philosophical Foundations of Imprecise Probability ISIPTA 2015 Tutorial PHILOSOPHY MUNICH

GENIE update a personal view Steve Dytman, Univ. of Pittsburgh NusTec 11 December, 2018

The Journal fetal light response regulates mouse eye development Sujata Rao, Christina Chun,

Language Acquisition of Multiword Expressions from language technology to language learners

Of Business Process Modeling with BPMN and Situation Calculus Sylvain Bouveret Onera DTIM

Drivers & Victims of the Fossil Fuel Industry in New Zealand An extended version for

Sambuz

Useful Links

Newsletter

Mail Us

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl - PowerPoint PPT Presentation

Causal Inference in case of feedback Joris Mooij j.m.mooij@uva.nl Institute for Computing Informatics Institute and Information Sciences December 12th, 2013 Joris Mooij (IAS, ISLA, IvI, UvA) Van Dantzig Seminar Talk 2013-12-12 1 / 59 Part

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal inference Gary Goertz Kroc Institute for International Peace Studies University of Notre

Causal Inference An introduction based on S. Wagers course on Causal Inference (OIT 661) Imke

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal Inference Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal Inference and Response Surface Modeling Inference and

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

A theoretical study of the aromatic character of polyphosphaphospholes. Is the pyramidality the

Advanced Search Algorithms Graham Neubig https://phontron.com/class/nn4nlp2018/ (Some Slides by

Philosophical Foundations of Imprecise Probability ISIPTA 2015 Tutorial PHILOSOPHY MUNICH

GENIE update a personal view Steve Dytman, Univ. of Pittsburgh NusTec 11 December, 2018

The Journal fetal light response regulates mouse eye development Sujata Rao, Christina Chun,

Language Acquisition of Multiword Expressions from language technology to language learners

Of Business Process Modeling with BPMN and Situation Calculus Sylvain Bouveret Onera DTIM

Drivers &amp; Victims of the Fossil Fuel Industry in New Zealand An extended version for

Sambuz

Useful Links

Newsletter

Mail Us

Drivers & Victims of the Fossil Fuel Industry in New Zealand An extended version for