non parametric causal models
play

Non-parametric causal models Robin J. Evans Thomas S. Richardson - PowerPoint PPT Presentation

Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44 Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with


  1. Non-parametric causal models Robin J. Evans Thomas S. Richardson Oxford and Univ. of Washington UAI Tutorial 12th July 2015 1 / 44

  2. Structure Part One: Causal DAGs with latent variables Part Two: Statistical Models arising from DAGs with latents 2 / 44

  3. Outline for Part One Intervention distributions The general identification problem Tian’s ID Algorithm Fixing: generalizing marginalizing and conditioning Non-parametric constraints aka Verma constraints 3 / 44

  4. Intervention distributions (I) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X 4 / 44

  5. Example L X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 5 / 44

  6. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) × p ( M | ˜ x ) p ( Y | L , M ) 5 / 44

  7. Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). 6 / 44

  8. Intervention distributions (II) Given a causal DAG G with distribution: � p ( V ) = p ( v | pa( v )) v ∈ V we wish to compute an intervention distribution via truncated factorization: � p ( V \ X | do( X = x )) = p ( v | pa( v )) . v ∈ V \ X Hence if we are interested in Y ⊂ V \ X then we simply marginalize: � � p ( Y | do( X = x )) = p ( v | pa( v )) . w ∈ V \ ( X ∪ Y ) v ∈ V \ X This is the ‘g-computation’ formula of Robins (1986). Note: p ( Y | do( X = x )) is a sum over a product of terms p ( v | pa( v )). 6 / 44

  9. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L , M ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m 7 / 44

  10. Example L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L , M ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m Note that p ( Y | do( X = ˜ x )) � = p ( Y | X = ˜ x ). 7 / 44

  11. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L , M ) 8 / 44

  12. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) 8 / 44

  13. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) 8 / 44

  14. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m 8 / 44

  15. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l 8 / 44

  16. Example: no effect of M on Y L L X M Y X M Y p ( X , L , M , Y ) = p ( L ) p ( X | L ) p ( M | X ) p ( Y | L ) p ( L , M , Y | do( X = ˜ x )) = p ( L ) p ( M | ˜ x ) p ( Y | L ) � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l ) l , m � = p ( L = l ) p ( Y | L = l ) l = p ( Y ) � = P ( Y | ˜ x ) since X �⊥ ⊥ Y . ‘Correlation is not Causation’. 8 / 44

  17. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m 9 / 44

  18. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m Here we have used that M ⊥ ⊥ L | X and Y ⊥ ⊥ X | L , M . 9 / 44

  19. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m � = p ( L = l ) p ( Y , M = m | L = l , X = ˜ x ) l , m 9 / 44

  20. Example with M unobserved L L X M Y X M Y � p ( Y | do( X = ˜ x )) = p ( L = l ) p ( M = m | ˜ x ) p ( Y | L = l , M = m ) l , m � = p ( L = l ) p ( M = m | ˜ x , L = l ) p ( Y | L = l , M = m , X = ˜ x ) l , m � = p ( L = l ) p ( Y , M = m | L = l , X = ˜ x ) l , m � = p ( L = l ) p ( Y | L = l , X = ˜ x ) . l ⇒ can find p ( Y | do( X = ˜ x )) even if M not observed. This is an example of the ‘back door formula’. 9 / 44

  21. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) 10 / 44

  22. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m 10 / 44

  23. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m 10 / 44

  24. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m �� � � p ( X = x ∗ ) p ( Y | M = m , X = x ∗ ) = p ( M = m | X = ˜ x ) m x ∗ 10 / 44

  25. Example with L unobserved L L X M Y X M Y p ( Y | do( X = ˜ x )) � = p ( M = m | do( X = ˜ x )) p ( Y | do( M = m )) m � = p ( M = m | X = ˜ x ) p ( Y | do( M = m )) m �� � � p ( X = x ∗ ) p ( Y | M = m , X = x ∗ ) = p ( M = m | X = ˜ x ) m x ∗ ⇒ can find p ( Y | do( X = ˜ x )) even if L not observed. This is an example of the ‘front door formula’. 10 / 44

  26. But with both L and M unobserved.... L X M Y ...we are out of luck! 11 / 44

  27. But with both L and M unobserved.... L X M Y ...we are out of luck! Given P ( X , Y ), absent further assumptions we cannot distinguish: L X Y X M Y 11 / 44

  28. General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? 12 / 44

  29. General Identification Question Given: a latent DAG G ( O ∪ H ), where O are observed, H are hidden, and disjoint subsets X , Y ⊆ O . Q: Is p ( Y | do( X )) identified given p ( O )? A: Provide either an identifying formula that is a function of p ( O ) or report that p ( Y | do( X )) is not identified. 12 / 44

  30. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) 13 / 44

  31. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

  32. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x 13 / 44

  33. Latent Projection Can preserve conditional independences and causal coherence with latents using paths. DAG G on vertices V = O ˙ ∪ H , define latent projection as follows: (Verma and Pearl, 1992) Whenever there is a path of the form y x · · · h 1 h k add y x Whenever there is a path of the form y x · · · h 1 h k add y x Then remove all latent variables H from the graph. 13 / 44

  34. ADMGs x z x z − → u project w t t y y 14 / 44

  35. ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) 14 / 44

  36. ADMGs x z x z − → u project w t t y y Latent projection leads to an acyclic directed mixed graph (ADMG) Can read off independences with d/m-separation. The projection preserves the causal structure; Verma and Pearl (1992). 14 / 44

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend