Inferring causality from observations
Dominik Janzing1 and Sebastian Weichwald2
1) Amazon Development Center, T¨ ubingen, Germany 2) CoCaLa, University of Copenhagen, Denmark
Inferring causality from observations Dominik Janzing 1 and Sebastian - - PowerPoint PPT Presentation
Inferring causality from observations Dominik Janzing 1 and Sebastian Weichwald 2 1) Amazon Development Center, T ubingen, Germany 2) CoCaLa, University of Copenhagen, Denmark September 2019 Online material Peters, Janzing, Sch olkopf:
1) Amazon Development Center, T¨ ubingen, Germany 2) CoCaLa, University of Copenhagen, Denmark
https://mitpress.mit.edu/books/elements-causal-inference
https://ei.is.tuebingen.mpg.de/publications/janzing14
http://mlss.tuebingen.mpg.de/2013/speakers.html
https://stat.mit.edu/news/four-lectures-causality/ 2
1 Motivation: correlation versus causation 2 Formalizing causality: causal DAGs, functional causal
3 Strong assumptions that enable causal discovery:
4 Macroscopic and microscopic causal models: consistent
5 Causal inference in time series: Granger causality and its
6 Causal relations among individual objects: algorithmic
3
4
5
6
Hesselmar et al, Pediatrics March 2015, Vol135 / Issue 3 image source: Wikipedia ‘Geschirrsp¨ ulmaschine’, author Christian Giersing
7
8
9
10
X Y X Z Y X Y 1) 2) 3)
11
12
Xj PAj (Parents of Xj) = fj(PAj, Ej)
13
Xj non-descendants descendants parents of Xj
j P(Xj|PAj)
14
Person X Father Mother Brother Grand- mother
15
X Y Z W
16
17
18
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 x y
19
(Pearl 1988)
20
X Y Z U
21
X Y Z U W
22
X Y Z U V W
23
= X or Y
24
25
26
i )):
n
i
i )) =
i 27
i ))
i ))
k=i
i
28
X Y X Z Y X Y 1) 2) 3)
1 interventional and observational probabilities coincide (seeing
2 intervening on x does not change y
3 intervening on x does not change y
29
30
31 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
32 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
33 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
34 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
35 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
36 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
37 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
38 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
39 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
40 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
41 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
42 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
43 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
44 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
45 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
46 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
47 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
48 (PW Holland, Statistics and Causal Inference. Journal of the American Statistical Association, 1986)
49
50
51
52
53
54
55
⊥ S
56
⊥ S
57
⊥ S
58
59
X Y Z Z X Y Y Z X X Z Y Z Y X Y X Z
60
j=1 P(Xj|PAj) represent independent
61
62
Time 1 Position
X Y
63
X Y X Y causal learning: predict effect from cause anticausal learning: predict cause from effect
Sch¨
64
65
66
Spirtes, Glymour, Scheines, 1993
67
68
69
70 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
71 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
72 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
73 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
74 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X
75 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
76 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y
77 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
78 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
⊥ Y |X
79 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
⊥ Y |X = ⇒ all paths between S and Y blocked by X
80 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
⊥ Y |X = ⇒ all paths between S and Y blocked by X
81 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
⊥ Y |X = ⇒ all paths between S and Y blocked by X
82 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
S,X,Y
⊥ X = ⇒ existence of path between S and X w/o collider
⊥ Y = ⇒ existence of path between S and Y w/o collider
⊥ Y |X = ⇒ all paths between S and Y blocked by X
83 (S Weichwald et al., NeuroImage, 2015; M Grosse-Wentrup et al., NeuroImage, 2016; S Weichwald et al., IEEE ST SigProc, 2016)
84 (Bach, Symmonds, Barnes, and Dolan. Journal of Neuroscience, 2017)
85 (Bach, Symmonds, Barnes, and Dolan. Journal of Neuroscience, 2017)
86 (Bach, Symmonds, Barnes, and Dolan. Journal of Neuroscience, 2017)
87
88
89
90
91
92
93
94
95
96
Hoyer, Janzing, Mooij, Peters,Sch¨
97
98
99
100
Peters, Mooij, Janzing, Sch¨
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 Decision rate (%) Accuracy (%) AN
101
102
∂2 ∂y2 log p(y) can be computed from p(x|y) knowing f ′(x0)
Janzing, Steudel: Justifying additive noise-based causal inference via algorithmic information theory, OSID (2010) 103
Daniusis, Janzing,... UAI 2010, Janzing et al. AI 2012
y x f(x)
p(x) p(y) 104
105 (Shimizu et al. (2006))
106 (Shimizu et al. (2006))
107 (Shimizu et al. (2006))
108 (Shimizu et al. (2006))
109 (Shimizu et al. (2006))
110 (Shimizu et al. (2006))
1 infer (Id − B) up to scaling and permutation via ICA
111 (Shimizu et al. (2006))
1 infer (Id − B) up to scaling and permutation via ICA
112 (Shimizu et al. (2006))
1 infer (Id − B) up to scaling and permutation via ICA
2 resolve scaling and permutation to obtain B
113 (Shimizu et al. (2006))
1 infer (Id − B) up to scaling and permutation via ICA
2 resolve scaling and permutation to obtain B
114 (Shimizu et al. (2006))
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * −2 2 −2 2 x y
115
116
117 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
118 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
119 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
120 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
121 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
122 (Pfister∗, Weichwald∗, et al. (2018) arXiv:1806.01094)
123
124
125
126
127
128
X1 X2 X3 X4 X5 X6 τ1(X) τ2(X) τ3(X) τ ?
129
Pθ
130
Pθ
θ
P∅
θ
Pdo(i1)
θ
Pdo(i2)
θ
Pdo(i3)
θ
131
P∅
X
Pdo(A=0)
X
Pdo(A=0,C=0)
X
Pdo(C=0)
X
132
P∅
X
Pdo(A=0)
X
Pdo(A=0,C=0)
X
Pdo(C=0)
X
133
P∅
X
Pdo(A=0)
X
Pdo(A=0,C=0)
X
Pdo(C=0)
X
X
τ(X)
135
τ(X)
136
. . . . . . . . . At Bt Ct At+1 Bt+1 Ct+1 At+2 Bt+2 Ct+2 . . . . . . . . .
τ
A B C
MX MY
137
. . . . . . . . . At At Bt Ct At+1 At+1 Bt+1 Ct+1 At+2 At+2 Bt+2 Ct+2 . . . . . . . . .
τ
A B C
MX MY
138
. . . . . . . . . At Bt Ct Ct At+1 Bt+1 Ct+1 Ct+1 At+2 Bt+2 Ct+2 Ct+2 . . . . . . . . .
τ
A B C
MX MY
139
. . . . . . . . . At At Bt Ct Ct At+1 At+1 Bt+1 Ct+1 Ct+1 At+2 At+2 Bt+2 Ct+2 Ct+2 . . . . . . . . .
τ
A B C
MX MY
140
τ(X) = Pdo(ω(i)) Y
PX Pdo(i)
X
Pdo(j)
X
PY Pdo(ω(i))
Y
Pdo(ω(j))
Y
do(i) do(j) do(ω(i)) do(ω(j)) τ τ τ
141
142
143
144
145
. . . . . . . . . . . . . . . . . . . . . . . . do(i) dynamic MX X 1
t
X 2
t
X 1
t
X 2
t 146
. . . . . . . . . . . . . . . . . . . . . . . . do(i) dynamic MX stationary MY Y1 Y2 Y1 Y2 do(ω(i))
τ τ
X 1
t
X 2
t
X 1
t
X 2
t 147
148
(not by C Granger)
149
150 (J Peters et al. Causal discovery on time series using restricted structural equation models. NIPS, 2013)
151 (J Peters et al. Causal discovery on time series using restricted structural equation models. NIPS, 2013)
(not by C Granger)
152
(not by C Granger)
(by C Granger)
153 (CWJ Granger, Investigating Causal Relations by Econometric Models and Cross-spectral Methods. Econometrica, 1969)
154 (N Ay and D Polani, Information flows in causal networks. Advances in Complex Systems, 2008)
155 (N Ay and D Polani, Information flows in causal networks. Advances in Complex Systems, 2008)
Yt Zt Xt Yt Zt Xt Xt+1 Yt+1 Zt+1 Yt+2 Zt+2 Xt+2 Xt+3 Yt+3 Zt+3 Xt+4 Yt+4 Zt+4 Xt+4 Yt+4 Zt+4
156
157
158
159
160
161
162
163
164
165
(Kolmogorov 1965, Chaitin 1966, Solomonoff 1964)
166
167
Chaitin, Gacs
+
168
169
170
x y x z y x y 1) 2) 3)
DJ, Sch¨
171
172
j
173
j ) +
n
j )
174
175
176
177
178
179
180