On estimation of functional causal models: Post-nonlinear causal model as an example
Kun Zhang, Zhikun W ang, Bernhard Schölkopf
- Dept. Empirical Inference
Max Planck Institute for Intelligent Systems Tübingen, Germany
1
On estimation of functional causal models: Post - nonlinear - - PowerPoint PPT Presentation
On estimation of functional causal models: Post - nonlinear causal model as an example Kun Zhang, Zhikun W ang, Bernhard Schlkopf Dept. Empirical Inference Max Planck Institute for Intelligent Systems Tbingen, Germany 1
Max Planck Institute for Intelligent Systems Tübingen, Germany
1
lCausal discovery: identify causal
lIn the past decades, under certain
¡ statistical data ⇒ causal structure ¡ causal Markov assumption ¡ faithfulness…
X1 X2
2.1 2.0 3.1 4.2 2.3 -0.6 1.3 2.2
. . . . . .
2
3
4
4
equivalence class faithfulness
4
equivalence class faithfulness
4
equivalence class faithfulness
4
5
model, i.e., asymmetry in X and Y
form !
independence between the assumed cause and noise
6
! x = cdf(x1), so ! x ~ U(0,1); e = cdf(y | ! x) = p!
x,y( !
x,t)
!" x2
dt. Then (x, y) $ ( ! x,e), with E_||_X
7
8
Y = f(X, E; θ1); E ⊥ ⊥ X, E ∼ p(E; θ2), f ∈ F (appropriately constrained)
maximizing lX→Y (θ) =
T
X
i=1
log PF(xi, yi) =
T
X
i=1
log p(X = xi) +
T
X
i=1
log p(E = ˆ ei; θ2)−
T
X
i=1
log
∂E
ei
minimizing I(X, ˆ E; θ) = − 1 T
T
X
i=1
log p(X = xi)− 1 T
T
X
i=1
log p(X = xi, Y = yi)+ 1 T
T
X
i=1
log p( ˆ E = ˆ ei; θ2) + 1 T
T
X
i=1
log
∂E
ei
T lX→Y (θ) = 1 T
T
X
i=1
log p(X = xi, Y = yi) − I(X, ˆ E; θ).
9
JX→Y = PT
i=1 log p(E = ˆ
ei)− PT
i=1 log
∂E
ei
⊥ X, E ∼ p(E; θ2), f ∈ F (appropriately constrained)
10
11
12
−1.5 −1 −0.5 0.5 1 1.5
X Y
13
ˆ Z Y
warping function f2
(b) Estimated PNL function f2
−1.5 −1 −0.5 0.5 1 1.5 2X ˆ Z
unwarped data point GP posterior mean
(c) Estimated f1
−1.5 −1 −0.5 0.5 1 1.5 2X ˆ N
estimated noise
(d) X and estimated noise
not independent!
−2 2 4 6 8 −1 1 2 3 4 5 6 7 8ˆ Z Y
warping function f2
(a) Estimated PNL function f2
−1.5 −1 −0.5 0.5 1 1.5 2X ˆ Z
unwarped data point GP posterior mean
(b) Estimated f1
−1.5 −1 −0.5 0.5 1 1.5 2X ˆ N
estimated noise
(c) X and estimated noise
−2 2 4 6 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8ˆ N
distribution of noise
(d) MoG Noise Distribution
a l m
t l i n e a r ! independent !
−10 10 20 30 −2 2 4 6 8 10 ˆ Z Y
(a) Estimated PNL function f2
−2 −1 1 2 −5 5 10 15 20 25 X ˆ Z
unwarped data point estimated f1
(b) Estimated f1
−2 −1 1 2 −5 5 10 15 20 25 X ˆ N
(c) X and estimated noise
independent !
Y = f2 ( f1 (X) + E)
14
Method PNL-MLP PNL-WGP-Gaussian PNL-WGP-MoG ANM GPI IGCI Accuracy (%) 70 67 76 63 72 73
Accuracy of different methods for causal direction determination on the cause-effect pairs.
Additive noise model Gaussian process latent variable model Information geometric causal inference
−3 −2 −1 1 2X Y
data points
Data pair 22 X: age of a person, Y : corresponding height
−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2X Y
data points
Data pair 57 X: latitude of the country’s capital, Y : life expectancy
15
16