Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens - - PowerPoint PPT Presentation
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens - - PowerPoint PPT Presentation
Zig-Zag Monte Carlo Delft University of Technology Joris Bierkens February 7, 2017 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 1 / 33 Acknowledgements Collaborators Andrew Duncan Paul Fearnhead Antonietta Mira Gareth
Acknowledgements
Collaborators
Andrew Duncan Paul Fearnhead Antonietta Mira Gareth Roberts
Financial support
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 2 / 33
Outline
1 Motivation: Markov Chain Monte Carlo 2 One-dimensional Zig-Zag process 3 Multi-dimensional ZZP 4 Subsampling 5 Doubly intractable likelihood
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 3 / 33
Bayesian inference
In Bayesian inference we typically deal with a posterior density π(x) = π(x; y) ∝ L(y | x)π0(x), x ∈ Rd, where L(y | x) is the likelihood of the data y given parameter x ∈ Rd, and π0 is a prior density for x. Quantities of interest are e.g.
- posterior mean
- xπ(x) dx,
- posterior variance
- x2π(x) dx −
- xπ(x) dx
2,
- tail probability
- ✶{x≥c}π(x) dx.
All of these involve integrals of the form
- h(x)π(x) dx.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 4 / 33
Evaluating
- h(x)π(x) dx
Possible approaches:
1 Explicit (analytic) integration. Rarely possible
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating
- h(x)π(x) dx
Possible approaches:
1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating
- h(x)π(x) dx
Possible approaches:
1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X1, X2, . . . ) from π and use
the law of large numbers. Requires independent samples from π
- h(x)π(x) dx = lim
K→∞
1 K
K
- k=1
h(Xk).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
Evaluating
- h(x)π(x) dx
Possible approaches:
1 Explicit (analytic) integration. Rarely possible 2 Numerical integration. Curse of dimensionality 3 Monte Carlo. Draw independent samples (X1, X2, . . . ) from π and use
the law of large numbers. Requires independent samples from π
4 Markov Chain Monte Carlo. Construct an ergodic Markov chain
(X1, X2, . . . ) with invariant distribution π(x) dx, use Birkhoff’s ergodic theorem.
- h(x)π(x) dx = lim
K→∞
1 K
K
- k=1
h(Xk).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 5 / 33
One-dimensional Zig-Zag process
Dynamics
- Continuous time
- Current state (X(t), Θ(t)) ∈ R × {−1, +1}.
- Move X(t) in direction Θ(t) = ±1 until a switch occurs.
- The switching intensity is λ(X(t), Θ(t)).
10 20 30 40 50 60 70 80 90 100 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 6 / 33
Relation between switching rate and potential
Lf (x, θ) = θ df dx + λ(x, θ)(f (x, −θ) − f (x, θ)), x ∈ R, θ ∈ {−1, +1}.
- Potential U(x) = − log π(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential
Lf (x, θ) = θ df dx + λ(x, θ)(f (x, −θ) − f (x, θ)), x ∈ R, θ ∈ {−1, +1}.
- Potential U(x) = − log π(x)
- π is invariant if and only if λ(x, +1) − λ(x, −1) = U′(x) for all x.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential
Lf (x, θ) = θ df dx + λ(x, θ)(f (x, −θ) − f (x, θ)), x ∈ R, θ ∈ {−1, +1}.
- Potential U(x) = − log π(x)
- π is invariant if and only if λ(x, +1) − λ(x, −1) = U′(x) for all x.
- Equivalently,
λ(x, θ) = γ(x) + max (0, θU′(x)) , γ(x) ≥ 0.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Relation between switching rate and potential
Lf (x, θ) = θ df dx + λ(x, θ)(f (x, −θ) − f (x, θ)), x ∈ R, θ ∈ {−1, +1}.
- Potential U(x) = − log π(x)
- π is invariant if and only if λ(x, +1) − λ(x, −1) = U′(x) for all x.
- Equivalently,
λ(x, θ) = γ(x) + max (0, θU′(x)) , γ(x) ≥ 0.
Example: Gaussian distribution N(0, σ2)
- Density π(x) ∝ exp(−x2/(2σ2))
- Potential U(x) = x2/(2σ2)
- Derivative U′(x) = x/σ2
- Switching rates λ(x, θ) = (θx/σ2)+ + γ(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 7 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t))
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx =
- R
(f (x, −1) − f (x, +1))(λ(x, +1) − λ(x, −1)) π(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx =
- R
(f (x, −1) − f (x, +1))(λ(x, +1) − λ(x, −1)) π(x) dx =
- R
(f (x, −1) − f (x, +1))U′(x)π(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx =
- R
(f (x, −1) − f (x, +1))(λ(x, +1) − λ(x, −1)) π(x) dx =
- R
(f (x, −1) − f (x, +1))U′(x)π(x) dx = −
- R
(f (x, −1) − f (x, +1))π′(x) dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 8 / 33
Proof of invariance of π ∝ exp(−U)
Lf (x, θ) = θ ∂f ∂x (x, θ) + λ(x, θ) (f (x, −θ) − f (x, θ)) , λ(x, +1) − λ(x, −1) = U′(x). Markov semigroup P(t)f (x, θ) = Ex,θf (X(t), Θ(t)) π stationary means that
- θ=±1
- R
P(t)f (x, θ)π(x) dx =
- θ=±1
- R
f (x, θ)π(x) dx f ∈ D(L), t ≥ 0. Differentiating gives the equivalent condition:
θ=±1
- R Lf (x, θ)π(x) dx = 0, f ∈ D(L).
- θ=±1
- R
λ(x, θ) (f (x, −θ) − f (x, θ)) π(x) dx =
- R
{λ(x, +1) (f (x, −1) − f (x, +1)) + λ(x, −1) (f (x, +1) − f (x, −1))} π(x) dx =
- R
(f (x, −1) − f (x, +1))(λ(x, +1) − λ(x, −1)) π(x) dx =
- R
(f (x, −1) − f (x, +1))U′(x)π(x) dx = −
- R
(f (x, −1) − f (x, +1))π′(x) dx =
- R
(f ′(x, −1) − f ′(x, +1))π(x) dx = −
- θ=±1
- R
θ df dx (x, θ)π(x) dx.
- Joris Bierkens (TU Delft)
Zig-Zag Monte Carlo February 7, 2017 8 / 33
Use in Monte Carlo
(X(t), Θ(t))t≥0 has invariant distribution proportional to π(x). If ergodic, lim
T→∞
1 T T h(X(s)) ds =
- R
h(x)π(x) dx.
How to use in computations
Either:
- Numerically integrate
1 T
T
0 h(Xs) ds for some finite T > 0, or
- Define (X1, X2, . . . ) by setting Xk = X(k∆) for some ∆ > 0; use as in
traditional MCMC.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 9 / 33
CLT for the 1D Zig-Zag process
[B., Duncan, Limit theorems for the Zig-Zag process, 2016]
X(t) satisfies a Central Limit Theorem (CLT) for observable h if 1 √ T T [h(Xs) − Eπh(X)] ds N(0, σ2
h).
Example: unimodal potential/density function
T + T −
1
T +
1
T −
2
T +
2
T −
3
T +
3
S+
1
S−
1
S+
2
S−
2
S+
3
S−
3
t X(t)
- Say Yi =
T +
i
T +
i−1 h(Xs) ds.
- CLT for ZZP follows essentially from CLT for N(t)
i=1 Yi.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 10 / 33
CLT for the 1D Zig-Zag process
[B., Duncan, Limit theorems for the Zig-Zag process, 2016]
General formula for asymptotic variance σ2
h = 2
- R
(λ(x, +1) + λ(x, −1))|φ′(x)|2π(x) dx where −LLangevinφ = h := h − π(h). Langevin diffusion: σ2
h = 2
- R
|φ′(x)|2π(x) dx
Cool results
- Computational efficiency for ZZP better than IID sampling for Gaussian
(oscillatory ACF)
- Student-t distribution, ν degrees of freedom
- Langevin diffusion satisfies CLT for ν > 2
- Zig-Zag process satisfies CLT for ν > 1.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 11 / 33
Multi-dimensional Zig-Zag process
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 12 / 33
Multi-dimensional Zig-Zag process
- Target π(x) = exp(−U(x)) on Rd.
- Set of directions θ ∈ {−1, +1}d.
- Switching rates λi(x, θ) = (θi∂iU(x))+, for i = 1, . . . , d.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process
- Target π(x) = exp(−U(x)) on Rd.
- Set of directions θ ∈ {−1, +1}d.
- Switching rates λi(x, θ) = (θi∂iU(x))+, for i = 1, . . . , d.
Cool observation
- factorized target distribution π(x) = d
i=1 πi(xi) with
πi(y) = exp(−Ui(y)).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process
- Target π(x) = exp(−U(x)) on Rd.
- Set of directions θ ∈ {−1, +1}d.
- Switching rates λi(x, θ) = (θi∂iU(x))+, for i = 1, . . . , d.
Cool observation
- factorized target distribution π(x) = d
i=1 πi(xi) with
πi(y) = exp(−Ui(y)).
- Switching rates: λi(x, θ) = (θiU′
i (xi))+.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Multi-dimensional Zig-Zag process
- Target π(x) = exp(−U(x)) on Rd.
- Set of directions θ ∈ {−1, +1}d.
- Switching rates λi(x, θ) = (θi∂iU(x))+, for i = 1, . . . , d.
Cool observation
- factorized target distribution π(x) = d
i=1 πi(xi) with
πi(y) = exp(−Ui(y)).
- Switching rates: λi(x, θ) = (θiU′
i (xi))+.
- Every component of the Zig-Zag process mixes at O(1).
- Compare to RWM O (d), MALA O
- d1/3
, HMC O
- d1/4
.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 13 / 33
Sampling
x
dU dx
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling
x
dU dx
λ(x) = max
- 0, dU
dx
- Joris Bierkens (TU Delft)
Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling
x
dU dx
λ(x) = max
- 0, dU
dx
- Λ(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Sampling
x
dU dx
λ(x) = max
- 0, dU
dx
- Λ(x)
T draw P(T ≥ t) = exp
- −
t
0 Λ(X(s)) ds
- accept T with probability
λ(X(T) Λ(X(T))
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 14 / 33
Subsampling
x
dU1 dx dU2 dx dU dx
U = 1
2(U1 + U2)
m(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling
x
dU1 dx dU2 dx
λ2(x) λ1(x) Λ(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling
x T draw P(T ≥ t) = exp
- −
t
0 Λ(X(s)) ds
- draw I from {1, 2} uniformly
accept T with probability λI (X(T))
Λ(X(T)) dU2 dx
λ2(x)
dU1 dx
λ1(x) Λ(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 15 / 33
Subsampling
Intractable likelihood, big data:
U(x) = 1
n
n
i=1 Ui(x). If π(x) ∝ n i=1 f (yi | x)π0(x), take
Ui(x) = − log π0(x) − n log f (yi | x).
Theorem
With subsampling, the Zig-Zag Process has exp(−U) as invariant density.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling
Intractable likelihood, big data:
U(x) = 1
n
n
i=1 Ui(x). If π(x) ∝ n i=1 f (yi | x)π0(x), take
Ui(x) = − log π0(x) − n log f (yi | x).
Theorem
With subsampling, the Zig-Zag Process has exp(−U) as invariant density. Proof: Effective switching rate is
- λ(x, θ) = 1
n
n
- i=1
λi(x, θ) = 1 n
n
- i=1
(θU′
i (x))+.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling
Intractable likelihood, big data:
U(x) = 1
n
n
i=1 Ui(x). If π(x) ∝ n i=1 f (yi | x)π0(x), take
Ui(x) = − log π0(x) − n log f (yi | x).
Theorem
With subsampling, the Zig-Zag Process has exp(−U) as invariant density. Proof: Effective switching rate is
- λ(x, θ) = 1
n
n
- i=1
λi(x, θ) = 1 n
n
- i=1
(θU′
i (x))+.
- λ(x, +1) −
λ(x, −1) = 1 n n
- i=1
(U′
i (x))+ − n
- i=1
(−U′
i (x))+
- Joris Bierkens (TU Delft)
Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling
Intractable likelihood, big data:
U(x) = 1
n
n
i=1 Ui(x). If π(x) ∝ n i=1 f (yi | x)π0(x), take
Ui(x) = − log π0(x) − n log f (yi | x).
Theorem
With subsampling, the Zig-Zag Process has exp(−U) as invariant density. Proof: Effective switching rate is
- λ(x, θ) = 1
n
n
- i=1
λi(x, θ) = 1 n
n
- i=1
(θU′
i (x))+.
- λ(x, +1) −
λ(x, −1) = 1 n n
- i=1
(U′
i (x))+ − n
- i=1
(−U′
i (x))+
- = 1
n
n
- i=1
- (U′
i (x))+ − (U′ i (x))−
= 1 n
n
- i=1
U′
i (x) = U′(x).
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 16 / 33
Subsampling - scaling
- Without subsampling, O(n) computations per O(1) update
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling
- Without subsampling, O(n) computations per O(1) update
- With naive subsampling, O(1) computations per O(1/n) update
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling
- Without subsampling, O(n) computations per O(1) update
- With naive subsampling, O(1) computations per O(1/n) update
- Subsampling with control variates, O(1) computations per O(1)
update: super-efficient.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Subsampling - scaling
- Without subsampling, O(n) computations per O(1) update
- With naive subsampling, O(1) computations per O(1/n) update
- Subsampling with control variates, O(1) computations per O(1)
update: super-efficient.
- The Control Variates approach depends on posterior contraction and
requires finding a point close to the mode: O(n) start-up cost.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 17 / 33
Control variates
- U(x) = 1
n
n
i=1 Ui(x)
- Let x⋆ denote (a point ‘close’ to) the mode of the posterior distribution.
- Naive subsampling: λi(x, θ) = (θU′
i (x))+.
- Control variates:
λi(x, θ) = (θ {U′
i (x) + U′(x⋆) − U′ i (x⋆)})+ .
- If x is close to the mode then U′
i (x) − U′ i (x⋆) is small (under
assumptions on U)
- So each λi(x, θ) is close to the ‘ideal’ switching rate (θU′(x))+.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 18 / 33
100 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 19 / 33
100 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 20 / 33
100 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 21 / 33
10,000 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 22 / 33
10,000 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 23 / 33
10,000 observations
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 24 / 33
Scaling in number of observations
Zig-Zag, Zig-Zag w/Subsampling, Zig-Zag w/Control Variates, Zig-Zag with poor computational bound
- 6
7 8 9 10 −8 −6 −4 −2 2
- log(number of observations) base 2
log(ESS / epochs) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 25 / 33
Scaling in number of observations
Zig-Zag, Zig-Zag w/Subsampling, Zig-Zag w/Control Variates, Zig-Zag with poor computational bound
- 6
7 8 9 10 6 8 10 12 14 16
- log(number of observations) base 2
log(ESS / second) base 2 Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 26 / 33
Doubly intractable likelihood
In many applications, the distribution of interest π has the following form. π(x; y) = exp d
i=1 xisi(y)
- Z(y)M(x)
π0(x), x ∈ Rd, where
- y ∈ {0, 1}n is a fixed observed realization of the ‘forward model’,
p(y | x) = exp d
i=1 xisi(y)
- M(x)
,
- si, i = 1, . . . , d, are statistics which characterize the distribution of the
forward model, with weights x1, . . . , xd ∈ R.
- Z(y) usual normalization constant
Computational problem: Computation of M(x) is O(2n): M(x) =
- y∈{0,1}n
exp d
- i=1
xisi(y)
- .
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 27 / 33
Examples of doubly intractable likelihood
p(y | x) = exp d
i=1 xisi(y)
- M(x)
, x ∈ Rd, y ∈ {0, 1}n. Ising model (physics, image analysis)
- s1(y) = y TWy, where W is an interaction matrix
- s2(y) = hTy, where h represents an “external magnetic field”
- x1, x2 serve as “inverse temperatures”
Exponential Random Graph Model
- random graphs over k vertices, with n := 1
2k(k − 1) possible edges
- y1, . . . , yn indicate the presence of an edge
- s1(y): number of edges in the random graph
- s2(y): e.g. number of triangles in the random graph
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 28 / 33
The Zig-Zag process applied to doubly intractable likelihood
For simplicity, say x ∈ R and ignore prior distribution. π(x; y) = exp (xs(y)) Z(y)M(x) , M(x) =
- z∈{0,1}n
exp (xs(z)) x ∈ R, y ∈ {0, 1}n, so that U(x) = − log π(x; y) = −xs(y) + log M(x). For the derivative of U we find U′(x) = −s(y) + d dx log M(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood
For simplicity, say x ∈ R and ignore prior distribution. π(x; y) = exp (xs(y)) Z(y)M(x) , M(x) =
- z∈{0,1}n
exp (xs(z)) x ∈ R, y ∈ {0, 1}n, so that U(x) = − log π(x; y) = −xs(y) + log M(x). For the derivative of U we find U′(x) = −s(y) + d dx log M(x) = −s(y) +
- z∈{0,1}n exp (xs(z)) s(z)
M(x)
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood
For simplicity, say x ∈ R and ignore prior distribution. π(x; y) = exp (xs(y)) Z(y)M(x) , M(x) =
- z∈{0,1}n
exp (xs(z)) x ∈ R, y ∈ {0, 1}n, so that U(x) = − log π(x; y) = −xs(y) + log M(x). For the derivative of U we find U′(x) = −s(y) + d dx log M(x) = −s(y) +
- z∈{0,1}n exp (xs(z)) s(z)
M(x) = −s(y) + Ex[s(Y )], where Y is a realization of the forward model with parameter x.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 29 / 33
The Zig-Zag process applied to doubly intractable likelihood
U′(x) = −s(y) + Ex[s(Y )]. Switching rate complexity O(2n). For x ∈ R, θ ∈ {−1, +1}, λ(x, θ) = max(θU′(x), 0) = max (−θs(y) + θEx[s(Y )], 0) . Idea: Use unbiased estimate of Ex[s(Y )] Crude algorithm for determining next switch:
1 Determine upper bound Λ(x) for λ(x, θ) 2 Generate switching time according to
P(T ≥ t) = exp
- −
t
0 Λ(X(r)) dr
- .
3 Obtain unbiased estimate ˆ
G of
d dx U(x) 4 Accept switch with probability max(0, θ ˆ
G)/Λ(X(T)), otherwise repeat.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 30 / 33
Unbiased estimation of Ex[s(Y )]
Two possible approaches:
- perfect sampling, “coupling from the past” (Propp, Wilson, 1996): use
Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π(· | x). Disadvantages:
- Not applicable to all discrete models
- Exponentially slow convergence in “cold temperature” regimes
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 31 / 33
Unbiased estimation of Ex[s(Y )]
Two possible approaches:
- perfect sampling, “coupling from the past” (Propp, Wilson, 1996): use
Glauber dynamics in ingenious way to obtain a sample Y which is distributed exactly according to the forward distribution π(· | x). Disadvantages:
- Not applicable to all discrete models
- Exponentially slow convergence in “cold temperature” regimes
- unbiased MCMC sampling (Glynn, Rhee, 2014): introduce N-valued
random variable N. Define ∆i := s(Yi) − s( Yi) where (Yi) and ( Yi) are two realizations of Glauber dynamics, correlated in a specific way. Unbiased estimate ˆ G =
N
- i=0
∆i P(N ≥ i). Disadvantages:
- no global upper bound for estimate,
- variance may be extremely large.
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 31 / 33
Zig-Zag Process
- We can use piecewise deterministic Markov processes for sampling
- Unbiased estimate for the log density gradient results in correct
invariant distribution.
- Significantly better scaling than IID sampling for big data
- Doubly intractable likelihood: work in progress
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 32 / 33
References
B., Roberts, A piecewise deterministic scaling limit of Lifted Metropolis-Hastings in the Curie-Weiss model, to appear in Annals of Applied Probability, 2015, https://arxiv.org/abs/1509.00302 B., Fearnhead, Roberts, The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data, 2016, https://arxiv.org/abs/1607.03188 B., Duncan, Limit theorems for the Zig-Zag process, 2016, https://arxiv.org/abs/1607.08845 B., Fearnhead, Pollock, Roberts, Piecewise Deterministic Markov Processes for Continuous-Time Monte Carlo, https://arxiv.org/abs/1611.07873 Thank you!
Joris Bierkens (TU Delft) Zig-Zag Monte Carlo February 7, 2017 33 / 33