Introduction Result Ideas of the proof Summary
Sampling, MCMC and Spectral Gaps in Infinite Dimensions
Martin Hairer1 Andrew Stuart1 Sebastian Vollmer1
1Department of Mathematics
University of Warwick
Sampling, MCMC and Spectral Gaps in Infinite Dimensions Martin - - PowerPoint PPT Presentation
Introduction Result Ideas of the proof Summary Sampling, MCMC and Spectral Gaps in Infinite Dimensions Martin Hairer 1 Andrew Stuart 1 Sebastian Vollmer 1 1 Department of Mathematics University of Warwick Sydney, 2012 Introduction Result
Introduction Result Ideas of the proof Summary
1Department of Mathematics
University of Warwick
Introduction Result Ideas of the proof Summary
1
2
3
4
Introduction Result Ideas of the proof Summary
n
j=1
Introduction Result Ideas of the proof Summary Target Measure
Assumption: The target measure µ has density w.r.t Gaussian γ µ(dx) = M exp(−Φ(x))γ(dx). (1) γ = N (0, C) on a separable Hilbert space H, {ϕn}n∈N orthonormal basis of eigenvectors of C with eigenvalues {λ2
n}n∈N, then
Karhunen-Loeve expansion yields γ = L(
∞
i=1
λi ϕiξi ), where ξi
i.i.d
∼ N (0, 1) Example: Brownian motion on [0, 1] Bt =
∞
k=1
1 (k − 1
2 )2π2
√ 2 sin
2 )πt
Using the projections Pm on span{ϕi}m
i=1, then m-dim. approximations are
γm(dx) = L(
m
i=1
λi ϕiξi )(dx) µm(dx) = Mm exp(−Φ(Pmx))γm(dx). (2)
Introduction Result Ideas of the proof Summary Target Measure
α(x, y) acceptance probability for transition from x to y [Tierney, 1998] Random Walk Metropolis (RWM) algorithm on Rm Q(x, dy) = L(x + √ 2δξ)(dy) with ξ ∼ γm α(x,y) = 1 ∧ exp
2 x, Cx − 1 2 y, Cy
Q(x, dy) = L
1 2 x +
√ 2δξ
α(x, y) = 1 ∧ exp(Φ(x) − Φ(y)) Transition kernel P(x,dz) = Q(x, dz)α(x,z) + δx(dz) (1 − α(x,u))Q(x,du) P, Pm denote the transition kernel respectively
Introduction Result Ideas of the proof Summary
A Markov-transiation kernel P with invariant measure µ has an L2
µ-spectral-gap 1 − β
iff β = sup
f ∈L2
µ
Pf − µ(f )2 / f − µ(f )2 < 1.
[Kipnis and Varadhan, 1986] If X0 ∼ µ , then for any f ∈ L2 f (Xn) satisfies a CLT with asymptotic variance σ2
f ,P ≤ 2µ(f 2)
1 − β .
[Rudolf, 2011] For X0 ∼ ν with ν absolutely continuous w.r.t. µ non asymptotic result of the form MSE : Eν,K |Sn(f ) − µ(f )|2 2 n(1 − β) .
Introduction Result Ideas of the proof Summary Key result
1
2
Introduction Result Ideas of the proof Summary Dimension Dependent Results for the RWM
Conductance C = inf
µ(A)≤ 1
2
µ(A) Relation to spectral gap (c.f. [Lawler and Sokal, 1988, Sinclair and Jerrum, 1989]) C2 2 ≤ 1 − β ≤ 2C.
For any Metropolis-Hastings transition kernel P and µ(B) ≤ 1
2,
1 − β ≤ 2 sup
x∈B
α(x).
The algorithm started in B can only move to Bc if it accepts the move. Hence P(x,Bc) ≤ α(x).
Introduction Result Ideas of the proof Summary Dimension Dependent Results for the RWM
i=1 1 i ξiei)
2
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
1 2 x, 2δC
1 2 y, 2δC
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
d : H × H → R+ is a distance-like function if it is symmetric, lower semi-continuous and d(x, y) = 0 ⇔ x = y .
The corresponding Wasserstein distance is given by d(ν1, ν2) = inf
π∈Γ(ν1,ν2)
with Γ(ν1, ν2) = {π ∈ M(H2)|Pi∗π = νi}.
P has a Wasserstein spectral gap if ∃λ > 0, C > 0 s.t. d(ν1Pn, ν2Pn) ≤ C exp(−λn)d(ν1, ν2) for all n ∈ N.
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
1 P has a Lyapunov function V ; 2 P is d-contracting 3
n, ν2P ˜ n) ≤ 1
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
µ is dense in L2 µ, then
Introduction Result Ideas of the proof Summary Preliminaries & Weak Harris Theorem
Weak Harris Theorem for d(x, y) = 1 ∧ x−y
ǫ
There is an r > 0 and αl > 0 s.t. P (qx (ξ) is accepted| ξ ≤ r) ≥ αl
Assume that Φ has a global Lipschitz constant L and the Assumption above is satisfied then for ǫ small enough the pCN algorithm for µ (µm) converges exponentially in ˜ d(x,y) =
d(x, y) = 1 ∧ x − y ǫ with an m-independent bound on the rate. Moreover, µ (µm) is the unique invariant measure.
Introduction Result Ideas of the proof Summary d-contracting
Recall d-contracting: d(x, y) < 1 implies d(P(x,·),P(y, ·)) ≤ cd(x, y) c < 1
Proposals from x and y for ξ ∼ γ qx (ξ) = (1 − 2δ)
1 2 x +
√ 2δξ qy (ξ) = (1 − 2δ)
1 2 y +
√ 2δξ U uniform independent random variable ˜ x = qx (ξ)χ[0,α(x,qx)](U) + x · χ(α(x,qx),1] ˜ y = qy (ξ)χ[0,α(y,qy)](U) + y · χ(α(y,qy ),1] P(x, ·) = L( ˜ x), P(y, ·) = L( ˜ y), Basic coupling πBasic = L(( ˜ x, ˜ y))
Introduction Result Ideas of the proof Summary d-contracting
π∈Γ(P(x,·),P(y,·))
π∈Γ(P(x,·),P(y,·))
1 2 x − y
P(only one accepts) ≤
≤
≤ 2L |x − y| ≤ 2Lǫd(x,y)
Introduction Result Ideas of the proof Summary d-contracting
1 2 n ≤ 1
Introduction Result Ideas of the proof Summary Dimensionality
m
i=1
i ≤ ∞
i=1
i .
m
i=1
λi ξ2
i )) ≤ E(f ( ∞
i=1
λiξ2
i )) =
Introduction Result Ideas of the proof Summary
µ spectral
Introduction Result Ideas of the proof Summary
Introduction Result Ideas of the proof Summary
Hairer, M., Mattingly, J. C., and Scheutzow, M. (2011). Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations.
Kipnis, C. and Varadhan, S. R. S. (1986). Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Communications in Mathematical Physics, 104(1):1–19. Komorowski, T. and Walczuk, A. (2011). Central limit theorem for Markov processes with spectral gap in Wasserstein metric. Arxiv preprint arXiv:1102.18422. Lawler, G. F. and Sokal, A. D. (1988). Bounds on the L2 spectrum for Markov chains and Markov processes: a generalization of Cheeger’s inequality. American Mathematical Society, 309(2). Rudolf, D. (2011). Explicit error bounds for Markov chain Monte Carlo. PhD thesis, to appear in Dissertationes Mathematicae, Friedrich-Schiller-Universität Jena. Sinclair, A. and Jerrum, M. (1989). Approximate counting, uniform generation and rapidly mixing Markov chains. Information and Computation, 82(1):93–133.
Introduction Result Ideas of the proof Summary
Tierney, L. (1998). A note on Metropolis-Hastings kernels for general state spaces.
Wang, F.-Y. (2003). Functional inequalities for the decay of sub-Markov semi-groups. Potential Anal., 18(1):1–23.
Introduction Result Ideas of the proof Summary
Assume that exp(−Φ) is integrable and for all κ φ(r) = sup
x=y∈Br (0)
|Φ(x) − Φ(y)| / x − y ≤ Mκeκr. Moreover, there exists a ∈ ( 1
2 , 1) and R > 0 s.t. ∀x ∈ BR(0)c
inf
z∈B(rxa)((1−2δ)
1 2 x)
− Φ(z) + Φ(x) > αl. Then for ǫ small enough the pCN algorithm for µ (µm) converges exponentially in ˜ d(x,y) =
d(x,y) = 1 ∧ inf
L,ψ∈A(L,x,y)
1 ǫ
L
0 exp(η ψ)dt
for A(L, x, y) := {ψ ∈ C1([0, L], H), ψ(0) = x, ψ(L) = y, ˙ ψ = 1} and with an m-independent bound on the rate. Moreover, µ(µm) is the unique invariant measure.