Optimal scaling of the transient phase of Metropolis Hastings - - PowerPoint PPT Presentation

optimal scaling of the transient phase of metropolis
SMART_READER_LITE
LIVE PREVIEW

Optimal scaling of the transient phase of Metropolis Hastings - - PowerPoint PPT Presentation

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm Optimal scaling of the transient phase of Metropolis Hastings algorithms Tony Leli` evre


slide-1
SLIDE 1

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Optimal scaling of the transient phase of Metropolis Hastings algorithms

Tony Leli` evre Ecole des Ponts and INRIA Joint work with B. Jourdain and B. Miasojedow MCMSki, Chamonix, 8 January 2014

slide-2
SLIDE 2

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Outline of the talk

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

slide-3
SLIDE 3

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

slide-4
SLIDE 4

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Metropolis Hastings algorithm

The aim of the MH algorithm is to sample a target probability measure, say with density p on Rn. Algorithm: iterate on k ≥ 0,

  • Proposition: At time k, given X n

k , propose a move to

ˆ X n

k+1 ∼ q(X n k , y) dy, where q(x, y) Markov density kernel on Rn,

  • Acception/Rejection: Accept the move (X n

k+1 = ˆ

X n

k+1) with

probability α(X n

k , ˆ

X n

k ), where

α(x, y) := p(y)q(y, x) p(x)q(x, y) ∧ 1. Otherwise, reject the move (X n

k+1 = X n k ).

(X n

k )k≥0 is a reversible Markov chain wrt p(x) dx.

The efficiency of the algorithm crucially depends on the choice of the proposal distribution q.

slide-5
SLIDE 5

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Metropolis Hastings algorithm

In the following, we focus on the Gaussian random walk proposal (RWM):

  • ˆ

X n

k+1 = X n k + σGk+1 where (Gk)k≥1 i.i.d. ∼ Nn(0, In)

  • q(x, y) =

1 (2πσ2)n/2 exp

  • − |x−y|2

2σ2

  • = q(y, x)
  • Acceptance probability α(x, y) = p(y)

p(x) ∧ 1.

Another standard choice: one step of overdamped Langevin (MALA):

  • ˆ

X n

k+1 = X n k + σ2 2 (∇ ln p)(X n k ) + σGk+1 where (Gk)k≥1 i.i.d.

∼ Nn(0, In)

  • q(x, y) = q(y, x).

Question: How to choose σ as a function of the dimension n?

slide-6
SLIDE 6

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Previous work: Roberts, Gelman, Gilks 97

Two fundamental assumptions:

  • (H1) Product target: p(x) = p(x1, . . . , xn) = n

i=1 e−V(xi),

  • (H2) Stationarity: X n

0 = (X 1,n

, . . . , X n,n ) ∼ p(x)dx and thus ∀k, X n

k = (X 1,n k

, . . . , X n,n

k

) ∼ p(x)dx. Then, pick the first component X 1,n

k

, choose σn = ℓ √n, and rescale the time accordingly (diffusive scaling) by considering (X 1,n

⌊nt⌋)t≥0.

Under regularity assumptions on V, as n → ∞, (X 1,n

⌊nt⌋)t≥0 (d)

⇒ (Xt)t≥0 unique solution of the SDE dXt = −h(ℓ)1 2V ′(Xt) dt +

  • h(ℓ) dBt,

where h(ℓ) = 2ℓ2 Φ

ℓ√

R(V ′)2 exp(−V)

2

  • with Φ(x) =

x

−∞ e− y2

2

dy √ 2π.

slide-7
SLIDE 7

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Previous work: Roberts, Gelman, Gilks 97

Practical counterparts: (i) scaling of the variance proposal, (ii) scaling

  • f the number of iterations.

Question: How to choose ℓ ?

  • The function ℓ → h(ℓ) = 2ℓ2 Φ

ℓ√

R(V ′)2 exp(−V)

2

  • is maximum

at ℓ⋆ ≃

2.38

R(V ′)2 exp(−V).

  • Besides, the limiting average acceptance rate is

E[α(X n

k , ˆ

X n

k+1)] =

  • Rn×Rn e

n

i=1(V(xi)−V(yi)) ∧ 1

  • α(x,y)

q(x, y)e− n

i=1 V(xi)dxdy

− →n→∞ acc(ℓ) = 2Φ  − ℓ

  • R(V ′)2 exp(−V)

2   ∈ (0, 1). Observe that acc(ℓ⋆) ≃ 0.234, whatever V. This justifies a constant acceptance rate strategy, with a target acceptance rate of approximately 25%.

slide-8
SLIDE 8

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

A few references

  • (H1) + (H2) and various proposals: Gaussian RWM Roberts Gelman Gilks 1997,

MALARoberts Rosenthal 1997, nonGaussian RWM Neal Roberts 2011, RWM discontinuous target Neal Roberts Yuen 2012, Mutiple try MCMC B´

edard Douc Moulines 2012,

Delayed rejection MCMC B´

edard Douc Moulines 2013, Hybrid Monte Carlo Beskos Pillai Roberts Sanz-Serna Stuart 2013.

  • Beyond (H1): i. but non i.d. components RWM B´

edard 2007,2009; finite range

interactions Breyer Roberts 2000; mean-field interaction Breyer Piccioni Scarlatti 2004; density w.r.t. i.i.d. Beskos Roberts Stuart 2009; infinite-dimensional target with density w.r.t. Gaussian field RWM Mattingly, Pillai, Stuart 2012, MALA Pillai, Stuart, Thiery 2012.

  • Beyond (H2):Christensen, Roberts, Rosenthal 2005 Partial results for RWM and MALA

with Gaussian target, Pillai, Stuart, Thiery 2013 modified RWM for infinite-dimensional target with density w.r.t. Gaussian field.

Aim of this work: Study of the limit n → ∞ without the stationarity assumption (H2).

slide-9
SLIDE 9

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

slide-10
SLIDE 10

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

The limit n → ∞ without (H2)

We consider the RWMH with target p(x) = n

i=1 exp(−V(xi)):

(Gi

k)i,k≥1 are i.i.d. ∼ N1(0, 1) independent of (Uk)k≥1 i.i.d. ∼ U[0, 1],

and    X i,n

k+1 = X i,n k

+

ℓ √nGi k+11Ak+1, 1 ≤ i ≤ n,

with Ak+1 =

  • Uk+1 ≤ e

n

i=1(V(X i,n k )−V(X i,n k + ℓ √n Gi k+1))

. From now on, we assume that V is C3 with V ′′ and V (3) bounded.

Theorem

Assume that

  • 1. m is a probability measure on R s.t.
  • R(V ′)4(x) m(dx) < +∞,
  • 2. ∀n ≥ 1, X 1,n

, . . . , X n,n are i.i.d. according to m. Then the process (X 1,n

⌊nt⌋)t≥0 converges in distribution to the unique

solution of the SDE nonlinear in the sense of McKean: X0 ∼ m, dXt = −G (a(t), b(t)) V ′(Xt) dt + Γ1/2(a(t), b(t)) dBt with a(t) = E[(V ′(Xt))2], b(t) = E[V ′′(Xt)], and...

slide-11
SLIDE 11

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

The functions Γ and G

Γ(a, b) =        ℓ2Φ

  • − ℓb

2√a

  • + ℓ2e

ℓ2(a−b) 2

Φ

  • b

2√a − √a

  • if a ∈ (0, +∞),

ℓ2 2 if a = +∞,

ℓ2e− ℓ2b+

2

where b+ = max(b, 0) if a = 0, G(a, b) =    ℓ2e

ℓ2(a−b) 2

Φ

  • b

2√a − √a

  • if a ∈ (0, +∞),

0 if a = +∞ and 1{b>0}ℓ2e− ℓ2b

2 if a = 0.

slide-12
SLIDE 12

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Remarks

  • Limiting acceptance rate: t → P(A⌊nt⌋) converges to

t → acc(a(t), b(t)) where a(t) = E[(V ′(Xt))2], b(t) = E[V ′′(Xt)] and acc(a, b) = 1 ℓ2 Γ(a, b).

  • Stationary case: If m(dx) = e−V(x)dx, then ∀t ≥ 0 Xt ∼ e−V(x)dx

and a(t) = E[(V ′(Xt))2] =

  • R V ′(V ′e−V) =
  • R V ′(−e−V)′ =
  • R V ′′e−V = E[V ′′(Xt)] = b(t) are constant. Using the fact that

for a > 0, Γ(a, a) = 2G(a, a) = 2ℓ2Φ

  • −ℓ√a/2
  • , we are back to

the dynamics dXt = −h(ℓ)1 2V ′(Xt) dt +

  • h(ℓ)dBt

with h(ℓ) = 2ℓ2 Φ

  • − ℓ

2

  • R(V ′)2 exp(−V)
  • .
slide-13
SLIDE 13

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Propagation of chaos

  • One can actually prove a propagation of chaos result.

Definition

A sequence (χn

1, . . . , χn n)n≥1 of exchangeable random variables is said

to be ν-chaotic if for fixed k ∈ N∗, the law of (χn

1, . . . , χn k) converges in

distribution to ν⊗k as n goes to ∞. The processes ((X 1,n

⌊nt⌋, . . . , X n,n ⌊nt⌋)t≥0)n≥1 are P-chaotic where P

is the law of the unique solution to the SDE nonlinear in the sense of McKean: X0 ∼ m dXt = −G(a(t), b(t))V ′(Xt) dt + Γ1/2(a(t), b(t)) dBt. with a(t) = E[(V ′(Xt))2] and b(t) = E[V ′′(Xt)].

  • The assumption on the IC may then be replaced by: the initial

positions (X 1,n , . . . , X n,n )n≥1 are exchangeable, m-chaotic and s.t. supn E[(V ′(X 1,n ))4] < ∞.

slide-14
SLIDE 14

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Proof

The proof is based on:

  • A weak formulation of the nonlinear SDE (martingale problem)
  • Tightness arguments

This is a mean field limit.

slide-15
SLIDE 15

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

slide-16
SLIDE 16

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Invariant measure

We would like to understand the longtime behavior of the nonlinear SDE dXt = −G(a(t), b(t))V ′(Xt)dt + Γ1/2(a(t), b(t)) dBt, where a(t) = E[(V ′(Xt))2] and b(t) = E[V ′′(Xt)].

Proposition

The probability measure e−V(x)dx is the unique invariant measure for this SDE.

slide-17
SLIDE 17

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Fokker-Planck equation

Denoting by ψt the density of Xt, one has                ∂tψt = ∂x

  • G(a[ψt], b[ψt])V ′ψt + 1

2Γ(a[ψt], b[ψt])∂xψt

  • ,

a[ψt] =

  • (V ′(x))2ψt(x) dx,

b[ψt] =

  • V ′′(x)ψt(x) dx.

Question 1: Does ψt converges to ψ∞ = exp(−V) ? Question 2: Is it possible to optimize the convergence, by appropriately choosing ℓ (recall that the variance of the proposal is ℓ2/n, and thus that Γ(a, b) = Γ(a, b, ℓ) and G(a, b) = G(a, b, ℓ)) ?

slide-18
SLIDE 18

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Entropy techniques

To analyze the longtime behavior, we use entropy techniques.

Definition

The probability measure ν satisfies a log-Sobolev inequality with constant ρ > 0 (in short LSI(ρ)) if, for any probability measure µ absolutely continuous wrt ν, H(µ|ν) ≤ 1 2ρI(µ|ν) where

  • H(µ|ν) =
  • ln

dµ dν

  • dµ is the Kullback-Leibler divergence (or

relative entropy) of µ wrt ν,

  • I(µ|ν) =
  • ∇ ln

dµ dν

  • 2

dµ is the Fisher information of µ wrt ν. Roughly speaking, e−V satisfies LSI(ρ) for some ρ > 0 if V has at least quadratic growth at ∞. In the Gaussian case V(x) = x2+ln(2π)

2

, exp(−V) satisfies LSI(1).

slide-19
SLIDE 19

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Entropy techniques

Recall the nonlinear FP equation: ∂tψt = ∂x

  • G(a[ψt], b[ψt])V ′ψt + 1

2Γ(a[ψt], b[ψt])∂xψt

  • .

We can prove exponential convergence of ψt to the invariant density ψ∞ = e−V in entropy.

Theorem

If X0 admits a density ψ0 s.t. E[(V ′(X0))2] < +∞ and H(ψ0|ψ∞) < ∞, then d dt H(ψt|ψ∞) ≤ −b[ψt] Γ(a[ψt], b[ψt]) − 2a[ψt] G(a[ψt], b[ψt]) 2(b[ψt] − a[ψt]) I(ψt|ψ∞) < 0. If moreover ψ∞ = e−V satisfies LSI(ρ), then there exists a positive and non-increasing function λ : [0, +∞) → (0, +∞) such that ∀t ≥ 0 H(ψt|ψ∞) ≤ e−t λ(H(ψ0|ψ∞))H(ψ0|ψ∞).

slide-20
SLIDE 20

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Elements of proof

Writing a, b for a[ψt], b[ψt], one has d dt H(ψt|ψ∞) =

  • R

∂tψt ln ψt +

  • R

V∂tψt = −Γ(a, b) 2 I(ψt|ψ∞) + (a − b)2 2G(a, b) − Γ(a, b) 2(b − a) , where 2G(a,b)−Γ(a,b)

2(b−a)

≥ 0. Moreover, (a − b)2 =

  • R

(V ′)2ψt −

  • R

V ′′ψt 2 =

  • R

V ′(V ′ψt + ∂xψt) 2 =

  • R

V ′∂x ln(ψt/e−V)ψt 2 ≤ a I(ψt|ψ∞). Hence d dt H(ψt|ψ∞) ≤ −bΓ(a, b) − 2aG(a, b) 2(b − a) I(ψt|ψ∞). If ψ∞ satisfies LSI(ρ), then (i) −I(ψt|ψ∞) ≤ −2ρH(ψt|ψ∞) and (ii) using the fact that t → H(ψt|ψ∞) is decreasing, ∀t ≥ 0, 2ρ bΓ(a,b)−2aG(a,b)

2(b−a)

≥ λ(H(ψ0|ψ∞)) > 0.

slide-21
SLIDE 21

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

slide-22
SLIDE 22

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Strategies to optimize the convergence of RWMH

We want to choose ℓ in order to accelerate the convergence to

  • equilibrium. Two natural strategies: (i) optimize the exponential rate of

convergence to zero of H(ψt|ψ∞) (ii) choose ℓ in order to obtain a constant average acceptance rate. Preliminary remark: When b ≤ 0, one has

d dt H(ψt|ψ∞) ≤ − Γ(a,b) 2

  • R(∂x ln ψt)2ψt with limℓ→∞ Γ(a, b) = +∞. So
  • ne should choose ℓ as large as possible.

From now on, suppose that b > 0 (recall that in the longtime limit b = a > 0). We have: d dt H(ψt|ψ∞) ≤ − bΓ(a, b) − 2aG(a, b) 2(b − a)

  • 1

b F( a b ,ℓ

√ b)

I(ψt|ψ∞) < 0, where F(s, ℓ) =          ℓ2e− ℓ2

2 if s = 0,

2ℓ2 1 + ℓ2

4

  • Φ
  • − ℓ

2

ℓ 2 √ 2πe− ℓ2

8

  • if s = 1,

ℓ2 1−s

  • Φ

ℓ 2√s

  • + (1 − 2s)e

ℓ2(s−1) 2

Φ

2√s − ℓ√s

  • if 0 < s = 1.
slide-23
SLIDE 23

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Choice of ℓ maximizing the exponential rate of cv

Lemma

Let b > 0. Then ℓ ≥ 0 → 1

bF( a b, ℓ

√ b) admits a unique maximum at point ˜ ℓ⋆(a, b). Moreover ˜ ℓ⋆(a, b) = 1 √ b ℓ⋆ a b

  • where for any s ≥ 0, ℓ⋆(s) realizes the unique maximum of

ℓ → F(s, ℓ). The function s → ℓ⋆(s) is continuous on [0, +∞) and

  • ˜

ℓ⋆(a, b) ∼a/b→0

ℓ⋆(0) √ b = √ 2 √ b.

  • ˜

ℓ⋆(a, b) ∼a/b→1

ℓ⋆(1) √ b .

  • ˜

ℓ⋆(a, b) ∼a/b→+∞

x⋆√a b

where x⋆ ≃ 1.22. Remark: Since dV(Xt) = V ′(Xt)

  • Γ(a, b)dBt − G(a, b)V ′(Xt))dt
  • + 1

2Γ(a, b)V ′′(Xt)dt, we

have d

dt E[V(Xt)] = 1 2(bΓ(a, b) − 2aG(a, b)) and ˜

ℓ⋆(a, b) also maximizes | d

dt E[V(Xt)]|.

slide-24
SLIDE 24

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm 1 2 3 4 5 6 2 4 6 8 10 12 14 16 18 20

Figure: Solid line: the function s → ℓ⋆(s). Dashed line: the function: s → x⋆√s.

slide-25
SLIDE 25

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Comparison with constant acceptance rate strategies

Recall that the limiting mean acceptance rate is acc(a, b, ℓ) = 1 ℓ2 Γ(a, b) = G a b, ℓ √ b

  • where G(s, ℓ) = Φ

ℓ 2√s

  • + e

ℓ2(s−1) 2

Φ

  • 1

2√s − √ s

  • .

Lemma

For s > 0, the function ℓ → G(s, ℓ) is decreasing. Moreover, for α ∈ (0, 1), the unique ℓ s.t. acc(a, b, ℓ) = α is ˜ ℓα(a, b) = 1 √ b ℓα a b

  • where ℓα(s) is the unique solution to G(s, ℓα(s)) = α. Last,
  • ˜

ℓα(a, b) ∼a/b→0 √

−2 ln(α) √ b

.

  • ˜

ℓα(a, b) ∼a/b→1

ℓα(1) √ b .

  • ˜

ℓα(a, b) ∼a/b→∞ −2Φ−1(α)

√a b .

slide-26
SLIDE 26

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Comparison with constant acceptance rate strategies

Remark 1: Notice that ˜ ℓ⋆(a, b) =

1 √ bℓ⋆ a b

  • and ˜

ℓα(a, b) =

1 √ bℓα a b

  • have the same scaling in (a, b).

− → Constant acceptance rate strategy seems sensible. Remark 2: Choice of α: how to choose α to get ˜ ℓ⋆(a, b) ≃ ˜ ℓα(a, b) ?

  • a/b → 0: α = 1

e ≃ 0.37.

  • a/b → 1: α such that ℓα(1) = ℓ⋆(1), namely α ≃ 0.35.
  • a/b → ∞: α = Φ(−x⋆/2) ≃ 0.27.

(Recall that the standard choice for the RWM under the stationarity assumption is α = 0.234.) − → Constant acceptance rate with α ∈ (1/4, 1/3) seems sensible. Let us plot the relative difference in terms of exponential rate of convergence, for the three values α = 1

e ≃ 0.37, α ≃ 0.35 and

α = Φ(−x⋆/2) ≃ 0.27.

slide-27
SLIDE 27

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

2 4 6 8 10 0.00 0.10

b=1

a 2 4 6 8 10 0.0 0.3 0.6

b=0.1

a 2 4 6 8 10 0.00 0.03

b=10

a

Figure:

F( a

b ,˜

l⋆(a,b) √ b)−F( a

b ,˜

lα(a,b) √ b) F( a

b ,˜

l⋆(a,b) √ b)

as function of a for b = 1, 0.1, 10 and α ≃ 0.27 solid line, α ≃ 0.35 dashed line, α = e−1 ≃ 0.37 dotted line. − → α ≃ 0.27 seems to be the best compromise.

slide-28
SLIDE 28

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Gaussian target : V(x) = 1

2(x2 + ln(2π)) Setting m(t)

def

= E[Xt] =

  • R xψt(x)dx and

s(t)

def

= E[(Xt)2] =

  • R x2ψt(x)dx, one has

H(ψt|ψ∞) = 1 2

  • s(t) − ln(s(t) − m(t)2) − 1
  • ,

d dt H(ψt|ψ∞) = 1 2

  • F(s, ℓ)(1 − s) − F(s, ℓ)(1 − s) + 2mG(s, 1, ℓ)

s − m2

  • .

It is possible to compute numerically ℓent(m, s) maximizing

  • d

dt H(ψt|ψ∞)

  • .

To assess the convergence, we compute t0 → ˆ Im

t0,t0+T = 1

T

t0+T

  • k=t0+1

X 1,n

k

+ . . . + X 1,n

k

n t0 → ˆ Is

t0,t0+T = 1

T

t0+T

  • k=t0+1

(X 1,n

k

)2 + . . . + (X n,n

k

)2 n .

slide-29
SLIDE 29

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm 100 200 300 400 500 200 600

I ^s

burn−in−time square bias = 2.38

0.27 − A 0.27 − N ent

100 200 300 400 500 5 10 15

I ^m

burn−in−time square bias = 2.38

0.27 − A 0.27 − N ent

ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ ℓ⋆ ℓ⋆

Figure: t0 →square bias of (ˆ Is

t0,T+t0,ˆ

Im

t0,T+t0), (X 1,n

, . . . , X n,n ) = (10, . . . , 10), n = 100(ℓ0.27 − A → adaptive scaling Metropolis algorithm and ℓ0.27 − N → numerical approximation of ℓ0.27(s, 1).)

slide-30
SLIDE 30

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Conclusions:

  • 1. The constant ℓ strategy is bad ;
  • 2. The constant average acceptance rate strategy (using ℓα) leads

to very close convergence curves compared to the optimal exponential rate of convergence strategy (using ℓ⋆) ;

  • 3. The optimal exponential rate of convergence strategy is as good

as the most optimal strategy one could design in terms of entropy decay (using ℓent).

slide-31
SLIDE 31

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Example of non Gaussian target

V(x) =

  • (x − 1)2(x + 1)2

if |x| ≤ 1, 4x2 − 8|x| + 4

  • therwise.
  • I =
  • R(V ′)2e−V = 4.07 so that 2.38

√ I = 1.18

  • X i,n

i.i.d. ∼ N1(1, 0.143) so that E[(V ′(X 1,n ))2] = E[V ′′(X 1,n )] = 5.24

slide-32
SLIDE 32

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

100 200 300 400 500 0e+00 3e−04 6e−04

I ^s

burn−in−time square bias = 1.18

0.27 0.35

100 200 300 400 500 0.10 0.20

I ^m

burn−in−time square bias = 1.18

0.27 0.35

ℓ ℓ ℓ ℓ ℓ ℓ ℓ⋆ ℓ⋆

The constant acceptance rate strategies are implemented using an adaptive scaling Metropolis algorithm.

slide-33
SLIDE 33

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

References

  • B. Jourdain, TL and B. Miasojedow, Optimal scaling for the

transient phase of the random walk Metropolis algorithm: the mean-field limit, http://arxiv.org/abs/1210.7639.

  • B. Jourdain, TL and B. Miasojedow, Optimal scaling for the

transient phase of Metropolis Hastings algorithms: the longtime behavior, http://arxiv.org/abs/1212.5517.

slide-34
SLIDE 34

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Optimal scaling of the transient phase of MALA (1)

Consider the MALA algorithm: X i,n

k+1 = X i,n k

+

  • Z i,n

k+1

  • σnGi

k+1 − σn2

2 V ′(X i,n

k )

  • 1Ak+1, 1 ≤ i ≤ n where Ak+1 =
  • Uk+1 ≤ e

n

i=1(V(X i,n k )−V(X i,n k +Z i,n k+1)+ 1 2[(Gi k+1)2−(Gi k+1− σn 2 (V ′(X i,n k )+V ′(X i,n k +Z i,n k+1)))2])

For σn =

ℓ n1/4 and ((X 1,n

, . . . , X n,n ))n≥1 m-chaotic, one expects prop.

  • f chaos for the processes ((X 1,n

⌊√nt⌋, . . . , X n,n ⌊√nt⌋)t≥0)n≥1 to the law of

   dXt =

  • w(t, ℓ)dBt − w(t, ℓ) 1

2V ′(Xt) dt, X0 ∼ m(dx)

where w(t, ℓ) = ℓ2

  • e

ℓ4 8 E(((V ′)2V ′′+V (4)−2V (3)V ′−(V ′′)2)(Xt)) ∧ 1

  • .

Remark: If V(x) = x2+ln(2π)

2

, then

d dt E(X 2 t ) = ℓ2

e

ℓ4 8 (E(X 2 t )−1) ∧ 1

  • (1 − E(X 2

t )), [Christensen, Roberts, Rosenthal 2005].

slide-35
SLIDE 35

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Optimal scaling of the transient phase of MALA (2)

w(t, ℓ) = ℓ2

  • e

ℓ4 8 E(((V ′)2V ′′+V (4)−2V (3)V ′−(V ′′)2)(Xt)) ∧ 1

  • on time intervals such that

E

  • (V ′)2V ′′ + V (4) − 2V (3)V ′ − (V ′′)2

(Xt)

  • < 0, then

ℓ → w(t, ℓ) maximum at ℓ⋆ ≃

1.42 E1/4((2V (3)V ′+(V ′′)2−(V ′)2V ′′−V (4))(Xt))

  • on time intervals such that

E

  • (V ′)2V ′′ + V (4) − 2V (3)V ′ − (V ′′)2

(Xt)

  • = 0 (this is in

particular the case at equilibrium), the correct scaling [Roberts, Rosenthal

1998] is

σn = ℓ n1/6 and one obtain a diffusive limit for (X 1,n

⌊n1/3t⌋)t≥0. At equilibrium,

there exists an optimal ℓ = ℓ⋆ and acc(ℓ⋆) = 0.574.

  • on time intervals such that

E

  • (V ′)2V ′′ + V (4) − 2V (3)V ′ − (V ′′)2

(Xt)

  • > 0, with the

scaling σn =

ℓ n1/4 , we have w(t, ℓ) = ℓ2 → +∞ as ℓ → +∞. One

should take σn ≫

ℓ n1/4 .

slide-36
SLIDE 36

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

Optimal scaling of the transient phase of MALA (3)

The case E

  • (V ′)2V ′′ + V (4) − 2V (3)V ′ − (V ′′)2

(Xt)

  • > 0: one

should take σn going to zero as slowly as possible. Let us consider the Gaussian case V(x) = (x2 + ln(2π))/2, so that E

  • (V ′)2V ′′ + V (4) − 2V (3)V ′ − (V ′′)2

(Xt)

  • = E(X 2

t − 1).

Proposition

If the initial random variables (X 1,n , . . . , X n,n ) are i.i.d. according to m such that m, x2 − 1 > 0 and m, x8 < +∞, and σn satisfies: lim

n→∞ σn = 0 and

lim

n→∞ nσ2 n = +∞,

then the processes ((X 1,n

⌊t/σ2

n⌋)t≥0, . . . , (X n,n

⌊t/σ2

n⌋)t≥0) are Q-chaotic

where Q denotes the law of the Ornstein-Uhlenbeck process dXt = dBt − Xt

2 dt, X0 ∼ m. Moreover, the limiting mean acceptance

rate is 1. Remark: this result still holds if limn→∞ nσ2

n = 0.

slide-37
SLIDE 37

Introduction Optimal scaling of the transient phase of RWMH Longtime convergence of the nonlinear SDE Optimization strategies for the RWMH algorithm

In summary

RWMH MALA Equilibrium σn =

ℓ √n, acc(ℓ⋆) = 0.234

σn =

ℓ n1/6 , acc(ℓ⋆) = 0.574

Transient σn =

ℓ √n, acc(ℓ⋆) = 0.27

σn =

ℓ n1/4 , optimal ℓ ???

In all cases, the associated timescale is the diffusive one:

  • X 1,n

⌊t/σ2

n⌋

  • t≥0.