Convergence and Efficiency of the Wang-Landau algorithm Gersende - - PowerPoint PPT Presentation

convergence and efficiency of the wang landau algorithm
SMART_READER_LITE
LIVE PREVIEW

Convergence and Efficiency of the Wang-Landau algorithm Gersende - - PowerPoint PPT Presentation

Convergence and Efficiency of the Wang-Landau algorithm Convergence and Efficiency of the Wang-Landau algorithm Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint work with Benjamin Jourdain, Tony Leli` evre and Gabriel


slide-1
SLIDE 1

Convergence and Efficiency of the Wang-Landau algorithm

Convergence and Efficiency of the Wang-Landau algorithm

Gersende FORT

LTCI CNRS & Telecom ParisTech Paris, France

Joint work with Benjamin Jourdain, Tony Leli` evre and Gabriel Stoltz - from ENPC, France. Estelle Kuhn - from INRA Jouy-en-Josas, France. Paper arXiv math.PR 1207.6880

slide-2
SLIDE 2

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (1/3)

In Molecular dynamics, the models consist in the description of the state

  • f the system: the location of the N particles xℓ (e.g. the set of N points

in R3) and sometimes the speed of the particles. There are interactions between the particles x1, · · · ,xN, described through a potential/Hamiltonian H(x1, · · · ,xN). A state of the system is characterized by a probability π(x): e.g. in the canonical

ensemble NVT

π(x) ∝ exp(−βH(x)) β

def

= 1 kB T

(inverse temperature)

where x = (x1, · · · ,xN) ∈ X. The goal is to compute derivatives of the partition function i.e. expectations under the distribution π when the dimension of the support X is very large, π is multimodal (or metastable).

slide-3
SLIDE 3

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (2/3)

Exact computations of

  • φ dπ

are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim

T →∞

1 T

T

  • t=1

φ(Xt)

a.s.

− →

  • φ dπ.
slide-4
SLIDE 4

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (2/3)

Exact computations of

  • φ dπ

are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim

T →∞

1 T

T

  • t=1

φ(Xt)

a.s.

− →

  • φ dπ.

Unfortunately, in mestastable systems, the points remain trapped in local modes for a very long time

Fig.: [left] level curves of a potential in R2 which is metastable in the first direction. [right] path of the first component of (Xt)t

slide-5
SLIDE 5

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (2/3)

Exact computations of

  • φ dπ

are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim

T →∞

1 T

T

  • t=1

φ(Xt)

a.s.

− →

  • φ dπ.

Unfortunately, in mestastable systems, the points remain trapped in local modes for a very long time

Fig.: [left] level curves of a potential in R2 which is metastable in the first direction. [right] path of the first component of (Xt)t

In such situations, the convergence is very long to obtain!

slide-6
SLIDE 6

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (3/3)

It is not possible to answer the metastability problem in full generality (number of modes, size of the barriers between metastable states which increase with the dimension N, · · · ). Nevertheless, in Molecular Dynamics, it is often possible to identify a reaction coordinate that is, in some sense a ”direction of metastability”.

slide-7
SLIDE 7

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Wang-Landau: a biasing technique (3/3)

It is not possible to answer the metastability problem in full generality (number of modes, size of the barriers between metastable states which increase with the dimension N, · · · ). Nevertheless, in Molecular Dynamics, it is often possible to identify a reaction coordinate that is, in some sense a ”direction of metastability”. A new approach to define samplers robust to metastability: ◮ sample from a biased distribution π⋆ such that the image of π⋆ by the reaction coordinate O is uniform: O(X) when X ∼ π⋆ has a uniform distribution the conditional distribution of π⋆ given O(x) is equal to the conditional distribution of π given O(x). ◮ approximate integrals w.r.t. π by an importance sampling algorithm with proposal π⋆

slide-8
SLIDE 8

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique

Outline

The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography

slide-9
SLIDE 9

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm

The original Wang-Landau algorithm (1/3)

Assume π(x) ∝ exp(−β H(x))

  • n a discrete (but large) space X, and the goal is to compute
  • x∈X

Φ(H(x)) π(x) Then,

  • x

Φ(H(x)) π(x) =

  • e∈H(X)

Φ(e) g(e)

  • e′∈H(X) g(e′)

where g is the density of state: g(e)

def

=

  • x∈X

1 IH(x)=e

slide-10
SLIDE 10

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm

The original Wang-Landau algorithm (2/3)

Density of state: g(e)

def

=

  • x∈X

1 IH(x)=e g(e) can not be calculated exactly for large systems. Although the total number of configurations increases exponentially with the size of the system, the total number of possible energy levels increases linearly with the size of system. example: qL2

compared to 2L2 for a q-state Potts on a L × L lattice withe nearest-neighfor interactions

Wang and Landau (2001) proposed to perform a random walk in the energy space in order to estimate g(e) for any e. With the density of states, we can calculate most of thermodynamic quantities in all inverse temperature β we can access many thermodynamic properties (free energy, internal energy, specific heat i.e. normalizing constant, expectation and variance under π)

slide-11
SLIDE 11

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm

The original Wang-Landau algorithm (3/3)

Algorithm: Initialisation: density of state: g(e) = 1 for any e modification factor: f0 LOOP 1:

Repeat Run a Markov chain with transition matrix Q(e,e′) = 1 ∧ g(e) g(e′) Update the histogram in the energy space: if E is the new point, ln g(E) ← ln g(E) + ln ft Until the flat histogram is reached.

LOOP 2: Repeat LOOP1 with a new modification factor ft+1 ← √ft until the modification factor is smaller than a predefined value.

slide-12
SLIDE 12

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm

The original Wang-Landau algorithm (3/3)

Algorithm: Initialisation: density of state: g(e) = 1 for any e modification factor: f0 LOOP 1:

Repeat Run a Markov chain with transition matrix Q(e,e′) = 1 ∧ g(e) g(e′) Update the histogram in the energy space: if E is the new point, ln g(E) ← ln g(E) + ln ft Until the flat histogram is reached.

LOOP 2: Repeat LOOP1 with a new modification factor ft+1 ← √ft until the modification factor is smaller than a predefined value. Why does it work? the intuition: The chain Q is reversible w.r.t. ∝ 1/g(e) The distribution of g(E) when E ∼ 1/g(e) is the uniform distribution.

slide-13
SLIDE 13

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space

General Wang-Landau (1/3)

How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =

d

  • i=1

1 IXi(x)π(x)

slide-14
SLIDE 14

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space

General Wang-Landau (1/3)

How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =

d

  • i=1

1 IXi(x)π(x) Consider a family of biased distributions (πθ,θ ∈ Rd) on X πθ(x) ∝

d

  • i=1

1 θ(i)1 IXi(x)π(x) where θ = (θ(1), · · · ,θ(d)) satisfies

i θ(i) = 1 and θ(i) ≥ 0.

slide-15
SLIDE 15

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space

General Wang-Landau (1/3)

How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =

d

  • i=1

1 IXi(x)π(x) Consider a family of biased distributions (πθ,θ ∈ Rd) on X πθ(x) ∝

d

  • i=1

1 θ(i)1 IXi(x)π(x) where θ = (θ(1), · · · ,θ(d)) satisfies

i θ(i) = 1 and θ(i) ≥ 0.

Run an algorithm which combines sampling under πθt

(exact or MCMC)

update of the biasing factor θt+1 ← θt + · · · in such a way that (θt)t and (πθt)t converge to θ⋆ = (π(X1), · · · ,π(Xd)) πθ⋆(Xi) = 1 d

slide-16
SLIDE 16

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space

General Wang-Landau (2/3)

When it converges θt(i) ≈ π(Xi) Integrals w.r.t. π by Importance Sampling

  • φ dπ ≈ 1

T

T

  • t=1
  • d

d

  • i=1

θt(i)1 IXt∈Xi

  • φ(Xt)
slide-17
SLIDE 17

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space

General Wang-Landau (3/3)

Set θ⋆ = (π(X1), · · · ,π(Xd)). Algorithm Initialisation: X0 and θ0 = (1/d, · · · ,1/d) Repeat: given (Xt,θt)

  • sample Xt+1 ∼ Pθt(Xt,·) where Pθ is a Markov kernel with

invariant distribution πθt

  • Update the weights

θt+1 = θt + γt+1 H(θt,Xt+1) where the field H is chosen so that θ⋆ is a zero of θ →

  • πθ(dx)H(θ,x)

and (γt)t is a positive stepsize sequence.

slide-18
SLIDE 18

Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau in Statistics

Wang-Landau in Statistics

Multicanonical sampling (Atchad´

e & Liu, 2010)

Simulated Tempering (Atchad´

e & Liu, 2010)

Target: ρ on ˜ X. Temperatures: T1 > T2 > · · · > Td = 1. X = ˜ X × {1, · · · ,d} θ⋆(i) =

  • ρ1/Ti(dx)

πθ(x,i) ∝ 1 θ(i)ρ1/Ti(x) Trans-dimensional MCMC (Atchad´

e & Liu, 2010)

˜ X = K

k=1 Xk

Target ∝ K

k=1 ρk(x) 1

IXk(x) on ˜ X. X = ˜ X × {1, · · · ,d} θ⋆(i) =

  • Xi

ρi(dx) πθ(x,i) ∝ 1 θ(i)ρi(x)1 IXi(x) Variable selection (Bornn et al, 2013) Target: a posteriori distribution π of a binary vector. reaction coordinate: partition of the energy state − log π(X) Bayesian inference in mixture models (Bornn et al, 2013)

slide-19
SLIDE 19

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm

Outline

The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography

slide-20
SLIDE 20

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC

WL: an example of adaptive MCMC (1/2)

A family of target distributions (πθ)θ∈Θ. A family of transition kernels (Pθ)θ∈Θ such that πθPθ = πθ. WL defines a random sequence ((Xt,θt))t such that E [φ(Xt+1)|θ0,X0, · · · ,θt,Xt] =

  • Pθt(Xt,dy)φ(y).

and the parameter θt is updated by a Stochastic Approximation algorithm

slide-21
SLIDE 21

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC

WL: an example of adaptive MCMC (2/2)

In the literature, different strategies for the update of (θt,γt) in such a way that

d i=1 θt(i) = 1 and θt(i) ≥ 0.

(exponential update) for any i ∈ {1, · · · ,d}

θt+1(i) = θt(i) exp

  • γt+1 (1

IXi(Xt+1) − 1/d)

  • d

ℓ=1 θt(ℓ) exp

  • γt+1 (1

IXℓ(Xt+1) − 1/d)

  • (linearized version) if Xt+1 ∈ Xi,
  • θt+1(i) = θt(i) + γt+1 θt(i)(1 − θt(i))

θt+1(k) = θt(k) − γt+1 θt(k)θt(i) k = i

֒ → For the next move, the probability of sampling a point in the current stratum Xi is reduced. The chain is pushed towards strata which weaker frequency of visit thus improving the exploration of the space.

slide-22
SLIDE 22

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC

WL: an example of adaptive MCMC (2/2)

In the literature, different strategies for the update of (θt,γt) in such a way that

d i=1 θt(i) = 1 and θt(i) ≥ 0.

(exponential update) for any i ∈ {1, · · · ,d}

θt+1(i) = θt(i) exp

  • γt+1 (1

IXi(Xt+1) − 1/d)

  • d

ℓ=1 θt(ℓ) exp

  • γt+1 (1

IXℓ(Xt+1) − 1/d)

  • (linearized version) if Xt+1 ∈ Xi,
  • θt+1(i) = θt(i) + γt+1 θt(i)(1 − θt(i))

θt+1(k) = θt(k) − γt+1 θt(k)θt(i) k = i

֒ → For the next move, the probability of sampling a point in the current stratum Xi is reduced. The chain is pushed towards strata which weaker frequency of visit thus improving the exploration of the space. The stepsize sequence (γt)t decreases deterministically OR randomly (based on a flat histogram criterion for example). In our work, we consider the linearized update and a deterministic stepsize sequence γt.

slide-23
SLIDE 23

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm A numerical illustration

A numerical illustration (1/2)

Target density: π(x1,x2) ∝ exp(−β H(x1,x2))1 I[−R,R](x1)

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1 1 2 3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −4 −2 2 4 1 2 3 4 5 6 7 8

Fig.: [left] Level curves of the potential H. [center, right] Density π up to a normalizing constant.

−2 −1 1 2 3 −4 −2 2 4 10 20 30 40 50 60 beta=1 −2 −1 1 2 3 −4 −2 2 4 1 2 3 4 5 x 10

8

beta=5

The larger β is, the lar- ger is the ratio between the weight of the strata located near to the main metastable states and the weight of the tran- sition region (near x1 = 0).

slide-24
SLIDE 24

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm A numerical illustration

A numerical illustration (2/2)

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

R = 2.4. d = 48 strata, parti- tion along the x-axis. Pθ are Hastings-Metropolis kernels with proposal distribution N(0,(2R/d)2I) and target πθ. X0 = (−1,0). The stepsize sequence is γt ∼ c/t0.8.

0.5 e6 1 e6 1.5 e6 2 e6 2.5 e6 3 e6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 0.02 0.04 0.06 0.08 0.1 0.12

Fig.: [left] The sequences (θt(i))t. [right] The limiting value θ⋆(i)

slide-25
SLIDE 25

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC

Sufficient conditions for the convergence of adaptive MCMC (1/2)

Roberts and Rosenthal (2007); F., Moulines and Priouret (2012)

For the proof of the ergodicity, observe E [f(Xt)] − πθ⋆(f) = E [f(Xt) − E [f(Xt)|Ft−ℓ]] + E

  • E [f(Xt)|Ft−ℓ] − P ℓ

θt−ℓf(Xt−ℓ)

  • + E
  • P ℓ

θt−ℓf(Xt−ℓ) − πθt−ℓ(f)

  • + E
  • πθt−ℓ(f) − πθ⋆(f)
  • Convergence when

the first term is null the second term is small when adaptation is diminishing the third term is small when the transition kernels (Pθ,θ ∈ Θ) are ergodic (enough), at a rate which is uniform (enough) in θ (containment condition) the last term is small provided (θt,t ≥ 0) converges to θ⋆ since in our case πθ − πθ⋆TV ≤ 2(d − 1)

d

  • i=1
  • 1 − θ(i)

θ⋆(i)

slide-26
SLIDE 26

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC

Sufficient conditions for the convergence of adaptive MCMC (2/2)

For the convergence of the weight sequence (θt)t, observe θt+1 = θt + γt+1 H(θt,Xt+1) = θt + γt+1h(θt) + γt+1 (H(θt,Xt+1) − h(θt)) where the mean field h is defined by h(θ)

def

=

  • H(θ,x)πθ(dx) =

d

  • i=1

θ⋆(i) θ(i) −1 (θ⋆ − θ)

slide-27
SLIDE 27

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC

Sufficient conditions for the convergence of adaptive MCMC (2/2)

For the convergence of the weight sequence (θt)t, observe θt+1 = θt + γt+1 H(θt,Xt+1) = θt + γt+1h(θt) + γt+1 (H(θt,Xt+1) − h(θt)) where the mean field h is defined by h(θ)

def

=

  • H(θ,x)πθ(dx) =

d

  • i=1

θ⋆(i) θ(i) −1 (θ⋆ − θ) Convergence to θ⋆ when the O.D.E ˙ θ = h(θ) converges to θ⋆ (Lyapunov function, · · · ) (stability condition) the sequence (θt)t visits infinitely often a compact subset of {θ : θ(i) > 0 and d

i=1 θ(i) = 1}

the noise sequence is small enough ·

t γt = ∞, t γ2 t < ∞

· the transition kernels (Pθ,θ ∈ Θ) are ergodic (enough) and are smooth enough in θ.

slide-28
SLIDE 28

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main results: assumptions (1/5)

1

The target distribution has a density π w.r.t. the measure λ on X ⊂ Rp, supX π < ∞.

2

The partition (Xi)i such that θ⋆(i)

def

=

  • Xi π dλ > 0.

3

For any θ ∈ Θ, Pθ is a Hastings-Metropolis kernel with proposal q and invariant distribution πθ. It is assumed: infX2 q > 0.

4

The stepsize sequence (γt)t satisfies

t γt = +∞ and t γ2 t < ∞.

Under these assumptions, there exists ρ ∈ (0,1) such that for any θ sup

x∈X

P t

θ(x,·) − πθTV ≤ 2(1 − ρ)t

slide-29
SLIDE 29

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: stability of (θt)t (2/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and infX π > 0 P

  • lim sup

t

min

1≤i≤d θt(i) > 0

  • = 1.

Sketch of the proof: Tk < ∞ w.p.1. where Tk are the successive times when a sample Xn is drawn in the stratum i⋆ such that θn(i⋆) = mink θn(k). We prove that P(lim supk

  • mini θTk−1(i)
  • > 0) = 1, and a key property for this proof is

Pθ(x,Xj)1 IXi(x) ≤ C 1 ∧ θ(i) θ(j) . ֒ → Low probability of moving from a stratum with small weight to a stratum with large weight.

slide-30
SLIDE 30

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: convergence of (θt)t (3/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and the stability of the sequence (θt)t, P

  • lim

t θt = θ⋆

  • = 1.
slide-31
SLIDE 31

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: convergence of (θt)t (3/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and the stability of the sequence (θt)t, P

  • lim

t θt = θ⋆

  • = 1.

Sketch of the proof: Check the conditions of Andrieu, Moulines and Priouret (2005). Main ingredients: The Lyapunov function V associated to the mean field h V (θ) = −

d

  • i=1

θ⋆(i) log θ(i) θ⋆(i)

  • The uniform (in x,θ) geometric ergodicity of the transition kernels Pθ

The regularity properties πθ − πθ′TV ≤ 2(d − 1)

d

  • i=1
  • 1 − θ(i)

θ′(i)

  • sup

x∈X

Pθ(x,·) − Pθ′(x,·)TV ≤ 4 sup

i

  • 1 − θ(i)

θ′(i)

  • + 4 sup

i

  • 1 − θ′(i)

θ(i)

slide-32
SLIDE 32

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: ergodicity and LLN for the samples (Xt)t (4/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and the stability of the sequence (θt)t, lim

t E [f(Xt)] =

  • f(x) πθ⋆(x)λ(dx)

1 T

T

  • t=1

f(Xt)

a.s.

− →

  • f(x) πθ⋆(x)λ(dx)

for any bounded measurable function f.

slide-33
SLIDE 33

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: ergodicity and LLN for the samples (Xt)t (4/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and the stability of the sequence (θt)t, lim

t E [f(Xt)] =

  • f(x) πθ⋆(x)λ(dx)

1 T

T

  • t=1

f(Xt)

a.s.

− →

  • f(x) πθ⋆(x)λ(dx)

for any bounded measurable function f.

Proof: Check the conditions of F., Moulines and Priouret (2012). Main ingredients: The uniform (in x,θ) geometric ergodicity of the transition kernels Pθ The regularity properties πθ − πθ′TV ≤ 2(d − 1)

d

  • i=1
  • 1 − θ(i)

θ′(i)

  • sup

x∈X

Pθ(x,·) − Pθ′(x,·)TV ≤ 4 sup

i

  • 1 − θ(i)

θ′(i)

  • + 4 sup

i

  • 1 − θ′(i)

θ(i)

slide-34
SLIDE 34

Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results

Main result: ergodicity and LLN for the weighted samples (Xt)t (5/5)

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)

Under the stated assumptions and the stability of the sequence (θt)t, lim

t E

  • d

d

  • i=1

θt(i) f(Xt)1 IXi(Xt)

  • =
  • f(x) π(x)λ(dx)

1 T

T

  • t=1
  • d

d

  • i=1

θt(i)1 IXi(Xt)

  • f(Xt)

a.s.

− →

  • f(x) π(x)λ(dx)

for any bounded measurable function f.

slide-35
SLIDE 35

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm

Outline

The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography

slide-36
SLIDE 36

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm

Introduction

Wang-Landau algorithms are designed to be able to switch as fast as possible from a metastable state to another metastable state in order to efficiently explore the whole configuration space. We obtained convergence results on WL but how to study the efficiency of the WL and how to compare WL to a non-adaptive MCMC sampler? We now discuss: Comparison in terms of how rapidly does the sampler escape from a metastable state Explicit computation of exit times for a simple model, numerical study for a more complex one.

slide-37
SLIDE 37

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Central Limit Theorem on the weight sequence

Central Limit Theorem on the weight sequence

Theorem

F., Jourdain, Kuhn, Leli` evre, Stoltz (2012) Under the stated assumptions, when γt ∼ γ⋆/nα

(1/2 < α < 1) γ−1/2

t

(θt − θ⋆)

d

− → Nd(0,U⋆) where U⋆ = d 2

  • X
  • ˆ

H⋆(x) ˆ HT

⋆ (x) − Pθ⋆ ˆ

H⋆(x) Pθ⋆ ˆ HT

⋆ (x)

  • πθ⋆(x)λ(dx)

and ˆ H⋆(x) =

  • ℓ≥0

P ℓ

θ⋆ (H(θ⋆,·) − h(θ⋆)) (x)

Similar result when γt ∼ γ⋆/t.

slide-38
SLIDE 38

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example

Toy example (1/3)

Consider the target distribution on X = {1,2,3} π(1) = π(3) = 1 2 + ǫ π(2) = ǫ 2 + ǫ The proposal distribution in WL (for the kernels Pθ) and in HM is Q =         2 3 1 3 1 3 1 3 1 3 1 3 2 3         Proposal kernel only allowing jumps to the clo- sest strata. We compute the time T1→3 to reach the state 3 starting from the state 1, for WL and a Hastings-Metropolis (HM) algorithm.

slide-39
SLIDE 39

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example

Toy example (2/3)

Here are the transition kernels for HM (top) and WL (bottom)

P =          1 − ǫ 3 ǫ 3 1 3 1 3 1 3 ǫ 3 1 − ǫ 3          Pθ =           1 − 1 3

  • ǫ θ(1)

θ(2) ∧ 1

  • 1

3

  • ǫ θ(1)

θ(2) ∧ 1

  • 1

3 1 ǫ θ(2) θ(1) ∧ 1

  • 1 − 1

3 1 ǫ θ(2) θ(1) ∧ 1 + 1 ǫ θ(2) θ(3) ∧ 1

  • 1

3 1 ǫ θ(2) θ(3) ∧ 1

  • 1

3 ǫ 1 θ(3) θ(2) ∧ 1

  • 1 − 1

3

  • ǫ θ(3)

θ(2) ∧ 1

        

In WL, when the chain gets stuck (say) in state 1, θn(1) increases which penalizes the state 1 and favors moves to state 2.

slide-40
SLIDE 40

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example

Toy example (3/3)

Yes, the Wang-Landau is less metastable ! For Hastings-Metropolis, T1→3 scales like 6/ǫ: ǫ 6E [T1→3] ∼ǫ→0 1 lim

ǫ→0 P( ǫ

6T1→3 > c) = exp(−c) For Wang-Landau, with a stepsize sequence γt = γ⋆/tα ◮ for some α ∈ (1/2,1) there exists constants C1,C2 such that lim

ǫ→0 P

  • |ln ǫ|−1/(1−α) T1→3 ∈ (C1,C2)
  • = 1

and T1→3 scales like |ln ǫ|1/(1−α). ◮ for α = 1, T1→3 scales like ǫ−1/(1+γ⋆)

slide-41
SLIDE 41

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (1/7)

π(x1,x2) ∝ exp(−β H(x1,x2))1 I[−R,R](x1)

  • n [−R,R] × R+

−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1 1 2 3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5

Fig.: [left] level curves of the potential H [center] Density (up to a normalizing constant) [right] Partition of the state space

In this numerical illustration: R = 2.4. WL is run with d = 48; the proposal distribution is N(0,v2I) where v = 2R/d. HM is a symmetric random walk with proposal distribution N(0,v2I) and target π.

slide-42
SLIDE 42

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (2/7)

Path of the x1-component of (Xt)t, when Xt is the WL chain (left) and the HM chain (right).

2 4 6 8 10 12 x 10

4

−2 −1.5 −1 −0.5 0.5 1 1.5 2 beta=4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10

6

−2 −1.5 −1 −0.5 0.5 1 1.5 2 beta=4

Fig.: [left] Wang Landau, T = 110 000. [right] Hastings Metropolis, T = 2 106; the red line is at x = 110 000

slide-43
SLIDE 43

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (3/7)

The larger β is, the larger the ratio is between the weight of the strata located near the main metastable states and the weight of the transition region (around x1 = 0). The stepsize sequence is γt = γ⋆/tα. since Initialisation of the samplers: X0 = (−1,0), θ0 = (1/d, · · · ,1/d). The algorithm are run until the first time t such that X1

t > 1.

We repeat this experiment over M independent runs, and compute the mean value of the exit time (M ∼ 102 to 105 depending upon the value of β). We report the mean value of the exit times tβ: Wang Landau ¯ tβ: Hastings-Metropolis as a function of β, for different values of α.

slide-44
SLIDE 44

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (4/7)

Plot of β → ¯ tβ, the mean exit-time for HM (left) and β → tβ, the mean exit-time for WL (right). When γt = γ⋆/t (α = 1).

100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 1 2 3 4 5 6 7 8 t beta computed fit 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 2 4 6 8 10 12 t beta computed fit

Fig.: When γ⋆ = 2. [left] Hastings-Metropolis. [right] Wang-Landau. Note the logarithmic scale on the y-axis

We also observe (plots not displayed) that the shape depends on γ⋆.

slide-45
SLIDE 45

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (5/7)

We observe that ¯ tβ ∼ C exp(βµ0) tβ ∼ C(γ⋆) exp(βµγ⋆) Based on the results for the toy example, it is expected tβ ∼ C(γ⋆) exp(β µ0 1 + γ⋆ ) γ⋆ µγ⋆ µγ⋆/µ0 1/(1 + γ⋆) 2.32 1 1 1 1.74 0.75 0.5 2 1.51 0.65 0.33 4 1.25 0.54 0.20 8 0.92 0.40 0.11 Comparison of the observed shape µγ⋆ and the expected shape µ0/(1 + γ⋆) for dif- ferent values of γ⋆. Quite bad prediction !

slide-46
SLIDE 46

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (6/7)

Plot of β → tβ, the mean exit-time for WL. When γt = 1/tα when α = 0.125 (left) and α = 0.75 (right).

100 1000 10000 100000 1 10 100 1000 t beta computed fit 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 1 10 100 t beta computed fit

Fig.: [left] α = 0.125. [right] α = 0.75. Note the logarithmic scale on the y-axis

slide-47
SLIDE 47

Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example

A less simple example (7/7)

We observe that tβ ∼ C(α)tµα Based on the results for the toy example, it is expected tβ ∼ C(α)t1/(1−α) α µα 1/(1 − α) 0.125 1.11 1.14 0.25 1.30 1.33 0.375 1.55 1.60 0.5 2.02 2.00 0.625 2.72 2.67 0.75 4.06 4.00 Comparison of the observed shape µα and the expected shape 1/(1−α) for different values of α. Far better prediction !

slide-48
SLIDE 48

Convergence and Efficiency of the Wang-Landau algorithm Conclusion

Outline

The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography

slide-49
SLIDE 49

Convergence and Efficiency of the Wang-Landau algorithm Conclusion

Conclusion

Wang Landau: new methodologies

Next

Adaptive MCMC - Stochastic Approximation with controlled Markov chains.

Next

Multimodality, metastability - Molecular Dynamics, Statistical Physics.

Next

slide-50
SLIDE 50

Convergence and Efficiency of the Wang-Landau algorithm Bibliography

Outline

The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography

slide-51
SLIDE 51

Convergence and Efficiency of the Wang-Landau algorithm Bibliography

Bibliography

The Wang Landau method in Statistical Physics

previous

  • F. Wang and D. P. Landau, Efficient, Multiple-Range Random Walk Algorithm to Calculate the

Density of States external link: Phys. Rev. Lett., 86, 2050, (2001) Description of the method and its application

to the 2D Potts model and Ising model

  • F. Wang and D. P. Landau, Determining the density of states for classical statistical models: A

random walk algorithm to produce a flat histogram external link: Phys. Rev. E, 64, 056101, (2001)

Detailed description of the previous study

  • D. P. Landau, S.-H. Tsai and M. Exler, A new approach to Monte Carlo simulations in statistical

physics: Wang-Landau sampling external link: Am. J. Phys., 72, 1294, (2004) Detailed description of the

first paper

  • M. S. Shell, P. G. Debenedetti and A. Z. Panagiotopoulos, Generalization of the Wang-Landau

method for off-lattice simulations external link: Phys. Rev. E, 66, 056703, (2002)

  • B. J. Schulz, K. Binder, M. Muller, and D. P. Landau, Avoiding boundary effects in Wang-Landau

sampling external link. Phys. Rev. E, 67, 067102, (2003) A slight modification was introduced to avoid the systematic underestimation of the density of states at the higher energy border

  • A. G. Cunha-Netto, A. A. Caparica S.-H. Tsai, R. Dickman and D. P. Landau. Improving

Wang-Landau sampling with adaptive windows external link. Phys. Rev. E, 78, 055701, (2008) Using adaptive windows in energy to avoid border effects between energy ranges.

slide-52
SLIDE 52

Convergence and Efficiency of the Wang-Landau algorithm Bibliography

Methodology and Convergence analysis of Wang-Landau

previous

  • F. Liang. A general Wang-Landau algorithm for Monte Carlo computation. J. Am. Stat. Assoc.

100:1311-1327 (2005).

  • F. Liang, C. Liu and R.J. Carroll. Stochastic approximation in Monte Carlo computation. J. Am.
  • Stat. Assoc. 102:305-320 (2007).
  • Y. Atchad´

e and J.S. Liu. The Wang-Landau algorithm for Monte Carlo computation in general state space. Stat. Sinica, 20(1):209-233 (2010).

Application of Wang-Landau to Statistics. Convergence results (on the samples (Xt)t) under the assumption that the algorithm is ”stable”

  • L. Bornn, P. Jacob, P. Del Moral and A. Doucet. An Adaptive Wang-Landau Algorithm for

Automatic Density Exploration. To appear in Journal of Computational and Graphical Statistics (2013).

New methods for (i) adaptive binning strategy to automate the difficult task of partitioning the state space, (ii) the use

  • finteracting parallel chains to improve the convergence speed and use of computational resources, and (iii) the use of adaptive

proposal distributions.

  • P. Jacob and R. Ryder. The Wang-Landau algorithm reaches the flat histogram criterion in finite
  • time. To appear in Ann. Appl. Probab. (2013).

The linearized version of the update of the weight vector θt satisfies in finite time the uniformity criterion required in the original Wang-Landau algorithm. This is not guaranteed for some non-linear update.

slide-53
SLIDE 53

Convergence and Efficiency of the Wang-Landau algorithm Bibliography

Convergence of adaptive MCMC

previous

Roberts, G.O. and Rosenthal, J.S. Coupling and ergodicity of adaptive MCMC. J. Appl. Probab. 44:458-475 (2007). Fort, G., Moulines, E. and Priouret, P. Convergence of interacting MCMC: ergodicity and law of large numbers. Ann. Statist. 39:3262-3289 (2012) Fort, G., Moulines, E., Priouret, P. and Vandekerkhove, P. Convergence of interacting MCMC: Central Limit Theorem. To appear in Bernoulli (2013). Convergence of stochastic approximation scheme

  • A. Benveniste, M. Metivier and P. Priouret. Adaptive algorithms for Stochastic Approximations.

Springer-Verlag (1987).

  • C. Andrieu, E. Moulines and P. Priouret. Stability of stochastic approximation under verifiable
  • conditions. SIAM Journal on Control and Optimisation 44:283-312 (2005).
  • G. Fort. Central Limit Theorems for Stochastic Approximation algorithms. Submittted (2013).
slide-54
SLIDE 54

Convergence and Efficiency of the Wang-Landau algorithm Bibliography

(free energy) Biasing techniques in Molecular Dynamics

previous [a] on the choice of a ”good” reaction coordinate: is it easier to sample from π⋆ than from π? [b] approximation of π⋆ on the fly converging to π⋆ in the long-time limit: either approximation of π⋆ (adaptive biasing potential) or when the reaction coordinate is a continuous parameter, approximation of (adaptive biasing force).

  • N. Chopin, T. Leli`

evre and G. Stoltz. Free energy methods for efficient exploration of mixture posterior densities. Stat. Comput. 22-897-916 (2012). with a discussion

  • E. Darve and A. Pohorille. Calculating free energies using average force. J. Chem. Phys.

115:9169-9183 (2001). Dickson, B. and Legoll, F. and Leli` evre, T. and Stoltz, G. and Fleurat-Lessard, P. Free energy calculations: An efficient adaptive biasing potential method. J. Phys. Chem. B. 114:5823-5830 (2010)

  • B. Jourdain, T. Leli`

evre and R. Roux. Existence, uniqueness and convergence of particle approximation for the adaptive biasing force process. M2AN Math. Model. Numer. Anal. 44:831-865 (2010).

  • T. Leli`

evre and K. Minoukadeh. Longtime convergence of an adaptive biasing force method: the bi-channel case. Arch. Ration. Mech. Anal, 202:1-34 (2011).

  • T. Leli`

evre, M. Rousset and G. Stoltz. Computation of free energy profiles with adaptive parallel

  • dynamics. J. Chem. Phys. 126: (2007).
  • T. Leli`

evre, M. Rousset and G. Stoltz. Long-time convergence of an Adaptive Biasing Force

  • method. Nonlinearity, 21:1155-1181 (2008).
  • T. Leli`

evre, M. Rousset and G. Stoltz. Free energy computations: a mathematical perspective. Imperial Collegee Press (2010). Minoukadeh, K. and Chipot, C. and Leli` evre, T. Potential of mean force calculations: a multiple-walker adaptive biasing force approach. J. Chem. Th. Comput. 6:1008-1017 (2010).