Convergence and Efficiency of the Wang-Landau algorithm
Convergence and Efficiency of the Wang-Landau algorithm Gersende - - PowerPoint PPT Presentation
Convergence and Efficiency of the Wang-Landau algorithm Gersende - - PowerPoint PPT Presentation
Convergence and Efficiency of the Wang-Landau algorithm Convergence and Efficiency of the Wang-Landau algorithm Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint work with Benjamin Jourdain, Tony Leli` evre and Gabriel
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (1/3)
In Molecular dynamics, the models consist in the description of the state
- f the system: the location of the N particles xℓ (e.g. the set of N points
in R3) and sometimes the speed of the particles. There are interactions between the particles x1, · · · ,xN, described through a potential/Hamiltonian H(x1, · · · ,xN). A state of the system is characterized by a probability π(x): e.g. in the canonical
ensemble NVT
π(x) ∝ exp(−βH(x)) β
def
= 1 kB T
(inverse temperature)
where x = (x1, · · · ,xN) ∈ X. The goal is to compute derivatives of the partition function i.e. expectations under the distribution π when the dimension of the support X is very large, π is multimodal (or metastable).
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (2/3)
Exact computations of
- φ dπ
are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim
T →∞
1 T
T
- t=1
φ(Xt)
a.s.
− →
- φ dπ.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (2/3)
Exact computations of
- φ dπ
are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim
T →∞
1 T
T
- t=1
φ(Xt)
a.s.
− →
- φ dπ.
Unfortunately, in mestastable systems, the points remain trapped in local modes for a very long time
Fig.: [left] level curves of a potential in R2 which is metastable in the first direction. [right] path of the first component of (Xt)t
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (2/3)
Exact computations of
- φ dπ
are not possible (π is known up to a normalizing constant, the domain of integration is very large, · · · ) (Markov chain) Monte Carlo methods allow to sample points (Xt)t s.t. lim
T →∞
1 T
T
- t=1
φ(Xt)
a.s.
− →
- φ dπ.
Unfortunately, in mestastable systems, the points remain trapped in local modes for a very long time
Fig.: [left] level curves of a potential in R2 which is metastable in the first direction. [right] path of the first component of (Xt)t
In such situations, the convergence is very long to obtain!
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (3/3)
It is not possible to answer the metastability problem in full generality (number of modes, size of the barriers between metastable states which increase with the dimension N, · · · ). Nevertheless, in Molecular Dynamics, it is often possible to identify a reaction coordinate that is, in some sense a ”direction of metastability”.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Wang-Landau: a biasing technique (3/3)
It is not possible to answer the metastability problem in full generality (number of modes, size of the barriers between metastable states which increase with the dimension N, · · · ). Nevertheless, in Molecular Dynamics, it is often possible to identify a reaction coordinate that is, in some sense a ”direction of metastability”. A new approach to define samplers robust to metastability: ◮ sample from a biased distribution π⋆ such that the image of π⋆ by the reaction coordinate O is uniform: O(X) when X ∼ π⋆ has a uniform distribution the conditional distribution of π⋆ given O(x) is equal to the conditional distribution of π given O(x). ◮ approximate integrals w.r.t. π by an importance sampling algorithm with proposal π⋆
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau: a biasing technique
Outline
The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm
The original Wang-Landau algorithm (1/3)
Assume π(x) ∝ exp(−β H(x))
- n a discrete (but large) space X, and the goal is to compute
- x∈X
Φ(H(x)) π(x) Then,
- x
Φ(H(x)) π(x) =
- e∈H(X)
Φ(e) g(e)
- e′∈H(X) g(e′)
where g is the density of state: g(e)
def
=
- x∈X
1 IH(x)=e
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm
The original Wang-Landau algorithm (2/3)
Density of state: g(e)
def
=
- x∈X
1 IH(x)=e g(e) can not be calculated exactly for large systems. Although the total number of configurations increases exponentially with the size of the system, the total number of possible energy levels increases linearly with the size of system. example: qL2
compared to 2L2 for a q-state Potts on a L × L lattice withe nearest-neighfor interactions
Wang and Landau (2001) proposed to perform a random walk in the energy space in order to estimate g(e) for any e. With the density of states, we can calculate most of thermodynamic quantities in all inverse temperature β we can access many thermodynamic properties (free energy, internal energy, specific heat i.e. normalizing constant, expectation and variance under π)
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm
The original Wang-Landau algorithm (3/3)
Algorithm: Initialisation: density of state: g(e) = 1 for any e modification factor: f0 LOOP 1:
Repeat Run a Markov chain with transition matrix Q(e,e′) = 1 ∧ g(e) g(e′) Update the histogram in the energy space: if E is the new point, ln g(E) ← ln g(E) + ln ft Until the flat histogram is reached.
LOOP 2: Repeat LOOP1 with a new modification factor ft+1 ← √ft until the modification factor is smaller than a predefined value.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The original Wang-Landau algorithm
The original Wang-Landau algorithm (3/3)
Algorithm: Initialisation: density of state: g(e) = 1 for any e modification factor: f0 LOOP 1:
Repeat Run a Markov chain with transition matrix Q(e,e′) = 1 ∧ g(e) g(e′) Update the histogram in the energy space: if E is the new point, ln g(E) ← ln g(E) + ln ft Until the flat histogram is reached.
LOOP 2: Repeat LOOP1 with a new modification factor ft+1 ← √ft until the modification factor is smaller than a predefined value. Why does it work? the intuition: The chain Q is reversible w.r.t. ∝ 1/g(e) The distribution of g(E) when E ∼ 1/g(e) is the uniform distribution.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space
General Wang-Landau (1/3)
How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =
d
- i=1
1 IXi(x)π(x)
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space
General Wang-Landau (1/3)
How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =
d
- i=1
1 IXi(x)π(x) Consider a family of biased distributions (πθ,θ ∈ Rd) on X πθ(x) ∝
d
- i=1
1 θ(i)1 IXi(x)π(x) where θ = (θ(1), · · · ,θ(d)) satisfies
i θ(i) = 1 and θ(i) ≥ 0.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space
General Wang-Landau (1/3)
How to sample a metastable target distribution π on a general state space X? Choose a partition X1, · · · ,Xd of X. Then π(x) =
d
- i=1
1 IXi(x)π(x) Consider a family of biased distributions (πθ,θ ∈ Rd) on X πθ(x) ∝
d
- i=1
1 θ(i)1 IXi(x)π(x) where θ = (θ(1), · · · ,θ(d)) satisfies
i θ(i) = 1 and θ(i) ≥ 0.
Run an algorithm which combines sampling under πθt
(exact or MCMC)
update of the biasing factor θt+1 ← θt + · · · in such a way that (θt)t and (πθt)t converge to θ⋆ = (π(X1), · · · ,π(Xd)) πθ⋆(Xi) = 1 d
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space
General Wang-Landau (2/3)
When it converges θt(i) ≈ π(Xi) Integrals w.r.t. π by Importance Sampling
- φ dπ ≈ 1
T
T
- t=1
- d
d
- i=1
θt(i)1 IXt∈Xi
- φ(Xt)
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm The Wang-Landau algorithm in general state space
General Wang-Landau (3/3)
Set θ⋆ = (π(X1), · · · ,π(Xd)). Algorithm Initialisation: X0 and θ0 = (1/d, · · · ,1/d) Repeat: given (Xt,θt)
- sample Xt+1 ∼ Pθt(Xt,·) where Pθ is a Markov kernel with
invariant distribution πθt
- Update the weights
θt+1 = θt + γt+1 H(θt,Xt+1) where the field H is chosen so that θ⋆ is a zero of θ →
- πθ(dx)H(θ,x)
and (γt)t is a positive stepsize sequence.
Convergence and Efficiency of the Wang-Landau algorithm The Wang-Landau algorithm Wang-Landau in Statistics
Wang-Landau in Statistics
Multicanonical sampling (Atchad´
e & Liu, 2010)
Simulated Tempering (Atchad´
e & Liu, 2010)
Target: ρ on ˜ X. Temperatures: T1 > T2 > · · · > Td = 1. X = ˜ X × {1, · · · ,d} θ⋆(i) =
- ρ1/Ti(dx)
πθ(x,i) ∝ 1 θ(i)ρ1/Ti(x) Trans-dimensional MCMC (Atchad´
e & Liu, 2010)
˜ X = K
k=1 Xk
Target ∝ K
k=1 ρk(x) 1
IXk(x) on ˜ X. X = ˜ X × {1, · · · ,d} θ⋆(i) =
- Xi
ρi(dx) πθ(x,i) ∝ 1 θ(i)ρi(x)1 IXi(x) Variable selection (Bornn et al, 2013) Target: a posteriori distribution π of a binary vector. reaction coordinate: partition of the energy state − log π(X) Bayesian inference in mixture models (Bornn et al, 2013)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm
Outline
The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC
WL: an example of adaptive MCMC (1/2)
A family of target distributions (πθ)θ∈Θ. A family of transition kernels (Pθ)θ∈Θ such that πθPθ = πθ. WL defines a random sequence ((Xt,θt))t such that E [φ(Xt+1)|θ0,X0, · · · ,θt,Xt] =
- Pθt(Xt,dy)φ(y).
and the parameter θt is updated by a Stochastic Approximation algorithm
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC
WL: an example of adaptive MCMC (2/2)
In the literature, different strategies for the update of (θt,γt) in such a way that
d i=1 θt(i) = 1 and θt(i) ≥ 0.
(exponential update) for any i ∈ {1, · · · ,d}
θt+1(i) = θt(i) exp
- γt+1 (1
IXi(Xt+1) − 1/d)
- d
ℓ=1 θt(ℓ) exp
- γt+1 (1
IXℓ(Xt+1) − 1/d)
- (linearized version) if Xt+1 ∈ Xi,
- θt+1(i) = θt(i) + γt+1 θt(i)(1 − θt(i))
θt+1(k) = θt(k) − γt+1 θt(k)θt(i) k = i
֒ → For the next move, the probability of sampling a point in the current stratum Xi is reduced. The chain is pushed towards strata which weaker frequency of visit thus improving the exploration of the space.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm WL: an example of adaptive MCMC
WL: an example of adaptive MCMC (2/2)
In the literature, different strategies for the update of (θt,γt) in such a way that
d i=1 θt(i) = 1 and θt(i) ≥ 0.
(exponential update) for any i ∈ {1, · · · ,d}
θt+1(i) = θt(i) exp
- γt+1 (1
IXi(Xt+1) − 1/d)
- d
ℓ=1 θt(ℓ) exp
- γt+1 (1
IXℓ(Xt+1) − 1/d)
- (linearized version) if Xt+1 ∈ Xi,
- θt+1(i) = θt(i) + γt+1 θt(i)(1 − θt(i))
θt+1(k) = θt(k) − γt+1 θt(k)θt(i) k = i
֒ → For the next move, the probability of sampling a point in the current stratum Xi is reduced. The chain is pushed towards strata which weaker frequency of visit thus improving the exploration of the space. The stepsize sequence (γt)t decreases deterministically OR randomly (based on a flat histogram criterion for example). In our work, we consider the linearized update and a deterministic stepsize sequence γt.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm A numerical illustration
A numerical illustration (1/2)
Target density: π(x1,x2) ∝ exp(−β H(x1,x2))1 I[−R,R](x1)
−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1 1 2 3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −4 −2 2 4 1 2 3 4 5 6 7 8
Fig.: [left] Level curves of the potential H. [center, right] Density π up to a normalizing constant.
−2 −1 1 2 3 −4 −2 2 4 10 20 30 40 50 60 beta=1 −2 −1 1 2 3 −4 −2 2 4 1 2 3 4 5 x 10
8
beta=5
The larger β is, the lar- ger is the ratio between the weight of the strata located near to the main metastable states and the weight of the tran- sition region (near x1 = 0).
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm A numerical illustration
A numerical illustration (2/2)
−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5
R = 2.4. d = 48 strata, parti- tion along the x-axis. Pθ are Hastings-Metropolis kernels with proposal distribution N(0,(2R/d)2I) and target πθ. X0 = (−1,0). The stepsize sequence is γt ∼ c/t0.8.
0.5 e6 1 e6 1.5 e6 2 e6 2.5 e6 3 e6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 0.02 0.04 0.06 0.08 0.1 0.12
Fig.: [left] The sequences (θt(i))t. [right] The limiting value θ⋆(i)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC
Sufficient conditions for the convergence of adaptive MCMC (1/2)
Roberts and Rosenthal (2007); F., Moulines and Priouret (2012)
For the proof of the ergodicity, observe E [f(Xt)] − πθ⋆(f) = E [f(Xt) − E [f(Xt)|Ft−ℓ]] + E
- E [f(Xt)|Ft−ℓ] − P ℓ
θt−ℓf(Xt−ℓ)
- + E
- P ℓ
θt−ℓf(Xt−ℓ) − πθt−ℓ(f)
- + E
- πθt−ℓ(f) − πθ⋆(f)
- Convergence when
the first term is null the second term is small when adaptation is diminishing the third term is small when the transition kernels (Pθ,θ ∈ Θ) are ergodic (enough), at a rate which is uniform (enough) in θ (containment condition) the last term is small provided (θt,t ≥ 0) converges to θ⋆ since in our case πθ − πθ⋆TV ≤ 2(d − 1)
d
- i=1
- 1 − θ(i)
θ⋆(i)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC
Sufficient conditions for the convergence of adaptive MCMC (2/2)
For the convergence of the weight sequence (θt)t, observe θt+1 = θt + γt+1 H(θt,Xt+1) = θt + γt+1h(θt) + γt+1 (H(θt,Xt+1) − h(θt)) where the mean field h is defined by h(θ)
def
=
- H(θ,x)πθ(dx) =
d
- i=1
θ⋆(i) θ(i) −1 (θ⋆ − θ)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Sufficient conditions for the convergence of adaptive MCMC
Sufficient conditions for the convergence of adaptive MCMC (2/2)
For the convergence of the weight sequence (θt)t, observe θt+1 = θt + γt+1 H(θt,Xt+1) = θt + γt+1h(θt) + γt+1 (H(θt,Xt+1) − h(θt)) where the mean field h is defined by h(θ)
def
=
- H(θ,x)πθ(dx) =
d
- i=1
θ⋆(i) θ(i) −1 (θ⋆ − θ) Convergence to θ⋆ when the O.D.E ˙ θ = h(θ) converges to θ⋆ (Lyapunov function, · · · ) (stability condition) the sequence (θt)t visits infinitely often a compact subset of {θ : θ(i) > 0 and d
i=1 θ(i) = 1}
the noise sequence is small enough ·
t γt = ∞, t γ2 t < ∞
· the transition kernels (Pθ,θ ∈ Θ) are ergodic (enough) and are smooth enough in θ.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main results: assumptions (1/5)
1
The target distribution has a density π w.r.t. the measure λ on X ⊂ Rp, supX π < ∞.
2
The partition (Xi)i such that θ⋆(i)
def
=
- Xi π dλ > 0.
3
For any θ ∈ Θ, Pθ is a Hastings-Metropolis kernel with proposal q and invariant distribution πθ. It is assumed: infX2 q > 0.
4
The stepsize sequence (γt)t satisfies
t γt = +∞ and t γ2 t < ∞.
Under these assumptions, there exists ρ ∈ (0,1) such that for any θ sup
x∈X
P t
θ(x,·) − πθTV ≤ 2(1 − ρ)t
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: stability of (θt)t (2/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and infX π > 0 P
- lim sup
t
min
1≤i≤d θt(i) > 0
- = 1.
Sketch of the proof: Tk < ∞ w.p.1. where Tk are the successive times when a sample Xn is drawn in the stratum i⋆ such that θn(i⋆) = mink θn(k). We prove that P(lim supk
- mini θTk−1(i)
- > 0) = 1, and a key property for this proof is
Pθ(x,Xj)1 IXi(x) ≤ C 1 ∧ θ(i) θ(j) . ֒ → Low probability of moving from a stratum with small weight to a stratum with large weight.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: convergence of (θt)t (3/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and the stability of the sequence (θt)t, P
- lim
t θt = θ⋆
- = 1.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: convergence of (θt)t (3/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and the stability of the sequence (θt)t, P
- lim
t θt = θ⋆
- = 1.
Sketch of the proof: Check the conditions of Andrieu, Moulines and Priouret (2005). Main ingredients: The Lyapunov function V associated to the mean field h V (θ) = −
d
- i=1
θ⋆(i) log θ(i) θ⋆(i)
- The uniform (in x,θ) geometric ergodicity of the transition kernels Pθ
The regularity properties πθ − πθ′TV ≤ 2(d − 1)
d
- i=1
- 1 − θ(i)
θ′(i)
- sup
x∈X
Pθ(x,·) − Pθ′(x,·)TV ≤ 4 sup
i
- 1 − θ(i)
θ′(i)
- + 4 sup
i
- 1 − θ′(i)
θ(i)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: ergodicity and LLN for the samples (Xt)t (4/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and the stability of the sequence (θt)t, lim
t E [f(Xt)] =
- f(x) πθ⋆(x)λ(dx)
1 T
T
- t=1
f(Xt)
a.s.
− →
- f(x) πθ⋆(x)λ(dx)
for any bounded measurable function f.
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: ergodicity and LLN for the samples (Xt)t (4/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and the stability of the sequence (θt)t, lim
t E [f(Xt)] =
- f(x) πθ⋆(x)λ(dx)
1 T
T
- t=1
f(Xt)
a.s.
− →
- f(x) πθ⋆(x)λ(dx)
for any bounded measurable function f.
Proof: Check the conditions of F., Moulines and Priouret (2012). Main ingredients: The uniform (in x,θ) geometric ergodicity of the transition kernels Pθ The regularity properties πθ − πθ′TV ≤ 2(d − 1)
d
- i=1
- 1 − θ(i)
θ′(i)
- sup
x∈X
Pθ(x,·) − Pθ′(x,·)TV ≤ 4 sup
i
- 1 − θ(i)
θ′(i)
- + 4 sup
i
- 1 − θ′(i)
θ(i)
Convergence and Efficiency of the Wang-Landau algorithm Convergence of the Wang-Landau algorithm Main results
Main result: ergodicity and LLN for the weighted samples (Xt)t (5/5)
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012)
Under the stated assumptions and the stability of the sequence (θt)t, lim
t E
- d
d
- i=1
θt(i) f(Xt)1 IXi(Xt)
- =
- f(x) π(x)λ(dx)
1 T
T
- t=1
- d
d
- i=1
θt(i)1 IXi(Xt)
- f(Xt)
a.s.
− →
- f(x) π(x)λ(dx)
for any bounded measurable function f.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm
Outline
The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm
Introduction
Wang-Landau algorithms are designed to be able to switch as fast as possible from a metastable state to another metastable state in order to efficiently explore the whole configuration space. We obtained convergence results on WL but how to study the efficiency of the WL and how to compare WL to a non-adaptive MCMC sampler? We now discuss: Comparison in terms of how rapidly does the sampler escape from a metastable state Explicit computation of exit times for a simple model, numerical study for a more complex one.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Central Limit Theorem on the weight sequence
Central Limit Theorem on the weight sequence
Theorem
F., Jourdain, Kuhn, Leli` evre, Stoltz (2012) Under the stated assumptions, when γt ∼ γ⋆/nα
(1/2 < α < 1) γ−1/2
t
(θt − θ⋆)
d
− → Nd(0,U⋆) where U⋆ = d 2
- X
- ˆ
H⋆(x) ˆ HT
⋆ (x) − Pθ⋆ ˆ
H⋆(x) Pθ⋆ ˆ HT
⋆ (x)
- πθ⋆(x)λ(dx)
and ˆ H⋆(x) =
- ℓ≥0
P ℓ
θ⋆ (H(θ⋆,·) − h(θ⋆)) (x)
Similar result when γt ∼ γ⋆/t.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example
Toy example (1/3)
Consider the target distribution on X = {1,2,3} π(1) = π(3) = 1 2 + ǫ π(2) = ǫ 2 + ǫ The proposal distribution in WL (for the kernels Pθ) and in HM is Q = 2 3 1 3 1 3 1 3 1 3 1 3 2 3 Proposal kernel only allowing jumps to the clo- sest strata. We compute the time T1→3 to reach the state 3 starting from the state 1, for WL and a Hastings-Metropolis (HM) algorithm.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example
Toy example (2/3)
Here are the transition kernels for HM (top) and WL (bottom)
P = 1 − ǫ 3 ǫ 3 1 3 1 3 1 3 ǫ 3 1 − ǫ 3 Pθ = 1 − 1 3
- ǫ θ(1)
θ(2) ∧ 1
- 1
3
- ǫ θ(1)
θ(2) ∧ 1
- 1
3 1 ǫ θ(2) θ(1) ∧ 1
- 1 − 1
3 1 ǫ θ(2) θ(1) ∧ 1 + 1 ǫ θ(2) θ(3) ∧ 1
- 1
3 1 ǫ θ(2) θ(3) ∧ 1
- 1
3 ǫ 1 θ(3) θ(2) ∧ 1
- 1 − 1
3
- ǫ θ(3)
θ(2) ∧ 1
-
In WL, when the chain gets stuck (say) in state 1, θn(1) increases which penalizes the state 1 and favors moves to state 2.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Toy example
Toy example (3/3)
Yes, the Wang-Landau is less metastable ! For Hastings-Metropolis, T1→3 scales like 6/ǫ: ǫ 6E [T1→3] ∼ǫ→0 1 lim
ǫ→0 P( ǫ
6T1→3 > c) = exp(−c) For Wang-Landau, with a stepsize sequence γt = γ⋆/tα ◮ for some α ∈ (1/2,1) there exists constants C1,C2 such that lim
ǫ→0 P
- |ln ǫ|−1/(1−α) T1→3 ∈ (C1,C2)
- = 1
and T1→3 scales like |ln ǫ|1/(1−α). ◮ for α = 1, T1→3 scales like ǫ−1/(1+γ⋆)
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (1/7)
π(x1,x2) ∝ exp(−β H(x1,x2))1 I[−R,R](x1)
- n [−R,R] × R+
−2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1 1 2 3 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 1 2 3 4 5 6 7 8 −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5
Fig.: [left] level curves of the potential H [center] Density (up to a normalizing constant) [right] Partition of the state space
In this numerical illustration: R = 2.4. WL is run with d = 48; the proposal distribution is N(0,v2I) where v = 2R/d. HM is a symmetric random walk with proposal distribution N(0,v2I) and target π.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (2/7)
Path of the x1-component of (Xt)t, when Xt is the WL chain (left) and the HM chain (right).
2 4 6 8 10 12 x 10
4
−2 −1.5 −1 −0.5 0.5 1 1.5 2 beta=4 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10
6
−2 −1.5 −1 −0.5 0.5 1 1.5 2 beta=4
Fig.: [left] Wang Landau, T = 110 000. [right] Hastings Metropolis, T = 2 106; the red line is at x = 110 000
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (3/7)
The larger β is, the larger the ratio is between the weight of the strata located near the main metastable states and the weight of the transition region (around x1 = 0). The stepsize sequence is γt = γ⋆/tα. since Initialisation of the samplers: X0 = (−1,0), θ0 = (1/d, · · · ,1/d). The algorithm are run until the first time t such that X1
t > 1.
We repeat this experiment over M independent runs, and compute the mean value of the exit time (M ∼ 102 to 105 depending upon the value of β). We report the mean value of the exit times tβ: Wang Landau ¯ tβ: Hastings-Metropolis as a function of β, for different values of α.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (4/7)
Plot of β → ¯ tβ, the mean exit-time for HM (left) and β → tβ, the mean exit-time for WL (right). When γt = γ⋆/t (α = 1).
100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 1 2 3 4 5 6 7 8 t beta computed fit 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 2 4 6 8 10 12 t beta computed fit
Fig.: When γ⋆ = 2. [left] Hastings-Metropolis. [right] Wang-Landau. Note the logarithmic scale on the y-axis
We also observe (plots not displayed) that the shape depends on γ⋆.
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (5/7)
We observe that ¯ tβ ∼ C exp(βµ0) tβ ∼ C(γ⋆) exp(βµγ⋆) Based on the results for the toy example, it is expected tβ ∼ C(γ⋆) exp(β µ0 1 + γ⋆ ) γ⋆ µγ⋆ µγ⋆/µ0 1/(1 + γ⋆) 2.32 1 1 1 1.74 0.75 0.5 2 1.51 0.65 0.33 4 1.25 0.54 0.20 8 0.92 0.40 0.11 Comparison of the observed shape µγ⋆ and the expected shape µ0/(1 + γ⋆) for dif- ferent values of γ⋆. Quite bad prediction !
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (6/7)
Plot of β → tβ, the mean exit-time for WL. When γt = 1/tα when α = 0.125 (left) and α = 0.75 (right).
100 1000 10000 100000 1 10 100 1000 t beta computed fit 100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10 1 10 100 t beta computed fit
Fig.: [left] α = 0.125. [right] α = 0.75. Note the logarithmic scale on the y-axis
Convergence and Efficiency of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm A less simple example
A less simple example (7/7)
We observe that tβ ∼ C(α)tµα Based on the results for the toy example, it is expected tβ ∼ C(α)t1/(1−α) α µα 1/(1 − α) 0.125 1.11 1.14 0.25 1.30 1.33 0.375 1.55 1.60 0.5 2.02 2.00 0.625 2.72 2.67 0.75 4.06 4.00 Comparison of the observed shape µα and the expected shape 1/(1−α) for different values of α. Far better prediction !
Convergence and Efficiency of the Wang-Landau algorithm Conclusion
Outline
The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography
Convergence and Efficiency of the Wang-Landau algorithm Conclusion
Conclusion
Wang Landau: new methodologies
Next
Adaptive MCMC - Stochastic Approximation with controlled Markov chains.
Next
Multimodality, metastability - Molecular Dynamics, Statistical Physics.
Next
Convergence and Efficiency of the Wang-Landau algorithm Bibliography
Outline
The Wang-Landau algorithm Convergence of the Wang-Landau algorithm Efficiency of the Wang-Landau algorithm Conclusion Bibliography
Convergence and Efficiency of the Wang-Landau algorithm Bibliography
Bibliography
The Wang Landau method in Statistical Physics
previous
- F. Wang and D. P. Landau, Efficient, Multiple-Range Random Walk Algorithm to Calculate the
Density of States external link: Phys. Rev. Lett., 86, 2050, (2001) Description of the method and its application
to the 2D Potts model and Ising model
- F. Wang and D. P. Landau, Determining the density of states for classical statistical models: A
random walk algorithm to produce a flat histogram external link: Phys. Rev. E, 64, 056101, (2001)
Detailed description of the previous study
- D. P. Landau, S.-H. Tsai and M. Exler, A new approach to Monte Carlo simulations in statistical
physics: Wang-Landau sampling external link: Am. J. Phys., 72, 1294, (2004) Detailed description of the
first paper
- M. S. Shell, P. G. Debenedetti and A. Z. Panagiotopoulos, Generalization of the Wang-Landau
method for off-lattice simulations external link: Phys. Rev. E, 66, 056703, (2002)
- B. J. Schulz, K. Binder, M. Muller, and D. P. Landau, Avoiding boundary effects in Wang-Landau
sampling external link. Phys. Rev. E, 67, 067102, (2003) A slight modification was introduced to avoid the systematic underestimation of the density of states at the higher energy border
- A. G. Cunha-Netto, A. A. Caparica S.-H. Tsai, R. Dickman and D. P. Landau. Improving
Wang-Landau sampling with adaptive windows external link. Phys. Rev. E, 78, 055701, (2008) Using adaptive windows in energy to avoid border effects between energy ranges.
Convergence and Efficiency of the Wang-Landau algorithm Bibliography
Methodology and Convergence analysis of Wang-Landau
previous
- F. Liang. A general Wang-Landau algorithm for Monte Carlo computation. J. Am. Stat. Assoc.
100:1311-1327 (2005).
- F. Liang, C. Liu and R.J. Carroll. Stochastic approximation in Monte Carlo computation. J. Am.
- Stat. Assoc. 102:305-320 (2007).
- Y. Atchad´
e and J.S. Liu. The Wang-Landau algorithm for Monte Carlo computation in general state space. Stat. Sinica, 20(1):209-233 (2010).
Application of Wang-Landau to Statistics. Convergence results (on the samples (Xt)t) under the assumption that the algorithm is ”stable”
- L. Bornn, P. Jacob, P. Del Moral and A. Doucet. An Adaptive Wang-Landau Algorithm for
Automatic Density Exploration. To appear in Journal of Computational and Graphical Statistics (2013).
New methods for (i) adaptive binning strategy to automate the difficult task of partitioning the state space, (ii) the use
- finteracting parallel chains to improve the convergence speed and use of computational resources, and (iii) the use of adaptive
proposal distributions.
- P. Jacob and R. Ryder. The Wang-Landau algorithm reaches the flat histogram criterion in finite
- time. To appear in Ann. Appl. Probab. (2013).
The linearized version of the update of the weight vector θt satisfies in finite time the uniformity criterion required in the original Wang-Landau algorithm. This is not guaranteed for some non-linear update.
Convergence and Efficiency of the Wang-Landau algorithm Bibliography
Convergence of adaptive MCMC
previous
Roberts, G.O. and Rosenthal, J.S. Coupling and ergodicity of adaptive MCMC. J. Appl. Probab. 44:458-475 (2007). Fort, G., Moulines, E. and Priouret, P. Convergence of interacting MCMC: ergodicity and law of large numbers. Ann. Statist. 39:3262-3289 (2012) Fort, G., Moulines, E., Priouret, P. and Vandekerkhove, P. Convergence of interacting MCMC: Central Limit Theorem. To appear in Bernoulli (2013). Convergence of stochastic approximation scheme
- A. Benveniste, M. Metivier and P. Priouret. Adaptive algorithms for Stochastic Approximations.
Springer-Verlag (1987).
- C. Andrieu, E. Moulines and P. Priouret. Stability of stochastic approximation under verifiable
- conditions. SIAM Journal on Control and Optimisation 44:283-312 (2005).
- G. Fort. Central Limit Theorems for Stochastic Approximation algorithms. Submittted (2013).
Convergence and Efficiency of the Wang-Landau algorithm Bibliography
(free energy) Biasing techniques in Molecular Dynamics
previous [a] on the choice of a ”good” reaction coordinate: is it easier to sample from π⋆ than from π? [b] approximation of π⋆ on the fly converging to π⋆ in the long-time limit: either approximation of π⋆ (adaptive biasing potential) or when the reaction coordinate is a continuous parameter, approximation of (adaptive biasing force).
- N. Chopin, T. Leli`
evre and G. Stoltz. Free energy methods for efficient exploration of mixture posterior densities. Stat. Comput. 22-897-916 (2012). with a discussion
- E. Darve and A. Pohorille. Calculating free energies using average force. J. Chem. Phys.
115:9169-9183 (2001). Dickson, B. and Legoll, F. and Leli` evre, T. and Stoltz, G. and Fleurat-Lessard, P. Free energy calculations: An efficient adaptive biasing potential method. J. Phys. Chem. B. 114:5823-5830 (2010)
- B. Jourdain, T. Leli`
evre and R. Roux. Existence, uniqueness and convergence of particle approximation for the adaptive biasing force process. M2AN Math. Model. Numer. Anal. 44:831-865 (2010).
- T. Leli`
evre and K. Minoukadeh. Longtime convergence of an adaptive biasing force method: the bi-channel case. Arch. Ration. Mech. Anal, 202:1-34 (2011).
- T. Leli`
evre, M. Rousset and G. Stoltz. Computation of free energy profiles with adaptive parallel
- dynamics. J. Chem. Phys. 126: (2007).
- T. Leli`
evre, M. Rousset and G. Stoltz. Long-time convergence of an Adaptive Biasing Force
- method. Nonlinearity, 21:1155-1181 (2008).
- T. Leli`