SLIDE 1 Mix & Match Hamiltonian Monte Carlo
Elena Akhmatskaya1,2 and Tijana Radivojević1
1BCAM - Basque Center for Applied Mathematics, Bilbao, Spain 2IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
ICMAT Workshop: Mathematical Perspectives in Biology, Madrid February 3, 2016
SLIDE 2 Outline
- Introduction to hybrid Monte Carlo
- Hamiltonian Monte Carlo
- Mix & Match Hamiltonian Monte Carlo
SLIDE 3
Hybrid Monte Carlo - overview
► The method has been introduced by S. Duane, et al. for lattice field theory
simulations: Hybrid Monte Carlo, Phys. Lett. B 195 (1987) 216-222
► Idea: to combine two basic simulation approaches for molecular
simulations ü Deterministic: Molecular dynamics (MD) ü Stochastic: Metropolis Hastings Markov Chain Monte Carlo (MHMC)
Generates a random walk in the phase space of the system using
a proposal density and includes a method for rejecting / accepting proposed moves ¡
¡ ¡ Numerically integrates Newton’s / Hamiltonian equations
to predict time evolution of the simulated system
SLIDE 4 Hybrid Monte Carlo - rationale ¡
¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Useful Properties for Sampling Methods
▶
Allow for large moves between consecutive configurations
▶
A proposal is well defined
▶
Do not require gradients computation
▶
Free of discretization errors
▶
Provide dynamic information
▶
Maintain temperature by construction MC ü Yes ✗ No ü Yes ü Yes ✗ No ü Yes
MD and MC are surprisingly complementary: where one method fails another succeeds
MD ✗ No ü Yes ✗ No ✗ No ü Yes ✗ No ¡
SLIDE 5 ▶
Run MD trajectory at constant energy: ü Randomly assign momenta ü Numerically integrate Hamiltonian equations for a fixed number of steps
▶
Accept trajectory with Metropolis probability ≠0 (due to numerical errors)
X – position, p – momentum, U – potential energy, M – mass matrix, β- Boltzmann factor
▶
Sample in constant temperature (canonical) ensemble
Hybrid Monte Carlo - algorithmic summary
min 1,exp(−βΔH)
{ }
H(x,p) = U(x)+ 1 2 pTM -1p
start end
H H H − Δ :=
dx = M -1pdt; dp = −∇xU(x)dt ρcanon ∝exp(−β H)
SLIDE 6 Method: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡MC ¡ ¡ ü ¡MC ¡move ¡= ¡MD ¡
Hybrid Monte Carlo (HMC) - synopsis
Purpose:
▶ Efficient sampling in molecular
simulations
▶ Reduced discretization error ▶ Rigorous temperature control
What we got?
▶ HMC: bad MD but good Monte Carlo
ü Moderate system size: Performance degradation with system size and time step ü Thermodynamic properties only: does not reproduce in full dynamics of the system
What is next?
▶ Add more dynamical information ▶ Improve acceptance rates for
large systems
▶ Extend to new applications
SLIDE 7 Evolution of hybrid Monte Carlo methods
D’87: Duane et. al, 1987; H’91: Horowitz, 1991; KP’01: Kennedy & Pendleton, 2001; IH’04: Izaguirre & Hampton, 2004; AR’08-15: Akhmatskaya & Reich, 2008-2015; N’93: Neal, 1993
Hybrid Monte Carlo
HMC (D’87)
Atomistic molecular simulation Generalized HMC
GHMC (H’91, KP’01)
(more dynamics)
Meso-GHMC /GSHMC (AR’08-15)
Particle simulation
MTS-GHMC/ GSHMC (AR’08-15)
Multi-scale simulation Shadow HMC
SHMC (IH’04)
(less rejections) Generalized SHMC
GSHMC (AR’08-15)
(dynamics + sampling)
MD + = MC
SLIDE 8 Evolution of hybrid Monte Carlo methods
D’87: Duane et. al, 1987; H’91: Horowitz, 1991; KP’01: Kennedy & Pendleton, 2001; IH’04: Izaguirre & Hampton, 2004; AR’08-15: Akhmatskaya & Reich, 2008-2015; N’93: Neal, 1993
Hybrid Monte Carlo
HMC (D’87)
Atomistic molecular simulation Hamiltonian Monte Carlo
HMC (N’93)
Statistical simulation Generalized HMC
GHMC (H’91, KP’01)
(more dynamics)
Meso-GHMC /GSHMC (AR’08-15)
Particle simulation
MTS-GHMC/ GSHMC (AR’08-15)
Multi-scale simulation Shadow HMC
SHMC (IH’04)
(less rejections) Generalized SHMC
GSHMC (AR’08-15)
(dynamics + sampling)
MD + = MC
SLIDE 9 Hamiltonian Monte Carlo - formulation
▶ For the target density π(θ) of position vector θ, momenta p conjugate to
θ, and a ‘mass’ matrix M (a preconditioner)
ü construct a potential function ü and a Hamiltonian
▶ Obtain a proposal in the Markov chain by simulating Hamiltonian
dynamics (HD)
▶ Perform sampling with respect to a canonical density
U(θ) = −log(π(θ)) H(θ,p) =U(θ)+ 1 2 pT M −1p π(θ,p)∝exp(−H(θ,p))
SLIDE 10
(θ, p) (θ , p )
¯ p (θ0, p0 Integrates for L steps the Hamiltonian equations using a symplectic numerical integrator with a time step h Generated proposal:
ψτ is the time-reversible symplectic map,
Metropolis test
! θ , ! p
( ) =ψτ θ ,p ( ),
τ = Lh θ new ,pnew
( ) =
! θ , ! p
( ) with
probability α θ ,p
( ) otherwise
" # $ % $ α = min 1,exp −ΔH
( ) { }
ΔH = H ! θ , ! p
( )− H θ ,p ( ) ≠ 0
CMU
Hamiltonian Dynamics Complete Momentum Update (CMU)
Hamiltonian Monte Carlo - algorithm
SLIDE 11 Hamiltonian Monte Carlo – synopsis
Features
▶
Problem sizes: 10-104 against 103-106 in molecular simulation
▶
All elements and parameters of the method do not have a physical meaning ü Dynamics is not important ✗ The choice of simulation parameters is purely heuristic
▶
Highly oscillatory Hamiltonian systems ✗ an appropriate choice of a numerical integrator is not obvious ✗ the randomized simulation parameters for HD are preferable
▶
Hierarchical/latent-variable models are common ✗ require tuning of multiple non-independent sets of simulation parameters Wish list
▶
Rational choice of simulation parameters
▶
Problem specific numerical integrators
▶
Improved acceptance rates
▶
Increased sampling efficiency
SLIDE 12 Evolution of Hamiltonian Monte Carlo methods
N’93: Neal, 1993; GC’11: Girolami & Calderhead, 2011; HG’14: Hoffman & Gelman, 2014; SDMDW’14: Sohl-Dickstein, Mudigonda and DeWeese, 2014; CSS’15: Campos and Sanz-Serna, 2015; AR’16: Akhmatskaya & Radivojević, 2016
Hamiltonian Monte Carlo
HMC (N’93)
Statistical simulation Riemann manifold HMC
RMHMC (GC’11)
(strongly correlated target densities) No-U-Turn Sampler NUTS (HG’14) (tuned parameters) Extra chance HMC
XCGHMC (SDMDW’14, CSS’15)
(less rejections)
HD + = MC
SLIDE 13 Evolution of Hamiltonian Monte Carlo methods
N’93: Neal, 1993; GC’11: Girolami & Calderhead, 2011; HG’14: Hoffman & Gelman, 2014; SDMDW’14: Sohl-Dickstein, Mudigonda and DeWeese, 2014; CSS’15: Campos and Sanz-Serna, 2015; RA’16: Radivojević & Akhmatskaya, 2016
Hamiltonian Monte Carlo
HMC (N’93)
Statistical simulation Riemann manifold HMC
RMHMC (GC’11)
(strongly correlated target densities) No-U-Turn Sampler NUTS (HG’14) (tuned parameters) Extra chance HMC
XCGHMC (SDMDW’14, CSS’15)
(less rejections) Mix&Match HMC
MMHMC (RA’16)
(high dimensions, enhanced sampling)
HD + = MC
SLIDE 14 Preview
I We introduce an alternative to Hamiltonian Monte Carlo (HMC) for
efficient sampling in statistical simulations
I The new method called Mix & Match Hamiltonian Monte Carlo
(MMHMC) has been inspired by Generalized Shadow Hybrid Monte Carlo (GSHMC ) by Akhmatskaya & Reich
I MMHMC:
3 is generalized HMC that samples with modified Hamiltonians 3 offers computationally effective Metropolis test for momentum update 3 reduces potential negative effects of momentum flips 3 relies on the method- and system- specific adaptive integration scheme and compatible modified Hamiltonians 3 outperforms in sampling efficiency the advanced sampling techniques for computational statistics such as HMC and Riemann Manifold HMC 3 efficient for sampling in multidimensional space
SLIDE 15 Behind the scenes
I 2008 – 2011: GSHMC was
3 introduced for sampling in molecular simulation 3 published: Akhmatskaya, Reich (2008), JCOMP, 227, 4934; Akhmatskaya,
Bou-Rabee, Reich (2009), JCOMP, 228 (6), 2256
3 patented: GB patent (2009), US patent (2011)
[Fujitsu, Authors: Akhmatskaya, Reich]
3 proved to be successful in simulations of complex molecular systems in Biology and Chemistry (6 publications)
7 No implementation and testing in statistical computation till 2015 7 Never been implemented in open source software due to patent restrictions
I November 2015: Fujitsu issued the license giving a permission
(i) to use the patented method in open source software (ii) to EA to implement and use know-how
I Current status: GSHMC has been modified and adapted to statistical
applications to give birth to MMHMC. Implemented in BCAM in-house software [Radivojevi, Akhmatskaya, preprint]
Fujitsu granted the permission
SLIDE 16 Success stories: Proteins
Peptide toxin / bilayer system:
GSHMC offers ~8x increase in sampling efficiency (from analysis of ACFs) compared to conventional molecular dynamics (MD) simulation
- C. L. Wee, M. S. P. Sansom, S. Reich, E. Akhmatskaya (2008), J. Phys. Chem. B,112, 5710.–5717
SLIDE 17 Success stories: Polymers
In Situ Formation of Graft Copolymer: Langevin Dynamics (LD) vs. GSHMC
GSHMC predicts the equilibrium morphology of graft polymer ~ 7 x faster then LD (from ACFs of radii of gyration)
LD GSHMC T=50000 T = 3000 T = 1800 T = 4200 T = 7000 T = 1800
Asua J. M., Akhmatskaya E. (2011)Dynamical modelling of morphology development in multiphase latex particles European Success Stories in Industrial Mathematics,ed. by Thibaut Lery, et al.,Springer
SLIDE 18 MMHMC – features
I For the target density π(θ) of position vector θ, momenta p conjugate to θ,
with a ‘mass’ matrix M (a preconditioner) construct a potential function U(θ) = − log(π(θ)) and a Hamiltonian H(θ, p) = U(θ) + 1 2pTM−1p
I Sampling is performed with respect to a modified canonical density
˜ π(θ, p) ∝ exp(−Hh(θ, p))
I E. g., the modified Hamiltonian of order m = 4 for the Verlet method
Hh(θ, p) = H(θ, p) + h2 24
- 2p|M−1Uθθ(θ)M−1p − Uθ(θ)|M−1Uθ(θ)
SLIDE 19 ▶ Comparatively cheap and arbitrary accurate approximations of
Hamiltonians
▶ Defined by an asymptotic expansion in powers of the discretization
parameter
▶ Exact (by construction) for quadratic Hamiltonians ▶ Stay close to true Hamiltonians for short HD simulations (such as in HMC) ▶ Conserved by symplectic integrators to higher accuracy than true
Hamiltonian
▶ The ways to construct approximations to modified Hamiltonians are
defined by a choice of the integrator
p is an order of an integrator, p=2 for Verlet; N – a problem size
Modified / Shadow Hamiltonians Hh
ΔH = O(Nh2 p) ΔHh = O(Nh2m)
m≥4
Hh = H +O hm
( ),m ≥ 4
SLIDE 20
HMC MMHMC
Momentum update complete partial Momentum Metropolis test 7 3 Metropolis test H Hh Re-weighting 7 3
SLIDE 21 MMHMC – algorithm
(θ, p) PMMC
Hamiltonian Dynamics Metropolis test F
(θ
new, p new)
w ¯ p (θ0, p0)
A R
Partial Momentum Monte Carlo (PMMC)
¯ p = √1 − ϕp + √ϕu with probability P = min{1, exp(−∆ ˆ Hh)} p
u ∼ N(0, M) is the noise, ϕ ∈ [0, 1]
∆ ˆ Hh = h2(6b − 1) 24 ⇣ ϕA + 2 p ϕ(1 − ϕ)B ⌘
A = u|M−1Uθθ(θ)M−1u − p|M−1Uθθ(θ)M−1p B = u|M−1Uθθ(θ)M−1p b is the integrator’s parameter
The extended “Hamiltonian” ˆ Hh(θ, p, u) = Hh(θ, p) + 1 2u|M1u defines the extended reference density ˆ π(θ, p, u) ∝ exp(− ˆ Hh(θ, p, u))
SLIDE 22 MMHMC – algorithm
(θ, p) PMMC
Hamiltonian Dynamics Metropolis test F
(θ
new, p new)
w ¯ p (θ0, p0)
A R
Metropolis test (θ
new, p new) =
⇢ (θ0, p0)
accept with prob. α
F(θ, p⇤)
reject otherwise
α = min {1, exp(∆Hh)} Flip F(θ, p) = ⇢ (θ, p)
reduced flip (optionally)
Re-weighting (w) For every n = 1, . . . , N stores wn = exp
- Hh(θn, pn) H(θn, pn)
- hΩi =
PN
n=1 wnΩn
PN
n=1 wn
Ωn, n = 1, 2, . . . , N - values of observables along a
sequence of states (θn, pn)
SLIDE 23 Modifications – Reduced Flip
I use the information on the previous, current, and candidate states
while rigorously satisfying the balance condition Metropolis test (θ
new, p new) =
(θ0, p0)
with probability α
F(θ, ¯ p)
with probability PF
(θ, ¯ p)
α = min {1, exp(−∆Hh)}
PF((θ, ¯ p)|(θprev, pprev), (θ0, p0)) = 8 > > < > > : max ⇣ 0, 1
α
p),(θ0,p0)
p),F(θprev,pprev))
if(θprev, pprev) ! (θ, ¯
p)
was an accepted move
1 α((θ, ¯ p), (θ0, p0))
Wagoner J. A., Pande V. S. (2012), Reducing the effect of Metropolization on mixing times in molecular dynamics
- simulations. J. Chem., Phys. 137: 214105
SLIDE 24 What to expect?
Pros – enhanced sampling
I High acceptance rates – Hh conserved by symplectic integrators better
than H
I Access to second-order information about the target distribution I Extra parameter for performance enhancing I Faster convergence to the target PDF
Cons – extra computational cost
B Computation of Hh for each proposal (higher orders Hh are more
expensive)
B Extra Metropolis test for momentum update B Accurate numerical integrators required to use low orders Hh for
systems with highly oscillatory H Our strategy To find the numerical integrator that provides the best conservation of modi- fied energy. Search within the family of 2-stage splitting integrating schemes.
SLIDE 25
Modified Hamiltonians for splitting integrators
4-th order modified Hamiltonian for 2-stage splitting integrators were derived in terms of quantities available during simulation by applying the Baker-Campbell-Hausdorff (BCH) formula iteratively Hh(θ, p) = U(θ) + K(p) + h2⇣ αp|M−1 ˙ Uθ(θ) + βUθ(θ)|M−1Uθ(θ) ⌘ ˙ Uθ(θ) - numerical time-derivative of Uθ(θ) α = 6b − 1 24 , β = 6b2 − 6b + 1 12 b - parameter of an integrator
SLIDE 26 Adaptive integrators for optimal conservation of modified energy
Consider one parameter 2-stage splitting integrator of Hamiltonian system with H(θ, p) = U(θ) + 1
2pTM−1p = A + B:
ψh = ϕB
bh ϕA
h 2 ϕB
(1−2b)h ϕA
h 2 ϕB
bh
I Our Adaptive Integration Approach (AIA):
Given step size h and highest frequency fmax find b(h, fmax) (⌘ system specific integrator) that minimizes expectation of the modified energy error ∆ = H[4]
h (Ψh,L(θ, p)) H[4] h (θ, p), i. e.
0 E(∆) ρ(h, b) ! minimal
SLIDE 27 Adaptive integrators for optimal conservation of modified energy
ρ(h, b) = ⇣ ( 1
2 + h2β)Bh + ( 1 2 + h2α)Ch
⌘⇣ Bh + Ch ⌘ 1 − A2
h
Ah = h4b(1 − 2b) 4 − h2 2 + 1 Bh = −h3(1 − 2b) 4 + h Ch = −h5b2(1 − 2b) 4 + h3b(1 − b) − h
Then: b∗(h, fmax) = arg min
b∈B
max
0<h<¯ h ρ(h, b)
where ¯ h (reduced step size) is a function of fmax
SLIDE 28 AIA in action
˜ h
1 2 3
b
0.19 0.2 0.21 0.22 0.23 0.24 0.25
AIA BCSS∗ VV
VV - velocity Verlet, b = 0.25 BCSS∗ - a fixed parameter integrator derived for MMHMC using ideas of Blanes et. al (2014)
h
0.5 0.6 0.7 0.8 0.9 1
medESS/T
2000 3000 4000 5000 6000 7000 8000
D = 100 h
0.4 0.5 0.6 0.7 0.8
medESS/T
100 200 300 400 500 600 700 800
D = 1000
AIA VV BCSS∗ HMCVV
Efficiency in sampling a multivarate Gaussian distribution
- S. Blanes, F. Casas, J.M. Sanz-Serna (2014), Numerical integrators for the Hybrid Monte Carlo method,
SIAM J. Sci. Comput., 36(4), A1556–A1580
˜ h
1 2 3
b
0.2 0.21 0.22 0.23 0.24 0.25
AIA BCSS∗ VV
SLIDE 29 AIA in action
˜ h
1 2 3
b
0.19 0.2 0.21 0.22 0.23 0.24 0.25
AIA BCSS∗ VV
VV - velocity Verlet, b = 0.25 BCSS∗ - a fixed parameter integrator derived for MMHMC using ideas of Blanes et. al (2014)
h
0.5 0.6 0.7 0.8 0.9 1
medESS/T
2000 3000 4000 5000 6000 7000 8000
D = 100 h
0.4 0.5 0.6 0.7 0.8
medESS/T
100 200 300 400 500 600 700 800
D = 1000
AIA VV BCSS∗ HMCVV
Efficiency in sampling a multivarate Gaussian distribution
- S. Blanes, F. Casas, J.M. Sanz-Serna (2014), Numerical integrators for the Hybrid Monte Carlo method,
SIAM J. Sci. Comput., 36(4), A1556–A1580
˜ h
1 2 3
b
0.2 0.21 0.22 0.23 0.24 0.25
AIA BCSS∗ VV
SLIDE 30 Benchmark Models
Banana-shaped distribution
θ = (θ1, θ2) yi|θ ∼ N(θ1 + θ2
2, σ2 yi)
θ ∼ N(0, σ2
θ)
Bayesian Logistic Regression (BLR)
yi|x1,i, . . . , xD−1,i ∼ Bernoulli(pi) logit(pi) = θ0 + θ1x1,i + · · · + θD−1xD−1,i
Dataset # of param (D) # of obs (K) german 25 1000 sonar 61 208 musk 167 476 secom 444 1567
Lichman, M. (2013), UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University
- f California, School of Information and Computer Science
SLIDE 31 Benchmark Models
Stochastic Volatility (SV)
yt|β, xt ∼ N
- 0, β2 exp(xt)
- xt|φ, σ, xt−1
∼ N
x1 ∼ N ✓ 0, σ2 1 − φ2 ◆ θ = (β, σ, φ) - model parameters xt, t = 1, . . . , d - latent volatilities
p(x|y, θ) p(θ|y, x)
Simulated data d = 2000 d = 5000 d = 10000
SLIDE 32 Testing
Methods Criteria MMHMC Space exploration HMC Sampling efficiency RMHMC Convergence Metrics
I AR – acceptance rate I ESS* - time normalized effective
sample size
I EF – efficiency factor (relative
ESS* w.r.t HMC)
I ˆ
R – potential scale reduction factor
I Results averaged over 10 runs I Choice of simulation parameters – each method tuned for the best
performance
- M. Girolami, B. Calderhead (2011), Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods,
Journal of the Royal Statistical Society: Series B, 73(2):123–214
- SP. Brooks, A. Gelman (1998), General methods for monitoring convergence of iterative simu- lations.
Journal of Computational and Graphical Statistics, 7, 434–455
SLIDE 33
Banana distribution – 15 sampling paths
RWMH
HMC
RMHMC MMHMC
SLIDE 34
Banana distribution – 5000 samples
RWMH
HMC
MMHMC RMHMC
SLIDE 35 Bayesian Logistic Regression (BLR)
minESS medESS maxESS
EF
2 4 6 8
musk (D = 167)
minESS medESS maxESS
secom (D = 444)
EF
0.5 1 1.5 2 2.5
german (D = 25)
HMC MMHMC
sonar (D = 61)
AR
0.8 0.9 1
AR
0.8 0.9 1
AR
0.8 0.9 1
AR
0.8 0.9 1
SLIDE 36 Stochastic volatility (SV): d=2000
β σ φ
EF
0.5 1 1.5 2 2.5 3 minESS medESS maxESS
HMC RMHMC MMHMC AR
0.7 0.8 0.9 1
θ x
SLIDE 37 SV: d=5000
β σ φ
EF
0.5 1 1.5 2 2.5 3 minESS medESS maxESS
HMC MMHMC AR
0.7 0.8 0.9 1
θ x
SLIDE 38 SV: d=10000
β σ φ
EF
0.5 1 1.5 2 2.5 3 minESS medESS maxESS
HMC MMHMC AR
0.6 0.7 0.8 0.9 1
θ x
SLIDE 39
SV – Convergence
×105 1 2 1 1.1 1.2 1.3
ˆ Rβ MC iterations×105
1 2
ˆ Rσ
d = 2000 d = 5000 d = 10000
×105 1 2
ˆ Rφ
HMC RMHMC MMHMC
SLIDE 40 Summary
MMHMC vs HMC
I MMHMC demonstrates higher AR, bigger ESS* and faster
convergence MMHMC vs RMHMC
I SV model: MMHMC and RMHMC demonstrate comparable
sampling performance with slight dominance of MMHMC
I BLR model: RMHMC does not improve HMC for considered
dimensions, whereas MMHMC outperforms HMC up to 8× for D ≥ 25.
I In contrast to RMHMC, MMHMC
I does not require matrix inversion (computationally less expensive) I relies on separable Hamiltonians - allows for use of new, more
efficient numerical integrators
I efficient for high dimensions problems
SLIDE 41 Work in progress / Future work
▶ Rational choice of the MMHMC parameters
ü Step sizes: the RAIA method is developed. Testing in progress ü Trajectory lengths: Bayesian adaptive schemes
▶ Applications
ü Bayesian pharmacometrics modeling & simulation ¡
SLIDE 42 Guggenheim, Bilbao BCAM
Modeling
imulation ion in in Lif Life e & Mat ater erials ials Sciences ciences: : MSLM LMS