[PPT] - Mix & Match Hamiltonian Monte Carlo Elena Akhmatskaya 1,2 and PowerPoint Presentation

SLIDE 1

Mix & Match Hamiltonian Monte Carlo

Elena Akhmatskaya1,2 and Tijana Radivojević1

1BCAM - Basque Center for Applied Mathematics, Bilbao, Spain 2IKERBASQUE, Basque Foundation for Science, Bilbao, Spain

ICMAT Workshop: Mathematical Perspectives in Biology, Madrid February 3, 2016

SLIDE 2

Outline

Introduction to hybrid Monte Carlo
Hamiltonian Monte Carlo
Mix & Match Hamiltonian Monte Carlo

SLIDE 3

Hybrid Monte Carlo - overview

► The method has been introduced by S. Duane, et al. for lattice field theory

simulations: Hybrid Monte Carlo, Phys. Lett. B 195 (1987) 216-222

► Idea: to combine two basic simulation approaches for molecular

simulations ü Deterministic: Molecular dynamics (MD) ü Stochastic: Metropolis Hastings Markov Chain Monte Carlo (MHMC)

Generates a random walk in the phase space of the system using

a proposal density and includes a method for rejecting / accepting proposed moves ¡

¡ ¡ Numerically integrates Newton’s / Hamiltonian equations

to predict time evolution of the simulated system

SLIDE 4

Hybrid Monte Carlo - rationale ¡

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ Useful Properties for Sampling Methods

▶

Allow for large moves between consecutive configurations

▶

A proposal is well defined

▶

Do not require gradients computation

▶

Free of discretization errors

▶

Provide dynamic information

▶

Maintain temperature by construction MC ü Yes ✗ No ü Yes ü Yes ✗ No ü Yes

MD and MC are surprisingly complementary: where one method fails another succeeds

MD ✗ No ü Yes ✗ No ✗ No ü Yes ✗ No ¡

SLIDE 5

▶

Run MD trajectory at constant energy: ü Randomly assign momenta ü Numerically integrate Hamiltonian equations for a fixed number of steps

▶

Accept trajectory with Metropolis probability ≠0 (due to numerical errors)

Hamiltonian

X – position, p – momentum, U – potential energy, M – mass matrix, β- Boltzmann factor

▶

Sample in constant temperature (canonical) ensemble

Hybrid Monte Carlo - algorithmic summary

min 1,exp(−βΔH)

{ }

H(x,p) = U(x)+ 1 2 pTM -1p

start end

H H H − Δ :=

dx = M -1pdt; dp = −∇xU(x)dt ρcanon ∝exp(−β H)

SLIDE 6

Method: ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡MC ¡ ¡ ü ¡MC ¡move ¡= ¡MD ¡

Hybrid Monte Carlo (HMC) - synopsis

Purpose:

▶ Efficient sampling in molecular

simulations

▶ Reduced discretization error ▶ Rigorous temperature control

What we got?

▶ HMC: bad MD but good Monte Carlo

ü Moderate system size: Performance degradation with system size and time step ü Thermodynamic properties only: does not reproduce in full dynamics of the system

What is next?

▶ Add more dynamical information ▶ Improve acceptance rates for

large systems

▶ Extend to new applications

SLIDE 7

Evolution of hybrid Monte Carlo methods

D’87: Duane et. al, 1987; H’91: Horowitz, 1991; KP’01: Kennedy & Pendleton, 2001; IH’04: Izaguirre & Hampton, 2004; AR’08-15: Akhmatskaya & Reich, 2008-2015; N’93: Neal, 1993

Hybrid Monte Carlo

HMC (D’87)

Atomistic molecular simulation Generalized HMC

GHMC (H’91, KP’01)

(more dynamics)

Meso-GHMC /GSHMC (AR’08-15)

Particle simulation

MTS-GHMC/ GSHMC (AR’08-15)

Multi-scale simulation Shadow HMC

SHMC (IH’04)

(less rejections) Generalized SHMC

GSHMC (AR’08-15)

(dynamics + sampling)

MD + = MC

SLIDE 8

Evolution of hybrid Monte Carlo methods

D’87: Duane et. al, 1987; H’91: Horowitz, 1991; KP’01: Kennedy & Pendleton, 2001; IH’04: Izaguirre & Hampton, 2004; AR’08-15: Akhmatskaya & Reich, 2008-2015; N’93: Neal, 1993

Hybrid Monte Carlo

HMC (D’87)

Atomistic molecular simulation Hamiltonian Monte Carlo

HMC (N’93)

Statistical simulation Generalized HMC

GHMC (H’91, KP’01)

(more dynamics)

Meso-GHMC /GSHMC (AR’08-15)

Particle simulation

MTS-GHMC/ GSHMC (AR’08-15)

Multi-scale simulation Shadow HMC

SHMC (IH’04)

(less rejections) Generalized SHMC

GSHMC (AR’08-15)

(dynamics + sampling)

MD + = MC

SLIDE 9

Hamiltonian Monte Carlo - formulation

▶ For the target density π(θ) of position vector θ, momenta p conjugate to

θ, and a ‘mass’ matrix M (a preconditioner)

ü construct a potential function ü and a Hamiltonian

▶ Obtain a proposal in the Markov chain by simulating Hamiltonian

dynamics (HD)

▶ Perform sampling with respect to a canonical density

U(θ) = −log(π(θ)) H(θ,p) =U(θ)+ 1 2 pT M −1p π(θ,p)∝exp(−H(θ,p))

SLIDE 10

(θ, p) (θ , p )

¯ p (θ0, p0 Integrates for L steps the Hamiltonian equations using a symplectic numerical integrator with a time step h Generated proposal:

ψτ is the time-reversible symplectic map,

Metropolis test

! θ , ! p

( ) =ψτ θ ,p ( ),

τ = Lh θ new ,pnew

( ) =

! θ , ! p

( ) with

probability α θ ,p

( ) otherwise

" # $ % $ α = min 1,exp −ΔH

( ) { }

ΔH = H ! θ , ! p

( )− H θ ,p ( ) ≠ 0

CMU

Hamiltonian Dynamics Complete Momentum Update (CMU)

Hamiltonian Monte Carlo - algorithm

SLIDE 11

Hamiltonian Monte Carlo – synopsis

Features

▶

Problem sizes: 10-104 against 103-106 in molecular simulation

▶

All elements and parameters of the method do not have a physical meaning ü Dynamics is not important ✗ The choice of simulation parameters is purely heuristic

▶

Highly oscillatory Hamiltonian systems ✗ an appropriate choice of a numerical integrator is not obvious ✗ the randomized simulation parameters for HD are preferable

▶

Hierarchical/latent-variable models are common ✗ require tuning of multiple non-independent sets of simulation parameters Wish list

▶

Rational choice of simulation parameters

▶

Problem specific numerical integrators

▶

Improved acceptance rates

▶

Increased sampling efficiency

SLIDE 12

Evolution of Hamiltonian Monte Carlo methods

N’93: Neal, 1993; GC’11: Girolami & Calderhead, 2011; HG’14: Hoffman & Gelman, 2014; SDMDW’14: Sohl-Dickstein, Mudigonda and DeWeese, 2014; CSS’15: Campos and Sanz-Serna, 2015; AR’16: Akhmatskaya & Radivojević, 2016

Hamiltonian Monte Carlo

HMC (N’93)

Statistical simulation Riemann manifold HMC

RMHMC (GC’11)

(strongly correlated target densities) No-U-Turn Sampler NUTS (HG’14) (tuned parameters) Extra chance HMC

XCGHMC (SDMDW’14, CSS’15)

(less rejections)

HD + = MC

SLIDE 13

Evolution of Hamiltonian Monte Carlo methods

N’93: Neal, 1993; GC’11: Girolami & Calderhead, 2011; HG’14: Hoffman & Gelman, 2014; SDMDW’14: Sohl-Dickstein, Mudigonda and DeWeese, 2014; CSS’15: Campos and Sanz-Serna, 2015; RA’16: Radivojević & Akhmatskaya, 2016

Hamiltonian Monte Carlo

HMC (N’93)

Statistical simulation Riemann manifold HMC

RMHMC (GC’11)

(strongly correlated target densities) No-U-Turn Sampler NUTS (HG’14) (tuned parameters) Extra chance HMC

XCGHMC (SDMDW’14, CSS’15)

(less rejections) Mix&Match HMC

MMHMC (RA’16)

(high dimensions, enhanced sampling)

HD + = MC

SLIDE 14

Preview

I We introduce an alternative to Hamiltonian Monte Carlo (HMC) for

efficient sampling in statistical simulations

I The new method called Mix & Match Hamiltonian Monte Carlo

(MMHMC) has been inspired by Generalized Shadow Hybrid Monte Carlo (GSHMC ) by Akhmatskaya & Reich

I MMHMC:

3 is generalized HMC that samples with modified Hamiltonians 3 offers computationally effective Metropolis test for momentum update 3 reduces potential negative effects of momentum flips 3 relies on the method- and system- specific adaptive integration scheme and compatible modified Hamiltonians 3 outperforms in sampling efficiency the advanced sampling techniques for computational statistics such as HMC and Riemann Manifold HMC 3 efficient for sampling in multidimensional space

SLIDE 15

Behind the scenes

I 2008 – 2011: GSHMC was

3 introduced for sampling in molecular simulation 3 published: Akhmatskaya, Reich (2008), JCOMP, 227, 4934; Akhmatskaya,

Bou-Rabee, Reich (2009), JCOMP, 228 (6), 2256

3 patented: GB patent (2009), US patent (2011)

[Fujitsu, Authors: Akhmatskaya, Reich]

3 proved to be successful in simulations of complex molecular systems in Biology and Chemistry (6 publications)

7 No implementation and testing in statistical computation till 2015 7 Never been implemented in open source software due to patent restrictions

I November 2015: Fujitsu issued the license giving a permission

(i) to use the patented method in open source software (ii) to EA to implement and use know-how

I Current status: GSHMC has been modified and adapted to statistical

applications to give birth to MMHMC. Implemented in BCAM in-house software [Radivojevi, Akhmatskaya, preprint]

Fujitsu granted the permission

SLIDE 16

Success stories: Proteins

Peptide toxin / bilayer system:

GSHMC offers ~8x increase in sampling efficiency (from analysis of ACFs) compared to conventional molecular dynamics (MD) simulation

C. L. Wee, M. S. P. Sansom, S. Reich, E. Akhmatskaya (2008), J. Phys. Chem. B,112, 5710.–5717

SLIDE 17

Success stories: Polymers

In Situ Formation of Graft Copolymer: Langevin Dynamics (LD) vs. GSHMC

GSHMC predicts the equilibrium morphology of graft polymer ~ 7 x faster then LD (from ACFs of radii of gyration)

LD GSHMC T=50000 T = 3000 T = 1800 T = 4200 T = 7000 T = 1800

Asua J. M., Akhmatskaya E. (2011)Dynamical modelling of morphology development in multiphase latex particles European Success Stories in Industrial Mathematics,ed. by Thibaut Lery, et al.,Springer

SLIDE 18

MMHMC – features

I For the target density π(θ) of position vector θ, momenta p conjugate to θ,

with a ‘mass’ matrix M (a preconditioner) construct a potential function U(θ) = − log(π(θ)) and a Hamiltonian H(θ, p) = U(θ) + 1 2pTM−1p

I Sampling is performed with respect to a modified canonical density

˜ π(θ, p) ∝ exp(−Hh(θ, p))

I E. g., the modified Hamiltonian of order m = 4 for the Verlet method

Hh(θ, p) = H(θ, p) + h2 24

2p|M−1Uθθ(θ)M−1p − Uθ(θ)|M−1Uθ(θ)

SLIDE 19

▶ Comparatively cheap and arbitrary accurate approximations of

Hamiltonians

▶ Defined by an asymptotic expansion in powers of the discretization

parameter

▶ Exact (by construction) for quadratic Hamiltonians ▶ Stay close to true Hamiltonians for short HD simulations (such as in HMC) ▶ Conserved by symplectic integrators to higher accuracy than true

Hamiltonian

▶ The ways to construct approximations to modified Hamiltonians are

defined by a choice of the integrator

p is an order of an integrator, p=2 for Verlet; N – a problem size

Modified / Shadow Hamiltonians Hh

ΔH = O(Nh2 p) ΔHh = O(Nh2m)

m≥4

Hh = H +O hm

( ),m ≥ 4

SLIDE 20

HMC MMHMC

Momentum update complete partial Momentum Metropolis test 7 3 Metropolis test H Hh Re-weighting 7 3

SLIDE 21

MMHMC – algorithm

(θ, p) PMMC

Hamiltonian Dynamics Metropolis test F

(θ

new, p new)

w ¯ p (θ0, p0)

A R

Partial Momentum Monte Carlo (PMMC)

¯ p =    √1 − ϕp + √ϕu with probability P = min{1, exp(−∆ ˆ Hh)} p

therwise

u ∼ N(0, M) is the noise, ϕ ∈ [0, 1]

∆ ˆ Hh = h2(6b − 1) 24 ⇣ ϕA + 2 p ϕ(1 − ϕ)B ⌘

A = u|M−1Uθθ(θ)M−1u − p|M−1Uθθ(θ)M−1p B = u|M−1Uθθ(θ)M−1p b is the integrator’s parameter

The extended “Hamiltonian” ˆ Hh(θ, p, u) = Hh(θ, p) + 1 2u|M1u defines the extended reference density ˆ π(θ, p, u) ∝ exp(− ˆ Hh(θ, p, u))

SLIDE 22

MMHMC – algorithm

(θ, p) PMMC

Hamiltonian Dynamics Metropolis test F

(θ

new, p new)

w ¯ p (θ0, p0)

A R

Metropolis test (θ

new, p new) =

⇢ (θ0, p0)

accept with prob. α

F(θ, p⇤)

reject otherwise

α = min {1, exp(∆Hh)} Flip F(θ, p) = ⇢ (θ, p)

reduced flip (optionally)

Re-weighting (w) For every n = 1, . . . , N stores wn = exp

Hh(θn, pn) H(θn, pn)
hΩi =

PN

n=1 wnΩn

PN

n=1 wn

Ωn, n = 1, 2, . . . , N - values of observables along a

sequence of states (θn, pn)

SLIDE 23

Modifications – Reduced Flip

I use the information on the previous, current, and candidate states

while rigorously satisfying the balance condition Metropolis test (θ

new, p new) =

   (θ0, p0)

with probability α

F(θ, ¯ p)

with probability PF

(θ, ¯ p)

therwise

α = min {1, exp(−∆Hh)}

PF((θ, ¯ p)|(θprev, pprev), (θ0, p0)) = 8 > > < > > : max ⇣ 0, 1

α

(θ,¯

p),(θ0,p0)

α
F(θ,¯

p),F(θprev,pprev))

⌘

if(θprev, pprev) ! (θ, ¯

p)

was an accepted move

1 α((θ, ¯ p), (θ0, p0))

therwise

Wagoner J. A., Pande V. S. (2012), Reducing the effect of Metropolization on mixing times in molecular dynamics

simulations. J. Chem., Phys. 137: 214105

SLIDE 24

What to expect?

Pros – enhanced sampling

I High acceptance rates – Hh conserved by symplectic integrators better

than H

I Access to second-order information about the target distribution I Extra parameter for performance enhancing I Faster convergence to the target PDF

Cons – extra computational cost

B Computation of Hh for each proposal (higher orders Hh are more

expensive)

B Extra Metropolis test for momentum update B Accurate numerical integrators required to use low orders Hh for

systems with highly oscillatory H Our strategy To find the numerical integrator that provides the best conservation of modi- fied energy. Search within the family of 2-stage splitting integrating schemes.

SLIDE 25

Modified Hamiltonians for splitting integrators

4-th order modified Hamiltonian for 2-stage splitting integrators were derived in terms of quantities available during simulation by applying the Baker-Campbell-Hausdorff (BCH) formula iteratively Hh(θ, p) = U(θ) + K(p) + h2⇣ αp|M−1 ˙ Uθ(θ) + βUθ(θ)|M−1Uθ(θ) ⌘ ˙ Uθ(θ) - numerical time-derivative of Uθ(θ) α = 6b − 1 24 , β = 6b2 − 6b + 1 12 b - parameter of an integrator

SLIDE 26

Adaptive integrators for optimal conservation of modified energy

Consider one parameter 2-stage splitting integrator of Hamiltonian system with H(θ, p) = U(θ) + 1

2pTM−1p = A + B:

ψh = ϕB

bh ϕA

h 2 ϕB

(1−2b)h ϕA

h 2 ϕB

bh

I Our Adaptive Integration Approach (AIA):

Given step size h and highest frequency fmax find b(h, fmax) (⌘ system specific integrator) that minimizes expectation of the modified energy error ∆ = H[4]

h (Ψh,L(θ, p)) H[4] h (θ, p), i. e.

0  E(∆)  ρ(h, b) ! minimal

SLIDE 27

Adaptive integrators for optimal conservation of modified energy

ρ(h, b) = ⇣ ( 1

2 + h2β)Bh + ( 1 2 + h2α)Ch

⌘⇣ Bh + Ch ⌘ 1 − A2

h

Ah = h4b(1 − 2b) 4 − h2 2 + 1 Bh = −h3(1 − 2b) 4 + h Ch = −h5b2(1 − 2b) 4 + h3b(1 − b) − h

Then: b∗(h, fmax) = arg min

b∈B

max

0<h<¯ h ρ(h, b)

where ¯ h (reduced step size) is a function of fmax

SLIDE 28

AIA in action

˜ h

1 2 3

b

0.19 0.2 0.21 0.22 0.23 0.24 0.25

AIA BCSS∗ VV

VV - velocity Verlet, b = 0.25 BCSS∗ - a fixed parameter integrator derived for MMHMC using ideas of Blanes et. al (2014)

h

0.5 0.6 0.7 0.8 0.9 1

medESS/T

2000 3000 4000 5000 6000 7000 8000

D = 100 h

0.4 0.5 0.6 0.7 0.8

medESS/T

100 200 300 400 500 600 700 800

D = 1000

AIA VV BCSS∗ HMCVV

Efficiency in sampling a multivarate Gaussian distribution

S. Blanes, F. Casas, J.M. Sanz-Serna (2014), Numerical integrators for the Hybrid Monte Carlo method,

SIAM J. Sci. Comput., 36(4), A1556–A1580

˜ h

1 2 3

b

0.2 0.21 0.22 0.23 0.24 0.25

AIA BCSS∗ VV

SLIDE 29

AIA in action

˜ h

1 2 3

b

0.19 0.2 0.21 0.22 0.23 0.24 0.25

AIA BCSS∗ VV

VV - velocity Verlet, b = 0.25 BCSS∗ - a fixed parameter integrator derived for MMHMC using ideas of Blanes et. al (2014)

h

0.5 0.6 0.7 0.8 0.9 1

medESS/T

2000 3000 4000 5000 6000 7000 8000

D = 100 h

0.4 0.5 0.6 0.7 0.8

medESS/T

100 200 300 400 500 600 700 800

D = 1000

AIA VV BCSS∗ HMCVV

Efficiency in sampling a multivarate Gaussian distribution

S. Blanes, F. Casas, J.M. Sanz-Serna (2014), Numerical integrators for the Hybrid Monte Carlo method,

SIAM J. Sci. Comput., 36(4), A1556–A1580

˜ h

1 2 3

b

0.2 0.21 0.22 0.23 0.24 0.25

AIA BCSS∗ VV

SLIDE 30

Benchmark Models

Banana-shaped distribution

θ = (θ1, θ2) yi|θ ∼ N(θ1 + θ2

2, σ2 yi)

θ ∼ N(0, σ2

θ)

Bayesian Logistic Regression (BLR)

yi|x1,i, . . . , xD−1,i ∼ Bernoulli(pi) logit(pi) = θ0 + θ1x1,i + · · · + θD−1xD−1,i

Dataset # of param (D) # of obs (K) german 25 1000 sonar 61 208 musk 167 476 secom 444 1567

Lichman, M. (2013), UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University

f California, School of Information and Computer Science

SLIDE 31

Benchmark Models

Stochastic Volatility (SV)

yt|β, xt ∼ N

0, β2 exp(xt)
xt|φ, σ, xt−1

∼ N

φxt−1, σ2

x1 ∼ N ✓ 0, σ2 1 − φ2 ◆ θ = (β, σ, φ) - model parameters xt, t = 1, . . . , d - latent volatilities

p(x|y, θ) p(θ|y, x)

Simulated data d = 2000 d = 5000 d = 10000

SLIDE 32

Testing

Methods Criteria MMHMC Space exploration HMC Sampling efficiency RMHMC Convergence Metrics

I AR – acceptance rate I ESS* - time normalized effective

sample size

I EF – efficiency factor (relative

ESS* w.r.t HMC)

I ˆ

R – potential scale reduction factor

I Results averaged over 10 runs I Choice of simulation parameters – each method tuned for the best

performance

M. Girolami, B. Calderhead (2011), Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods,

Journal of the Royal Statistical Society: Series B, 73(2):123–214

SP. Brooks, A. Gelman (1998), General methods for monitoring convergence of iterative simu- lations.

Journal of Computational and Graphical Statistics, 7, 434–455

SLIDE 33

Banana distribution – 15 sampling paths

RWMH

HMC

RMHMC MMHMC

SLIDE 34

Banana distribution – 5000 samples

RWMH

HMC

MMHMC RMHMC

SLIDE 35

Bayesian Logistic Regression (BLR)

minESS medESS maxESS

EF

2 4 6 8

musk (D = 167)

minESS medESS maxESS

secom (D = 444)

EF

0.5 1 1.5 2 2.5

german (D = 25)

HMC MMHMC

sonar (D = 61)

AR

0.8 0.9 1

AR

0.8 0.9 1

AR

0.8 0.9 1

AR

0.8 0.9 1

SLIDE 36

Stochastic volatility (SV): d=2000

β σ φ

EF

0.5 1 1.5 2 2.5 3 minESS medESS maxESS

HMC RMHMC MMHMC AR

0.7 0.8 0.9 1

θ x

SLIDE 37

SV: d=5000

β σ φ

EF

0.5 1 1.5 2 2.5 3 minESS medESS maxESS

HMC MMHMC AR

0.7 0.8 0.9 1

θ x

SLIDE 38

SV: d=10000

β σ φ

EF

0.5 1 1.5 2 2.5 3 minESS medESS maxESS

HMC MMHMC AR

0.6 0.7 0.8 0.9 1

θ x

SLIDE 39

SV – Convergence

×105 1 2 1 1.1 1.2 1.3

ˆ Rβ MC iterations×105

1 2

ˆ Rσ

d = 2000 d = 5000 d = 10000

×105 1 2

ˆ Rφ

HMC RMHMC MMHMC

SLIDE 40

Summary

MMHMC vs HMC

I MMHMC demonstrates higher AR, bigger ESS* and faster

convergence MMHMC vs RMHMC

I SV model: MMHMC and RMHMC demonstrate comparable

sampling performance with slight dominance of MMHMC

I BLR model: RMHMC does not improve HMC for considered

dimensions, whereas MMHMC outperforms HMC up to 8× for D ≥ 25.

I In contrast to RMHMC, MMHMC

I does not require matrix inversion (computationally less expensive) I relies on separable Hamiltonians - allows for use of new, more

efficient numerical integrators

I efficient for high dimensions problems

SLIDE 41

Work in progress / Future work

▶ Rational choice of the MMHMC parameters

ü Step sizes: the RAIA method is developed. Testing in progress ü Trajectory lengths: Bayesian adaptive schemes

▶ Applications

ü Bayesian pharmacometrics modeling & simulation ¡

SLIDE 42

Guggenheim, Bilbao BCAM

Modeling

deling & Simulat