Molecular Dynamics Lecture 4 Ben Leimkuhler An overview of SDE - - PowerPoint PPT Presentation

molecular dynamics lecture 4
SMART_READER_LITE
LIVE PREVIEW

Molecular Dynamics Lecture 4 Ben Leimkuhler An overview of SDE - - PowerPoint PPT Presentation

Molecular Dynamics Lecture 4 Ben Leimkuhler An overview of SDE based thermostats Gentle thermostats & NEMD Adaptive thermostats, polymer models and noisy gradients Ensemble preconditioning Stochastic Dynamics out of Equilibrium Research


slide-1
SLIDE 1

Molecular Dynamics Lecture 4

An overview of SDE based thermostats Gentle thermostats & NEMD Adaptive thermostats, polymer models and noisy gradients Ensemble preconditioning

Ben Leimkuhler

Stochastic Dynamics out of Equilibrium Research School, CIRM, Marseille, 2017

slide-2
SLIDE 2

From L., Generalized Bulgak-Kusnezov Thermostats, PRE, 2010

slide-3
SLIDE 3

Finding the “Right” Dynamics for the Job

There are many different stochastic models that can be used in MD, but they can have very different efficiencies for a particular task. Overdamped Langevin Dynamics great for sampling well scaled multivariate Gaussian distribution, lousy for a highly corrugated landscape Nosé-Hoover gentle - good for autocorrelation functions in systems with strong internal mixing properties… not ergodic - for nucleic acid simulations in

implicit solvent

slide-4
SLIDE 4

Thermostats

Gibbs distribution Overdamped Langevin Nosé-Hoover Langevin Dynamics Nosé-Hoover Langevin Generalized Bulgac-Kusnezov Preconditioned Methods Ensemble Quasi-Newton

slide-5
SLIDE 5

dx = [J(x) + S(x)]r log π(x) + r · [J(x) + S(x)] + p 2S(x)dW

A generic sampling dynamics

J(x) antisymmetric S(x) symmetric

Includes, e.g., SDEs like Brownian and Langevin dynamics non-reversible perturbation methods various ensemble sampling schemes Questions: Which approach converges most rapidly? (small IAT) What is the sampling bias under discretization? How to effectively combine with extension?

slide-6
SLIDE 6

Generalized Sampling

Up to know we have assumed the situation of a known distribution with invariant density What if we don’t know U or cannot exactly resolve the force? Multiscale models, e.g. ab initio MD Methods and QM/MM methods (heating due to force mismatch) Nonequilibrium MD (e.g Shear Flows) Applications in Bayesian Inference & Big Data Analytics

slide-7
SLIDE 7

Problems for today

  • 1. How to gently perturb Hamiltonian dynamics

in order to achieve thermal equilibriation.

  • 2. How to handle driven systems efficiently, for

example with momentum constraints for shear flow applications and their use in machine learning.

  • 3. How to accelerate convergence to equilibrium by

use of an ensemble of “particles” (walkers).

slide-8
SLIDE 8

Additivity

The thermostats can be combined in most cases without altering their effectiveness (often improving it).

˙ x = f(x) + g(x)

L †

f ρ = 0

L †

g ρ = 0

⇒ L †

f+gρ = 0

Works for SDEs too…

slide-9
SLIDE 9

Extension

Many schemes make good use of the concept of extension

Z π(x)˜ π(y)dy ∝ π(x)

This looks banal but the key point is that although x and y decouple in the invariant distribution, they may be tightly coupled in the associated SDEs. Example: Langevin dynamics

Z e−βp2/2e−βU(x)dp ∝ e−βU(x)

slide-10
SLIDE 10

Remote control of thermal equilibration

Conservative dynamical system

Preserves Ex: Langevin dynamics

Ornstein-Uhlenbeck SDE

ρβ = e−βpT M −1pe−βU

Ergodic for

  • The two systems are both compatible with
  • Sufficient mixing

The ergodicity of the OU process implies ergodicity

  • f the full system

ρβ

¯ ρ = e−βpT M −1p

slide-11
SLIDE 11

Gentle thermostats & NEMD

slide-12
SLIDE 12

Where I learned about Nosé Dynamics

Sir Henry Peter Francis Swinnerton-Dyer, 16th Baronet KBE FRS Number Theorist, Student of Littlewood, Polya and Sylvester Prizeholder Vice-Chancellor of Cambridge University 1973-83

Seminar, Cambridge University 1997: Nosé Dynamics

slide-13
SLIDE 13

Too fast: balls move to outside,

  • pening valve, releasing steam,

reducing pressure, reducing speed Too slow: balls fall to inside, closing valve, leading to an increase in pressure, increasing speed James Watt’s Engine Nose-Hoover dynamics - a “Gibbs Governor” Preserves

˙ q = p ˙ p = rU(q) ξp ˙ ξ = p2 kT

e−β[p2/2+U(q)] × e−βξ2/2

slide-14
SLIDE 14

Problems with the Gibbs Governor

  • b. It’s not the Gibbs Governor. This is:

Undergraduate research project of Josiah Willard Gibbs

  • a. It doesn’t actually work.
slide-15
SLIDE 15

µ = 2 µ = 1 µ = 1/2 µ = 4 A l l W r

  • n

g !

Nosé-Hoover Dynamics for Harmonic Oscillator

slide-16
SLIDE 16

Nosé-Hoover Chains are not ergodic..

slide-17
SLIDE 17

Stochastic version: Nosé-Hoover-Langevin dynamics ‘Histograms’ matches theoretical behavior

˙ q = p ˙ p = rU(q) ξp ˙ ξ = p2 kT γξ + ση(t)

scalar OU process

slide-18
SLIDE 18

Nosé-Hoover-Langevin

˙ q = M −1p ˙ p = rU ˙ p = −ξp ˙ ξ = pT M −1p − nkBT e−βpT M −1p/2e−βU(q) e−βpT M −1p/2e−βξ2/2 e−βξ2/2

preserves preserves preserves ergodically

dξ = −γξdt + p 2β−1γdWt

dq = M −1pdt dp = rU(q)dt ξpdt dξ = [pT M −1p nkBT]dt γξdt + p 2β−1γdWt

slide-19
SLIDE 19

Ergodicity of NHL

NHL is clearly compatible with an extended Gibbs distribution meaning that

L†

NHL[ρβe−βξ2/2] = 0

We can also prove it is ergodic by using the theory developed for Langevin dynamics and explained in the previous lectures. [L., Noorizadeh, Theil 2009]

slide-20
SLIDE 20

Harmonic system w/o resonance

ωk ̸= ωl, k ̸= l

H = pT M −1p 2 + qT Bq 2

Theorem: NHL is ergodic on

A = M −1B, Aϕk = ωkϕk

Example: clamped harmonic spring chain [Mukamel’s talk]

q, p ∈ Rd

R2d+1

[L., Noorizadeh, Theil 2009]

slide-21
SLIDE 21

Harmonic system w/o resonance

slide-22
SLIDE 22

NH NHL butane molecule

slide-23
SLIDE 23

“Gentle” property of NH/NHL

We can show that NHL is a “gentle” thermostat: dynamical properties are mildly perturbed for a given rate of convergence of kinetic energy. [L., Noorizadeh and Penrose, J. Stat. Phys., 2011]

Similar (but less smooth) for “Stochastic Velocity Rescaling”

  • f G. Bussi, D. Donadio and M. Parinello

0.1 0.004 Langevin NHL VAF Error VAF Error

slide-24
SLIDE 24

Stochastic Velocity Rescaling [Bussi, Donadio, Parinello] dqi = −∂H ∂pi dt kinetic energy uses multiplicative noise

slide-25
SLIDE 25

Even the deterministic method seems to work for sufficiently complicated systems.

Autocorrelation functions (LJ System)

slide-26
SLIDE 26

Temperature gradients and NEMD simulation using Nosé-Hoover

[NONEQUILIBRIUM MOLECULAR DYNAMICS METHODS FOR LATTICE HEAT CONDUCTION CALCULATIONS Junichiro Shiomi, Ann. Rev. Heat Transfer, 2014]

Theoretical Methods for Calculating the Lattice Thermal Conductivity of Minerals, Reviews in Mineralogy & Geochemistry Vol. 71 pp. 253-269, 2010

slide-27
SLIDE 27

β = −0.006 β = 0.01

Buhler ’02 GBK Thermostat

real ‘bath’ of 100 weak vortices a 1 variable bath

Radial density at fixed β A thermostat for the Vortex Discretization

Dubinkina, Frank, L. MMS 2010

slide-28
SLIDE 28

Gentle Momentum Conserving Thermostat for NEMD

In many examples it is crucial to preserve the momentum. This is of particular relevance in fluid dynamics, or in any situations where aggregate inertial effects are likely to play an important role. The problem is that momentum is distorted by Langevin and Nose Hoover dynamics. How to control momentum while gently restoring canonical equilibrium? The simple idea is to use pairwise forces satisfying Newton’s 3rd law that are projected onto the radial axis between particles.

slide-29
SLIDE 29

Dissipative Particle Dynamics

Superficially similar to thermostatted MD: Newton’s equations + perturbation

  • DPD typically relies on “softened” potential energy functions
  • ften ad-hoc but sometimes derived by systematic coarse-

graining of MD MD - Lennard-Jones DPD

  • DPD typically viewed as a coarse-graining technique.
  • Always involves a thermostat.
  • Variants: DPD-e, QDPD,

(George Karniadakis) tDPD,cDPD, aDPF, sDPD, mDPD, eDPD

slide-30
SLIDE 30

Dissipative Particle Dynamics

[1] P. Hoogerbrugge and J. Koelman. Simulating microscopic hydrodynamic phenomena with dissipative particle dynamics. Europhysics Letters, 19(3):155, 1992. [2] P. Espanol and P. Warren. Statistical mechanics of dissipative particle dynamics. Europhysics Letters, 30(4):191, 1995.

Fluctuation-dissipation: Momentum-conserving Langevin dynamics

slide-31
SLIDE 31

Ergodicity: System has a unique smooth distribution which is a universal attractor. For an ergodic system, memory of initial conditions is eventually lost, and any path can be used to calculate averages. Open Question: Is DPD ergodic? Only proof is for a 1D system with high particle density (Shardlow and Yan) More general answer: Probably not! Simulation study of Pastorino et al 2007: Appears to contradict ergodicity in case of soft DPD potentials and reduced interactions.

Ergodicity of Dissipative Particle Dynamics

slide-32
SLIDE 32

DPD alternatives

Lowe-Andersen: Allows control of the Schmidt number = ratio of kinematic viscosity to diffusion constant Momenta are updated according to conservative forces. Subsequently, each pair is (with fixed probability) updated with an added random kick. Peters: All particles perform a random step after conservative updating, with collision coefficient chosen to mimic the Lowe-Andersen collision rate. Nosé-Hoover-Lowe-Andersen: Ad-hoc & does not reproduce the canonical ensemble.

slide-33
SLIDE 33

OBAB (Shardlow) Others: DPD-Trotter = A(B+O)A (Coveney et al) Also Lowe-Anderson, NHLA, …

Integrators for DPD

slide-34
SLIDE 34

Nose-Hoover-Like (+ “gentle noise”)

Pairwise NHL A gentle momentum-conserving thermostat

PNHL L. & Shang, JCP, 2015 kinetic energy control A method or DPD at low friction or for NEMD with momentum conservation.

slide-35
SLIDE 35

Splittings for PNHL

PNHL-S = ABCDODCBA PNHL-N = ABCDODCAB not symmetric! due to lack of symmetry, this method requires TWO force evaluations per timestep, instead of one.

slide-36
SLIDE 36

All the schemes are convergent, giving the same RDF at small stepsize.

slide-37
SLIDE 37

As the stepsize is increased, significant differences appear among the different methods

slide-38
SLIDE 38

500 DPD particles

slide-39
SLIDE 39
  • 1. Configurational Temperature (after Rugh 1989)
  • 2. Velocity Autocorrelation Functions (averaged dynamics)

What to measure?

slide-40
SLIDE 40

PNHL-N exhibits 2nd order convergence, but it is a non-symmetric method! Observed 2nd Order Convergence for PNHL-N

slide-41
SLIDE 41

2nd Order Convergence

Due to cancellations, PNHL-N leads to: Invariant distribution: For large , the aux variable plays no role.

slide-42
SLIDE 42

2nd Order Convergence

Reduce the perturbation equation using the projection operator Allowing to show: Proposition: PNHL-N is 2nd order accurate for all observables of the form

slide-43
SLIDE 43

PNHL approximates DPD dynamics for small friction.

slide-44
SLIDE 44

Stepsize effects on VAF (low friction)

slide-45
SLIDE 45

Efficiency Comparisons

Critical Stepsize: Stepsize to achieve a 10% relative error in config temp

slide-46
SLIDE 46

Adaptive thermostats

slide-47
SLIDE 47

Bayesian Inference

Posterior probability density (from Bayes’ Theorem):

p(q|X) ∝ exp(−U(q)), U(q) = −log p(X|q) − log p(q)

Understand choice of parameters q given observations X Use Maximum Likelihood Estimate/“Subsampling”: ˜ N << N log p(X|q) ≈ N ˜ N

˜ N

X

i=1

log p(xi|q) X = {x1, x2, . . . xN} p(X|q)p(q) = p(q|X) model prior

slide-48
SLIDE 48

D a t a

D a t a

D A T A !

Alan Turing Institute to research big data

Data Scientist: The Sexiest Job

  • f the 21st Century
slide-49
SLIDE 49

Nate Silver’s Nowcast (fivethirtyeight.com) Bayesian Election Forecasting The ultra-nerd October 2016

slide-50
SLIDE 50
slide-51
SLIDE 51

Constraints in Multilevel Inference

Variational Auto-Encoders and other “Deep” Generative Models in machine learning define smooth maps

(random) inputs sampled from prior parameters

[Graham & Storkey, Asymptotically exact inference in deep generative models and differentiable simulators, Arxiv, July 2016] problem: explore input space consistent with

  • bservations of the other outputs, then use this to

sample the parameters.

  • bservations

=> Stochastic, constrained model & geodesic integrator

slide-52
SLIDE 52

1719

  • Th. Bayes

Student Roll, University of Edinburgh

slide-53
SLIDE 53

Stochastic Gradient Langevin Dynamics

To sample a parameter distribution, apply Langevin dynamics with the potential computed from log p(X|q) ≈ N ˜ N

˜ N

X

i=1

log p(xi|q) This suggests a force polluted by a random field Special case (but usually assumed)

˜ F = rU(q) + η(x, t)

η(q, t) ∼ N(0, σ)

σ unknown.

Bayesian Learning via Stochastic Gradient Langevin Dynamics, M. Welling, & Y.-W. Teh, International Conference on Machine Learning, 2010

slide-54
SLIDE 54

Euler-Maruyama

qn+1 = qn + hF(qn) + hσ ˜ Rn + p 2β−1hRn

Like Euler-Maruyama for the modified SDE

dq = F(q)dt + p 2β−1 + σhdW

This is problemmatic - If we don’t know the perturbation we can’t control the distribution. Solution of Welling and Teh: h = hn → 0 Not a super idea! As the sampling proceeds we need to shrink the stepsize, slowing exploration. How to decide the timestep schedule?

slide-55
SLIDE 55

Gradient System

Noise Perturbation Negative Feedback Control

Adaptive Thermostats

Use negative feedback loop control to stabilize the system against force perturbation (even unknown) Jones & L., J. Chemical Physics, 2011

slide-56
SLIDE 56

driving thermostat steady state L † (ρβ(q, p) × ˆ ρ(ξ)) = 0, ˆ ρ(ξ) = exp

  • − βQ(ξ − ξheat)2/2
  • & ergodic for this equilibrium distribution

Nose-Hoover can automatically thermostat a simple stochastic driving perturbation. Jones & L., J. Chem. Phys., 2011

Adaptive Thermostats

slide-57
SLIDE 57

SGNHT

2014 NIPS Dear Bob…. A victim of google!

slide-58
SLIDE 58

Adaptive Langevin Thermostat

noise due to force error additional driving noise boxed term must be taken as a unit ergodicity by Hörmander, etc.

  • L. & Shang, SIAM J. Sci Comp. 2016
slide-59
SLIDE 59

Discretization

σ2

F = σ2∆t

slide-60
SLIDE 60

Numerical Analysis

large thermal mass limit BADODAB Under strong driving

slide-61
SLIDE 61

superconvergence Role of thermal mass

Simulations - clean gradient

slide-62
SLIDE 62

f: logistic function

data e.g. voting intention covariates e.g. age, income, …

posterior parameter distribution

Bayesian Logistic Regression

Gaussian prior

GOOGLE O u r M e t h

  • d

revenge of the nerd

slide-63
SLIDE 63

MNIST Binary Classification 7 or 9

Logistic Regression with 100 parameters Shang, Zhu, Leimkuhler, Storkey, NIPS, 2015 CCAdL (covariance controlled adaptive Langevin)

slide-64
SLIDE 64

Pairwise adaptive thermostats for shear flows

slide-65
SLIDE 65

Blood Flow under Shear

Dissipative Particle Dynamics simulation under shear flow Understanding particle margination in blood flow —- a step toward optimised drug delivery systems, K Mueller, D. Fedosov, and G. Gompper, 2015 Predicting human blood viscosity in silico, D. Fedosov et al, PNAS, 2011

slide-66
SLIDE 66

Shear Flow

Implemented using Lees-Edwards Boundary Conditions

slide-67
SLIDE 67

“Adaptive variant of DPD” PAdL L. & Shang, JCP 2016 Adaptively parameterizes DPD, i.e. selects the right value of so that the fluctuation-dissipation relation is satisfied Idea: apply this for non-equilibrium DPD simulation, viewing non-equilibrium forces as continual perturbation of conservative ones.

Pairwise Adaptive Langevin

slide-68
SLIDE 68

Splitting for PAdL

PAdL-S = ABODOBA Better than alternative orderings….

slide-69
SLIDE 69

PAdL mimics DPD VAFs

slide-70
SLIDE 70

low friction high friction

slide-71
SLIDE 71

NEMD (Shear Flow)

Kremer-Grest Melts: joint with Xiaocheng Shang and Martin Kröger

Lees-Edwards Boundary Conditions Shear rates between 0.01 and 1 (above this bonds break) Possible Application: e.g. Shear Banding simulation M = 20 chains of length N = 30, density = 0.84. box size = 8.939 (unentangled).

slide-72
SLIDE 72

CT Autocorrelation

slide-73
SLIDE 73

PAdL: much faster convergence (to similar accuracy) than Langevin in equilibrium setting (PAdL 5-6 times more efficient!) DPD - Much much slower than these in our tests.

Langevin PAdL

Orientation relaxation: PAdL is independent of friction

slide-74
SLIDE 74

Sheared (Nonequilibrium MD) Simulations

Low Shear Rate High Shear Rate

slide-75
SLIDE 75

Sheared (Nonequilibrium MD) Simulations

slide-76
SLIDE 76

Ensemble preconditioning MCMC simulation

Deep

slide-77
SLIDE 77

figure of merit = Integrated Autocorrelation Time (IAT)

Goal: redesign the dynamics (and integrator) to enhance the rate of convergence for typical observables f

We would like to have as small as possible

slide-78
SLIDE 78

Motivating Example: Eigenvalues: For MCMC schemes like Euler-Maruyama or Leimkuhler-Matthews, stability requires But for Poor Scaling Slow Convergence

slide-79
SLIDE 79

Ensemble Preconditioning

More generally, we wish to sample problems with complicated energy functions, where each basin or local approximation may be very poorly scaled. Related Concepts BFGS Method Stochastic Newton schemes MC Hammer (Goodman and Weare) Compare to: Riemannian Manifold HMC (Girolami et al)

slide-80
SLIDE 80

Use local information to estimate inverse Hessian matrix; precondition (rescale) dynamics to enhance convergence (reduce IAT)

Ensemble Preconditioning

Wishlist:

  • Increase efficiency by reducing the IAT
  • Compute the preconditioning based on a local ensemble

approximation

  • Allow for inertial effects (underdamped Langevin/HMC)

Idea: Use a collection of “walkers” to generate local covariance information and use this to estimate the inverse Hessian adaptively

slide-81
SLIDE 81

Procedure

Use an ensemble of L walkers:

each walker samples the same target distribution

We construct an dynamics in the extended space and compute ensemble averages by marginalisation

  • ver the individual walkers.
slide-82
SLIDE 82

collects covariance info

  • f nearby

walkers blend with identity (robustness) Basing Bi on other walkers only eliminates the problems of multiplicative noise

Discretization of SDEs: similar to BAOAB

slide-83
SLIDE 83

Gaussian Mixture Model: Hidalgo Stamps

  • (moderately) poorly scaled basins
  • multimodal due to “label switching” symmetry

“Adventures in Stamp Collecting”

slide-84
SLIDE 84

But the eigensystems are different in different basins, so the localized covariance is needed…

slide-85
SLIDE 85

Integrated Autocorrelation Times of Different Schemes

Gaussian Mixture Model: Hidalgo Stamps

slide-86
SLIDE 86
slide-87
SLIDE 87

Log Gaussian Cox Model

Ensemble Quasi-Newton python package http://bitbucket.org/c_matthews/ensembleqn