Monte Carlo in the Physical and Biological Sciences: Some Problems - - PowerPoint PPT Presentation

monte carlo in the physical and biological sciences some
SMART_READER_LITE
LIVE PREVIEW

Monte Carlo in the Physical and Biological Sciences: Some Problems - - PowerPoint PPT Presentation

Monte Carlo in the Physical and Biological Sciences: Some Problems of Interest and Algorithms Paul Dupuis Division of Applied Mathematics Brown University ICERM 26 October 2012 Paul Dupuis (Brown University) 26 October 2012 Some problems of


slide-1
SLIDE 1

Monte Carlo in the Physical and Biological Sciences: Some Problems of Interest and Algorithms

Paul Dupuis

Division of Applied Mathematics Brown University ICERM

26 October 2012

Paul Dupuis (Brown University) 26 October 2012

slide-2
SLIDE 2

Some problems of interest

1

The ergodic problem—computing expected values with respect to a stationary distribution (usually a Gibbs measure).

2

Transition rate problems—computing the probability of transitions over [0, T], exit locations, and mean exit times with respect to metastable states [Jonathan Weare tutorial, Tuesday morning].

Paul Dupuis (Brown University) 26 October 2012

slide-3
SLIDE 3

Examples

A first example. Compute the average potential energy and other functionals with respect to a Gibbs measure of the form π(dx) = e−V (x)/τdx

  • Z(τ),

and V is the potential of a (relatively) complex physical system.

Paul Dupuis (Brown University) 26 October 2012

slide-4
SLIDE 4

Examples

A first example. Compute the average potential energy and other functionals with respect to a Gibbs measure of the form π(dx) = e−V (x)/τdx

  • Z(τ),

and V is the potential of a (relatively) complex physical system. Primary interest is as the marginal (independent) distribution on spatial variables of stationary distribution of a Hamiltonian system. E.g., ¯ π(dx, dp) ∝ e− 1

τ V (x)− 1 τ

n

j=1 p2 j 2m dxdp. Paul Dupuis (Brown University) 26 October 2012

slide-5
SLIDE 5

Examples

A first example. Compute the average potential energy and other functionals with respect to a Gibbs measure of the form π(dx) = e−V (x)/τdx

  • Z(τ),

and V is the potential of a (relatively) complex physical system. Primary interest is as the marginal (independent) distribution on spatial variables of stationary distribution of a Hamiltonian system. E.g., ¯ π(dx, dp) ∝ e− 1

τ V (x)− 1 τ

n

j=1 p2 j 2m dxdp.

In this case the dynamical model is deterministic, and randomness only enters through initial condition ˙ xj(t) = 1 mpj(t), ˙ pj(t) = − ∂ ∂xj V (x(t)).

Paul Dupuis (Brown University) 26 October 2012

slide-6
SLIDE 6

Examples

One can use that π(dx) is the stationary distribution of the solution to dX = −∇V (X)dt + √ 2τdW , as well as a variety of related discrete time models (will mention specific examples later).

Paul Dupuis (Brown University) 26 October 2012

slide-7
SLIDE 7

Examples

One can use that π(dx) is the stationary distribution of the solution to dX = −∇V (X)dt + √ 2τdW , as well as a variety of related discrete time models (will mention specific examples later). The function V (x) is defined on a large space, and includes, e.g., various inter-molecular potentials. In general, it may have a very complicated surface, with many deep and shallow local minima. A representative quantity of interest is

  • V (x) e−V (x)/τdx
  • Z(τ).

Paul Dupuis (Brown University) 26 October 2012

slide-8
SLIDE 8

Examples

An example of a potential energy surface is the Lennard-Jones cluster of 38 atoms. This potential has ≈ 1014 local minima.

Paul Dupuis (Brown University) 26 October 2012

slide-9
SLIDE 9

Examples

An example of a potential energy surface is the Lennard-Jones cluster of 38 atoms. This potential has ≈ 1014 local minima. The lowest 150 and their “connectivity” graph are as in the figure (taken from Doyle, Miller & Wales, JCP, 1999).

Paul Dupuis (Brown University) 26 October 2012

slide-10
SLIDE 10

Examples

A second example. Other examples may have discrete (but very large!) domains of integration. A standard example is the Ising model (model for ferromagnetism).

Paul Dupuis (Brown University) 26 October 2012

slide-11
SLIDE 11

Examples

A second example. Other examples may have discrete (but very large!) domains of integration. A standard example is the Ising model (model for ferromagnetism). Given a lattice (e.g., Λ = {1, . . . , N} × {1, . . . , N}) and a neighborhood for each site i ∈ Λ (e.g., N(i) = {j : j − i1 = 1}), define a measure on the collection of spins {+1, −1}Λ by e−βJ(x)/Z(β), where xi ∈ {+1, −1} and J(x) = −

  • i∈Λ
  • j∈N(i)

Ji,jxixj −

  • i∈Λ

hixi.

Paul Dupuis (Brown University) 26 October 2012

slide-12
SLIDE 12

Examples

A second example. Other examples may have discrete (but very large!) domains of integration. A standard example is the Ising model (model for ferromagnetism). Given a lattice (e.g., Λ = {1, . . . , N} × {1, . . . , N}) and a neighborhood for each site i ∈ Λ (e.g., N(i) = {j : j − i1 = 1}), define a measure on the collection of spins {+1, −1}Λ by e−βJ(x)/Z(β), where xi ∈ {+1, −1} and J(x) = −

  • i∈Λ
  • j∈N(i)

Ji,jxixj −

  • i∈Λ

hixi. The analogue of the diffusion in the continuous state case are the Glauber dynamics (continuous time, finite state, jump Markov process).

Paul Dupuis (Brown University) 26 October 2012

slide-13
SLIDE 13

Monte Carlo approximation

We focus on the continuous state problem. If X(t) is any discrete or continuous time Markov process with invariant measure µ, then under ergodicity µT (A) = 1 T T δX (t)(A)dt

  • r = 1

T

T −1

  • i=0

δX (i)(A)dt

  • satisfies µT ⇒ µ, and hence provides an numerical approximation. In

practice one uses the discrete time process, and omits a “burn-in” period.

Paul Dupuis (Brown University) 26 October 2012

slide-14
SLIDE 14

Monte Carlo approximation

Examples of discrete time processes. A simple example is the classical Metropolis algorithm for π(dx) = e−V (x)/τdx

  • Z(τ). It is based on two

ideas: detailed balance and rejection.

Paul Dupuis (Brown University) 26 October 2012

slide-15
SLIDE 15

Monte Carlo approximation

Examples of discrete time processes. A simple example is the classical Metropolis algorithm for π(dx) = e−V (x)/τdx

  • Z(τ). It is based on two

ideas: detailed balance and rejection. For transition kernels with a density p(x, dy) = p(y|x)dy, detailed balance with respect to e−V (x)/τdx means that e−V (x)/τp(y|x) = e−V (y)/τp(x|y), and it automatically gives that e−V (x)/τ/Z(τ) is invariant with respect to p(x, dy) (integrate both sides w.r.t. x).

Paul Dupuis (Brown University) 26 October 2012

slide-16
SLIDE 16

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

Paul Dupuis (Brown University) 26 October 2012

slide-17
SLIDE 17

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

Paul Dupuis (Brown University) 26 October 2012

slide-18
SLIDE 18

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

2

Compute the acceptance probability A = min

  • 1, e−V (y)/τ/e−V (x)/τ

= min

  • 1, e−[V (y)−V (x)]/τ

.

Paul Dupuis (Brown University) 26 October 2012

slide-19
SLIDE 19

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

2

Compute the acceptance probability A = min

  • 1, e−V (y)/τ/e−V (x)/τ

= min

  • 1, e−[V (y)−V (x)]/τ

.

3

Generate U ∼ Unif[0, 1], and accept the move if A ≥ U.

Paul Dupuis (Brown University) 26 October 2012

slide-20
SLIDE 20

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

2

Compute the acceptance probability A = min

  • 1, e−V (y)/τ/e−V (x)/τ

= min

  • 1, e−[V (y)−V (x)]/τ

.

3

Generate U ∼ Unif[0, 1], and accept the move if A ≥ U.

Paul Dupuis (Brown University) 26 October 2012

slide-21
SLIDE 21

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

2

Compute the acceptance probability A = min

  • 1, e−V (y)/τ/e−V (x)/τ

= min

  • 1, e−[V (y)−V (x)]/τ

.

3

Generate U ∼ Unif[0, 1], and accept the move if A ≥ U. Then e−V (x)/τ min

  • 1, e−[V (y)−V (x)]/τ

= min

  • e−V (x)/τ, e−V (y)/τ

plus symmetry of q guarantees that e−V (x)/τ/Z(τ) is invariant, but gives little information on rate of convergence.

Paul Dupuis (Brown University) 26 October 2012

slide-22
SLIDE 22

Monte Carlo approximation

Standard Metropolis. Given the current location X(i) = x.

1

Choose a potential new location y according to a symmetric density q(y|x)dy = g(y − x)dy (often uniform on the support of g).

2

Compute the acceptance probability A = min

  • 1, e−V (y)/τ/e−V (x)/τ

= min

  • 1, e−[V (y)−V (x)]/τ

.

3

Generate U ∼ Unif[0, 1], and accept the move if A ≥ U. Then e−V (x)/τ min

  • 1, e−[V (y)−V (x)]/τ

= min

  • e−V (x)/τ, e−V (y)/τ

plus symmetry of q guarantees that e−V (x)/τ/Z(τ) is invariant, but gives little information on rate of convergence. Exponential weighting can make it very hard to move uphill, i.e., to y with V (y) > V (x).

Paul Dupuis (Brown University) 26 October 2012

slide-23
SLIDE 23

Monte Carlo approximation

Many variations.

Paul Dupuis (Brown University) 26 October 2012

slide-24
SLIDE 24

Monte Carlo approximation

Many variations.

1

Non-symmetric q(y|x) [Hastings], then include q in acceptance rule A = min

  • 1, q(y|x)e−V (y)/τ/q(x|y)e−V (x)/τ

.

Paul Dupuis (Brown University) 26 October 2012

slide-25
SLIDE 25

Monte Carlo approximation

Many variations.

1

Non-symmetric q(y|x) [Hastings], then include q in acceptance rule A = min

  • 1, q(y|x)e−V (y)/τ/q(x|y)e−V (x)/τ

.

2

Consider several transition kernels with same invariant distribution, and alternate between in various ways (i.e., instead of updating position on N particles at once, do one at a time).

Paul Dupuis (Brown University) 26 October 2012

slide-26
SLIDE 26

Monte Carlo approximation

Many variations.

1

Non-symmetric q(y|x) [Hastings], then include q in acceptance rule A = min

  • 1, q(y|x)e−V (y)/τ/q(x|y)e−V (x)/τ

.

2

Consider several transition kernels with same invariant distribution, and alternate between in various ways (i.e., instead of updating position on N particles at once, do one at a time).

3

Increase acceptance rate by using form of dynamics to suggest good choices of q (e.g., “smart MC”).

Paul Dupuis (Brown University) 26 October 2012

slide-27
SLIDE 27

Monte Carlo approximation

While such refinements improve performance, the “rare event sampling problem,” i.e., efficient movement between deep local minima, limits the range of problems to which it is applied.

Paul Dupuis (Brown University) 26 October 2012

slide-28
SLIDE 28

Monte Carlo approximation

While such refinements improve performance, the “rare event sampling problem,” i.e., efficient movement between deep local minima, limits the range of problems to which it is applied. Rates of convergence. With an eye towards algorithm design, one needs a measure of “rate of convergence.” Interestingly, none of the measures used for many years deal directly with the empirical measure.

Paul Dupuis (Brown University) 26 October 2012

slide-29
SLIDE 29

Monte Carlo approximation

Example: the subdominant eigenvalue.

Paul Dupuis (Brown University) 26 October 2012

slide-30
SLIDE 30

Monte Carlo approximation

Example: the subdominant eigenvalue. Let p(x, dy) be the transition kernel: with A(y|x) the acceptance rate

  • B

f (y)p(x, dy) =

  • B

[f (y)A(y|x)q(y|x) − [1 − A(y|x)]f (x)] dy. Then p(x, ·) has a single eigenvalue of modulus 1 (associated with e−V (x)/τ) and the magnitude |λ1| of the next largest used as a rate.

Paul Dupuis (Brown University) 26 October 2012

slide-31
SLIDE 31

Monte Carlo approximation

Example: the subdominant eigenvalue. Let p(x, dy) be the transition kernel: with A(y|x) the acceptance rate

  • B

f (y)p(x, dy) =

  • B

[f (y)A(y|x)q(y|x) − [1 − A(y|x)]f (x)] dy. Then p(x, ·) has a single eigenvalue of modulus 1 (associated with e−V (x)/τ) and the magnitude |λ1| of the next largest used as a rate. Provides information on convergence of n-step transition kernel, but not empirical measure.

Paul Dupuis (Brown University) 26 October 2012

slide-32
SLIDE 32

Monte Carlo approximation

Example: the subdominant eigenvalue. Let p(x, dy) be the transition kernel: with A(y|x) the acceptance rate

  • B

f (y)p(x, dy) =

  • B

[f (y)A(y|x)q(y|x) − [1 − A(y|x)]f (x)] dy. Then p(x, ·) has a single eigenvalue of modulus 1 (associated with e−V (x)/τ) and the magnitude |λ1| of the next largest used as a rate. Provides information on convergence of n-step transition kernel, but not empirical measure. E.g., almost periodic chain

  • ε

1 − ε 1 − ε ε

  • converges very slowly via this measure, but empirical measure converges

rapidly to (1/2, 1/2).

Paul Dupuis (Brown University) 26 October 2012

slide-33
SLIDE 33

An accelerated algorithm: parallel tempering

A new idea: “parallel tempering” (also called “replica exchange”, due to Geyer, Swendsen and Wang). Idea of parallel tempering, two temperatures.

Paul Dupuis (Brown University) 26 October 2012

slide-34
SLIDE 34

An accelerated algorithm: parallel tempering

A new idea: “parallel tempering” (also called “replica exchange”, due to Geyer, Swendsen and Wang). Idea of parallel tempering, two temperatures. Besides τ 1 = τ, introduce higher temperature τ 2 > τ 1.

Paul Dupuis (Brown University) 26 October 2012

slide-35
SLIDE 35

An accelerated algorithm: parallel tempering

A new idea: “parallel tempering” (also called “replica exchange”, due to Geyer, Swendsen and Wang). Idea of parallel tempering, two temperatures. Besides τ 1 = τ, introduce higher temperature τ 2 > τ 1. Thus dX1 = −∇V (X1)dt + √ 2τ 1dW1 dX2 = −∇V (X2)dt + √ 2τ 2dW2, with W1 and W2 independent. Then one obtains a Monte Carlo approximation to π(x1, x2) = e− V (x1)

τ1 e− V (x2) τ2

  • Z(τ 1)Z(τ 2).

Paul Dupuis (Brown University) 26 October 2012

slide-36
SLIDE 36

An accelerated algorithm: parallel tempering

Now introduce swaps, i.e., X1 and X2 exchange locations with state dependent intensity ag(x1, x2) = a

  • 1 ∧ π(x2, x1)

π(x1, x2)

  • = a
  • 1 ∧ e−

V (x1)

τ1 + V (x2) τ2

  • +

V (x2)

τ1 + V (x1) τ2

  • ,

with a > 0 the “swap rate.”

Paul Dupuis (Brown University) 26 October 2012

slide-37
SLIDE 37

An accelerated algorithm: parallel tempering

Now have a Markov jump-diffusion. Easy to check: with this swapping intensity still have detailed balance, and thus π(x1, x2) = e− V (x1)

τ1 e− V (x2) τ2

  • Z(τ 1)Z(τ 2).

Paul Dupuis (Brown University) 26 October 2012

slide-38
SLIDE 38

An accelerated algorithm: parallel tempering

Now have a Markov jump-diffusion. Easy to check: with this swapping intensity still have detailed balance, and thus π(x1, x2) = e− V (x1)

τ1 e− V (x2) τ2

  • Z(τ 1)Z(τ 2).

Increased temperature ∼ higher diffusivity of X a

2

∼ easier communication for X a

2

∼ passed to X a

1 via swaps

Paul Dupuis (Brown University) 26 October 2012

slide-39
SLIDE 39

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling].

Paul Dupuis (Brown University) 26 October 2012

slide-40
SLIDE 40

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling]. Most widely used computational tool, available in CHARMM, etc.

Paul Dupuis (Brown University) 26 October 2012

slide-41
SLIDE 41

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling]. Most widely used computational tool, available in CHARMM, etc. Natural questions: how many temperatures, where to place them, how to cycle through all pairs.

Paul Dupuis (Brown University) 26 October 2012

slide-42
SLIDE 42

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling]. Most widely used computational tool, available in CHARMM, etc. Natural questions: how many temperatures, where to place them, how to cycle through all pairs. One can use parameters other than temperature but with same idea.

Paul Dupuis (Brown University) 26 October 2012

slide-43
SLIDE 43

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling]. Most widely used computational tool, available in CHARMM, etc. Natural questions: how many temperatures, where to place them, how to cycle through all pairs. One can use parameters other than temperature but with same idea. Implementation uses one of the discrete time versions, and conventional wisdom is that swap rate should be chosen so that

  • ne swap attempt ←

→ six steps discrete dynamics. Users cautioned against attempting swaps too often.

Paul Dupuis (Brown University) 26 October 2012

slide-44
SLIDE 44

An accelerated algorithm: parallel tempering

Real problems use many (20-60) temperatures, and usually attempt swaps only between particles with adjacent temperatures [needed so there are enough successful swaps, again problem due to exponential scaling]. Most widely used computational tool, available in CHARMM, etc. Natural questions: how many temperatures, where to place them, how to cycle through all pairs. One can use parameters other than temperature but with same idea. Implementation uses one of the discrete time versions, and conventional wisdom is that swap rate should be chosen so that

  • ne swap attempt ←

→ six steps discrete dynamics. Users cautioned against attempting swaps too often. Related precursors: simulated tempering, which follows a single particle, and jump-walking.

Paul Dupuis (Brown University) 26 October 2012

slide-45
SLIDE 45

A little large deviations analysis

The Donsker-Varadhan theory (see also Gärtner). Consider dX = b(X)dt + σ(X)dW , X(0) = x0 and for large T µT (dx) = 1 T T δX (t)(dx)dt.

Paul Dupuis (Brown University) 26 October 2012

slide-46
SLIDE 46

A little large deviations analysis

The Donsker-Varadhan theory (see also Gärtner). Consider dX = b(X)dt + σ(X)dW , X(0) = x0 and for large T µT (dx) = 1 T T δX (t)(dx)dt. Then considered as taking values in P(Rd), P

  • µT ≈ ν
  • ≈ e−TI (ν).

Here I(ν) ≥ 0 measures deviations from the LLN limit (ergodic theorem) π, where π is unique invariant probability for X.

Paul Dupuis (Brown University) 26 October 2012

slide-47
SLIDE 47

A little large deviations analysis

The Donsker-Varadhan theory (see also Gärtner). Consider dX = b(X)dt + σ(X)dW , X(0) = x0 and for large T µT (dx) = 1 T T δX (t)(dx)dt. Then considered as taking values in P(Rd), P

  • µT ≈ ν
  • ≈ e−TI (ν).

Here I(ν) ≥ 0 measures deviations from the LLN limit (ergodic theorem) π, where π is unique invariant probability for X. For diffusions satisfying a detailed balance, I takes an explicit form.

Paul Dupuis (Brown University) 26 October 2012

slide-48
SLIDE 48

A little large deviations analysis

What does the Donsker-Varadhan theory say about parallel tempering?

Paul Dupuis (Brown University) 26 October 2012

slide-49
SLIDE 49

A little large deviations analysis

What does the Donsker-Varadhan theory say about parallel tempering? Suppose ν given such that θ(x1, x2) = dν dπ(x1, x2) is smooth.

Paul Dupuis (Brown University) 26 October 2012

slide-50
SLIDE 50

A little large deviations analysis

What does the Donsker-Varadhan theory say about parallel tempering? Suppose ν given such that θ(x1, x2) = dν dπ(x1, x2) is smooth. Then we have monotonic form I a(ν) = J0(ν) + aJ1(ν) where J0 is the rate for “no swap” dynamics and

Paul Dupuis (Brown University) 26 October 2012

slide-51
SLIDE 51

A little large deviations analysis

What does the Donsker-Varadhan theory say about parallel tempering? Suppose ν given such that θ(x1, x2) = dν dπ(x1, x2) is smooth. Then we have monotonic form I a(ν) = J0(ν) + aJ1(ν) where J0 is the rate for “no swap” dynamics and J1(ν) =

  • Rd ×Rd g(x1, x2)
  • θ(x2, x1)

θ(x1, x2)

  • ν(dx1dx2)

with (z) = z log z − z + 1 = 0 iff z = 1.

Paul Dupuis (Brown University) 26 October 2012

slide-52
SLIDE 52

A little large deviations analysis

Rate for low temperature marginal. By contraction principle, for probability measure γ I a

1 (γ) = inf {I a(ν) : first marginal of ν is γ} .

Paul Dupuis (Brown University) 26 October 2012

slide-53
SLIDE 53

A little large deviations analysis

Rate for low temperature marginal. By contraction principle, for probability measure γ I a

1 (γ) = inf {I a(ν) : first marginal of ν is γ} .

If γ(dx1) = π1(dx1) = e− V (x1)

τ1 dx1

  • Z(τ 1), then for a ∈ (0, ∞)

I a

1 (γ) > I 0 1 (γ)

and I a

1 (γ) ↑ some finite limit.

Paul Dupuis (Brown University) 26 October 2012

slide-54
SLIDE 54

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except

Paul Dupuis (Brown University) 26 October 2012

slide-55
SLIDE 55

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except if a is large but finite almost all computational effort is directed at swap attempts, rather than diffusion dynamics,

Paul Dupuis (Brown University) 26 October 2012

slide-56
SLIDE 56

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except if a is large but finite almost all computational effort is directed at swap attempts, rather than diffusion dynamics, if a → ∞ then limit process not well defined (no tightness).

Paul Dupuis (Brown University) 26 October 2012

slide-57
SLIDE 57

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except if a is large but finite almost all computational effort is directed at swap attempts, rather than diffusion dynamics, if a → ∞ then limit process not well defined (no tightness).

Paul Dupuis (Brown University) 26 October 2012

slide-58
SLIDE 58

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except if a is large but finite almost all computational effort is directed at swap attempts, rather than diffusion dynamics, if a → ∞ then limit process not well defined (no tightness). An alternative perspective: rather than swap particles, swap temperatures, and use “weighted” empirical measure.

Paul Dupuis (Brown University) 26 October 2012

slide-59
SLIDE 59

Infinite swapping limit

This suggests one consider the infinite swapping limit a → ∞, except if a is large but finite almost all computational effort is directed at swap attempts, rather than diffusion dynamics, if a → ∞ then limit process not well defined (no tightness). An alternative perspective: rather than swap particles, swap temperatures, and use “weighted” empirical measure. Particle swapping. Process: (X a

1 , X a 2 ) ,

Approximation to π(dx): 1 T T δ(X a

1 ,X a 2 )(dx)dt Paul Dupuis (Brown University) 26 October 2012

slide-60
SLIDE 60

Infinite swapping limit

Temperature swapping.

Paul Dupuis (Brown University) 26 October 2012

slide-61
SLIDE 61

Infinite swapping limit

Temperature swapping. Process: dY a

1 = −∇V (Y a 1 )dt +

  • 2r1(Z a)dW1

dY a

2 = −∇V (Y a 2 )dt +

  • 2r2(Z a)dW2,

where r(Z a(t)) jumps between τ 1 and τ 2 with intensity ag(Y a

1 (t), Y a 2 (t)).

Paul Dupuis (Brown University) 26 October 2012

slide-62
SLIDE 62

Infinite swapping limit

Temperature swapping. Process: dY a

1 = −∇V (Y a 1 )dt +

  • 2r1(Z a)dW1

dY a

2 = −∇V (Y a 2 )dt +

  • 2r2(Z a)dW2,

where r(Z a(t)) jumps between τ 1 and τ 2 with intensity ag(Y a

1 (t), Y a 2 (t)).

Approximation to π(dx): 1 T T

  • 1{0}(Z a)δ(Y a

1 ,Y a 2 )(dx) + 1{1}(Z a)δ(Y a 2 ,Y a 1 )(dx)

  • dt.

Paul Dupuis (Brown University) 26 October 2012

slide-63
SLIDE 63

Infinite swapping limit

The advantage is a well defined weak limit as a → ∞:

Paul Dupuis (Brown University) 26 October 2012

slide-64
SLIDE 64

Infinite swapping limit

The advantage is a well defined weak limit as a → ∞: dY1 = −∇V (Y1)dt +

  • 2τ 1ρ1(Y1, Y2) + 2τ 2ρ2(Y1, Y2)dW1

dY2 = −∇V (Y2)dt +

  • 2τ 2ρ1(Y1, Y2) + 2τ 1ρ2(Y1, Y2)dW2,

ηT (dx) = T

  • ρ1(Y1, Y2)δ(Y1,Y2) + ρ2(Y1, Y2)δ(Y2,Y1)
  • ds,

and ρ1(x1, x2) = e−

V (x1)

τ1 + V (x2) τ2

  • Zρ(x1, x2)

, ρ2(x1, x2) = e−

V (x2)

τ1 + V (x1) τ2

  • Zρ(x1, x2)

.

Paul Dupuis (Brown University) 26 October 2012

slide-65
SLIDE 65

Infinite swapping limit

The advantage is a well defined weak limit as a → ∞: dY1 = −∇V (Y1)dt +

  • 2τ 1ρ1(Y1, Y2) + 2τ 2ρ2(Y1, Y2)dW1

dY2 = −∇V (Y2)dt +

  • 2τ 2ρ1(Y1, Y2) + 2τ 1ρ2(Y1, Y2)dW2,

ηT (dx) = T

  • ρ1(Y1, Y2)δ(Y1,Y2) + ρ2(Y1, Y2)δ(Y2,Y1)
  • ds,

and ρ1(x1, x2) = e−

V (x1)

τ1 + V (x2) τ2

  • Zρ(x1, x2)

, ρ2(x1, x2) = e−

V (x2)

τ1 + V (x1) τ2

  • Zρ(x1, x2)

. Theorem:

  • ηT

satisfies the large deviation principle with rate I ∞.

Paul Dupuis (Brown University) 26 October 2012

slide-66
SLIDE 66

Implementation issues and partial infinite swapping

As noted applications of parallel tempering use many temperatures (e.g., K = 30 to 50) when V is complicated to overcome barriers of all different heights.

Paul Dupuis (Brown University) 26 October 2012

slide-67
SLIDE 67

Implementation issues and partial infinite swapping

As noted applications of parallel tempering use many temperatures (e.g., K = 30 to 50) when V is complicated to overcome barriers of all different heights. Straightforward extension of infinite swapping to K temperatures τ 1 < τ 2 < · · · < τ K . Benefits of symmetrization/equilibration even greater, e.g. I ∞(ν) = ∞ means very fast equilibration of relative contributions from all permutations of (x1, . . . , xK ) to joint distribution, and in particular even larger rate for lowest marginal.

Paul Dupuis (Brown University) 26 October 2012

slide-68
SLIDE 68

Implementation issues and partial infinite swapping

As noted applications of parallel tempering use many temperatures (e.g., K = 30 to 50) when V is complicated to overcome barriers of all different heights. Straightforward extension of infinite swapping to K temperatures τ 1 < τ 2 < · · · < τ K . Benefits of symmetrization/equilibration even greater, e.g. I ∞(ν) = ∞ means very fast equilibration of relative contributions from all permutations of (x1, . . . , xK ) to joint distribution, and in particular even larger rate for lowest marginal. But, coefficients become complex, e.g., K! weights, and each involves many calculations. Not practical if K ≥ 7.

Paul Dupuis (Brown University) 26 October 2012

slide-69
SLIDE 69

Implementation issues and partial infinite swapping

As noted applications of parallel tempering use many temperatures (e.g., K = 30 to 50) when V is complicated to overcome barriers of all different heights. Straightforward extension of infinite swapping to K temperatures τ 1 < τ 2 < · · · < τ K . Benefits of symmetrization/equilibration even greater, e.g. I ∞(ν) = ∞ means very fast equilibration of relative contributions from all permutations of (x1, . . . , xK ) to joint distribution, and in particular even larger rate for lowest marginal. But, coefficients become complex, e.g., K! weights, and each involves many calculations. Not practical if K ≥ 7. Need for computational feasibility leads to partial infinite swapping.

Paul Dupuis (Brown University) 26 October 2012

slide-70
SLIDE 70

Implementation issues and partial infinite swapping

Full infinite swapping has the property that if (Y1(t), . . . , YK (t)) = (y1, . . . , yK ), then contributions from all permutations of (y1, . . . , yK ) are added to ηT (dx) according to their relative weights under π(dx).

Paul Dupuis (Brown University) 26 October 2012

slide-71
SLIDE 71

Implementation issues and partial infinite swapping

Full infinite swapping has the property that if (Y1(t), . . . , YK (t)) = (y1, . . . , yK ), then contributions from all permutations of (y1, . . . , yK ) are added to ηT (dx) according to their relative weights under π(dx). We say that the full set of permutations are instantaneously equilibrated.

Paul Dupuis (Brown University) 26 October 2012

slide-72
SLIDE 72

Implementation issues and partial infinite swapping

Full infinite swapping has the property that if (Y1(t), . . . , YK (t)) = (y1, . . . , yK ), then contributions from all permutations of (y1, . . . , yK ) are added to ηT (dx) according to their relative weights under π(dx). We say that the full set of permutations are instantaneously equilibrated. Partial infinite swapping. Given any subgroup of set of permutations one can construct a corresponding partial infinite swapping dynamic.

Paul Dupuis (Brown University) 26 October 2012

slide-73
SLIDE 73

Implementation issues and partial infinite swapping

Full infinite swapping has the property that if (Y1(t), . . . , YK (t)) = (y1, . . . , yK ), then contributions from all permutations of (y1, . . . , yK ) are added to ηT (dx) according to their relative weights under π(dx). We say that the full set of permutations are instantaneously equilibrated. Partial infinite swapping. Given any subgroup of set of permutations one can construct a corresponding partial infinite swapping dynamic. Two examples are Dynamics A and B in figure:

Paul Dupuis (Brown University) 26 October 2012

slide-74
SLIDE 74

Implementation issues and partial infinite swapping

Using partial infinite swapping one can control the complexity of the coefficients and algorithm.

Paul Dupuis (Brown University) 26 October 2012

slide-75
SLIDE 75

Implementation issues and partial infinite swapping

Using partial infinite swapping one can control the complexity of the coefficients and algorithm. If one alternates between subgroups that generate full group of permutations, one approximates full infinite swapping (convergence theorem in continuous time).

Paul Dupuis (Brown University) 26 October 2012

slide-76
SLIDE 76

Implementation issues and partial infinite swapping

Using partial infinite swapping one can control the complexity of the coefficients and algorithm. If one alternates between subgroups that generate full group of permutations, one approximates full infinite swapping (convergence theorem in continuous time). However, particles lose their physical identity in infinite swapping limit (partial or otherwise). Cannot simply alternate—need a proper “handoff” rule.

Paul Dupuis (Brown University) 26 October 2012

slide-77
SLIDE 77

Implementation issues and partial infinite swapping

Using partial infinite swapping one can control the complexity of the coefficients and algorithm. If one alternates between subgroups that generate full group of permutations, one approximates full infinite swapping (convergence theorem in continuous time). However, particles lose their physical identity in infinite swapping limit (partial or otherwise). Cannot simply alternate—need a proper “handoff” rule. Can identify the “distributionally correct” handoff rule, using that partial swappings are limits of “physically meaningful” processes. E.g., in a block of 4 locations xi associated with 4 temperatures τ i, select a permutation σ according to

e

  • V (xσ(1))

τ1

+

V (xσ(2)) τ2

+

V (xσ(3)) τ3

+

V (xσ(4)) τ4

¯ σ

e

  • V (x ¯

σ(1)) τ1

+

V (x ¯ σ(2)) τ2

+

V (x ¯ σ(3)) τ3

+

V (x ¯ σ(4)) τ4

  • ,

and assign τ i to xσ(i).

Paul Dupuis (Brown University) 26 October 2012

slide-78
SLIDE 78

Numerical examples

Relaxation study of convergence to equilibrium for LJ-38. quantity of interest: average potential energy at various temperatures used 45 temperatures, 3—6—6—· · · —6 type dynamic for partial infinite swapping used Smart Monte Carlo for particle dynamics lowest 1/3 of temperatures raised to push process away from equilibrium (low temperature components pushed away from deep minima) then reduced to correct temperatures for 600 discrete time steps to study return to equilibria repeated 2000 times, we plot averages for lowest (and hardest) temperature

Paul Dupuis (Brown University) 26 October 2012

slide-79
SLIDE 79

Numerical examples

Relaxation study of convergence to equilibrium for LJ-38: parallel tempering versus partial infinite swapping, only lowest temperature illustrated.

Paul Dupuis (Brown University) 26 October 2012

slide-80
SLIDE 80

Numerical examples

For this system, reduction relative to best parallel tempering: 1010 reduced to 106 steps with additional overhead of approximately 10%.

Paul Dupuis (Brown University) 26 October 2012

slide-81
SLIDE 81

Numerical examples

Convergence of the empirical measure on temperatures to uniform distribution.

Paul Dupuis (Brown University) 26 October 2012

slide-82
SLIDE 82

Concluding remarks

Function minimization attributes: The infinite swapping process (Y1, Y2) has symmetric dynamics, interesting qualitative properties w.r.t. function minimization.

Paul Dupuis (Brown University) 26 October 2012

slide-83
SLIDE 83

Concluding remarks

Convergence to equilibrium, single sample, 12 lowest temperatures:

Paul Dupuis (Brown University) 26 October 2012

slide-84
SLIDE 84

References

Glauber dynamics: An Introduction to Markov Processes, Stroock, Springer, 2005. One temperature Monte Carlo: “Equation of state calculations by fast computing machines”, Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller, J. of Chem. Phy., 21, 1087, 1953. “Monte Carlo sampling methods using Markov chains and their applications”, Hastings, Biometrika, 57, 97—109, 1970. “Brownian dynamics as smart Monte Carlo simulation”, Rossky, Doll, and Friedman, J. of Chem. Phy., 69, 4628, 1978. Parallel tempering: “Replica Monte Carlo simulation of spin glasses”, Swendsen and Wang, Phys. Rev. Lett., 57, 2607—2609, 1986. “Markov chain Monte Carlo maximum likelihood”, Geyer in Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface, ASA, 156—163, 1991.

Paul Dupuis (Brown University) 26 October 2012

slide-85
SLIDE 85

References

A paper suggesting it is good to swap a lot: “Exchange frequency in replica exchange molecular dynamics”, Sindhikara, Meng and Roitberg, J. of Chem. Phy., 128, 024104, 2008. Infinite swapping: “On the infinite swapping limit for parallel tempering”, Dupuis, Liu, Plattner and Doll, SIAM J. on MMS, 10, 986—1022, 2012. “An infinite swapping approach to the rare-event sampling problem”, Plattner, Doll, Dupuis, Wang, Liu and Gubernatis, J. of Chem. Phy., 135, 134111, 2011.

Paul Dupuis (Brown University) 26 October 2012