Stratified Markov Chain Monte Carlo Brian Van Koten University of - - PowerPoint PPT Presentation

stratified markov chain monte carlo
SMART_READER_LITE
LIVE PREVIEW

Stratified Markov Chain Monte Carlo Brian Van Koten University of - - PowerPoint PPT Presentation

Stratified Markov Chain Monte Carlo Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019 Sampling Problems What is the


slide-1
SLIDE 1

Stratified Markov Chain Monte Carlo

Brian Van Koten University of Massachusetts, Amherst Department of Mathematics and Statistics with A. Dinner, J. Tempkin, E. Thiede, B. Vani, and J. Weare June 28, 2019

slide-2
SLIDE 2

Sampling Problems

What is the probability of finding a protein in a given conformation? Bayesian inference for ODE model of circadian rhythms. Compute sample from posterior distribution. Compute sample from Boltzmann distribution.

Figure from Phong, et al, PNAS, 2012 Figure from Folding@home

slide-3
SLIDE 3

Markov Chain Monte Carlo (MCMC)

Goal: Compute π(g) :=

  • g(x)π(dx).

MCMC Method: Choose Markov chain Xn so that lim

N→∞

1 N

N−1

  • n=0

g(Xn) = π(g).

  • “Xn samples π.”

5000 10000 time step (n) 2 1 1 2 position (x)

MCMC trajectory Xn

1 density ( )

Target Density

slide-4
SLIDE 4

Difficulties with MCMC

5000 10000 time step (n) M position (x)

MCMC trajectory Xn

1 density ( )

Target Density

Multimodality: Multimodality = ⇒ slow convergence Tails: Need large sample to compute small probabilities, e.g. π ([M, ∞)).

slide-5
SLIDE 5

Sketch of Stratified MCMC

  • 1. Choose family of strata, i.e. distributions

πi whose supports cover support of target π.

  • 2. Sample strata by MCMC.
  • 3. Estimate π(g) from samples of strata.

1

Target Distribution

2 1 1 2 1 2

Strata

i

Why Stratify?

  • Strata may be unimodal, even if π is

multimodal

  • Can concentrate sampling in tail

Typical Strata: πi(dx) ∝ ψi(x)π(dx) for “localized” ψi.

slide-6
SLIDE 6

History of Stratification

Surveys: [Russian census, late 1800s], [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977], [WHAM, 1992], [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems:

  • 1. WHAM/MBAR are complicated iterative methods . . .
  • 2. No clear story explaining benefits of stratification.
  • 3. Stratification underappreciated as a general strategy.
  • 4. Need good error bars for adaptivity.
slide-7
SLIDE 7

History of Stratification

Surveys: [Russian census, late 1800s], [Neyman, 1937] Bayes factors: [Geyer, 1994] Selection bias models: [Vardi, 1985] Free energy: [Umbrella Sampling, 1977], [WHAM, 1992], [MBAR, 2008] Ion channels: [Berneche, et al, 2001] Protein folding: [Boczko, et al, 1995] Problems:

  • 1. WHAM/MBAR are complicated iterative methods . . .
  • 2. No clear story explaining benefits of stratification.
  • 3. Stratification underappreciated as a general strategy.
  • 4. Need good error bars for adaptivity.

BvK, et al: Propose Eigenvector Method for Umbrella Sampling, develop story, error bars, stratification for dynamical quantities . . .

slide-8
SLIDE 8

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al] 1

Target Distribution

1

Bias Functions

i

2 1 1 2 1 2

Strata

i

Bias Functions: {ψi}L

i=1 with L

  • i=1

ψi(x) = 1 and ψi(x) ≥ 0. Note: User chooses bias functions. Weights: zi = π(ψi) Strata: πi(dx) = z−1

i

ψi(x)π(dx)

slide-9
SLIDE 9

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) =

  • g(x)

L

  • i=1

ψi(x)

  • ψi’s sum to one

π(dx) =

L

  • i=1

zi

  • g(x) ψi(x)π(dx)

zi

  • πi(dx)

=

L

  • i=1

ziπi(g).

slide-10
SLIDE 10

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) =

  • g(x)

L

  • i=1

ψi(x)

  • ψi’s sum to one

π(dx) =

L

  • i=1

zi

  • g(x) ψi(x)π(dx)

zi

  • πi(dx)

=

L

  • i=1

ziπi(g). How to express weights zi = π(ψi) as averages over strata?

slide-11
SLIDE 11

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) = L

i=1 ziπi(g).

How to express weights zi = π(ψi) as averages over strata?

slide-12
SLIDE 12

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) = L

i=1 ziπi(g).

How to express weights zi = π(ψi) as averages over strata? zj = π(ψj) =

L

  • i=1

ziπi(ψj) ⇐ ⇒ zT = zTF

  • eigenproblem

, where Fij = πi(ψj)

  • verlap matrix

.

slide-13
SLIDE 13

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) = L

i=1 ziπi(g).

To express weights zi = π(ψi) as averages over strata, zT = zTF

  • eigenproblem

, where Fij = πi(ψj)

  • verlap matrix

.

slide-14
SLIDE 14

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Goal: Write π(g) in terms of averages over strata πi(dx) = ψi(x)π(dx)

zi

. First, decompose π(g) as weighted sum: π(g) = L

i=1 ziπi(g).

To express weights zi = π(ψi) as averages over strata, zT = zTF

  • eigenproblem

, where Fij = πi(ψj)

  • verlap matrix

. Why does eigenproblem determine z?

  • 1. F is stochastic; z is a probability vector.
  • 2. If F irreducible, z is unique solution of eigenproblem.
slide-15
SLIDE 15

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Recall: π(g) = L

i=1 ziπi(g), and zT = zTF for Fij = πi(ψj).

EMUS Algorithm:

  • 1. Choose bias functions ψi and processes X i

n sampling the strata.

  • 2. Compute ¯

gi :=

1 Ni

Ni

n=1 g(X i n) to estimate πi(g).

  • 3. Compute ¯

Fij :=

1 Ni

Ni

n=1 ψj(X i n) to estimate F.

  • 4. Solve eigenproblem ¯

zT = ¯ zT ¯ F to estimate weights z.

  • 5. Output gEM = L

i=1 ¯

zi ¯ gi.

slide-16
SLIDE 16

Eigenvector Method for Umbrella Sampling (EMUS)

[BvK, et al]

Recall: π(g) = L

i=1 ziπi(g), and zT = zTF for Fij = πi(ψj).

EMUS Algorithm:

  • 1. Choose bias functions ψi and processes X i

n sampling the strata.

  • 2. Compute ¯

gi :=

1 Ni

Ni

n=1 g(X i n) to estimate πi(g).

  • 3. Compute ¯

Fij :=

1 Ni

Ni

n=1 ψj(X i n) to estimate F.

  • 4. Solve eigenproblem ¯

zT = ¯ zT ¯ F to estimate weights z.

  • 5. Output gEM = L

i=1 ¯

zi ¯ gi. Key Point: Simplicity of EMUS enables analysis of stratification.

slide-17
SLIDE 17

EMUS Analysis: Outline

  • 1. Sensitivity of gEM to sampling error.
  • 2. Dependence of sampling error on choice of strata.
  • 3. Stories involving multimodality and tails.
slide-18
SLIDE 18

Quantifying Sensitivity to Sampling Error I

For F irreducible and stochastic, let z(F) be the unique solution of z(F)T = z(F)TF. PF

i [τj < τi]: probability of hitting j before i, conditioned on starting

from i, for a Markov chain on 1, . . . , L with transition matrix F. Theorem [BvK, et al]: 1 2 1 PF

i [τj < τi] ≤

max

m=1,...L

  • ∂ log zm

∂Fij (F)

1 PF

i [τj < τi] ≤ 1

Fij . Led to new perturbation bounds for Markov chains [BvK, et al].

slide-19
SLIDE 19

Quantifying Sensitivity to Sampling Error II

Assumption: CLT holds for MCMC averages:

  • Ni(¯

gi − πi(g)) d − → N(0, C (¯ gi)

asymptotic variance

). Theorem [BvK, et al]: √ N

  • gEM − π(g)

d − → N

  • 0, C
  • gEM

, where C

  • gEM

varπ(g)

L

  • i=1

   

  • j=i

Fij>0

1 PF

i [τj < τi]2

   

  • sensitivity to error in ¯

F

× L

j=1 C

¯ Fij

  • κi
  • error in ¯

F

+ z2

i

C (¯ gi) κi . Notation: N is total sample size, with Ni = κiN from πi.

slide-20
SLIDE 20

EMUS Analysis: Outline

  • 1. Sensitivity of gEM to sampling error.
  • 2. Dependence of sampling error on choice of strata.
  • 3. Stories involving multimodality and tails.
slide-21
SLIDE 21

Dependence of Sampling Error on Strata I

Write π(dx) = Z −1 exp(−V (x)/ε) for some potential V :

2 2 6 potential (V) 1 density ( )

Assume bias functions ψi piecewise constant:

D

Assume X i

t is overdamped Langevin with reflecting boundaries:

dX i

t = − ∇V (X i t )dt

  • gradient descent

+ √ 2εdBi

t noise

+ reflecting BCs

slide-22
SLIDE 22

Dependence of Sampling Error on Strata II

Let π(dx) = Z −1 exp(−V (x)/ε) for some potential V :

2 2 1 3 6 potential (V) 1 density ( ) = 1 = 5

Theorem [BvK, et al]: For overdamped Langevin with reflecting BCs, C(¯ gi) varπi(g)

  • D2

ε

  • diffusion scaling

× exp maxsupp πi V − minsupp πi V ε

  • Arrhenius

. Notation: D is diameter of support of πi.

slide-23
SLIDE 23

EMUS Analysis: Outline

  • 1. Dependence of sampling error on choice of strata.
  • 2. Sensitivity of gEM to sampling error.
  • 3. Stories involving multimodality and tails.
slide-24
SLIDE 24

EMUS and Multimodality

Let π(dx) = Z −1 exp(−V (x)/ε) for double well V :

2 2 1 3 6 potential (V) 1 density ( ) = 1 = 5

Asymptotic variance of na¨ ıve MCMC grows exponentially as ε ↓ 0. Theorem [BvK, et al]: For right choice of strata (L ∝ ε−1), asymptotic variance of EMUS estimate gEM grows polynomially as ε ↓ 0.

slide-25
SLIDE 25

EMUS and Tails

Goal: Compute π([M, ∞)) = ∞

M π(dx). M 0.0 0.2 0.4 density ( )

For a broad class of distributions π, relative asymptotic variance of MCMC grows exponentially as M ↑ ∞. Theorem [BvK, et al]: For right choice of strata, relative asymptotic variance of EMUS grows polynomially as M ↑ ∞.

slide-26
SLIDE 26

Example: EMUS for Bayesian Inference

Goal: Fit set of thicknesses of 485 stamps by mix of 3 Gaussians:

0.0 0.2 0.4 0.6 0.8 1.0 1 2 6 8 10 12 Stamp Thickness (10 m) 0.0 0.2 0.4 0.6 0.8 1.0 Counts 3 6 8 10 12 4

Parameters: means µ1 ≤ µ2 ≤ µ3, precisions λ1, λ2, λ3, weights, etc Bayesian method: Define posterior distribution on parameter space.

slide-27
SLIDE 27

Example: EMUS for Bayesian Inference

Parameters: means µ1 ≤ µ2 ≤ µ3, precisions λ1, λ2, λ3, weights, etc Objective: Compute marginal in log10 λ1 and log10 λ2. Strata: Cylinders over grid of regions in log10 λ1, log10 λ2 plane:

slide-28
SLIDE 28

Example: EMUS for Bayesian Inference

Parameters: means µ1 ≤ µ2 ≤ µ3, precisions λ1, λ2, λ3, weights, etc Objective: Compute marginal in log10 λ1 and log10 λ2.

slide-29
SLIDE 29

Example: EMUS for Bayesian Inference

Asymptotic variances of EMUS vs. unbiased MCMC for marginal in log λ1:

1 1 2 3 log10

1 (0.01

m

2)

1 2 3 4 5 6 log10 Asym. Var. in log10

1

EMUS No Bias

slide-30
SLIDE 30

Conclusions

We present and analyze EMUS, a stratified MCMC method, and we derive practical error bars for EMUS estimator [BvK et al, JCP, 2016]. Our analysis required development of new perturbation estimates for stochastic matrices [BvK et al, SIMAX, 2015]. We clearly identify classes of problems for which stratification is beneficial, and we propose novel applications in statistics

[BvK et al, 2019+].

We analyze and improve a stratification method for computing dynamical quantities [BvK et al, SIREV, 2017]. Ongoing Work: Convergence of NEUS, automatic methods for determining strata, comparison with other rare event sampling methods, . . .