Bayesian Coarse-Graining in Atomistic Simulations: Adaptive - - PowerPoint PPT Presentation

bayesian coarse graining in atomistic simulations
SMART_READER_LITE
LIVE PREVIEW

Bayesian Coarse-Graining in Atomistic Simulations: Adaptive - - PowerPoint PPT Presentation

Bayesian Coarse-Graining in Atomistic Simulations: Adaptive Identification of the Dimensionality and Salient Features M. Schberl 1 , N. Zabaras 2 , 3 , P .S. Koutsourelakis 1 1Continuum Mechanics Group 2Institute for Advanced Study Technical


slide-1
SLIDE 1

Bayesian Coarse-Graining in Atomistic Simulations:

Adaptive Identification of the Dimensionality and Salient Features

  • M. Schöberl1, N. Zabaras2,3,P

.S. Koutsourelakis1

1Continuum Mechanics Group 2Institute for Advanced Study Technical University of Munich Technical University of Munich 3Institute of Computational Sciences and Informatics University of Notre Dame

SIAM CSE, Atlanta, GA, USA March 1st 2017

p.s.koutsourelakis@tum.de Bayesian CG 1 / 23

slide-2
SLIDE 2

Problem Definition - Equilibrium Statistical Mechanics

Fine-Grained Model (FG)

pf(x) ∝ e−βUf (x)

  • x ∈ M: fine-scale DOFs
  • Uf(x): atomistic potential
  • Observables:

Epf (x)[a] =

  • a(x) pf(x) dx

Coarse-Grained Model (CG)

X = R(x), dim(X) << dim(x)

  • X: coarse-scale dofs
  • R: restriction operator (mapping

fine → coarse)

p.s.koutsourelakis@tum.de Bayesian CG 2 / 23

slide-3
SLIDE 3

Motivation

Questions

  • What are good coarse-grained variables X (how

many, how are they related to the FG description?)

  • What is the right CG model?
  • Given a good CG model for X, how much can one

predict about the whole x (reconstruction)?

  • How much information is lost during

coarse-graining and how does this affect predictions produced by the CG model?

  • Given finite simulation data at the fine-scale, how

(un)certain can one be in their predictions?

p.s.koutsourelakis@tum.de Bayesian CG 3 / 23

slide-4
SLIDE 4

Motivation

Questions

  • What are good coarse-grained variables X (how

many, how are they related to the FG description?)

  • What is the right CG model?
  • Given a good CG model for X, how much can one

predict about the whole x (reconstruction)?

  • How much information is lost during

coarse-graining and how does this affect predictions produced by the CG model?

  • Given finite simulation data at the fine-scale, how

(un)certain can one be in their predictions?

p.s.koutsourelakis@tum.de Bayesian CG 3 / 23

slide-5
SLIDE 5

Motivation

Questions

  • What are good coarse-grained variables X (how

many, how are they related to the FG description?)

  • What is the right CG model?
  • Given a good CG model for X, how much can one

predict about the whole x (reconstruction)?

  • How much information is lost during

coarse-graining and how does this affect predictions produced by the CG model?

  • Given finite simulation data at the fine-scale, how

(un)certain can one be in their predictions?

p.s.koutsourelakis@tum.de Bayesian CG 3 / 23

slide-6
SLIDE 6

Motivation

Questions

  • What are good coarse-grained variables X (how

many, how are they related to the FG description?)

  • What is the right CG model?
  • Given a good CG model for X, how much can one

predict about the whole x (reconstruction)?

  • How much information is lost during

coarse-graining and how does this affect predictions produced by the CG model?

  • Given finite simulation data at the fine-scale, how

(un)certain can one be in their predictions?

p.s.koutsourelakis@tum.de Bayesian CG 3 / 23

slide-7
SLIDE 7

Motivation

Questions

  • What are good coarse-grained variables X (how

many, how are they related to the FG description?)

  • What is the right CG model?
  • Given a good CG model for X, how much can one

predict about the whole x (reconstruction)?

  • How much information is lost during

coarse-graining and how does this affect predictions produced by the CG model?

  • Given finite simulation data at the fine-scale, how

(un)certain can one be in their predictions?

p.s.koutsourelakis@tum.de Bayesian CG 3 / 23

slide-8
SLIDE 8

Motivation

Two roads in CG:

1.) Variational (Mean Field, and many others) min¯

pf (x) KL(¯

pf(x) ||pf(x)) 2.) Data-driven (e.g. Relative Entropy [Shell (2008)]):

min¯

pf (x) KL(pf(x) ||¯

pf(x))

where:

  • ¯

pf(x): approximation

  • pf(x) ∝ e−βUf (x): exact

p.s.koutsourelakis@tum.de Bayesian CG 4 / 23

slide-9
SLIDE 9

Motivation

Existing methods

pf(x)

fine R(x)=X

− − − − − → ¯ pc(X)

coarse

Coarse-Scale X Fine-Scale Configuration x

Proposed (Generative model)

pc(X)

coarse pcf (x|X)

− − − − − → ¯ pf(x) =

  • pcf(x|X) pc(X) dX
  • fine

Notes

  • No restriction operator (fine-to-coarse R(x) = X).
  • A probabilistic coarse-to-fine map pcf(x|X) is prescribed
  • The coarse model pc(X) is not the marginal of X (given R(x) = X)

p.s.koutsourelakis@tum.de Bayesian CG 5 / 23

slide-10
SLIDE 10

Motivation

Existing methods

pf(x)

fine R(x)=X

− − − − − → ¯ pc(X)

coarse

Coarse-Scale X Fine-Scale Configuration x

Proposed (Generative model)

pc(X)

coarse pcf (x|X)

− − − − − → ¯ pf(x) =

  • pcf(x|X) pc(X) dX
  • fine

Coarse-Scale X Fine-Scale Configurations x

Notes

  • No restriction operator (fine-to-coarse R(x) = X).
  • A probabilistic coarse-to-fine map pcf(x|X) is prescribed
  • The coarse model pc(X) is not the marginal of X (given R(x) = X)

p.s.koutsourelakis@tum.de Bayesian CG 5 / 23

slide-11
SLIDE 11

Motivation

Given pc(X) and pcf(x|X):

1) Draw X from pc(X) (i.e. simulate CG model) 2) Draw x from pcf(x|X)

p.s.koutsourelakis@tum.de Bayesian CG 6 / 23

slide-12
SLIDE 12

Learning

Proposed Probabilistic Generative model

  • Parametrize:

pc(X|θc)

  • coarse model

, pcf(x|X, θcf)

  • coarse→fine map
  • Optimize:

min

θc,θcf KL(pf(x) || ¯

pf(x|θc, θcf)) ↔ min

θc,θcf −

  • pf(x) log
  • pcf (x|X,θcf) pc(X|θc) dX

pf (x)

dx ↔ max

θc,θcf

  • pf(x)
  • log
  • pcf(x|X, θcf) pc(X|θc) dX
  • dx

↔ max

θc,θcf

N

i=1 log

  • pcf(x(i)|X, θcf) pc(X|θc) dX

↔ max

θc,θcf L(θc, θcf),

(MLE)

  • MAP estimate: max

θc,θcf L(θc, θcf) + log p(θc, θcf)

  • log−prior
  • Fully Bayesian i.e. posterior: p(θc, θcf|x(1:N)) ∝ exp{L(θc, θcf) p(θc, θcf)}

p.s.koutsourelakis@tum.de Bayesian CG 7 / 23

slide-13
SLIDE 13

Learning

Proposed Probabilistic Generative model

  • Parametrize:

pc(X|θc)

  • coarse model

, pcf(x|X, θcf)

  • coarse→fine map
  • Optimize:

min

θc,θcf KL(pf(x) || ¯

pf(x|θc, θcf)) ↔ min

θc,θcf −

  • pf(x) log
  • pcf (x|X,θcf) pc(X|θc) dX

pf (x)

dx ↔ max

θc,θcf

  • pf(x)
  • log
  • pcf(x|X, θcf) pc(X|θc) dX
  • dx

↔ max

θc,θcf

N

i=1 log

  • pcf(x(i)|X, θcf) pc(X|θc) dX

↔ max

θc,θcf L(θc, θcf),

(MLE)

  • MAP estimate: max

θc,θcf L(θc, θcf) + log p(θc, θcf)

  • log−prior
  • Fully Bayesian i.e. posterior: p(θc, θcf|x(1:N)) ∝ exp{L(θc, θcf) p(θc, θcf)}

p.s.koutsourelakis@tum.de Bayesian CG 7 / 23

slide-14
SLIDE 14

Learning

Proposed Probabilistic Generative model

  • Parametrize:

pc(X|θc)

  • coarse model

, pcf(x|X, θcf)

  • coarse→fine map
  • Optimize:

min

θc,θcf KL(pf(x) || ¯

pf(x|θc, θcf)) ↔ min

θc,θcf −

  • pf(x) log
  • pcf (x|X,θcf) pc(X|θc) dX

pf (x)

dx ↔ max

θc,θcf

  • pf(x)
  • log
  • pcf(x|X, θcf) pc(X|θc) dX
  • dx

↔ max

θc,θcf

N

i=1 log

  • pcf(x(i)|X, θcf) pc(X|θc) dX

↔ max

θc,θcf L(θc, θcf),

(MLE)

  • MAP estimate: max

θc,θcf L(θc, θcf) + log p(θc, θcf)

  • log−prior
  • Fully Bayesian i.e. posterior: p(θc, θcf|x(1:N)) ∝ exp{L(θc, θcf) p(θc, θcf)}

p.s.koutsourelakis@tum.de Bayesian CG 7 / 23

slide-15
SLIDE 15

Learning

Stochastic VB-Expectation-Maximization [Beal & Ghahramani 2003]

L(θc, θcf) = N

i=1 log

  • pcf(x(i)|X (i), θcf) pc(X (i)|θc) dX (i)

= N

i=1 log

  • q(X (i)) pcf (x(i)|X (i),θcf) pc(X (i)|θc)

q(X (i))

dX (i) ≥ N

i=1

  • q(X (i)) log pcf (x(i)|X (i),θcf) pc(X (i)|θc)

q(X (i))

dX (i) = N

i=1 Fi(q(X (i)), θc, θcf) = F(q, θc, θcf)

  • E-step: Approximate qopt

i

(X (i)) using a multivariate Gaussians: qi(X (i)) = N(µopt

i

, Σopt

i

)

  • M-step: Compute gradients N

i=1 ∇θcF, N i=1 ∇θcfF, (and Hessian) and

update (θc, θcf)

p.s.koutsourelakis@tum.de Bayesian CG 8 / 23

slide-16
SLIDE 16

Learning

Stochastic VB-Expectation-Maximization [Beal & Ghahramani 2003]

L(θc, θcf) = N

i=1 log

  • pcf(x(i)|X (i), θcf) pc(X (i)|θc) dX (i)

= N

i=1 log

  • q(X (i)) pcf (x(i)|X (i),θcf) pc(X (i)|θc)

q(X (i))

dX (i) ≥ N

i=1

  • q(X (i)) log pcf (x(i)|X (i),θcf) pc(X (i)|θc)

q(X (i))

dX (i) = N

i=1 Fi(q(X (i)), θc, θcf) = F(q, θc, θcf)

  • E-step: Approximate qopt

i

(X (i)) using a multivariate Gaussians: qi(X (i)) = N(µopt

i

, Σopt

i

)

  • M-step: Compute gradients N

i=1 ∇θcF, N i=1 ∇θcfF, (and Hessian) and

update (θc, θcf)

Essential Ingredient: Stochastic Optimization

ADAptive Moment estimation (ADAM, [Kingma & Ba 2014])

p.s.koutsourelakis@tum.de Bayesian CG 8 / 23

slide-17
SLIDE 17

Learning

  • Exponential-family distributions:

pc(X|θc) = exp      θT

c φ(X)

  • CG potential Uc

−A(θc)      (eA(θc) =

  • eθT

c φ(X) dX)

pcf(x|X, θcf) = exp{θT

cfψ(x, X) − B(X, θcf)}

(eB(X,θcf ) =

  • eθT

cf ψ(x,X) dx)

  • Gradients:

∇θcF = N

i=1 < φ(X (i)) >qi(X (i)) −N < φ(X) >pc(X|θc)

∇θcf F = N

i=1(< ψ(x(i), X (i)) >qi(X (i)) − < ψ(x, X (i)) >pcf (x|X (i),θcf )qi(X (i)))

  • Hessian:

∇2

θcF = −N Covpc(X|θc)[φ(X)]

∇2

θcf F = − N i=1 Covpcf (x|X (i),θcf )qi(X (i))[ψ(x, X (i))]

→ Concave

p.s.koutsourelakis@tum.de Bayesian CG 9 / 23

slide-18
SLIDE 18

Learning

  • Exponential-family distributions:

pc(X|θc) = exp      θT

c φ(X)

  • CG potential Uc

−A(θc)      (eA(θc) =

  • eθT

c φ(X) dX)

pcf(x|X, θcf) = exp{θT

cfψ(x, X) − B(X, θcf)}

(eB(X,θcf ) =

  • eθT

cf ψ(x,X) dx)

  • Gradients:

∇θcF = N

i=1 < φ(X (i)) >qi(X (i)) −N < φ(X) >pc(X|θc)

∇θcf F = N

i=1(< ψ(x(i), X (i)) >qi(X (i)) − < ψ(x, X (i)) >pcf (x|X (i),θcf )qi(X (i)))

  • Hessian:

∇2

θcF = −N Covpc(X|θc)[φ(X)]

∇2

θcf F = − N i=1 Covpcf (x|X (i),θcf )qi(X (i))[ψ(x, X (i))]

→ Concave

p.s.koutsourelakis@tum.de Bayesian CG 9 / 23

slide-19
SLIDE 19

Learning

  • MAP-estimates:

max

θc,θcf L(θc, θcf) + log p(θc, θcf)

  • log−prior
  • Approximate Bayesian posterior using Laplace approximation

0.1 0.2 0.3 0.4 0.5 0.6

exact Gaussian

θMAP

Figure : Laplace approximation: p(θ|x(1:N)) ≈ N(µ, S)

where:

  • µ = θ,MAP
  • S−1 =
  • NCovpc(X|θc)[φ(X), φl(X)]

N

i=1 Covpcf(q|X(i),θcf)qi (X(i))[ψ(q, X)]

  • p.s.koutsourelakis@tum.de

Bayesian CG 10 / 23

slide-20
SLIDE 20

Adaptive feature learning

Which feature functions φ(X) one use?

CG potential: Uc(X) = θT

c φ(X) → pc(X) ∝ eUc(X)

  • Option 1: Use as many as possible in combination with a

sparsity-enforcing prior [Schöberl et al, JCP 2017].

  • Suitable when X have a clear, physical meaning.
  • Option 2: Consider a parametrized family Φz = {φ(X; z)} and greedily add the

best member of this (i.e. optimize z). I.e. suppose: Uc(X) = θT

c φ(X) + θc,newφ(X; z)

Then, the largest expected decrease in KL(pf(x) ||¯ pf(x)) is: arg max

z

N

  • i=1

< φ(X; z) >qi (X) − < φ(X; z) >pc(X) 2

p.s.koutsourelakis@tum.de Bayesian CG 11 / 23

slide-21
SLIDE 21

Adaptive feature learning

Which feature functions φ(X) one use?

CG potential: Uc(X) = θT

c φ(X) → pc(X) ∝ eUc(X)

  • Option 1: Use as many as possible in combination with a

sparsity-enforcing prior [Schöberl et al, JCP 2017].

  • Suitable when X have a clear, physical meaning.
  • Option 2: Consider a parametrized family Φz = {φ(X; z)} and greedily add the

best member of this (i.e. optimize z). I.e. suppose: Uc(X) = θT

c φ(X) + θc,newφ(X; z)

Then, the largest expected decrease in KL(pf(x) ||¯ pf(x)) is: arg max

z

N

  • i=1

< φ(X; z) >qi (X) − < φ(X; z) >pc(X) 2

p.s.koutsourelakis@tum.de Bayesian CG 11 / 23

slide-22
SLIDE 22

Adaptive feature learning

Which feature functions φ(X) one use?

CG potential: Uc(X) = θT

c φ(X) → pc(X) ∝ eUc(X)

  • Option 1: Use as many as possible in combination with a

sparsity-enforcing prior [Schöberl et al, JCP 2017].

  • Suitable when X have a clear, physical meaning.
  • Option 2: Consider a parametrized family Φz = {φ(X; z)} and greedily add the

best member of this (i.e. optimize z). I.e. suppose: Uc(X) = θT

c φ(X) + θc,newφ(X; z)

Then, the largest expected decrease in KL(pf(x) ||¯ pf(x)) is: arg max

z

N

  • i=1

< φ(X; z) >qi (X) − < φ(X; z) >pc(X) 2

p.s.koutsourelakis@tum.de Bayesian CG 11 / 23

slide-23
SLIDE 23

Numerical Illustrations - Alanine Dipeptide

Figure : Alanine Dipeptide [Bonomi et al,CPC , 2009]

p.s.koutsourelakis@tum.de Bayesian CG 12 / 23

slide-24
SLIDE 24

Numerical Illustrations - Alanine Dipeptide

Figure : Ramachandran plot for Alanine Dipeptide with respect to dihedral angles φ, ψ.

p.s.koutsourelakis@tum.de Bayesian CG 13 / 23

slide-25
SLIDE 25

Numerical Illustrations - Alanine Dipeptide

Proposed Global Model

  • Coarse-Grained model:

pc(X) ∝ e−βUc(X), Uc(X) = θT

c φ(X)

We assume:

  • X ∈ [0, 1]nX , nX = dim(X)
  • radial basis functions φ(X; z) = e− nX

k=1 τk(Xk−X0,k)2 where

z = {τk, X0,k}nX

j=1

  • Coarse-to-Fine map:

pcf(x|X) = N(µ + WX, S) , θcf = {µ, W, S}

p.s.koutsourelakis@tum.de Bayesian CG 14 / 23

slide-26
SLIDE 26

Numerical Illustrations - Alanine Dipeptide

Proposed Global Model

  • Coarse-Grained model:

pc(X) ∝ e−βUc(X), Uc(X) = θT

c φ(X)

We assume:

  • X ∈ [0, 1]nX , nX = dim(X)
  • radial basis functions φ(X; z) = e− nX

k=1 τk(Xk−X0,k)2 where

z = {τk, X0,k}nX

j=1

  • Coarse-to-Fine map:

pcf(x|X) = N(µ + WX, S) , θcf = {µ, W, S}

p.s.koutsourelakis@tum.de Bayesian CG 14 / 23

slide-27
SLIDE 27

Numerical Illustrations - Alanine Dipeptide

Figure : L = 1

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 15 / 23

slide-28
SLIDE 28

Numerical Illustrations - Alanine Dipeptide

Figure : L = 2

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 16 / 23

slide-29
SLIDE 29

Numerical Illustrations - Alanine Dipeptide

Figure : L = 3

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 17 / 23

slide-30
SLIDE 30

Numerical Illustrations - Alanine Dipeptide

Figure : L = 5

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 18 / 23

slide-31
SLIDE 31

Numerical Illustrations - Alanine Dipeptide

Figure : L = 10

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 19 / 23

slide-32
SLIDE 32

Numerical Illustrations - Alanine Dipeptide

Figure : L = 20

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 20 / 23

slide-33
SLIDE 33

Numerical Illustrations - Alanine Dipeptide

Figure : L = 26

Uc =

L

  • l=1

θc,l φl(X), dim(X) = 2

p.s.koutsourelakis@tum.de Bayesian CG 21 / 23

slide-34
SLIDE 34

Numerical Illustrations - Alanine Dipeptide

0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X0 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X1

Xi ∼q(X(i) |x(i) ,θ)

α β-1 β-2

Figure : Visualization in (Latent) CG-variable Space X

p.s.koutsourelakis@tum.de Bayesian CG 22 / 23

slide-35
SLIDE 35

Numerical Illustrations - Alanine Dipeptide

Probabilistic Predictions of Macroscopic Properties

Figure : Root-mean-squared (RMSD) deviation from an α-helical conformation

p.s.koutsourelakis

p.s.koutsourelakis@tum.de Bayesian CG 23 / 23

slide-36
SLIDE 36

Conclusions

Summary

  • A generative probabilistic model is proposed
  • It consists of a CG-density and a probabilistic coarse → fine map.
  • Can account for information loss due to CG
  • Can quantify predictive uncertainty in fine-scale observables.
  • Can be used for model selection.

p.s.koutsourelakis@tum.de Bayesian CG 24 / 23

slide-37
SLIDE 37

Conclusions

Summary

  • A generative probabilistic model is proposed
  • It consists of a CG-density and a probabilistic coarse → fine map.
  • Can account for information loss due to CG
  • Can quantify predictive uncertainty in fine-scale observables.
  • Can be used for model selection.

Outlook

  • Explore alternative definitions of coarse variables X and alternative

coarse → fine maps pcf e.g.:

  • Discrete states indicating Free-Energy wells
  • Hierachical coarse-graining:

¯ pf(x) =

  • pcf(x|X 1) pc(X 1|X 2) pc(X 2|X 3) . . . pc(X M) dX 1 . . . X M
  • Fully Bayesian or Variational Bayesian

p.s.koutsourelakis@tum.de Bayesian CG 24 / 23