Nested sampling with demons Michael Habeck Max Planck Institute for - - PowerPoint PPT Presentation

nested sampling with demons
SMART_READER_LITE
LIVE PREVIEW

Nested sampling with demons Michael Habeck Max Planck Institute for - - PowerPoint PPT Presentation

Nested sampling with demons Michael Habeck Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Gttingen, Germany Amboise, September 23, 2014 Bayesian inference Probability rules posterior evidence


slide-1
SLIDE 1

Nested sampling with demons

Michael Habeck

Max Planck Institute for Biophysical Chemistry and Institute for Mathematical Stochastics Göttingen, Germany

Amboise, September 23, 2014

slide-2
SLIDE 2

Bayesian inference

  • Probability rules

posterior × evidence = likelihood × prior Pr(θ|D, M) × Pr(D|M) = Pr(D|θ, M) × Pr(θ|M) p(θ) × Z = L(θ) × π(θ)

  • Inference
  • Evidence

Z = ∫ L(θ) π(θ) dθ

  • Posterior

p(θ) = 1 Z L(θ) π(θ)

slide-3
SLIDE 3

Nested sampling

  • The evidence reduces to a one-dimensional integral:

Z = ∫ L(θ) π(θ) dθ = ∫ 1 L(X) dX summing over prior mass X(λ) = ∫

L(θ)≥λ

π(θ) dθ, X(0) = 1, X(∞) = 0.

slide-4
SLIDE 4

Nested sampling

  • The evidence reduces to a one-dimensional integral:

Z = ∫ L(θ) π(θ) dθ = ∫ 1 L(X) dX summing over prior mass X(λ) = ∫

L(θ)≥λ

π(θ) dθ, X(0) = 1, X(∞) = 0.

  • Prior masses can be ordered:

X(λ) < X(λ′) if λ > λ′

  • Idea: We can evaluate L exactly and estimate X
slide-5
SLIDE 5

Estimation of prior masses

  • Nested sequence of truncated priors:

p(θ|λ) = Θ[L(θ) − λ] X(λ) π(θ) where Θ(x) = { 0 ; x < 0 1 ; x ≥ 0

  • Distribution of prior masses at likelihood contour λ:

X ∼ Uniform(0, X(λ))

  • Order statistics:

Xmax ∼ N X N−1 X(λ)N where Xmax is the maximum of N uniformely distributed Xn ∼ Uniform(0, X(λ))

slide-6
SLIDE 6

Nested sampling algorithm

Require: N (ensemble size), m (number of iterations) λ0 = 0, X0 = 1, S = {θ1, . . . , θN} (states) where θn ∼ π(θ) ▷ Initialize for k = 1, . . . , m do θk = argmin{L(θn)|n = 1, . . . , N} ▷ Store state with smallest likelihood λk = L(θk) ▷ Define new likelihood contour Xk = t Xk−1 where t ∼ N t N−1 ▷ Estimate prior mass θ ∼ p(θ|λk) ▷ Draw new state from truncated prior S ← S \ {θk} ∪ {θ} ▷ Replace current with new state end for Z = ∑m

k=1 λk (Xk−1 − Xk)

▷ Estimate evidence

slide-7
SLIDE 7

Compression

  • Compression achieved by a single nested sampling iteration

H(λ → λ′) = ∫ p(θ|λ′) ln[p(θ|λ′)/p(θ|λ)] dθ = ln[X(λ)/X(λ′)]

  • Average compression is constant

⟨H(λ → λ′)⟩ = −⟨ln t ⟩t∼Beta(N,1) = 1/N

slide-8
SLIDE 8

Physical analogy

Bayesian inference Statistical physics model parameters θ configuration/microstate θ negative log likelihood − ln L(θ) potential energy E(θ) log prior mass ln X(λ) volume entropy S(ϵ) = ln X(ϵ) likelihood contour L(θ) > λ energy bound E(θ) < ϵ prior mass X(λ) cumulative DOS X(ϵ) = ∫ ϵ

−∞ g(E) dE

evidence Z = ∫ L(X )dX partition function Z(β) = ∫ e−βEg(E) dE truncated prior microcanonical ensemble

slide-9
SLIDE 9

Microcanonical ensemble

  • Density of states (DOS)

g(E ) = ∫ δ[E − E(θ)] π(θ) dθ = ∂EX(E )

  • Microcanonical entropy and temperature:

S(E ) = ln X(E ), T(E ) = 1/∂ES(E )

  • Compression:

H(ϵ′ → ϵ) = S(ϵ′) − S(ϵ) = ∫ ϵ′

ϵ

β(E ) dE where the inverse temperature β = 1/T measures the entropy production

slide-10
SLIDE 10

Enter the demon

  • Implement truncated prior as microcanonical ensemble with additional

demon absorbing energy D : p(θ, D |ϵ) = 1 X(ϵ) δ[ϵ − D − E(θ)] Θ(D) π(θ) and explore constant energy shells

  • Creutz algorithm:

Require: ϵ (upper bound on total energy) θ ∼ π(θ) with energy E = E(θ) ≤ ϵ, D = ϵ − E ▷ Initialize while not converged do θ ′ ∼ π(θ) with energy E ′ = E(θ ′) ▷ Generate a candidate D ′ = D − ∆E where ∆E = E ′ − E ▷ Update demon’s state if D ′ ≥ 0 then (θ, D) ← (θ ′, D ′) ▷ Accept end if end while

slide-11
SLIDE 11

Sampling the Ising model with a single demon

8000 6000 4000 2000

energy E

2500 2000 1500 1000 500

Gibbs entropy SG (E)

A

estimated lnXk

  • Nearest-neighbor interaction on a 64 × 64 lattice: E(θ) = ∑

⟨i,j ⟩ θiθj where

θi = ±1

  • Nested sampling provides a very accurate estimate of the volume entropy

S = ln X

slide-12
SLIDE 12

Sampling the Ising model with a single demon

8000 6000 4000 2000

energy E

0.0 0.2 0.4 0.6 0.8 1.0

inverse temperature βG (E)

B

heat capacity ln(1 +

p

2)/2

  • H(ϵ′ → ϵ) = S(ϵ′) − S(ϵ) =

∫ ϵ′

ϵ β(E ) dE

  • ⟨H(ϵ′ → ϵ)⟩ = 1/N, therefore β(ϵk) (ϵk − ϵk+1) ≈ 1/N
  • histogram of energy bounds ϵk matches the inverse temperature /

entropy production β(E )

slide-13
SLIDE 13

Sampling the Ising model with a single demon

8000 6000 4000 2000

energy E

0.0 0.2 0.4 0.6 0.8 1.0

inverse temperature βG (E)

C

estimated βB

  • The demon’s energy distribution is

p(D|ϵ) = ∫ p(θ, D|ϵ) dθ = g(ϵ − D) X(ϵ) ≈ 1 T(ϵ) exp{−D/T(ϵ)}

  • The demon may serve as a thermometer: D ≈ T
slide-14
SLIDE 14

Properties of nested sampling

Pros: . 1 Nested sampling is a microcanonical approach: energy E is the control parameter rather than the temperature used in thermal approaches . . 2 constructs an adaptive “cooling” protocol {ϵk} . . 3 progresses at constant thermodynamic speed: ∆S ≈ 1/N . . 4 provides an estimate of the entropy S Cons: . . 1 Nested sampling requires efficient sampling from p(θ|ϵ) = 1 X(ϵ) Θ[ϵ − E(θ)] π(θ) = 1 X(ϵ) ∫ δ[ϵ − D − E(θ)] Θ(D) π(θ) dD

slide-15
SLIDE 15

Releasing more demons

  • We would like to preserve nested sampling’s adaptive behavior but be

more flexible in terms of the ensemble

  • Idea: introduce more demons in order to smooth the ensemble

p(θ, D, K |ϵ) = 1 Y(ϵ) δ[ϵ − D − K − E(θ)] Θ(D ) f(K ) π(θ) where the prior mass of the compound system is Y(ϵ) = ∫ Θ(ϵ − H ) (f ⋆ g)(H ) dH involving the convolution (f ⋆ g)(H )

  • Nested sampling tracks Y(ϵ) where ϵ is an upper bound on the total

energy H = K + E

slide-16
SLIDE 16

Releasing more demons

  • Nested sampling estimates the evidence of the extended system

ZH = ∫ e−H(f ⋆ g)(H ) dH = ZK ZE from which we can obtain the evidence of the original system ZE

  • Marginal distribution of configurations

p(θ|ϵ) = ∫ p(θ, D, K |ϵ) dD dK = 1 Y(ϵ) π(θ) F [ϵ − E(θ)] where F(K ) = ∫ K

−∞ f(t) dt is the cdf of the demon’s energy distribution

  • Sampling (θ, K ):

θ ∼ p(θ|ϵ) ∝ π(θ) F [ϵ − E(θ)] K ∼ p(K |θ, ϵ) ∝ f(K ) Θ[ϵ − E(θ) − K ]

slide-17
SLIDE 17

Demonic nested sampling of the ten state Potts model

Demon: f(K ) ∝ Θ(Kmax − K) Kd/2−1 (d-dimensional harmonic oscillator where d =dimension of configuration space) p(θ|ϵ) ∝ Θ[ϵ − E(θ)] π(θ) × { [ϵ − E(θ)]d/2; ϵ − E(θ) ≤ Kmax Kd/2

max;

ϵ − E(θ) > Kmax

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

iteration k

1e5 2.0 1.5 1.0 0.5 0.0

energy bounds ǫk

1e3

A

standard NS demonic NS 1080 1060 1040 1020 1000 980 960 940

energy E

1 2 3 4 5 61e3

B

100 200 300 400 500

demon capacity Kmax

10 5 5 10

relative accuracy logZ [%] C

slide-18
SLIDE 18

Nested sampling in phase space

  • In continuous configuration spaces, it is convenient to unfold the demon

and introduce momenta f(K) = ∫ δ[K − K(ξ)] dξ where K(ξ) = 1 2

d

i=1

ξ2

i

(kinetic energy)

  • The marginal distribution in configuration space is

p(θ|ϵ) ∝ Θ[ϵ − E(θ)] π(θ) × { [ϵ − E(θ)]d/2; ϵ − E(θ) ≤ Kmax Kd/2

max;

ϵ − E(θ) > Kmax

  • Hamiltonian dynamics for exploration:

(θ, ξ) L → (θ′, ξ′) where L is an integrator (e.g. the leapfrog)

slide-19
SLIDE 19

Microcanonical Hamiltonian Monte Carlo

  • 2(d + 1) dimensional phase space: implement demon D as harmonic
  • scillator with energy D = (ξ2

d+1 + θ2 d+1)/2

  • Require: ϵ (total energy), configuration θ with E(θ) < ϵ

θd+1 = 0 ▷ Initialize demon D while not converged do ξ ∼ N(0, 1) ▷ Draw momenta from (d + 1)-dim Gaussian ξ ← ξ × √ ϵ − E − D/∥ξ∥ ▷ Scale momenta so as to match excess energy (θ, ξ) L → (θ′, ξ′) ▷ Run the leapfrog algorithm H ′ = E(θ ′) + K(ξ ′) ▷ Compute total energy of candidate if H ′ < ϵ then θ ← θ′, E ← E(θ) ▷ Accept end if end while

slide-20
SLIDE 20

Application to GS peptide

0.0 0.2 0.4 0.6 0.8 1.0 1.2

iteration k

1e4 100 100 200 300 400 500 600 700 800

energy E(θk )

B

1 2 3 4 5

RMSD [ ]

50 100 150 200

C

  • A: Native structure of the GS peptide
  • B: Evolution of the energy (goodness-of-fit) during nested sampling
  • C: Structure’s accuracy measured by the root-mean square deviation

(RMSD) to the crystal structure

slide-21
SLIDE 21

Other demons

Distribution of system’s energy p(E |ϵ) = Θ(ϵ − E ) Y(ϵ) g(E ) F(ϵ − E ) demon pdf f(K) cdf F(K) Gauss √

β 2π e− β

2 K2

1 2 [1 + erf(

√ β/2 K)] Logarithmic K(ξ1, ξ2) = 1

2ξ2 1 + β−1 ln |ξ2|

  • scillator

⇒ √8πβ eβK √ 8π/β eβK Fermi β

e−βK (1+e−βK)2 1 1+e−βK

slide-22
SLIDE 22

Application to SH3 domain

  • Structure determination from sparse distance data measured by NMR

spectroscopy

  • Structure ensemble as accurate and precise as with parallel tempering
slide-23
SLIDE 23

Summary

  • Nested sampling is a powerful method to study the microcanonical

ensemble

  • By means of demons we can smooth the microcanonical ensemble, which

eases the exploration of configuration space

  • All of the desired features of nested sampling are preserved