Parallel tempering and Interacting MCMC algorithms Gersende FORT / - - PowerPoint PPT Presentation

parallel tempering and interacting mcmc algorithms
SMART_READER_LITE
LIVE PREVIEW

Parallel tempering and Interacting MCMC algorithms Gersende FORT / - - PowerPoint PPT Presentation

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris Tech CNRS - LTCI Parallel tempering and Interacting MCMC algorithms


slide-1
SLIDE 1

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Parallel tempering and Interacting MCMC algorithms

Gersende FORT / Eric MOULINES

Telecom Paris Tech CNRS - LTCI

slide-2
SLIDE 2

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Part II: Adaptive Equi-Energy samplers

Joint work with Amandine Schreck Aur´ elien Garivier and Eric Moulines from LTCI, Telecom ParisTech & CNRS, France.

slide-3
SLIDE 3

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

From Parallel Tempering to Interacting Tempering

◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting

Tempering algorithm.

◮ The idea is to replace an instantaneous swap by an interaction

with the whole past of a neighboring process on the temperature ladder.

slide-4
SLIDE 4

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

From Parallel Tempering to Interacting Tempering

◮ The Equi Energy sampler Kou et al (2006) is an example of Interacting

Tempering algorithm.

◮ The idea is to replace an instantaneous swap by an interaction

with the whole past of a neighboring process on the temperature ladder. Equi-Energy sampler Kou et al (2006)

◮ Will define X(t) = {X(t) n , n ≥ 0} with X(1) (hot temperature), · · · ,

X(K) target process.

◮ Algorithm: given the previous level X(k−1) 1:n−1 and the current point

X(k)

n−1, define X(k) n

as follows:

◮ (MCMC step / local moves) with probability ǫ,

X(k)

n

∼ P (k)(X(k)

n−1, ·)

with P (k) s.t. π(k)P (k) = π(k)

◮ (Interaction step / global moves) otherwise,

(i) selection of a point X(k−1)

  • among the set {X(k−1)

1:n−1} with the same

energy level as X(k)

n−1

(ii) acceptance-rejection ratio.

slide-5
SLIDE 5

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Numerical application: on the interest of EE

1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 Target density : mixture of 2−dim Gaussian draws means of the components

◮ target density : π = 20 i=1 N2(µi, Σi) ◮ K processes with target distribution π1/Tk

(TK = 1)

−2 2 4 6 8 10 12 −4 −2 2 4 6 8 10 12 14 Target density at temperature 1 draws means of the components −2 2 4 6 8 10 12 −2 2 4 6 8 10 12 Target density at temperature 2 draws means of the components 1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 12 Target density at temperature 3 draws means of the components 1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 12 Target density at temperature 4 draws means of the components 1 2 3 4 5 6 7 8 9 −2 2 4 6 8 10 Target density at temperature 5 draws means of the components 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 Hastings−Metropolis draws means of the components

slide-6
SLIDE 6

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

“Design parameters” of the EE sampler

  • 1. How to choose the probability of interaction ǫ ?
  • 2. How many temperatures, and which ones ?
  • 3. How many energy levels, and which ones ?

Despite many convergence analysis

(on EE with no selection)

◮ ergodicity:

limn E[h(X(K)

n

)] = π(h)

◮ law of large numbers: 1 n

n

j=1 h(X(K) j

) → π(h) in P or a.s.

◮ CLT:

√n−1 n

j=1{h(X(K) j

) − π(h)} →D N(0, σ2)

see e.g. Kou, Zhou, Wong (2006); Atchad´ e (2010); Andrieu, Jasra, Doucet, Del Moral (2011); Fort, Moulines, Priouret (2012); Fort, Moulines, Priouret, Vandekerkhove (2012) these problems are still open.

slide-7
SLIDE 7

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

“Design parameters” of the EE sampler

  • 1. How to choose the probability of interaction ǫ ?
  • 2. How many temperatures, and which ones ?
  • 3. How many energy levels, and which ones ?

◮ In the original EE: energy rings = strata in the range of the energy

H of the target π π(x) = exp(−H(x)) Choose Hi s.t. min H < H1 < · · · < HL. Energy Ring #i = {x, H(x) ∈ [Hi−1, Hi]}

◮ Our contribution: tune adaptively the boundaries of the strata

slide-8
SLIDE 8

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

  • Num. Appl.: fixed boundaries vs adapted boundaries

◮ Target distribution on R6

π = 1 2N6 (µ, 0.3 Id) + 1 2N6 (−µ, 0.2 Id) µ = [2, · · · , 2]

◮ We compare Hastings-Metropolis (HM); and the EE sampler and the

Adaptive EE sampler when applied with 3 temperatures and 11 strata.

◮ The last plot is for the 2-d projection

  • uT X; vT X
  • with

uT ∝ [1, 1, · · · , 1] vT ∝ [1, 1, 1, −1, −1, −1]

slide-9
SLIDE 9

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Behavior along one path: HM EE A-EE

1 2 3 4 5 6 x 10

5

0.5 1 1.5 2 2.5 L1 error when estimating the means MH EES SA−AEES 1 2 3 4 5 6 x 10

5

30 40 50 60 70 80 90 100 Time spent in the left mode ees aees true

[Top] Error when estimating the means 1 6

6

  • i=1
  • 1

n

n

  • j=1

X(K)

j,i

− Eπ[Xi]

  • [Bottom L] Time spent in one of the mode

where the path is initialized.

[Bottom R] Probability

  • f

being in some ellipsoids, for the first mode (line) and the second one (dashed line)

1 2 3 4 5 6 x 10

5

0.02 0.04 0.06 0.08 0.1 0.12 Probability of being in some area (true prob. is 0.05) mode 1 ees mode 2 ees mode 1 aees mode 2 aees true mode 1 true mode 2

slide-10
SLIDE 10

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Behavior on 50 ind. run HM EE A-EE

1000 5000 25 k 100 k 250 k 400 k 550 k 0.5 1 1.5 2 L1 error for HM (red), EES (black) and AEES (blue) 1000 5000 25 k 100 k 250 k 400 k 550 k 10 20 30 40 50 60 70 80 90 100 Percent of the time spent in the first component, for EES (black) and AEES (blue)

[Top] Error when estimating the means 1 6

6

  • i=1
  • 1

n

n

  • j=1

X(K)

j,i

− Eπ[Xi]

  • [Bottom L] Time spent in one of the mode

where the path is initialized.

[Bottom R] Probability

  • f

being in some ellipsoids for the first mode

1000 5000 25 k 100 k 250 k 400 k 550 k 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Proba of being in the left ellipsoid, for EES (black) and AEES (blue)

slide-11
SLIDE 11

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Adaptive tuning of the boundaries of the energy rings

֒ → How to define the boundaries H1, · · · , HL of the energy rings ? Algorithm

◮ Level 1 (Hot level)

◮ Sample X(1) with target π1/T1 (MCMC). ◮ at each time n, update the boundaries H(1)

n,1, · · · , H(1) n,L computed

from X(1)

1:n

◮ Level 2

◮ Sample X(2) (MCMC step and interaction step) with target π1/T2 .

For the interaction step, use the boundaries H(1)

  • .

◮ at each time n, update the boundaries H(2)

n,1, · · · , H(2) n,L computed

from X(2)

1:n

◮ Repeat until Level K.

slide-12
SLIDE 12

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

On the convergence of such adaptive schemes

Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction:

◮ we assume the process X(k−1) ”converges”. ◮ we prove that the process X(k) has the same convergence properties. ◮ Repeat from level 1 to K.

Tools for the proof:

◮ the conditional distribution L(X(k) n |past(1:k) n−1 ) is P (k) θn−1(X(k) n−1, ·)

P (k) θn (x, dy) = ǫP (k)(x, dy) + (1 − ǫ)K(k) θn (x, dy) K(k) θn (x, A) =

  • A

α(k) θn (x, y) gθn (x, y)θn(dy)

  • gθn (x, z)θn(dz)

+ δx(A)

  • {1 − α(k)

θn (x, y)} gθn (x, y)θn(dy)

  • gθn (x, z)θn(dz)

θn = 1 n n

  • j=1

δ X(k−1) j α(k) θn (x, y) = 1 ∧ π1/Tk−1/Tk−1 (y) π1/Tk−1/Tk−1 (x)

  • gθn (x, z)θn(dz)
  • gθn (y, z)θn(dz)

gθn (x, y) = ”x and y are in the same energy ring with boundaries defined by H(k−1) n,• ” (ex.) =

  • if if x, y are in the same energy level

1 if otherwise

slide-13
SLIDE 13

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

On the convergence of such adaptive schemes

Convergence result: we prove ergodicity and a strong law of large numbers for A-EE. Our approach for the proof is by induction:

◮ we assume the process X(k−1) ”converges”. ◮ we prove that the process X(k) has the same convergence properties. ◮ Repeat from level 1 to K.

Tools for the proof:

◮ the conditional distribution L(X(k) n |past(1:k) n−1 ) is P (k) θn−1(X(k) n−1, ·) ◮ containment and diminishing adaptation conditions extensions from the pioneering

work by (Roberts, Rosenthal (2005)) + Poisson equation + Limit Theorems for

Martingales.

◮ condition on the adapted boundaries

(i) There exists β > 0 s.t. limn nβ

  • H(k)

n,• − H(k) n−1,•

  • = 0 w.p.1.

(ii) H(k)

n,• → H(k) ∞,• w.p.1 when n → ∞.

(iii) assumption on the limiting boundaries: inf

x

  • g(k)

∞ (x, y)π1/Tk(dy) > 0

slide-14
SLIDE 14

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Example of adaptive boundaries

Example of adaptive boundaries: choose exp(−H(k)

i

) for 1 ≤ i ≤ L (computed from X(k)) as the quantiles of order i/(L + 1) of the distribution of π(Z) when Z ∼ π1/Tk

slide-15
SLIDE 15

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Example of adaptive boundaries

Example of adaptive boundaries: choose exp(−H(k)

n,i ) for 1 ≤ i ≤ L (computed from X(k) 1:n) as an estimator

  • f the quantiles of order i/(L + 1) of the distribution of

π(Z) when Z ∼ π1/Tk

slide-16
SLIDE 16

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Example of adaptive boundaries

Example of adaptive boundaries: choose exp(−H(k)

n,i ) for 1 ≤ i ≤ L (computed from X(k) 1:n) as an estimator

  • f the quantiles of order i/(L + 1) of the distribution of

π(Z) when Z ∼ π1/Tk Note that in EE, when using the interacting step to sample X(k)

n ◮ determine the ring such that

Hi−1 ≤ − log π(X(k)

n−1) ≤ Hi ◮ choose (at random) one point among X(k−1) 1

, · · · , X(k−1)

n−1

such that exp(−Hi) ≤ π(X(k−1)

  • ) ≤ exp(−Hi−1)

and accept / reject.

◮ When convergence:

L(X(k−1)

n

) → π1/Tk−1 when n → ∞

slide-17
SLIDE 17

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Quantile estimators

1) A first estimator, is based on the inversion of the empirical cdf F (k)

n (h) = 1

n

n

  • j=1

1π(X(k)

j

)≤h

(+) easy implementation (-) time consuming 2) A second one is based on Stochastic Approximation procedures H(k)

n+1,• = H(k) n,• + γn+1 Ξ

  • X(k)

n+1, H(k) n,•

  • (+) running time

(-) implementation of SA algorithm (choice of the step-size, initialization)

slide-18
SLIDE 18

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

  • Num. Appl.: Adaptive EE

−40 −20 20 40 −30 −20 −10 10 20 30 −40 −20 20 40 −30 −20 −10 10 20 30 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22

1 2 3 4 5 6 7 8 9 10

[left] True density (mixture of Gaussian, same weights); [right] Adaptive EE Frequency of the visit to each com- ponent of the mixture. Boxplot with 50 ind. run

slide-19
SLIDE 19

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

  • Num. Appl.: Motif discovery in DNA sequence

Same model as in the talk of Dawn, yesterday:

◮ a background sequence, with a Markovian transition (known) ◮ motifs, of known length, with independent multinomial transition

(unknown) Here is the result for A-EE and EE

slide-20
SLIDE 20

Parallel tempering and Interacting MCMC algorithms Adaptive Equi-Energy sampler

Conclusion

◮ EE depends on many design parameters that all play a role on the

efficiency of the sampler. We propose an adaptive procedure to tune on the fly the energy rings.

◮ Convergence results are established ∗ when the quantiles are

estimated by inversion of the cdf.

◮ Work in progress: convergence when the quantiles are estimated by

a Stochastic Approximation procedure. Challenging: convergence of SA algorithms when the draws are not Markovian (thanks to M. Vihola).

◮ First convergence results on EE with selection of the auxiliary point

during the interaction step.

∗Submitted, available at http://perso.telecom-paristech.fr/ schreck