Stochastic approximation for adaptive Markov chain Monte Carlo - - PowerPoint PPT Presentation

stochastic approximation for adaptive markov chain monte
SMART_READER_LITE
LIVE PREVIEW

Stochastic approximation for adaptive Markov chain Monte Carlo - - PowerPoint PPT Presentation

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI / CNRS - TELECOM ParisTech, France Stochastic approximation for adaptive


slide-1
SLIDE 1

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms

Gersende FORT

LTCI / CNRS - TELECOM ParisTech, France

slide-2
SLIDE 2

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers

  • I. Examples of adaptive and interacting MCMC samplers
  • 1. Adaptive Hastings-Metropolis algorithm [Haario et al. 1999]
  • 2. Equi-Energy algorithm [Kou et al. 2006]
  • 3. Wang-Landau algorithm [Wang & Landau, 2001]
slide-3
SLIDE 3

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

Adaptive Hastings-Metropolis algorithm

◮ Symmetric Random Walk Hastings-Metropolis algorithm Goal: sample a Markov chain with known stationary distribution π on Rd (known up

to a normalizing constant)

Iterative mecanism: given the current sample Xn,

propose a move to Xn + Y Y ∼ q(· − Xn) accept the move with probability α(Xn, Xn + Y ) = 1 ∧ π(Xn + Y ) π(Xn) and set Xn+1 = Xn + Y ; otherwise, Xn+1 = Xn.

slide-4
SLIDE 4

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

Adaptive Hastings-Metropolis algorithm

◮ Symmetric Random Walk Hastings-Metropolis algorithm Goal: sample a Markov chain with known stationary distribution π on Rd (known up

to a normalizing constant)

Iterative mecanism: given the current sample Xn,

propose a move to Xn + Y Y ∼ q(· − Xn) accept the move with probability α(Xn, Xn + Y ) = 1 ∧ π(Xn + Y ) π(Xn) and set Xn+1 = Xn + Y ; otherwise, Xn+1 = Xn.

Design parameter: how to choose the proposal distribution q ? For example, in the case q(· − x) = Nd(x; θ) how to scale the proposal i.e. how to choose the covariance matrix θ ?

slide-5
SLIDE 5

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

“goldilock principle” Too small, too large, better variance

500 1000 −3 −2 −1 1 2 3 50 100 0.2 0.4 0.6 0.8 1 500 1000 −1 −0.5 0.5 1 1.5 2 2.5 50 100 0.2 0.4 0.6 0.8 1 500 1000 −3 −2 −1 1 2 3 50 100 −0.2 0.2 0.4 0.6 0.8 1 1.2

slide-6
SLIDE 6

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

◮ Adaptive Hastings-Metropolis algorithm(s) Based on theoretical results [Gelman et al. 1996; · · · ] when the proposal is Gaussian Nd(x, θ), choose θ as the covariance structure of π [Haario et al. 1999]: θ ∝ Σπ . In practice, Σπ is unknown and this quantity is computed “online” with the past samples of the chain θn+1 = n n + 1 θn + 1 n + 1 n (Xn+1 − µn+1)(Xn+1 − µn+1)T +κ Idd

  • where µn+1 is the empirical mean.

κ > 0, prevent from badly scaled matrix

slide-7
SLIDE 7

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

◮ Adaptive Hastings-Metropolis algorithm(s) Based on theoretical results [Gelman et al. 1996; · · · ] when the proposal is Gaussian Nd(x, θ), choose θ as the covariance structure of π [Haario et al. 1999]: θ ∝ Σπ . In practice, Σπ is unknown and this quantity is computed “online” with the past samples of the chain θn+1 = n n + 1 θn + 1 n + 1 n (Xn+1 − µn+1)(Xn+1 − µn+1)T +κ Idd

  • where µn+1 is the empirical mean.

κ > 0, prevent from badly scaled matrix

OR such that the mean acceptance rate converges to α⋆ [Andrieu & Robert 2001]. In practice this θ is unknown and this parameter is adapted during the run of the algorithm θn = τnId with log τn+1 = log τn + γn+1 (αn+1 − α⋆) where αn is the mean acceptance rate. OR · · ·

slide-8
SLIDE 8

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

◮ In practice, simultaneous adaptation of the design parameter and simulation. Given the current value of the chain Xn and the design parameter θn Draw the next sample Xn+1 with the transition kernel Pθn(Xn, ·). Update the design parameter: θn+1 = Ξn+1(θn, Xn+1, ·).

slide-9
SLIDE 9

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Adaptive Hastings-Metropolis algorithm

◮ In practice, simultaneous adaptation of the design parameter and simulation. Given the current value of the chain Xn and the design parameter θn Draw the next sample Xn+1 with the transition kernel Pθn(Xn, ·). Update the design parameter: θn+1 = Ξn+1(θn, Xn+1, ·). ◮ In this MCMC context, we are interested in the behavior of the chain {Xn, n ≥ 0} e.g. Convergence of the marginals: E [f(Xn)] → π(f) for f bounded. Law of large numbers: n−1 Pn

k=1 f(Xk) → π(f)

(a.s. or P) Central limit theorem but we have πPθ = π for any θ: all the transition kernels have the same inv. distribution π so, stability / convergence of the adaptation process {θn, n ≥ 0} is not the main issue.

slide-10
SLIDE 10

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

Equi-Energy sampler

◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π. How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ?

slide-11
SLIDE 11

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

Equi-Energy sampler

◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π. How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ? ◮ Idea: (a) build an auxiliary process that moves between the modes far more easily and (b) define the process of interest by running a “classical” MCMC algorithm and sometimes, choose a value of the auxiliary process as the new value of the process of interest: draw a point at random + acceptation-rejection mecanism

slide-12
SLIDE 12

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

Equi-Energy sampler

◮ Proposed by Kou et al. (2006) for the simulation of multi-modal density π. How to define a sampler that both allows local moves for a local exploration of the density. and large jumps in order to visit other modes of the target ? ◮ Idea: (a) build an auxiliary process that moves between the modes far more easily and (b) define the process of interest by running a “classical” MCMC algorithm and sometimes, choose a value of the auxiliary process as the new value of the process of interest: draw a point at random + acceptation-rejection mecanism How to define such an auxiliary process ? Ans.: as a process with stationary distribution πβ (β ∈ (0, 1)), a tempered version of the target π.

slide-13
SLIDE 13

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

◮ On an example: a K-stage Equi-Energy sampler.

1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 Target density : mixture of 2−dim Gaussian draws means of the components

target density: π = P20

i=1 N2(µi, Σi)

K auxiliary processes: with targets π1/Ti T1 > T2 > · · · > TK+1 = 1

−2 2 4 6 8 10 12 −4 −2 2 4 6 8 10 12 14 Target density at temperature 1 draws means of the components −2 2 4 6 8 10 12 −2 2 4 6 8 10 12 Target density at temperature 2 draws means of the components 1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 12 Target density at temperature 3 draws means of the components 1 2 3 4 5 6 7 8 9 10 −2 2 4 6 8 10 12 Target density at temperature 4 draws means of the components 1 2 3 4 5 6 7 8 9 −2 2 4 6 8 10 Target density at temperature 5 draws means of the components 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 10 Hastings−Metropolis draws means of the components

slide-14
SLIDE 14

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

◮ Algorithm: (2 stages) Repeat: Update the adaptation process

θn = 1 n

n−1

X

k=0

δYk

where {Yn, n ≥ 0} is the auxiliary process with stationary distribution πβ. Update the process of interest with transition : Xn+1 ∼ Pθn(Xn, ·) where

Pθn(x, A) = (1−ǫ)P (x, A)+ǫ 8 > > < > > : Z

A

α(x, y)

| {z } accept/reject mecanism

θn(dy) + δx(A) Z (1 − α(x, y))θn(dy) 9 > > = > > ;

slide-15
SLIDE 15

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

◮ Algorithm: (2 stages) Repeat: Update the adaptation process

θn = 1 n

n−1

X

k=0

δYk

where {Yn, n ≥ 0} is the auxiliary process with stationary distribution πβ. Update the process of interest with transition : Xn+1 ∼ Pθn(Xn, ·) where

Pθn(x, A) = (1−ǫ)P (x, A)+ǫ 8 > > < > > : Z

A

α(x, y)

| {z } accept/reject mecanism

θn(dy) + δx(A) Z (1 − α(x, y))θn(dy) 9 > > = > > ;

◮ In this example, πPθ = π BUT πPπβ = π i.e. asymptotically, when θn “is” πβ, the process of interest {Xn, n ≥ 0} behaves like a Markov chain with invariant distribution π.

slide-16
SLIDE 16

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Equi-Energy sampler

◮ Algorithm: (2 stages) Repeat: Update the adaptation process

θn = 1 n

n−1

X

k=0

δYk

where {Yn, n ≥ 0} is the auxiliary process with stationary distribution πβ. Update the process of interest with transition : Xn+1 ∼ Pθn(Xn, ·) where

Pθn(x, A) = (1−ǫ)P (x, A)+ǫ 8 > > < > > : Z

A

α(x, y)

| {z } accept/reject mecanism

θn(dy) + δx(A) Z (1 − α(x, y))θn(dy) 9 > > = > > ;

◮ In this example, πPθ = π BUT πPπβ = π i.e. asymptotically, when θn “is” πβ, the process of interest {Xn, n ≥ 0} behaves like a Markov chain with invariant distribution π. In this MCMC context, we are again interested in the behavior of {Xn, n ≥ 0} but convergence of θn is crucial since the algorithm is designed to “sample from π” only when θn = πβ.

slide-17
SLIDE 17

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Wang-Landau algorithm

Wang-Landau algorithm

◮ Proposed by Wang & Landau (2001) to favor the moves between elements of a partition of the state space, when the weights of these elements are unknown. Goal:

sample a chain on Qd

i=1(Xi × {i}) with stationary distribution

Π(Ai × {i}) = 1 d Z

Ai

hi(x) θ⋆(i) ✶Xi(x) dx , when θ⋆ is unknown and/ or estimate the normalizing constants θ⋆(i). ✶

slide-18
SLIDE 18

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Wang-Landau algorithm

Wang-Landau algorithm

◮ Proposed by Wang & Landau (2001) to favor the moves between elements of a partition of the state space, when the weights of these elements are unknown. Goal:

sample a chain on Qd

i=1(Xi × {i}) with stationary distribution

Π(Ai × {i}) = 1 d Z

Ai

hi(x) θ⋆(i) ✶Xi(x) dx , when θ⋆ is unknown and/ or estimate the normalizing constants θ⋆(i).

Tool :

A family of transition kernels Pθ on Qd

i=1(Xi × {i})

where θ = (θ(1), · · · , θ(d)) is a probability on {1, · · · , d} with invariant distribution known up to a normalizing constant Πθ(Ai × {i}) = @

d

X

j=1

θ⋆(j) θ(j) 1 A

−1 Z Ai

π(x) θ(i) ✶Xi(x) dx ,

slide-19
SLIDE 19

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Wang-Landau algorithm

◮ Algorithm: repeat Draw (Xn+1, In+1) ∼ Pθn((Xn, In), ·) Update the adaptation process θn+1(i) ∝ θn(i) + γn+1θn(i)✶In+1(i) ✶

slide-20
SLIDE 20

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Wang-Landau algorithm

◮ Algorithm: repeat Draw (Xn+1, In+1) ∼ Pθn((Xn, In), ·) Update the adaptation process θn+1(i) ∝ θn(i) + γn+1θn(i)✶In+1(i) ◮ In this MCMC context, we are also interested in the convergence of the sequence {θn, n ≥ 0}: at a first order, θn+1(i) ≈ θn(i) + γn+1θn(i) “ ✶In+1(i) − θn(In+1) ”

and when (Xn, In) ∼ Πθn E » θn(i) „ ✶In+1 (i) − θn(In+1) « |Fn – = @ d X j=1 θ⋆(j) θn(j) 1 A −1 (θ⋆(i) − θn(i))

i.e. {θn, n ≥ 0} should converge to θ⋆ !

slide-21
SLIDE 21

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Conclusion

Conclusion (I)

In adaptive MCMC, given a family of transition kernels {Pθ, θ ∈ Θ} ergodic with invariant distribution πθ we define a bivariate process {(Xn, θn), n ≥ 0} such that P (Xn+1 ∈ ·|Fn) = Pθn(Xn, ·) θn is updated s.t. it should converge to θ⋆ Two cases: πθ = π for any θ OR πθ⋆ = π. What kind of conditions on the adaptation mecanism, in order the process {Xn, n ≥ 0} to converge to the target distribution π ? In the sequel, “convergence” means “ convergence of the marginals” E [f(Xn)] → π(f) f bounded

slide-22
SLIDE 22

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Examples of adaptive MCMC samplers Conclusion

Conclusion (II)

Trois exemples illustrant des situations diff´ erente :

1

Hastings Metropolis adaptatif :

tous les noyaux Pθ ont mˆ eme mesure invariante π.

2

Equi-Energy sampler :

Chaque noyau Pθ a sa propre mesure invariante πθ. On sait que πθ existe mais on n’a pas d’expression explicite (r´ egularit´ e en θ · · · )

3

Wang-Landau :

Chaque noyau Pθ a sa propre mesure invariante πθ. On a l’expression de πθ (en fonction de θ).

slide-23
SLIDE 23

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers

  • II. Convergence of adaptive / interacting MCMC samplers

(Joint work with E. Moulines (Telecom ParisTech, France) and P. Priouret (Paris VI, France))

slide-24
SLIDE 24

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers L’adaptation peut d´ etruire la convergence

L’adaptation peut d´ etruire la convergence

◮ Consider a family of transition kernels on {0, 1}: Pθ = 1 − θ θ θ 1 − θ ! θ ∈ (0, 1) Then, for any θ ∈ (0, 1), πPθ = π with π = (1/2; 1/2). ◮ Choose t0, t1 ∈ (0, 1). Define the adaptive process: ( Xn+1 ∼ Pθn(Xn, ·) θn+1 = tXn+1 Then, the transition kernel is 1 − t0 t0 t1 1 − t1 ! and the invariant distribution is π ∝ (t1, t0).

slide-25
SLIDE 25

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Conditions pour la convergence

On ´ ecrit

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − πθn−N (f)

i + E h πθn−N (f) i − π(f)

Soit trois groupes de conditions

1

Terme 1 : dˆ u ` a l’adaptation: comparaison du processus adapt´ e ` a une chaˆ ıne “gel´ ee” (que l’on n’adapte plus).

2

Terme 2 : ergodicit´ e des noyaux de transitions Pθ.

3

Terme 3 : uniquement si πθ = π; c’est le plus d´ elicat · · · surtout lorsque l’expression de πθ n’est pas connue.

slide-26
SLIDE 26

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − πθn−N (f)

i + E h πθn−N (f) i − π⋆(f)

◮ [Terme 3] quand πθ = π⋆, conditions pour limn πθn(f) = π⋆(f) Puisque πθ⋆+∆(f) − πθ⋆(f) = πθ⋆ (Pθ⋆+∆ − Pθ⋆) (I − Pθ⋆)−1 (I − πθ⋆) (f) + “Reste” la convergence de {πθn(f), n ≥ 0} vers πθ⋆(f) est une cons´ equence de la convergence des noyaux Pθn vers Pθ⋆.

slide-27
SLIDE 27

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − πθn−N (f)

i + E h πθn−N (f) i − π⋆(f)

◮ [Terme 3] quand πθ = π⋆, conditions pour limn πθn(f) = π⋆(f) Puisque πθ⋆+∆(f) − πθ⋆(f) = πθ⋆ (Pθ⋆+∆ − Pθ⋆) (I − Pθ⋆)−1 (I − πθ⋆) (f) + “Reste” la convergence de {πθn(f), n ≥ 0} vers πθ⋆(f) est une cons´ equence de la convergence des noyaux Pθn vers Pθ⋆. Cas favorable : convergence en “norme op´ erateur”. Sinon : r´ esultats plus fins! Par exemple, si on sait que ∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A)

que peut-on en d´ eduire sur limn πθn(f) ?

slide-28
SLIDE 28

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Starting from :

∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A) .

slide-29
SLIDE 29

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Starting from :

∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A) .

the steps are:

∀x ∈ X, ∃Ωx, P(Ωx) = 1 ∀ω ∈ Ωx lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·)

֒ → Tool: separable metric space X (ex. Polish)

slide-30
SLIDE 30

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Starting from :

∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A) .

the steps are:

∀x ∈ X, ∃Ωx, P(Ωx) = 1 ∀ω ∈ Ωx lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·)

֒ → Tool: separable metric space X (ex. Polish)

∃Ω′, P(Ω′) = 1 ∀ω ∈ Ω′, x ∈ X lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·) ,

֒ → Tool: Polish space X + equicontinuity of {Pθf − Pθ⋆f, θ ∈ Θ}

slide-31
SLIDE 31

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Starting from :

∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A) .

the steps are:

∀x ∈ X, ∃Ωx, P(Ωx) = 1 ∀ω ∈ Ωx lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·)

֒ → Tool: separable metric space X (ex. Polish)

∃Ω′, P(Ω′) = 1 ∀ω ∈ Ω′, x ∈ X lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·) ,

֒ → Tool: Polish space X + equicontinuity of {Pθf − Pθ⋆f, θ ∈ Θ}

∃Ω⋆, P(Ω⋆) = 1 ∀ω ∈ Ω⋆ lim

n P k θn(ω)(x, ·) D

− → P k

θ⋆(x, ·) ,

֒ → Tool: Feller properties of the kernels {Pθ, θ ∈ Θ}

slide-32
SLIDE 32

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

Starting from :

∀x ∈ X, A ∈ X, ∃Ωx,A, P(Ωx,A) = 1 ∀ω ∈ Ωx,A lim

n Pθn(ω)(x, A) = Pθ⋆(x, A) .

the steps are:

∀x ∈ X, ∃Ωx, P(Ωx) = 1 ∀ω ∈ Ωx lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·)

֒ → Tool: separable metric space X (ex. Polish)

∃Ω′, P(Ω′) = 1 ∀ω ∈ Ω′, x ∈ X lim

n Pθn(ω)(x, ·) D

− → Pθ⋆(x, ·) ,

֒ → Tool: Polish space X + equicontinuity of {Pθf − Pθ⋆f, θ ∈ Θ}

∃Ω⋆, P(Ω⋆) = 1 ∀ω ∈ Ω⋆ lim

n P k θn(ω)(x, ·) D

− → P k

θ⋆(x, ·) ,

֒ → Tool: Feller properties of the kernels {Pθ, θ ∈ Θ} Then

|πθn(f) − πθ⋆(f)| ≤ |P k

θnf(x) − πθn(f)| + |P k θ⋆f(x) − πθ⋆(f)| +

˛ ˛ ˛P k

θnf(x) − P k θ⋆f(x)

˛ ˛ ˛

֒ → Tool: ergodicity

slide-33
SLIDE 33

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − πθn−N (f)

i + E h πθn−N (f) i − π(f)

◮ [Terme 2] condition on the ergodicity of the transition kernels “Usually”, the transition kernels {Pθ, θ ∈ Θ} are geometrically ergodic : sup

f,|f|∞≤1

|P n

θ f(x) − πθ(f)| ≤ Cθ ρn θ V (x)

ρθ ∈ (0, 1) BUT the rate of convergence may depend upon θ · · · in such a way that ρθ → 1 when θ → ∂Θ . Therefore, the rate at which θn → ∂Θ has to be controlled.

slide-34
SLIDE 34

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

En pratique, le contrˆ

  • le de l’ergodicit´

e se fait via des conditions de drift + minoration

Si PθV ≤ λθV + bθ Pθ(x, ·) ≥ δθνθ(·) alors P n

θ (x, ·) − πθTV ≤ Cθ ρn θ V (x)

  • `

u Cθ ∨ (1 − ρθ)−1 ≤ C “ bθ ∨ δ−1

θ

∨ (1 − λθ)−1”3

slide-35
SLIDE 35

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

En pratique, le contrˆ

  • le de l’ergodicit´

e se fait via des conditions de drift + minoration

Si PθV ≤ λθV + bθ Pθ(x, ·) ≥ δθνθ(·) alors P n

θ (x, ·) − πθTV ≤ Cθ ρn θ V (x)

  • `

u Cθ ∨ (1 − ρθ)−1 ≤ C “ bθ ∨ δ−1

θ

∨ (1 − λθ)−1”3

et la non-d´ et´ erioration de ce contrˆ

  • le quand θ → ∂Θ se r`

egle au cas par cas

imposer que le param` etre θ “reste loin des bords” (reprojection dans un compact par ex.) laisser vivre la proc´ edure d’adaptation mais contrˆ

  • ler la croissance

[Vihola & Saksman, 2010], [Vihola, 2010]

  • Ex. pour HM adaptatif, [Vihola & Saksman, 2010] montrent que

Cθ ∨ (1 − ρθ)−1 ≤ c √ detθ ∀τ > 0, n−τ |θn| < +∞ p.s.

slide-36
SLIDE 36

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Conditions for convergence of the marginals

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − πθn−N (f)

i + E h πθn−N (f) i − π(f)

◮ [Terme 1] condition sur le m´ ecanisme d’adaptation since

˛ ˛ ˛E h f(Xn) − P N

θn−N f(Xn−N)

i˛ ˛ ˛ ≤

N−1

X

j=1

(N − j)E 2 6 6 6 6 4 sup

x

‚ ‚ ‚Pθn−N+j (x, ·) − Pθn−N+j−1(x, ·) ‚ ‚ ‚

TV

| {z }

“distance” between two successive transition kernels

3 7 7 7 7 5

Therefore, the adaptation has to be diminishing.

slide-37
SLIDE 37

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Adaptation and Ergodicity

Adaptation and Ergodicity

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − π(f)

i

◮ Example: Xn+1 ∼ Pθn(Xn, ·) θn = n−1/4 Pθ = 1 − θ θ θ 1 − θ ! In this case, since θn → 0

˛ ˛ ˛E h P N

θn−N f(Xn−N) − π(f)

i˛ ˛ ˛ ≤ |1 − 2θn−N|N → 1 N is fixed

slide-38
SLIDE 38

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Adaptation and Ergodicity

Adaptation and Ergodicity

E [f(Xn)] − π(f) = E h f(Xn) − P N

θn−N f(Xn−N)

i + E h P N

θn−N f(Xn−N) − π(f)

i

◮ Example: Xn+1 ∼ Pθn(Xn, ·) θn = n−1/4 Pθ = 1 − θ θ θ 1 − θ ! In this case, since θn → 0,

˛ ˛ ˛E h P Nn

θn−Nn f(Xn−Nn) − π(f)

i˛ ˛ ˛ ≤ |1 − 2θn−Nn|Nn → 0 for convenient Nn

Therefore, we choose N depending upon n: Nn → +∞ and the adaptation has to be such that

Nn−1

X

j=1

(Nn − j)E 2 6 6 6 4sup

x

‚ ‚ ‚Pθn−Nn+j (x, ·) − Pθn−Nn+j−1(x, ·) ‚ ‚ ‚

TV

| {z }

“distance” between two successive transition kernels

3 7 7 7 5 → 0 ֒ → The “rate” of adaptation depends on the ergodic behavior of the transition kernels

slide-39
SLIDE 39

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Convergence of adaptive/interacting MCMC samplers Adaptation and Ergodicity

  • III. Conclusion
slide-40
SLIDE 40

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Conclusion

Tools for convergence of adaptive MCMC samplers

1

Markov chain theory (ergodicity, Poisson equation, · · · )

2

Stochastic approximation (stability/convergence, control of “non-stability”)

slide-41
SLIDE 41

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Conclusion

Tools for convergence of adaptive MCMC samplers

1

Markov chain theory (ergodicity, Poisson equation, · · · )

2

Stochastic approximation (stability/convergence, control of “non-stability”) ◮ When the transition kernels have the same invariant distribution π Ergodicity of transition kernels. Diminishing adaptation

  • Ex. convergence of {θn, n ≥ 0} is not required BUT the control of “divergence to ∂Θ” in needed
slide-42
SLIDE 42

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Conclusion

Tools for convergence of adaptive MCMC samplers

1

Markov chain theory (ergodicity, Poisson equation, · · · )

2

Stochastic approximation (stability/convergence, control of “non-stability”) ◮ When the transition kernels have the same invariant distribution π Ergodicity of transition kernels. Diminishing adaptation

  • Ex. convergence of {θn, n ≥ 0} is not required BUT the control of “divergence to ∂Θ” in needed

◮ When they have their own invariant distribution and πθ⋆ = π. Ergodicity, Diminishing adaptation Convergence of θn to θ⋆

slide-43
SLIDE 43

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Conclusion

Proc´ edures d’approximation stochastique

Est-il n´ ecessaire de modifier l’adaptation pour que la proc´ edure d’approximation stochastique soit r´ ecurrente soit p.s. born´ ee (stabilit´ e) converge vers l’ensemble d’int´ erˆ et. par exemple en introduisant une reprojection sur un compact fixe une reprojection sur des compacts croissants ... doubl´ ee d’une “troncation” de la chaˆ ıne ֒ → pas toujours utile de forcer la r´ ecurrence / stabilit´ e puisqu’on sait s’accomoder d’une non-stabilit´ e du param` etre · · · ֒ → travaux de recherche en cours pour ´ eviter ces reprojections / troncations.

slide-44
SLIDE 44

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms References

Some references

1

Adaptive MCMC (methodologies)

  • C. Andrieu, J. Thoms. Statistics and Computing, 2008.

J.S. Rosenthal. MCMC handbook, 2009.

  • Y. Atchad´

e, G. Fort, E. Moulines, P. Priouret. Time sSeries book, 2010.

2

General results for convergence of adaptive MCMC

  • C. Andrieu, E. Moulines. Ann. Appl. Probab., 2006.

G.O. Roberts, J.S. Rosenthal. J. Appl. Probab. 2007.

  • Y. Atchad´

e, G. Fort. Stoch. Processes Appl., 2010.

  • G. Fort, E. Moulines, P. Priouret. Preprint, 2010.

3

Convergence of some adaptive MCMC

  • C. Andrieu, A. Jasra, A. Doucet, P. DelMoral Preprint 2007
  • Y. Atchad´

e Statistica Sinica, 2010

  • E. Saksman, M. Vihola, Ann. Appl. Probab. 2010
  • M. Vihola Preprint, 2010