Partial ordering of inhomogeneous Markov chains with applications to - - PowerPoint PPT Presentation

partial ordering of inhomogeneous markov chains with
SMART_READER_LITE
LIVE PREVIEW

Partial ordering of inhomogeneous Markov chains with applications to - - PowerPoint PPT Presentation

Introduction Main result Applications Conclusion Partial ordering of inhomogeneous Markov chains with applications to Markov chain Monte Carlo methods Jimmy Olsson Department of Mathematics KTH Institute of Technology Stockholm, Sweden


slide-1
SLIDE 1

logga Introduction Main result Applications Conclusion

Partial ordering of inhomogeneous Markov chains with applications to Markov chain Monte Carlo methods

Jimmy Olsson

Department of Mathematics KTH Institute of Technology Stockholm, Sweden Based on joint work with Florian Maire and Randal Douc

MCMski IV 7 January 2014, Chamonix

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-2
SLIDE 2

logga Introduction Main result Applications Conclusion

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-3
SLIDE 3

logga Introduction Main result Applications Conclusion

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-4
SLIDE 4

logga Introduction Main result Applications Conclusion

Markov Chain Monte Carlo (MCMC) methods

Let π be some target distribution on some state space (X, X) and assume that π is known up to a multiplicative constant only. Given π, MCMC methods allow a Markov chain (Xn)n with stationary distribution π to be generated. Expectations π(f) :=

  • f(x) π(dx) are estimated using sample averages

ˆ πn(f) := 1 n

n−1

  • k=0

f(Xk). Recall that a Markov transition kernel P on (X, X) is π-reversible if π(dx) P(x, dy) = π(dy) P(y, dx).

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-5
SLIDE 5

logga Introduction Main result Applications Conclusion

Comparison of π-reversible Markov chains

To measure the performance of an MCMC sampler with transition kernel P and stationary distribution π we consider the asymptotic variance v(f, P) := lim

n→∞ Var

√ n ˆ πn(f)

  • = lim

n→∞

1 nVar n−1

  • k=0

f(Xk)

  • .

For given π-reversible kernels P0 and P1 we would like to find some easily-checked conditions for when v(f, P0) ≥ v(f, P1) for all f belonging to some large class of objective functions.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-6
SLIDE 6

logga Introduction Main result Applications Conclusion

Comparison of π-reversible Markov chains (cont’d)

Definition Let P0 and P1 be π-reversible Markov kernels on (X, X). We say that P1 dominates P0 (i) on the off-diagonal, written P1 P0, if for all (x, A) ∈ X ×X, P1(x, A \ {x}) ≥ P0(x, A \ {x}). (ii) in the covariance ordering, written P1 P0, if for all f ∈ L2(π),

  • f(x)f(y) P1(x, dy) π(dx) ≤
  • f(x)f(y) P0(x, dy) π(dx).

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-7
SLIDE 7

logga Introduction Main result Applications Conclusion

Comparison of π-reversible Markov chains (cont’d)

With these definitions the following chain of implications holds true. Theorem (P . H. Peskun [2] and L. Tierney [3]) Let P0 and P1 be π-reversible Markov kernels on (X, X). Then P1 P0 ⇒ P1 P0 ⇒

  • v(f, P0) ≥ v(f, P1),

∀f ∈ L2(π)

  • .

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-8
SLIDE 8

logga Introduction Main result Applications Conclusion

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-9
SLIDE 9

logga Introduction Main result Applications Conclusion

Comparison of inhomogeneous chains

Theorem Let (X (0)

n )n and (X (1) n )n be Markov chains evolving as

π ∼ X (i)

Pi

− → X (i)

1 Qi

− → X (i)

2 Pi

− → X (i)

3 Qi

− → · · · where (i) the Pi and Qi are π-reversible, (ii) P1 P0 and Q1 Q0. Then for all f ∈ L2(π) satisfying a weak summability condition, lim

n→∞

1 nVar n−1

  • k=0

f(X (1)

k

)

  • ≤ lim

n→∞

1 nVar n−1

  • k=0

f(X (0)

k

)

  • .

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-10
SLIDE 10

logga Introduction Main result Applications Conclusion

The summability condition

The result holds for all f ∈ L2(π) such that for i ∈ {0, 1},

  • k=1
  • |Cov(f(X (i)

0 ), f(X (i) k ))| + |Cov(f(X (i) 1 ), f(X (i) k+1))|

  • < ∞.

(∗) The condition (∗)

implies that the asymptotic variances exist and are finite. holds when each product PiQi is V-geometrically ergodic. is not a necessary condition.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-11
SLIDE 11

logga Introduction Main result Applications Conclusion

Two slides on the proof (cont’d)

The proof of L. Tierney [3] uses spectral theory. It is however, under the summability condition, possible to replicate Tierney’s result without spectral theory by

1 showing that for π-reversible kernels P,

v(f, P) = πf 2 − π2f + 2

  • n=1

CovP(X0, Xn),

2 setting, using the notation f, g :=

  • f(x)g(x) π(dx),

Pα := (1 − α)P0 + αP1 and wλ(f, Pα) :=

  • n=1

λn f, Pn

αf ,

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-12
SLIDE 12

logga Introduction Main result Applications Conclusion

Two slides on the proof (cont’d)

3 and showing that for all λ ∈ (0, 1), α → wλ(f, Pα) is

decreasing, yielding wλ(f, P1) ≤ wλ(f, P0). To this aim, show that there is a function f ∗

α ∈ L2(π) such

that ∂wλ(f, Pα) ∂α = f ∗

α, λ(P1 − P0)f ∗ α = λ (f ∗ α, P1f ∗ α − f ∗ α, P0f ∗ α) ≤ 0.

4 Finally, apply the dominated convergence theorem as

λ → 1.

Very roughly, the proof of the inhomogeneous case follows the same lines, by splitting sums into even and odd terms (with distributions governed by Pi resp. Qi).

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-13
SLIDE 13

logga Introduction Main result Applications Conclusion

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-14
SLIDE 14

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-15
SLIDE 15

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Data augmentation

In many applications the density of the target π is analytically intractable or too expensive to evaluate. Often a way of coping with this problem is to augment the data by an auxiliary variable U and consider the extended target ˜ π(dy × du) := π(dy) R(y, du), where R is some Markov kernel, having the desired distribution π as marginal distribution. A typical example is Bayesian inference in models with latent variables (such as HMM and mixture models) where π and U play the roles of posterior resp. unobserved data.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-16
SLIDE 16

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Metropolis-Hastings (MH) for data augmentation

Algorithm (Freeze)

Given (Y (1)

k

, Uk),

1 generate Y ∗ ∼ s((Y (1)

k

, Uk), y) dy,

2 generate U∗ ∼ t((Y (1)

k

, Uk, Y ∗), u) du,

3 let (Y (1)

k+1, Uk+1) ←

  • (Y ∗, U∗)
  • w. pr. α(Y (1)

k

, Uk, Y ∗, U∗), (Y (1)

k

, Uk)

  • therwise.

Here α(Y (1)

k

, Uk, Y ∗, U∗) equals 1∧ π(Y ∗) r(Y ∗, U∗) s((Y ∗, U∗), Y (1)

k

) t((Y ∗, U∗, Y (1)

k

), Uk) π(Y (1)

k

) r(Y (1)

k

, Uk) s((Y (1)

k

, Uk), Y ∗) t((Y (1)

k

, Uk, Y ∗), U∗) .

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-17
SLIDE 17

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

The Systematic Refreshment Algorithm

In some cases it is possible to sample from R( · , du) = r( · , u) du; then, “refresh” systematically Uk by a random draw ˜ U according to R. Algorithm (Systematic Refreshment)

Given Y (2)

k

,

1 generate ˜

U ∼ r(Y (2)

k

, u) du,

2 generate Y ∗ ∼ s((Y (2)

k

, ˜ U), y) dy,

3 generate U ∼ t((Y (2)

k

, ˜ U, Y ∗), u) du,

4 let Y (2)

k+1 ←

  • Y ∗
  • w. pr. α(Y (2)

k

, ˜ U, Y ∗, U), Y (2)

k

  • therwise.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-18
SLIDE 18

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

The Systematic Refreshment Algorithm (cont’d)

On the contrary to the Freeze Algorithm, the marginal process (Y (2)

n )n is a Markov chain which can be proved to

be π-reversible. However, it is not a standard MH chain due to the use of the auxiliary variables. It can be shown that the algorithm covers the frameworks

  • f, e.g., randomized MCMC and generalized multiple-try

Metropolis. When comparing the performances of the Systematic Refreshment and Freeze Algorithms, the classical results

  • f P

. H. Peskun [2] and L. Tierney [3] do not apply as (Y (1)

n )n is not even a Markov chain.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-19
SLIDE 19

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Inhomogeneous embedding

Key observation: the chains (Y (1)

n )n and (Y (2) n )n can be

embedded into inhomogeneous Markov chains (X (1)

n )n and

(X (2)

n )n defined by

X (i)

2k =

  • Y (i)

k

U(i)

k

  • Pi

− → X (i)

2k+1 =

  • ˇ

Y (i)

k

ˇ U(i)

k

  • Qi

− → X (i)

2k+2 =

  • Y (i)

k+1

U(i)

k+1

  • Pi

− → · · · where

P1 is the identity kernel, Q1 = Q2 describe transitions according to the Freeze Algorithm, P2 modifies—“refreshes”—the second component according to ˇ U(2)

k

∼ R(Y (2)

k

, · ) while keeping ˇ Y (2)

k

= Y (2)

k

.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-20
SLIDE 20

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Inhomogeneous embedding (cont’d)

Here the Pi and Qi are ˜ π-reversible as

the identity kernel P1 is reversible w.r.t. any probability measure, Q1 = Q2 are ˜ π-reversible as standard MH kernels, P2 is ˜ π-reversible as a Gibbs sub-step transition kernel for the target ˜ π(dy × du) = π(dy) R(y, du).

In addition,

P2 P1, as P1 has no off-diagonal mass, trivially, Q2 Q2 = Q1.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-21
SLIDE 21

logga Introduction Main result Applications Conclusion Data augmentation-type MCMC methods

Freeze vs Systematic Refreshment

Thus, for the output (Y (1)

n )n and (Y (2) n )n of the Freeze resp.

Systematic Refreshment Algorithms we obtain, using our main result, the following. Corollary For all f ∈ L2(π) satisfying the summability assumption it holds that lim

n→∞

1 nVar n−1

  • k=0

f(Y (2)

k

)

  • ≤ lim

n→∞

1 nVar n−1

  • k=0

f(Y (1)

k

)

  • .

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-22
SLIDE 22

logga Introduction Main result Applications Conclusion Pseudo-marginal methods

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-23
SLIDE 23

logga Introduction Main result Applications Conclusion Pseudo-marginal methods

Pseudo-marginal methods

Nevertheless, sampling from R is infeasible in general. Hence, pseudo-marginal methods use importance sampling based on some proposal ˜ r(y, u) du and the corresponding importance weight wu(y) := r(y, u) ˜ r(y, u). Due to the use of importance sampling, the output is not π-reversible (except when wu(y) ≡ 1).

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-24
SLIDE 24

logga Introduction Main result Applications Conclusion Pseudo-marginal methods

Pseudo-marginal methods (cont’d)

The Monte Carlo within Metropolis algorithm is covered by this framework, where ˜ U and U correspond to MC samples used for approximating π(Yk) resp. π(Y ∗). Shares the good mixing properties of the Systematic Refreshment Algorithm at the cost of bias for finite MC sample sizes [1]. A way of coping with the bias is to propagate—“recycle”—as in the Grouped-independent MH algorithm, also the MC approximations through the

  • algorithm. We are then back to the Freeze Algorithm!

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-25
SLIDE 25

logga Introduction Main result Applications Conclusion Pseudo-marginal methods

The Random Refreshment Algorithm

Algorithm (Random Refreshment)

Given (Y (3)

k

, U(3)

k ),

1

(i) generate ˜ U∗ ∼ ˜ r(Yk, u) du, (ii) let ˜ U ← ˜ U∗

  • w. pr. ̺(Y (3)

k , U(3) k , ˜

U∗), U(3)

k

  • therwise,

(∗)

2 generate Y ∗ ∼ s((Yk, ˜

U), y) dy,

3 generate U∗ ∼ t((Yk, ˜

U, Y ∗), u) du,

4 let (Y (3)

k+1, U(3) k+1) ←

  • (Y ∗, U∗)
  • w. pr. α(Yk, ˜

U, U, Y ∗), (Y (3)

k

, U(3)

k )

  • therwise.

(∗) ̺(Y (3)

k

, U(3)

k , ˜

U∗) := 1 ∧ w˜

U∗(Y (3) k

)/wU(3)

k (Y (3)

k

).

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-26
SLIDE 26

logga Introduction Main result Applications Conclusion Pseudo-marginal methods

Random Refreshment vs Freeze

Using a similar inhomogeneous embedding one may establish the following for the output (Y (1)

n )n and (Y (3) n )n of

the Freeze resp. Random Refreshment Algorithms. Corollary

(i) The output of the Random Refreshment Algorithm is indeed ˜ π-reversible. (ii) For all f ∈ L2(π) satisfying the summability assumption, lim

n→∞

1 nVar n−1

  • k=0

f(Y (3)

k

)

  • ≤ lim

n→∞

1 nVar n−1

  • k=0

f(Y (1)

k

)

  • .

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-27
SLIDE 27

logga Introduction Main result Applications Conclusion

Outline

1

Introduction

2

Main result

3

Applications Data augmentation-type MCMC methods Pseudo-marginal methods

4

Conclusion

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-28
SLIDE 28

logga Introduction Main result Applications Conclusion

Conclusion

We have extended successfully the results of Peskun [2] and Tierney [3] to inhomogeneous Markov chains evolving alternatingly according to two different Markov transition kernels. This configuration covers several popular MCMC algorithms such as the Randomized MCMC algorithm, the Multiple-try Metropolis algorithm, the pseudo-marginal algorithms, the sampler of Carlin and Chib etc. As illustrated by our novel Random Refreshment Algorithm in the context of pseudo-marginal methods, the main result can also be used for designing new algorithms and improving, in terms of asymptotic variance, existing ones.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains

slide-29
SLIDE 29

logga Introduction Main result Applications Conclusion

References I

[1]

  • C. Andrieu and G. O. Roberts. “The pseudo-marginal

approach for efficient Monte Carlo computations”. In: The Annals of Statistics 37.2 (2009), pp. 697–725. [2] P . H. Peskun. “Optimum Monte-Carlo sampling using Markov chains”. In: Biometrika 60.3 (Dec. 1973),

  • pp. 607–612.

[3]

  • L. Tierney. “A note on Metropolis-Hastings kernels for

general state spaces”. In: Annals of Applied Probability (1998), pp. 1–9.

Jimmy Olsson KTH Institute of Technology Partial ordering of inhomogeneous Markov chains