Metropolis-Hastings Algorithm for Mixture Model and its Weak - - PowerPoint PPT Presentation

metropolis hastings algorithm for mixture model and its
SMART_READER_LITE
LIVE PREVIEW

Metropolis-Hastings Algorithm for Mixture Model and its Weak - - PowerPoint PPT Presentation

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University of Tokyo, Japan KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 1 / 22 1 Gibbs


slide-1
SLIDE 1

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence

Kengo, KAMATANI

University of Tokyo, Japan

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 1 / 22

slide-2
SLIDE 2

1 Gibbs sampler usually works well. 2 However in certain settings, it works poorly. ex) Mixture model. 3 Fortunately, we found an alternative MCMC method which works

better in simulation.

Problem

Both 2 and 3 are uniformly ergodic. Therefore, to compare those methods, we have to calculate the convergence rates. It is very difficult! Therefore, in Harris recurrence approach, the comparison is difficult. We take another approach.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 2 / 22

slide-3
SLIDE 3

Summary of the talk

  • Sec. 1 I show a bad behavior of the Gibbs sampler.
  • Sec. 2 Define efficiency (consistency) of MCMC. Prove that the

Gibbs sampler has a bad convergence property.

  • Sec. 3 Propose a new MCMC. Prove that the new MCMC is better

than the Gibbs sampler.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 3 / 22

slide-4
SLIDE 4

Note that... Harris recurrence property is also very important for our approach. Without this property, our approach is useless. The another motivation of our approach is to divide two different convergence issues 1) convergence to the local area and 2) consistency Only the mixture model is considered here. However it may be useful to other models.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 4 / 22

slide-5
SLIDE 5

Outline

1 Bad behavior of the Gibbs sampler

Model description Gibbs sampler

2 Efficiency of MCMC

What is MCMC? Consistency Degeneracy

3 MH algorithm converges faster

MH proposal construction MH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 5 / 22

slide-6
SLIDE 6

Outline

1 Bad behavior of the Gibbs sampler

Model description Gibbs sampler

2 Efficiency of MCMC

What is MCMC? Consistency Degeneracy

3 MH algorithm converges faster

MH proposal construction MH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 6 / 22

slide-7
SLIDE 7

Bad behavior of the Gibbs sampler

Model description

1 Consider a model

pX|Θ(dx|θ) = (1 − θ)F0(dx) + θF1(dx).

2 Flip a coin with the proportion of head θ. If the coin is head, generate

x from F1, otherwise, from F0.

3 We do not observe the coin but x. 4 Observation xn = (x1, x2, . . . , xn), xi ∼ pX|Θ(dx|θ0).

Prior distribution pΘ = Beta(α1, α0). We want to calculate the posterior distribution.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 7 / 22

slide-8
SLIDE 8

Bad behavior of the Gibbs sampler

Gibbs sampler

1 Set θ(0) ∈ Θ. 2 yi ∼ Bi(1, pi) (i = 1, 2, . . . , n) where

pi = θ(0)f1(xi) (1 − θ(0))f0(xi) + θ(0)f1(xi). Count m = n

i=1 yi. Fi(dx) = fi(x)dx.

3 Generate θ(1) ∼ Beta(α1 + m, α0 + n − m). 4 Empirical measure of (θ(0), θ(1), . . . , θ(N − 1)) is an estimator of the

posterior distribution. The next figure is a path of the Gibbs sampler when the true model is F0, that is, θ0 = 0.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 8 / 22

slide-9
SLIDE 9

Bad behavior of the Gibbs sampler

Gibbs sampler

200 400 600 800 1000 2 4 6

Path of MCMC

iteration deviance

Figure: Plot of paths of MCMC methods for n = 104. The dashed line is a path from the Gibbs sampler and the solid line is the MH algorithm.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 9 / 22

slide-10
SLIDE 10

Bad behavior of the Gibbs sampler

How to define efficiency

1 MCMC methods produce complicated Markov chain. 2 We make an approximation of MCMC method.

We observe the behavior of MCMC methods when the sample size n → ∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 10 / 22

slide-11
SLIDE 11

Outline

1 Bad behavior of the Gibbs sampler

Model description Gibbs sampler

2 Efficiency of MCMC

What is MCMC? Consistency Degeneracy

3 MH algorithm converges faster

MH proposal construction MH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 11 / 22

slide-12
SLIDE 12

Weak convergence of MCMC

What is MCMC?

Write s instead of θ.

1 For each observation x, Gibbs sampler produces paths

s = (s(0), s(1), . . .) in S∞.

2 In other words, for x ∈ X, Gibbs sampler defines a law G x ∈ P(S∞). 3 Therefore, a Gibbs sampler is a set of probability measures

G = (G x; x ∈ X) (Later, we will consider G as a random variable G(x) = G x). Let ˆ νm(s) be the empirical measure of s(0), . . . , s(m − 1). Let νx be the target distribution for each x. We expect that d(ˆ νm(s), νx) → 0 in a certain sense.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 12 / 22

slide-13
SLIDE 13

Weak convergence of MCMC

Consistency

1 We expect that as m → ∞,

EG(d(ˆ νm(s), ν)) → 0. But G and ν depend on x!.

2 We expect that as m → ∞,

EG x(d(ˆ νm(s), νx)) = oP(1). But G x and νx may depend on n!.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 13 / 22

slide-14
SLIDE 14

Weak convergence of MCMC

Consistency

Definition

(Mn = (Mx

n ); n ∈ N): sequence of MCMC.

We call (Mn; n ∈ N) consistent for νn = (νx

n) if for any m(n) → ∞,

EMx

n (d(ˆ

ν[m(n)](s), νx

n)) = oPn(1).

For a regular model, the Gibbs sampler has consistency with scaling θ → n1/2(θ − θ0).

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 14 / 22

slide-15
SLIDE 15

Weak convergence of MCMC

Degeneracy

Definition

1 If a measure ω ∈ P(S∞) satisfies the following, we call it degenerate:

ω({s; s(0) = s(1) = s(2) = · · · }) = 1 (1)

2 We also call M degenerate (in P) if Mx is degenerate a.s. x. 3 If Mn ⇒ M and M degenerate, we call Mn degenerate in the limit.

The Gibbs sampler Gn for mixture model is degenerate with scaling θ → n1/2θ if θ0 = 0 as n → ∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 15 / 22

slide-16
SLIDE 16

Weak convergence of MCMC

Degeneracy

In fact, Gn tends to a diffusion process type variable with time scaling 0, 1, 2, . . . → 0, n−1/2, 2n−1/2, . . .! Under both space and time scaling, G x

n is similar to the law of

dSt = (α1 + StZn − S2

t I)dt + StdBt

where Zn ⇒ N(0, I) and I is the Fisher information matrix. If we take m(n)n−1/2 → ∞, the empirical measure converges to the posterior distribution. We call Gn n1/2-weakly consistent.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 16 / 22

slide-17
SLIDE 17

Outline

1 Bad behavior of the Gibbs sampler

Model description Gibbs sampler

2 Efficiency of MCMC

What is MCMC? Consistency Degeneracy

3 MH algorithm converges faster

MH proposal construction MH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 17 / 22

slide-18
SLIDE 18

MH algorithm converges faster

MH proposal construction

Construct a posterior distribution for another parametric family:

1 Fix Q ⊂ P(X). 2 For each θ, set

qX|Θ(dx|θ) := argminq∈Qd(pX|Θ(dx|θ), q) where d is a certain metric. ex) Kullback-Leibler distance.

3 Calculate the posterior qn

Θ|Xn(dθ|xn).

Remark

We assume that we can generate θ ∼ qn

Θ|Xn(dθ|xn) in PC.

This construction is similar to

1

quasi Bayes method (See ex. Smith and Markov 1978)

2

variational Bayes method (See ex. Humphreys and Titterington 2000).

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 18 / 22

slide-19
SLIDE 19

MH algorithm converges faster

MH proposal construction

Construct an independent type Metropolis-Hastings algorithm with target distribution pn

Θ|Xn(dθ|xn).

Step 0 Generate θ(0) ∼ qn

Θ|Xn(dθ|xn). Go to Step 1.

Step i Generate θ(i)∗ ∼ qn

Θ|Xn(dθ|xn). Then

θ(i) =

  • θ∗(i)

with probability α(θ(i), θ∗(i)) θ(i − 1) with probability 1 − α(θ(i), θ∗(i)) . Go to Step i + 1.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 19 / 22

slide-20
SLIDE 20

MH algorithm converges faster

MH performance (Normal): Mean squared error

2000 4000 6000 8000 10000 0.0 0.5 1.0 1.5 2.0 MCMC standard error 0 0 1 2 4 3 0 mcmc sd

Figure: The dashed line is a path from the Gibbs sampler and the solid line is the MH algorithm for n = 10.

2000 4000 6000 8000 10000 0.0 0.5 1.0 1.5 2.0 MCMC standard error 0 0 1 3 4 3 0 mcmc sd

Figure: The same figure as the

  • left. The sample size is 102.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 20 / 22

slide-21
SLIDE 21

MH algorithm converges faster

Remarks

Remarks... If F0 and F1 are similar, the Gibbs sampler becomes even worse. This fact can be verified by setting Fǫ and ǫ → 0 instead of ǫ ≡ 1. In this case, we have to take m(n)ǫn−1/2 → ∞ ( Gn is ǫ−1n1/2-weakly consistent). For other model, there is a case such that m(n)n−1 → ∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 21 / 22

slide-22
SLIDE 22

MH algorithm converges faster

Thank you!

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 22 / 22