Multi-parameter models - Metropolis sampling Applied Bayesian - - PowerPoint PPT Presentation

multi parameter models metropolis sampling
SMART_READER_LITE
LIVE PREVIEW

Multi-parameter models - Metropolis sampling Applied Bayesian - - PowerPoint PPT Presentation

Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 17, 2017 Metropolis sampling 1 Last edited October 2, 2017 by


slide-1
SLIDE 1

Multi-parameter models - Metropolis sampling

Applied Bayesian Statistics

  • Dr. Earvin Balderama

Department of Mathematics & Statistics Loyola University Chicago

October 17, 2017

1

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-2
SLIDE 2

MCMC

Gibbs sampling

In Gibbs sampling, each parameter is updated by sampling from its full conditional distribution. This is possible with conjugate priors. However, if the prior is not conjugate it is not obvious how to make a draw from the full conditional. For example, if Y ∼ Normal(µ, 1) and µ ∼ Beta(a, b) then f(µ |Y) ∝ exp

  • −1

2(Y − µ)2

  • µa−1(1 − µ)b−1.

For some likelihoods, there is no known conjugate prior. So direct sampling from the posterior may not be possible. In these cases we can use Metropolis sampling.

2

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-3
SLIDE 3

MCMC

Metropolis sampling

Metropolis sampling is a version of rejection sampling, where it performs a kind of random walk around the parameter space, and either accepts or rejects a move based on a ratio of posterior densities; it always accepts the move if it’s to a location of higher density, but only sometimes accepts if it’s a location of lower density. We can perform this Metropolis sampling algorithm for each parameter,

  • ne at a time.

To make the algorithm and following pseudocode easier to read and understand (hopefully), we’ll focus the updating of only one parameter, θ.

3

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-4
SLIDE 4

MCMC

Metropolis algorithm

1

Set initial value θ(0).

2

For iteration t,

1

Draw a candidate θ∗ from a symmetric proposal distribution, J(θ |θ(t−1))

2

Compute the Metropolis ratio, R = f(θ∗ |y) f(θ(t−1) |y).

3

Set θ(t) =

  • θ∗

with acceptance probability min(R, 1), θ(t−1)

  • therwise.

The sequence θ(1), θ(2), . . . converges to the target distribution, f(θ |y).

4

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-5
SLIDE 5

MCMC

Metropolis algorithm

The proposal distribution must be symmetric, i.e., it must satisfy J(θa |θb) = J(θb |θa). This means that the probability of “jumping" from θa to θb is the same as if you started at θb and used the same jumping rule to jump to θa.

For example, if you propose a new candidate given the current value by θ∗ |θ(t−1) ∼ Normal(θ(t−1), s2

t ),

we get the same density for θ(t−1) |θ∗ ∼ Normal(θ∗, s2

t ).

The standard deviation of the proposal distribution, st, is a tuning parameter. What if st is too small? What if st is too large? Ideally sj is tuned to give acceptance probability between 0.25-0.60.

5

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-6
SLIDE 6

MCMC

Metropolis-Hastings algorithm

The Metropolis-Hastings (MH) algorithm generalizes Metropolis to allow for assymetric proposal distributions. For example, if θ ∈ [0, 1] then a reasonable candidate is θ∗ |θ(t−1) ∼ Beta

  • 10θ(t−1), 10(1 − θ(t−1))
  • .

But what is the consequence for using an asymmetric proposal distribution? J(θa |θb) = J(θb |θa). We need to account for this asymmetry in the Metropolis ratio.

6

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-7
SLIDE 7

MCMC

Metropolis-Hastings algorithm

1

Set initial value θ(0).

2

For iteration t,

1

Draw a candidate θ∗ from a proposal distribution, J(θ |θ(t−1)).

2

Compute the Metropolis ratio, R = f(θ∗ |y) f(θ(t−1) |y) · J(θ(t−1) |θ∗) J(θ∗ |θ(t−1)).

3

Set θ(t) =

  • θ∗

with acceptance probability min(R, 1), θ(t−1)

  • therwise.

The sequence θ(1), θ(2), . . . converges to the target distribution, f(θ |y).

7

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-8
SLIDE 8

MCMC

Metropolis-Hastings algorithm

1

How is Metropolis similar/different to Metropolis-Hastings?

2

How is Gibbs similar/different to Metropolis? What if we take the proposal distribution to be the full conditional distribution? What would be the Metropolis ratio?

8

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-9
SLIDE 9

MCMC

Variants

You can combine Gibbs and Metropolis in the obvious way, sampling directly from full conditionals when possible and Metropolis otherwise. Adaptive MCMC varies the proposal distribution throughout the chain. Hamiltonian Monte Carlo (HMC) uses the gradient of the posterior in the proposal distribution and is used in Stan.

9

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-10
SLIDE 10

MCMC

Blocked Gibbs/Metropolis

If a group of parameters are highly correlated, convergence can be slow. One way to improve Gibbs sampling is a block update. For example, in linear regression might iterate between sampling the block (β1, . . . , βp) and σ2. Blocked Metropolis is possible too. For example, the proposal for (β1, . . . , βp) could be multivariate normal.

10

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>

slide-11
SLIDE 11

MCMC

Summary

With te combination of Gibbs and Metropolis and Metropolis-Hastings sampling, we can fit virtually any model. In some cases Bayesian computing is actually preferable to maximum likelihood analysis. In most cases Bayesian computing is slower. However, in the opinion of many it is worth the wait for improved uncertainty quantification and interpretability. In all cases, it is important to carefully monitor convergence.

11

Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>