Multi-parameter models - Metropolis sampling
Applied Bayesian Statistics
- Dr. Earvin Balderama
Department of Mathematics & Statistics Loyola University Chicago
October 17, 2017
1
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
Multi-parameter models - Metropolis sampling Applied Bayesian - - PowerPoint PPT Presentation
Multi-parameter models - Metropolis sampling Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 17, 2017 Metropolis sampling 1 Last edited October 2, 2017 by
Applied Bayesian Statistics
Department of Mathematics & Statistics Loyola University Chicago
October 17, 2017
1
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
In Gibbs sampling, each parameter is updated by sampling from its full conditional distribution. This is possible with conjugate priors. However, if the prior is not conjugate it is not obvious how to make a draw from the full conditional. For example, if Y ∼ Normal(µ, 1) and µ ∼ Beta(a, b) then f(µ |Y) ∝ exp
2(Y − µ)2
For some likelihoods, there is no known conjugate prior. So direct sampling from the posterior may not be possible. In these cases we can use Metropolis sampling.
2
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
Metropolis sampling is a version of rejection sampling, where it performs a kind of random walk around the parameter space, and either accepts or rejects a move based on a ratio of posterior densities; it always accepts the move if it’s to a location of higher density, but only sometimes accepts if it’s a location of lower density. We can perform this Metropolis sampling algorithm for each parameter,
To make the algorithm and following pseudocode easier to read and understand (hopefully), we’ll focus the updating of only one parameter, θ.
3
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
1
Set initial value θ(0).
2
For iteration t,
1
Draw a candidate θ∗ from a symmetric proposal distribution, J(θ |θ(t−1))
2
Compute the Metropolis ratio, R = f(θ∗ |y) f(θ(t−1) |y).
3
Set θ(t) =
with acceptance probability min(R, 1), θ(t−1)
The sequence θ(1), θ(2), . . . converges to the target distribution, f(θ |y).
4
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
The proposal distribution must be symmetric, i.e., it must satisfy J(θa |θb) = J(θb |θa). This means that the probability of “jumping" from θa to θb is the same as if you started at θb and used the same jumping rule to jump to θa.
For example, if you propose a new candidate given the current value by θ∗ |θ(t−1) ∼ Normal(θ(t−1), s2
t ),
we get the same density for θ(t−1) |θ∗ ∼ Normal(θ∗, s2
t ).
The standard deviation of the proposal distribution, st, is a tuning parameter. What if st is too small? What if st is too large? Ideally sj is tuned to give acceptance probability between 0.25-0.60.
5
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
The Metropolis-Hastings (MH) algorithm generalizes Metropolis to allow for assymetric proposal distributions. For example, if θ ∈ [0, 1] then a reasonable candidate is θ∗ |θ(t−1) ∼ Beta
But what is the consequence for using an asymmetric proposal distribution? J(θa |θb) = J(θb |θa). We need to account for this asymmetry in the Metropolis ratio.
6
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
1
Set initial value θ(0).
2
For iteration t,
1
Draw a candidate θ∗ from a proposal distribution, J(θ |θ(t−1)).
2
Compute the Metropolis ratio, R = f(θ∗ |y) f(θ(t−1) |y) · J(θ(t−1) |θ∗) J(θ∗ |θ(t−1)).
3
Set θ(t) =
with acceptance probability min(R, 1), θ(t−1)
The sequence θ(1), θ(2), . . . converges to the target distribution, f(θ |y).
7
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
1
How is Metropolis similar/different to Metropolis-Hastings?
2
How is Gibbs similar/different to Metropolis? What if we take the proposal distribution to be the full conditional distribution? What would be the Metropolis ratio?
8
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
You can combine Gibbs and Metropolis in the obvious way, sampling directly from full conditionals when possible and Metropolis otherwise. Adaptive MCMC varies the proposal distribution throughout the chain. Hamiltonian Monte Carlo (HMC) uses the gradient of the posterior in the proposal distribution and is used in Stan.
9
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
If a group of parameters are highly correlated, convergence can be slow. One way to improve Gibbs sampling is a block update. For example, in linear regression might iterate between sampling the block (β1, . . . , βp) and σ2. Blocked Metropolis is possible too. For example, the proposal for (β1, . . . , βp) could be multivariate normal.
10
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>
MCMC
With te combination of Gibbs and Metropolis and Metropolis-Hastings sampling, we can fit virtually any model. In some cases Bayesian computing is actually preferable to maximum likelihood analysis. In most cases Bayesian computing is slower. However, in the opinion of many it is worth the wait for improved uncertainty quantification and interpretability. In all cases, it is important to carefully monitor convergence.
11
Metropolis sampling Last edited October 2, 2017 by <ebalderama@luc.edu>