Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk - PowerPoint PPT Presentation

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University

Monte Carlo Approximation • Generate some (unweighted) samples from the posterior: • Use these samples to compute any quantity of interest – Posterior marginal: 𝑞(𝑦 1 |𝐸) – Posterior predictive: 𝑞 𝑧 𝐸 – …

Sampling from standard distributions • Using the cdf – Based on the inverse probability transform

Sampling from standard distributions: Inverse CDF

Sampling from standard distributions: Inverse CDF • Example: Exponential distribution cdf

Sampling from a Gaussian: Box-Muller Method • Sample 𝑨 1 , 𝑨 2 ∈ (−1,1) uniformly • Discard ( 𝑨 1 , 𝑨 2 ) that do not inside the unit-circle, 2 + 𝑨 2 2 ≤ 1 so satisfying  𝑨 1 1 • 𝑞 𝑨 = 𝜌 𝐽(𝑨 𝑗𝑜𝑑𝑗𝑒𝑓 𝑑𝑗𝑠𝑑𝑚𝑓) • Define

Sampling from a Gaussian: Box-Muller Method • Pola lar coo oordinat dinate 상에서 sampl pling ng • r 을 normal distribution 에 비례하도록 샘플  Exponential 분포 • 𝜄 를 uniform 하게 샘플  주어진 r 에 대해 x,y 를 uniform 하게 분포

Sampling from a Gaussian: Box-Muller Method • 𝑌 ~ 𝑂 0,1 𝑍 ~ 𝑂 0,1 𝑦 2 +𝑧 2 𝑠 2 1 = 1 • 𝑞 𝑦, 𝑧 = 2𝜌 exp − 2𝜌 exp − 2 2 1 • 𝑠 2 ~𝐹𝑦𝑞 • 𝑨 = 𝑠 2 2 𝑒𝑠 • 𝑄 𝑨 = 𝑄 𝑠 − log(𝑉 0,1 ) 𝑒𝑨 = 0.5 𝑄 𝑠 • 𝐹𝑦𝑞 𝜇 = 𝑠 2 𝜇 • 𝑄 𝑠 = 2𝑄 𝑨 = 𝑓𝑦𝑞 − 2 • 𝑠 ∼ −2 log(𝑉 0,1 ) 𝑄 𝑉 ≤ 1 − exp(−0.5𝑠 2 ) = 𝑄 𝑠 2 ≤ −2𝑚𝑝𝑕𝑉 = 𝑄 𝑠 ≤ −2𝑚𝑝𝑕𝑉 https://theclevermachine.wordpress.com/2012/09/11/sampling-from-the-normal- distribution-using-the-box-muller-transform/

Sampling from a Gaussian: Box-Muller Method • 1. Draw 𝑣 1 , 𝑣 2 ∼ 𝑉 0,1 • 2. Transform to polar rep. – 𝑠 = −2log(𝑣 1 ) 𝜄 = 2𝜌𝑣 2 • 3. Transform to Catesian rep. – 𝑦 = 𝑠 cos 𝜄 𝑧 = 𝑠 cos 𝜄

Rejection sampling • when the inverse cdf method cannot be used • Create a proposal distribution 𝑟(𝑦) which satisfies • 𝑁𝑟(𝑦) provides an upper envelope for ෤ 𝑞(𝑦) • Sample 𝑦 ∼ 𝑟(𝑦) • Sample 𝑣 ∼ 𝑉(0,1) ෤ 𝑞(𝑦) • If 𝑣 > 𝑁𝑟(𝑦) , reject the sample • Otherwise accept it

Rejection sampling

Rejection sampling: Proof • Why rejection sampling works? • Let • The cdf of the accepted point: 𝑞(𝑦)

Rejection sampling • How efficient is this method? • 𝑄 𝑏𝑑𝑑𝑓𝑞𝑢 = ׬ 𝑟 𝑦 ׬ 𝐽 𝑦, 𝑣 ∈ 𝑇 𝑒𝑣 𝑒𝑦 𝑞 𝑦 ෤ 𝑁𝑟 𝑦 = න ෤ 𝑁𝑟 𝑦 𝑟 𝑦 𝑒𝑦 = 1 𝑞 𝑦 𝑁 න ෤ 𝑞 𝑦 𝑒𝑦 1 න 𝐽 𝑣 ≤ 𝑧 = 𝑧 0  We need to choose M as small as possible while still satisfying

Rejection sampling: Example • Suppose we want to sample from a Gamma • When 𝛽 is an integer, i. e. , 𝛽 = 𝑙 , we can use: • But, for non-integer 𝛽 , we cannot use this trick & instead use rejection sampling

Rejection sampling: Example • Use as a proposal, where • To obtain 𝑁 as small as possible, check the ratio 𝑞(𝑦)/𝑟 𝑦 : • This ratio attains its maximum when 𝑞(𝑦) 𝑟(𝑦) ≤ 𝑁 𝑞(𝑦) 𝑁𝑟(𝑦) ≤ 1

Rejection sampling: Example • Proposal

Rejection Sampling: Application to Bayesian Statistics • Suppose we want to draw (unweighted) samples from the posterior • Use rejection sampling – Target distribution – Proposal: – M: • Accepting probability:

Adaptive rejection sampling • Upper bound the log density with a piecewise linear function

Importance Sampling • MC methods for approximating integrals of the form: • The idea: Draw samples 𝒚 in regions which have high probability, 𝑞(𝒚) , but also where |𝑔(𝒚)| is large • Define 𝑔 𝒚 = 𝐽 𝒚 ∈ 𝐹 • Sample from a proposal than to sample from itself

Importance Sampling • Samples from any proposal 𝑟(𝒚) to estimate the integral: importance weights . • How should we choose the proposal? – Minimize the variance of the estimate መ 𝐽

Importance Sampling • By Jensen’s inequality, we have 𝐹 𝑣 2 𝒚 2 ≥ 𝐹 𝑣 𝒚 𝑞 𝒚 𝑔 𝒚 • Setting 𝑣 𝒚 = = 𝑥 𝒚 |𝑔 𝒚 | , we have the lower 𝑟 𝒚 bound: 𝑞 𝒚 𝑔 𝒚 가 상수이면 equality 가 성립 𝑣 𝒚 = 𝑟 𝒚 𝑟 𝒚 ∝ 𝑞 𝒚 𝑔 𝒚

Importance Sampling: Handling unnormalized distributions • When only unnormalized target distribution and proposals are available without 𝑎 𝑞 , 𝑎 𝑟 ? unnormalized importance weight • Use the same set of samples to evaluate 𝑎 𝑟 / 𝑎 𝑟 :

Ancestral sampling for PGM • Ancestral sampling – Sample the root nodes, – then sample their children, – then sample their children, etc. – This is okay when we have no evidence

Ancestral sampling

Ancestral sampling: Example

Rejection sampling for PGM • Now, suppose that we have some evidence with interest in conditional queries : • Rejection sampling (local sampling) – Perform ancestral sampling, – but as soon as we sample a value that is inconsistent with an observed value, reject the whole sample and start again • However, rejection sampling is very inefficient (requiring so many samples) & cannot be applied for real-valued evidences

Importance Sampling for DGM: Likelihood weighting • Likelihood weighting – Sample unobserved variables as before, conditional on their parents; But don’t sample observed variables; instead we just use their observed values. – This is equivalent to using a proposal of the form: the set of observed nodes – The corresponding importance weight:

Likelihood weighting

Sampling importance resampling (SIR) • Draw unweighted samples by first using importance sampling • Sample with replacement where the probability that we pick 𝒚 𝑡 is 𝑥 𝑡

Sampling importance resampling (SIR) Typically S’ << S • Application: Bayesian inference – Goal: draw samples from the posterior – Unnormalized posterior: – Proposal: – Normalized weights: – Then, we use SIR to sample from

Particle Filtering • Simulation based , algorithm for recursive Bayesian inference – Sequential importance sampling  resampling

Markov Chain Monte Carlo (MCMC) • 1) Construct a Markov chain on the state space X – whose stationary distribution is the target density 𝑞 ∗ (𝒚) of interest • 2) Perform a random walk on the state space – in such a way that the fraction of time we spend in each state x is proportional to 𝑞 ∗ (𝒚) • 3) By drawing (correlated!) samples 𝒚 0 , 𝒚 1 , 𝒚 2 , . . . , from the chain, perform Monte Carlo integration wrt p ∗

Markov Chain Monte Carlo (MCMC) vs. Variational inference • Variational inference – (1) for small to medium problems, it is usually faster; – (2) it is deterministic; – (3) is it easy to determine when to stop; – (4) it often provides a lower bound on the log liklihood.

Markov Chain Monte Carlo (MCMC) vs. Variational inference • MCMC – (1) it is often easier to implement; – (2) it is applicable to a broader range of models, such as models whose size or structure changes depending on the values of certain variables (e.g., as happens in matching problems), or models without nice conjugate priors; – (3) sampling can be faster than variational methods when applied to really huge models or datasets. 2

Gibbs Sampling • Sample each variable in turn, conditioned on the values of all the other variables in the distribution • For example, if we have D = 3 variables • Need to derive full conditional for variable I

Gibbs Sampling: Ising model • Full conditional

Gibbs Sampling: Ising model • Combine an Ising prior with a local evidence term 𝜔 𝑢

Gibbs Sampling: Ising model • Ising prior with 𝑋 𝑗𝑘 = 𝐾 = 1 – Gaussian noise model with 𝜏 = 2

Gibbs Sampling: Ising model

Gaussian Mixture Model (GMM) • Likelihood function • Factored conjugate prior

Gaussian Mixture Model (GMM): Variational EM • Standard VB approximation to the posterior: • Mean field approximation • VBEM results in the optimal form of 𝑟 𝒜, 𝜾 :

[Ref] Gaussian Models • Marginals and conditionals of a Gaussian model • 𝑞 𝒚 ∼ 𝑂(𝝂, 𝚻) • Marginals: • Posterior conditionals:

[Ref] Gaussian Models • Linear Gaussian systems • The posterior: • The normalization constant:

[Ref] Gaussian Models: Posterior distribution of 𝝂 • The likelihood wrt 𝝂 : • The prior: • The posterior:

[Ref] Gaussian Models: Posterior distribution of 𝚻 • The likelihood form of Σ • The conjugate prior: the inverse Wishart distribution The posterior:

Gaussian Mixture Model (GMM): Gibbs Sampling • Full joint distribution:

Gaussian Mixture Model (GMM): Gibbs Sampling • The full conditionals:

Gaussian Mixture Model (GMM): Gibbs Sampling

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk - PowerPoint PPT Presentation

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo Approximation Generate some (unweighted) samples from the posterior: Use these samples to compute any quantity of interest Posterior

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Markov Chain Monte Carlo (MCMC) Variational methods Milos Hauskrecht milos@cs.pitt.edu

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Tutorial on quasi-Monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW,

Hamiltonian Monte Carlo Dr. Jarad Niemi Iowa State University September 12, 2017 Adapted from

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Lecture 3, Estimation and model validation Magnus Wiktorsson Maximum likelihood, recap argument,

Principal Component Analysis (PCA) Dr. Veselina Kalinova Max Planck Institute for

Small groups and Questionnaires (for quality control) useR! 2 8 Lucien Lemmens Introduction

Fractions Return to Table of Contents Slide 5 / 305 Slide 6 / 305 Greatest Common Factor 1

Stochastic Simulation Generation of random variables Continuous sample space Bo Friis Nielsen

Stochastic Simulation Independent, uniformly distributed RN Generation of random variables

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Modern Computational Statistics Lecture 4: Numerical Integration Cheng Zhang School of