Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , - PowerPoint PPT Presentation

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , 2016 Dimitris Berberidis 1

Problem statement - Motivation Goal : Draw samples from a given pdf Impact of sampling :  Bayesian inference ( :unknowns, : data) Normalization  Marginalization  Our focus  Expectation  Optimization: non-convex multimodal objectives  Statistical mechanics  Penalized likelihood model selection  Simulation of physical systems 2

Roadmap  Motivation  Basic Monte Carlo  Rejection Sampling  Marcov chain Monte Carlo  Metropolis-Hastings  Gibbs sampling  Importance sampling  Relation to Rejection Sampling Sequential Importance Sampling (Particle Filtering)   Conclusions C. Andrieu, N. de Freitas, A. Doucet and M. Jordan, “An Introduction to MCMC for Machine Learning,” 3 Machine Learning , pp. 5-43, Jan 2003.

The Monte Carlo principle  Draw samples i.i.d from  Approximate with  Approx. integrals with tractable sums  unbiased for finite with with  Approx. the maximum of as Challenge: What if does not have a standard form (e.g. Gaussian) ? 4

Rejection Sampling  Instead of , draw i.i.d samples from an “easy”  Proposal pdf should satisfy: Rejection Sampling algorithm  Accepted sampled according to  Severe limitation in practice: can be too large 5

Basics of Markov chains  Discrete stochastic process is a Marcov chain (MC) if  MC is homogeneous if is time invariant  After steps, probability of state is:  MC reaches stationary distribution if :  MC converges to a stationary distribution if  Irreducible: All states are visited (transition graph connected)  Aperiodic: Does not get trapped into cycles 6

Markov chain Monte Carlo  Goal : Construct MC with target as stationary distribution  Sufficient condition: The detailed balance condition (DBC)  Continuous states  Transition kernel:  DBC remains the same  Run MC to convergence and obtain non i.i.d samples  Design to achieve fast convergence (e.g. small mixing time) 7

The Metropolis-Hastings sampler Rejection probability  MH transition kernel:  satisfies DBC Admits as stationary dist.  Scale of not needed! (recall )  MH always aperiodic; irreducible if support of includes support of  Special cases of MH  Independent sampler:  Metropolis sampler: 8

Example of MH sampling  Three different Gaussians as proposal distributions  Choice of proposal distribution is critical! 9

MCMC with mixture of transition kernels  Key property  Let and trans. kernels converge  also converges to  Intuition  Local random walk reduces the number of rejections  Global proposal helps discover other modes 10

Example of MH with mixture of Kernels Target: Proposal: 11

Experiment with mixture of Kernels 12

Simulated Annealing  Simple modification of the MH algorithm for global optimization Example  Simulates a non-homogeneous MC with  Intuition: concentrates around global max. of as 13

Experiment with Simulated Annealing 14

Cycles of MH kernels  Multivariate state is split into blocks  Each block is updated separately  Transition Kernel  Block correlated variables together for fast convergence  Trade-off on block size  Small block size: Chain takes long time to explore space  Large block size: Acceptance probability is small 15

Gibbs sampling  For assume that we know  Gibbs sampling proposal distribution  Acceptance probability =1  Combined with MH if not easy  To sample Markov networks, condition on ``Markov Blanket’’ 16

Importance sampling - Basics  Key idea: sample from and weight with  Draw i.i.d from to obtain:  Target is approximated by  Estimate is unbiased and:  If scale of unknown, set and normalize 17

Efficiency of importance sampling  Proposal pdf selected to minimize variance  Variance lower bound (using Jensen’s ineq.)  Optimum importance distribution  IS can be super efficient!  Generally difficult to sample 18

RS as a special case of IS  Recall the rejection sampling method  Define a new target distribution in  IS with target and proposal  Equivalent to RS if samples are used to obtain  IS generally (and provably) more efficient for this purpose Y. Chen, “Another look at rejection sampling through importance sampling,” Statistic & Probability 19 Letters, pp. 277-283, May 2005.

Hidden markov model  The hidden Marcov model State transition model: Observation model:  Goal of filtering : Approximate and 20

Sequential Importance Sampling (particle filtering)  Target density:  Importance density: Leave the past  How to sample from ? unchanged  At time we have:  Sample for :  Importance weights :  Augment without changing the past (filtering) 21

Particle degeneracy – How to fix it Theorem: The unconditional variance of the weights (with interpreted as r.v.’s) increases with time. Proof . The weight sequence is a Martingale random process Martingale definition: Variance of a martingale is always non-decreasing Rao-Blackwell  Theoretical fix: Sample from optimal  Practical fix : Resample particles after each iteration A. Kong, J. S. Liu, and W. H. Wong, “Sequential imputations and Bayesian missing data problems,” J. 22 of the American Statistical Association, pp. 278-288, March 1994.

The particle filter with resampling  Many available methods for selection (resampling)  Simplest is to ``clone ‘’ w.p.  Particles that are not cloned are ``killed’’ 23

The bootstrap particle filter  Simple, non-adaptive proposal distribution  Convenient for non-linear models with additive Gaussian noise  Transition prob. and likelihood are both Gaussian (easy to sample)  Simple to implement; Modular structure; Adheres parallelization  Resampling is very critical!  Ensures that the particles ‘follow’ the target A. Doucet, N. de Freitas and N. Gordon, “Sequential Monte Carlo Methods in Practice,” Springer , 2001. 24

Example: target tracking  State: position and constant velocity Speed corrections (Gaussian noise with cov. Q) 25

Distance and bearing measurements Uncorrelated Gaussian noise 26

Tracking  Bootstrap PF with particles:  Sampling step (propagation of particles)  Evaluation of weights (likelihood of particles)  Randomized resampling w.p. 27

Result 28

Conclusions  MCMC and IS: powerful, all-around tools for Bayesian inference  Applicable to any problem if tuned properly  Proposal distributions  Resampling schemes (in PF)  Other MCMC derivatives  MCMC expectation-maximization algorithms  Hybrid MC Slice sampler  Reversible jump MCMC for model selection  29

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , - PowerPoint PPT Presentation

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , 2016 Dimitris Berberidis 1 Problem statement - Motivation Goal : Draw samples from a given pdf Impact of sampling : Bayesian inference ( :unknowns,

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. Lampert

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC Berkeley with Michael I.

Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery

Felisa J. V azquez-Abad and Lachlan L. H. Andrew D epartement dinformatique et recherche

A Search for e Oscillation with MiniBooNE Hai-Jun Yang University of Michigan, Ann

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , - PowerPoint PPT Presentation

Markov chain Monte Carlo sampling SPiNCOM reading group Jun. 10 th , 2016 Dimitris Berberidis 1 Problem statement - Motivation Goal : Draw samples from a given pdf Impact of sampling : Bayesian inference ( :unknowns,

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Sampling Methods Oliver Schulte - CMPT 419/726 Bishop PRML Ch. 11 Sampling Rejection Sampling

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

The Monte Carlo Method Estimating through sampling (estimating , p -value, integrals,...)

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. Lampert

Bayesian networks: approximate inference Machine Intelligence Thomas D. Nielsen September 2008

Stochastic Simulation Idea: probabilities samples Get probabilities from samples: X count X

Monte Carlo Methods Lecture notes for MAP001169 Based on Script by Martin Sk old adopted by

Bayesian Bias Mitigation for Crowdsourcing Fabian L. Wauthier, UC Berkeley with Michael I.

Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery

Felisa J. V azquez-Abad and Lachlan L. H. Andrew D epartement dinformatique et recherche

A Search for e Oscillation with MiniBooNE Hai-Jun Yang University of Michigan, Ann

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly