Markov Chain Monte Carlo (MCMC) Variational methods Milos - PDF document

CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: • Markov Chain Monte Carlo (MCMC) • Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain Monte Carlo • Importance sampling: samples are generated according to Q and every sample from Q is reweighted according to w, but the Q distribution may be very far from the target • MCMC is a strategy for generating samples from the target distribution, including conditional distributions • MCMC: – Markov chain defines a sampling process that – initially generates samples very different from the target distribution (e.g. posterior) – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning 1

MCMC • The construction of a Markov chain requires two basic ingredients – a transition matrix P – an initial distribution  0 • Assume a finite set S={1,…m} of states, then a transition matrix is    p p p   11 12 1 m    p p p  21 22 2 m P              p p p m 1 m 2 mm        2 0 ( , ) p ij 1 i S Where and p ij i j S  j S CS 3750 Advanced Machine Learning Markov Chain • Markov chain defines a random process of selecting states ( 0 ) ( 1 ) ( )  m  , , , x x x Subsequent states selected based on the Initial state selected previous state and the transition matrix based on  0 x ’ x t t+1 • Chain Dynamics        ( t 1 ) ( t 1 ) ( t ) ( t ) P ( X x ' ) P ( X x ) T ( x x ' )  x Dom ( X ) Probability of a state x’ being selected transition matrix at time t+1 CS 3750 Advanced Machine Learning 2

MCMC • Markov chain satisfies        ( | , ,  ) ( | ) P X j X i X i X i P X j X i   1 0 0 1 1 1 n n n n n n • Irreducibility: A MC is called irreducible (or un- decomposable) if there is a positive transition probability for all pairs of states within a limited number of steps • In irreducible chains there may still exist a periodic structure such that for each state , the set of possible return times  Ν  to i when starting in i is a subset of the set p { p , 2 p , 3 p , } containing all but a finite set of these elements. The smallest number p with this property is the so-called period of the chain    ( ) n p gcd{ n N : p 0 } ii CS 3750 Advanced Machine Learning MCMC • Aperiodicity: An irreducible chain is called aperiodic (or acyclic) if the period p equals 1 or, equivalently, if for all pairs n  of states there is an integer n ij such that for all , the n ij probability p ( n ) ij >0. • If a Markov chain satisfies both irreducibility and aperiodicity , then it converges to an invariant distribution q(x) • A Markov chain with transition matrix P will have an q = qP . equilibrium distribution q iff • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     1 1 1 i i i i i i ( ) ( | ) ( ) ( | ) q x P x x q x P x x CS 3750 Advanced Machine Learning 3

Markov Chain Monte Carlo Objective: generate samples from the posterior distribution • Idea: – Markov chain defines a sampling process that – initially generates samples very different from the target posterior – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning MCMC • P(X|e) — the query we want to compute • e 1 & e 2 are known evidence P(X|e) • Sampling from the distribution P(X) is very different from the desired posterior P(X|e) e 1 e 2 CS 3750 Advanced Machine Learning 4

Markov Chain Monte Carlo (MCMC) State Space X 2 X 3 X 1 X 4 ……… CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 CS 3750 Advanced Machine Learning 5

MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning 6

MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 7

MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps Samples from desired P (X|e) P ’ (X|e) X 1 X 2 X n X n+1 X n+2 …… …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 8

MCMC • In general, an MCMC sampling process doesn’t have to converge to a stationary distribution • A finite state Markov Chain has a unique stationary distribution iff the markov chain is regular – regular: exist some k, for each pair of states x and x’, the probability of getting from x to x’ in exactly k steps is greater than 0 • We want Markov chains that converge to a unique target distribution from any initial state Big question: • How to build such Markov chains? CS 3750 Advanced Machine Learning Gibbs Sampling - A simple method to define MC for BBN can benefit from the structure (independences) in the network x 1 • Evidence: – x 5 =T – x 6 =T x 2 x 3 • all variables have binary values T or F x 4 x 5 x 6 CS 3750 Advanced Machine Learning 9

Gibbs Sampling Initial state x 1 x 2 x 3 x 1 =F, x 2 =T x 4 x 3 =T, x 4 =T x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning Gibbs Sampling Initial state Update Value of x 4 x 1 x 2 x 3 x 1 =F, x 2 =T x 3 =T, x 4 =T x 4 x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning 10

Gibbs Sampling x 1 x 1 x 1 =F, x 2 =T, x 2 x 3 x 2 x 3 x 3 =T, x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning Gibbs Sampling x 1 x 1 Update Value of x 3 x 2 x 3 x 2 x 3 x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning 11

Gibbs Sampling Update Value of x 3 x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 x 3 =T x 4 =F x 5 =T x 4 =F x 6 =T x 5 =T x 5 x 6 x 5 x 6 x 6 =T X 1 X 2 CS 3750 Advanced Machine Learning Gibbs Sampling After many reassignments x 1 x 1 x 2 x 3 x 2 x 3 …… …… x 4 x 4 x 5 x 6 x 5 x 6 X n Samples from desired P(X rest |e) CS 3750 Advanced Machine Learning 12

Gibbs Sampling Keep resampling each variable using the value of variables in its local neighborhood (Markov blanket) x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 ( | , , , ) P X x x x x 4 2 3 5 6 x 5 x 6 x 5 x 6 CS 3750 Advanced Machine Learning Gibbs Sampling • Gibbs sampling takes advantage of the graphical model structure • Markov blanket makes the variable independent from the rest of the network x 1 x 2 x 3 ( | , , , ) x 4 P X x x x x 4 2 3 5 6 x 5 x 6 CS 3750 Advanced Machine Learning 13

Building a Markov Chain • A reversible Markov chain: • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     i i 1 i i 1 i i 1 ( ) ( | ) ( ) ( | ) q x P x x q x P x x • Metropolis-Hastings algorithm – builds a reversible Markov Chain – Uses a proposal distribution to generate candidate states • Either accept it and take a transition to state x’ • Or reject it and stay at current state x CS 3750 Advanced Machine Learning Building a Markov Chain • Metropolis-Hastings algorithm – builds a reversible Markov Chain – uses the proposal distribution (similar to proposal the distribution in importance sampling) to generate candidates for x’  • A proposal distribution Q: T Q ( ' ) x x • Example: Uniform over the values of variables – Either accept a proposal and take a transition to state x’ – Or reject it and stay at current state x • Acceptance probability  ( ' ) A x x CS 3750 Advanced Machine Learning 14

Markov Chain Monte Carlo (MCMC) Variational methods Milos - PDF document

CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: Markov Chain Monte Carlo (MCMC) Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Tutorial on quasi-Monte Carlo methods Josef Dick School of Mathematics and Statistics, UNSW,

Hamiltonian Monte Carlo Dr. Jarad Niemi Iowa State University September 12, 2017 Adapted from

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

COMP 138: Reinforcement Learning Instructor : Jivko Sinapov Webpage :

DUNE Fitter Validation Daniel Cherdack Colorado State University DUNE LBPWG Meeting Monday July

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Confidence Intervals II 18.05 Spring 2014 Jeremy Orloff and Jonathan Bloom Agenda Polling:

CS 188: Artificial Intelligence Bayes Nets: Sampling Instructors: Dan Klein and Pieter Abbeel

Probabilistic model fitting Marcel Lthi Graphics and Vision Research Group Department of

On Computational and Probabilistic Inference Rajat Mani Thomas Objectives: Revisiting Bayesian

Hard-sphere MCMC algorithms, and Physics of two-dimensional melting and Perfect sampling

Sambuz

Useful Links

Newsletter

Mail Us