markov chain monte carlo mcmc
play

Markov Chain Monte Carlo (MCMC) Variational methods Milos - PDF document

CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: Markov Chain Monte Carlo (MCMC) Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain


  1. CS 3750 Machine Learning Lecture 6 Approximate probabilistic inference: • Markov Chain Monte Carlo (MCMC) • Variational methods Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square CS 3750 Advanced Machine Learning Markov chain Monte Carlo • Importance sampling: samples are generated according to Q and every sample from Q is reweighted according to w, but the Q distribution may be very far from the target • MCMC is a strategy for generating samples from the target distribution, including conditional distributions • MCMC: – Markov chain defines a sampling process that – initially generates samples very different from the target distribution (e.g. posterior) – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning 1

  2. MCMC • The construction of a Markov chain requires two basic ingredients – a transition matrix P – an initial distribution  0 • Assume a finite set S={1,…m} of states, then a transition matrix is    p p p   11 12 1 m    p p p  21 22 2 m P              p p p m 1 m 2 mm        2 0 ( , ) p ij 1 i S Where and p ij i j S  j S CS 3750 Advanced Machine Learning Markov Chain • Markov chain defines a random process of selecting states ( 0 ) ( 1 ) ( )  m  , , , x x x Subsequent states selected based on the Initial state selected previous state and the transition matrix based on  0 x ’ x t t+1 • Chain Dynamics        ( t 1 ) ( t 1 ) ( t ) ( t ) P ( X x ' ) P ( X x ) T ( x x ' )  x Dom ( X ) Probability of a state x’ being selected transition matrix at time t+1 CS 3750 Advanced Machine Learning 2

  3. MCMC • Markov chain satisfies        ( | , ,  ) ( | ) P X j X i X i X i P X j X i   1 0 0 1 1 1 n n n n n n • Irreducibility: A MC is called irreducible (or un- decomposable) if there is a positive transition probability for all pairs of states within a limited number of steps • In irreducible chains there may still exist a periodic structure such that for each state , the set of possible return times  Ν  to i when starting in i is a subset of the set p { p , 2 p , 3 p , } containing all but a finite set of these elements. The smallest number p with this property is the so-called period of the chain    ( ) n p gcd{ n N : p 0 } ii CS 3750 Advanced Machine Learning MCMC • Aperiodicity: An irreducible chain is called aperiodic (or acyclic) if the period p equals 1 or, equivalently, if for all pairs n  of states there is an integer n ij such that for all , the n ij probability p ( n ) ij >0. • If a Markov chain satisfies both irreducibility and aperiodicity , then it converges to an invariant distribution q(x) • A Markov chain with transition matrix P will have an q = qP . equilibrium distribution q iff • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     1 1 1 i i i i i i ( ) ( | ) ( ) ( | ) q x P x x q x P x x CS 3750 Advanced Machine Learning 3

  4. Markov Chain Monte Carlo Objective: generate samples from the posterior distribution • Idea: – Markov chain defines a sampling process that – initially generates samples very different from the target posterior – but gradually refines the samples so that they are closer and closer to the posterior. CS 3750 Advanced Machine Learning MCMC • P(X|e) — the query we want to compute • e 1 & e 2 are known evidence P(X|e) • Sampling from the distribution P(X) is very different from the desired posterior P(X|e) e 1 e 2 CS 3750 Advanced Machine Learning 4

  5. Markov Chain Monte Carlo (MCMC) State Space X 2 X 3 X 1 X 4 ……… CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 CS 3750 Advanced Machine Learning 5

  6. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 X 1 Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning 6

  7. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 X 1 X 2 Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 7

  8. MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps P ’ (X|e) X 1 X 2 X n …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning MCMC (Cont.) • Goal: a sample from P(X|e) • Start from some P(X) and generate a sample x1 • From x1 and transition generate x2 • Repeat for n steps Samples from desired P (X|e) P ’ (X|e) X 1 X 2 X n X n+1 X n+2 …… …… Apply T Apply T Apply T CS 3750 Advanced Machine Learning 8

  9. MCMC • In general, an MCMC sampling process doesn’t have to converge to a stationary distribution • A finite state Markov Chain has a unique stationary distribution iff the markov chain is regular – regular: exist some k, for each pair of states x and x’, the probability of getting from x to x’ in exactly k steps is greater than 0 • We want Markov chains that converge to a unique target distribution from any initial state Big question: • How to build such Markov chains? CS 3750 Advanced Machine Learning Gibbs Sampling - A simple method to define MC for BBN can benefit from the structure (independences) in the network x 1 • Evidence: – x 5 =T – x 6 =T x 2 x 3 • all variables have binary values T or F x 4 x 5 x 6 CS 3750 Advanced Machine Learning 9

  10. Gibbs Sampling Initial state x 1 x 2 x 3 x 1 =F, x 2 =T x 4 x 3 =T, x 4 =T x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning Gibbs Sampling Initial state Update Value of x 4 x 1 x 2 x 3 x 1 =F, x 2 =T x 3 =T, x 4 =T x 4 x 5 x 6 x 5 =x 6 =T (Fixed) X 0 CS 3750 Advanced Machine Learning 10

  11. Gibbs Sampling x 1 x 1 x 1 =F, x 2 =T, x 2 x 3 x 2 x 3 x 3 =T, x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning Gibbs Sampling x 1 x 1 Update Value of x 3 x 2 x 3 x 2 x 3 x 4 x 4 x 4 =F x 5 =T x 6 =T x 5 x 6 x 5 x 6 X 0 X 1 CS 3750 Advanced Machine Learning 11

  12. Gibbs Sampling Update Value of x 3 x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 x 3 =T x 4 =F x 5 =T x 4 =F x 6 =T x 5 =T x 5 x 6 x 5 x 6 x 6 =T X 1 X 2 CS 3750 Advanced Machine Learning Gibbs Sampling After many reassignments x 1 x 1 x 2 x 3 x 2 x 3 …… …… x 4 x 4 x 5 x 6 x 5 x 6 X n Samples from desired P(X rest |e) CS 3750 Advanced Machine Learning 12

  13. Gibbs Sampling Keep resampling each variable using the value of variables in its local neighborhood (Markov blanket) x 1 x 1 x 2 x 3 x 2 x 3 x 4 x 4 ( | , , , ) P X x x x x 4 2 3 5 6 x 5 x 6 x 5 x 6 CS 3750 Advanced Machine Learning Gibbs Sampling • Gibbs sampling takes advantage of the graphical model structure • Markov blanket makes the variable independent from the rest of the network x 1 x 2 x 3 ( | , , , ) x 4 P X x x x x 4 2 3 5 6 x 5 x 6 CS 3750 Advanced Machine Learning 13

  14. Building a Markov Chain • A reversible Markov chain: • A sufficient, but not necessary, condition to ensure a particular q(x) is the invariant distribution of transition matrix P is the following reversibility (detailed balance) condition     i i 1 i i 1 i i 1 ( ) ( | ) ( ) ( | ) q x P x x q x P x x • Metropolis-Hastings algorithm – builds a reversible Markov Chain – Uses a proposal distribution to generate candidate states • Either accept it and take a transition to state x’ • Or reject it and stay at current state x CS 3750 Advanced Machine Learning Building a Markov Chain • Metropolis-Hastings algorithm – builds a reversible Markov Chain – uses the proposal distribution (similar to proposal the distribution in importance sampling) to generate candidates for x’  • A proposal distribution Q: T Q ( ' ) x x • Example: Uniform over the values of variables – Either accept a proposal and take a transition to state x’ – Or reject it and stay at current state x • Acceptance probability  ( ' ) A x x CS 3750 Advanced Machine Learning 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend