advanced machine learning
play

Advanced Machine Learning MCMC Methods Amit Sethi Electrical - PowerPoint PPT Presentation

Advanced Machine Learning MCMC Methods Amit Sethi Electrical Engineering, IIT Bombay Objectives We have talked about: Exact inference in Factor Graphs using Sum- Product algorithm (aka Belief Propagation) Limitations and some


  1. Advanced Machine Learning MCMC Methods Amit Sethi Electrical Engineering, IIT Bombay

  2. Objectives • We have talked about: – Exact inference in Factor Graphs using Sum- Product algorithm (aka Belief Propagation) – Limitations and some remedies thereof of the Sum-Product algorithm • Today we will learn: – Sampling methods (aka Monte Carlo methods) when exact inference is intractable 2

  3. We want to find expected value of a function, e.g. when calculating messages p( z) E[ f ] = ∫ f(z) p(z) dz f( z) z • It may not be feasible to compute this, but – Computing f(z) may be easy, and – So, now we need to draw samples from p(z) E[ f ] ≈ 1/L ∑ l f(z (l) ) 3 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  4. We will look at following ways to sample from a distribution • Rejection Sampling • Importance Sampling • Gibbs Sampling 4

  5. Sampling marginals • Note that this procedure can be applied to generate samples for marginals as well • Simply discard portions of sample which are not needed – e.g. For marginal p(rain), sample (cloudy = t; sprinkler = f ; rain = t; w = t) just becomes (rain = t) • Still a fair sampling procedure • But, anything more complex can be a problem 5 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  6. When the partition function is unknown • Consider the case of p( z) f( z) an arbitrary, continuous p(z) z • How can we draw samples from it? • Assume that we can evaluate p(z) up to some constant, efficiently (e.g. MRF). 6 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  7. Rejection sampling makes use of an easier “proposal” distribution kq( z) p( z) z • Let’s also assume that we have some simpler distribution q(z) called a proposal distribution from which we can easily draw samples – e.g. q(z) is a Gaussian • We can then draw samples from q(z) and use these, if we had a way to convert these into samples from p(z) 7 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  8. Now, we reject samples according to the ratio of p and q at z kq( z) p( z) z • Introduce constant k such that kq(z) >= p(z) for all z • Rejection sampling procedure: – Generate z 0 from q(z) – Generate u 0 from [ 0 ; kq(z 0 ) ] uniformly – If u 0 > p(z 0 ) reject sample z 0 , otherwise keep it • Original samples are under the red curve • Kept samples from under the blue curve – hence samples from p(z) 8 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  9. Rejection sampling can end up rejecting a lot of samples from q • How likely are we to keep samples? • Probability a sample is accepted is: p(accept) = ∫ p(z)/ kq(z) q(z) dz = 1/k ∫ p(z) dz • Smaller k is better subject to kq(z) >= p(z) for all z – If q(z) is similar to p(z) , this is easier • In high-dim spaces, acceptance ratio falls off exponentially, and finding a suitable k challenging 9 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  10. In Importance sampling, we scale the weight of sample by the ratio of p and q p( z) kq( z) f( z) • Approximate expectation by drawing points from E[f] = ∫ f(z) p(z) dz = ∫ f(z) p(z)/q(z) q(z) dz ≈ 1/L ∑ l f(z (l) ) p(z (l) )/q(z (l) ) • The quantity p(z (l) )/q(z (l) ) is known as importance weight Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  11. MCMC methods generate samples sequentially • Markov chain Monte Carlo methods use a Markov chain, i.e. a sequence where a sample is dependent on the previous one, i.e. z (1) , z (2) , … , z ( τ ) • Transitions of the Markov chain form the proposal distribution q(z|z ( τ ) ) • Asymptotically, these sample are drawn from the desired distribution p(z) 11 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  12. Metropolis algorithm assumes the proposal distribution is symmetric 12 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  13. Visualizing Metropolis algorithm 13 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  14. Metropolis-Hastings algorithm generalizes MA for non- symmetric transitions 14 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  15. Gibbs Sampling is a simple coordinate-wise MCMC method without using proposal dist. 15 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  16. Markov Blanket of an MRF • It is simply the set of neighbouring nodes 16 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  17. Example: Estimate prob. of one node given others in an image denoising MRF • Potentials: – For observing: - η x i y i – For spatial coherence: - β x i x j – For prior: -hx i • P(x i | ~x i ) or P(x i | X\x i ,Y) • What is the Markov blanket of x i ? Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  18. Gibbs sampling as a special case of MH • Proposal distribution: • By holding other dimensions constant: • Also, • So, acceptance probability is: • So, the step is always accepted 18 Source: “Pattern Recognition and Machine Learning”, Book and slides by Christopher Boshop

  19. Issues with Gibbs sampling • Initialization is random • Samples are not independent – Burn-in should be discarded (random initialization may start and wander in low probability region for a time) • Time taken is linear in number of samples • Number of iterations scale with dimensionality

  20. RBM and its energy function defined • RBM is a bi-partite graph between its visible and hidden sets of nodes • Its energy function is: Source: “An Introduction to Restricted Boltzmann Machines”, Asja Fischer and Christian Igel, CIARP 2012

  21. Marginals in an RBM • The hidden nodes can be used to learn a code explaining the visible nodes – In a Deep Belief Net (DBN), more layers can be added on top • Due to its bi-partite nature, the Markov blanket of a node from any set is very simply the other set of nodes • This leads to a simple product form for nodes from a set • This is leading towards a formulation called Product of Experts (PoE) Source: “An Introduction to Restricted Boltzmann Machines”, Asja Fischer and Christian Igel, CIARP 2012

  22. Gibbs Sampling in RBM • Let us look at the marginal of the visible node Source: “An Introduction to Restricted Boltzmann Machines”, Asja Fischer and Christian Igel, CIARP 2012

  23. Gibbs Sampling in RBM • The RBM can be interpreted as a stochastic neural network, for which block Gibbs sampling can be used Source: “An Introduction to Restricted Boltzmann Machines”, Asja Fischer and Christian Igel, CIARP 2012

  24. In summary • Monte Carlo methods are often preferred over analytical methods to estimate probability distributions and their marginals in complex PGMs • Rejection sampling and importance sampling do not use samples effectively – Finding a good proposal distribution can be tricky • Markov Chain Monte Carlo is often preferred over simple Monte Carlo • Initial few samples of MCMC methods are rejected • Metropolis (Hastings) uses a proposal step distribution • Gibbs Sampling is the most preferred MCMC method – It makes use of Markov blankets to compute single variable marginals

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend