stochastic simulation markov chain monte carlo
play

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk MCMC: What we aim to achieve MCMC: What we aim to achieve We


  1. Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

  2. MCMC: What we aim to achieve MCMC: What we aim to achieve We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of X i ’s • which each has the same distribution as X • but we allow them to be interdependent. This is an inverse problem relative to the queueing exercise: We start with the distribution of X , and aim to design a state machine which has this steady-state distribution. DTU 02443 – lecture 8 2

  3. MCMC example from Bayesian statistics MCMC example from Bayesian statistics Prior distribution of parameter P ∼ U (0 , 1) : f P ( p ) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial( n, p ) i.e. the data has the conditional probabilities    n  P i (1 − P ) n − i P ( X = i | P ) = i DTU 02443 – lecture 8 3

  4. The posterior distribution of P The posterior distribution of P Conditional density of parameter, given observed data X = i (the posterior distribution): f P | X = i ( p ) = f P ( p ) P ( X = i | P = p ) P ( X = i ) We need the unconditional probability of the observation:   � 1  n  p i (1 − p ) n − i dp P ( X = i ) = f P ( p ) i 0 We can evaluate this; in more complex models we could not. AIM: To sample from f P | X = i , without evaluating c = 1 / P ( X = i ) . DTU 02443 – lecture 8 4

  5. The posterior distribution The posterior distribution Time series Histogram of Pvec 0.7 1400 0.6 0.5 1200 P 0.4 0.3 1000 0.2 0.1 800 Frequency 0 20 40 60 80 100 Iteration no. 600 400 200 0 0.2 0.4 0.6 0.8 DTU Pvec 02443 – lecture 8 5

  6. When to apply MCMC? When to apply MCMC? The distribution is given by f ( x ) = c · g ( x ) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). 1 c = � X g ( x ) dx This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals DTU 02443 – lecture 8 6

  7. Metropolis-Hastings algorithm Metropolis-Hastings algorithm • Proposal distribution h ( x , y ) • Acceptance of solution? The solution will be accepted with probability � � � � 1 , f ( y ) h ( y , x ) 1 , g ( y ) h ( y , x ) min = min f ( x ) h ( x , y ) g ( x ) h ( x , y ) • Avoiding the troublesome constant c ! • Frequently we apply a symmetric proposal distribution h ( y , x ) = h ( y , x ) Metropolis algorithm to get � � � � 1 , g ( y ) = min for h ( y , x ) = h ( x , y ) g ( x ) DTU 02443 – lecture 8 7

  8. Metropolis Hastigs algorithm and local balance Metropolis Hastigs algorithm and local balance The transition rate q ( x , y ) from x to y and vice versa is � 1 , g ( y ) h ( y , x ) � q ( x , y ) = h ( x , y ) min g ( x ) h ( x , y ) and � � 1 , g ( x ) h ( x , y ) q ( y , x ) = h ( y , x ) min g ( y ) h ( y , x ) Suppose g ( y ) h ( y , x ) < g ( x ) h ( x , y ) then f ( x ) q ( x , y ) = cg ( x ) h ( x , y ) g ( y ) h ( y , x ) g ( x ) h ( x , y ) = cg ( y ) h ( y , x ) = f ( y ) q ( y , x ) DTU 02443 – lecture 8 8

  9. Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings A simple symmetric proposal distribution is the random walk 1. At iteration i , the state is X i 2. Propose to jump from X i to Y i = X i + ∆ X i where ∆ X i is sampled indepedently from a symmetric distribution • If g ( Y ) ≥ g ( X i ) , accept • If g ( Y ) ≤ g ( X i ) , accept w.p. g ( Y ) /g ( X i ) 3. On accept: Set X i +1 = Y i and goto 1. 4. On reject: Set X i +1 = X i and goto 1. DTU 02443 – lecture 8 9

  10. Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998) • A good proposal distribution has the following properties ⋄ For any x , it is easy to sample from h ( x , y ) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently DTU 02443 – lecture 8 10

  11. Illustration of ordinary MCMC sampling Illustration of ordinary MCMC sampling A new proposal can be anywhere in the full region DTU 02443 – lecture 8 11

  12. Gibss sampling Gibss sampling • Applies in multivariate cases where the conditional distribution among the coordinates are known. • For a multidimensional distribution x the Gibss sampler will modify only one coordinate at a time. • Typically d -steps in each iteration, where d is the dimension of the parameter space , that is of x DTU 02443 – lecture 8 12

  13. Gibbs sampling - first dimension Gibbs sampling - first dimension • Each dimension is updated at a time. Here the first. DTU 02443 – lecture 8 13

  14. Gibbs sampling - second dimension Gibbs sampling - second dimension • Each dimension is updated at a time. Here the second. DTU 02443 – lecture 8 14

  15. Different perspective modelling with Markov Different perspective modelling with Markov chains as oppposed to MCMC sampling chains as oppposed to MCMC sampling • For an ordinary Markov chain we know P and find π - analytically or by simulation • When we apply MCMC ⋄ For a discrete distribution we have π = c α construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we have the density f ( x ) = cg ( x ) , construct a transition kernel P ( x , y ) and get f ( x ) by simulation. DTU 02443 – lecture 8 15

  16. Remarks Remarks • The method is computer intensitive • It is hard to verify the assumptions (Read: impossible) • Warmup period strongly recommended (necessary indeed!) • The samples are correlated • Should be run several times with different starting conditions ⋄ Comparing within run variance with between run variance • Check the BUGS site: http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site DTU 02443 – lecture 8 16

  17. Further reading Further reading • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5 • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1 DTU 02443 – lecture 8 17

  18. Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings • Proposed points Y i can be generated with other schemes - this would change the acceptance probabilities. • In mulitvariate situations, we can process one co-ordinate at a time • If we know conditional distributions in the mulitvariate setting, then we can apply Gibbs sampling • This is well suited for graphical models with many variables, which each interact only with a few others • (Decision support systems is a big area of application) • Many hybrids and specialized versions exist • Very active research area, both theory and applications

  19. Exercise 6: Markov Chain Monte Carlo Exercise 6: Markov Chain Monte Carlo simulation simulation 1. The number of busy lines in a trunk group (Erlang system) is given by a truncated Poisson distribution A i i ! P ( i ) = , j = 0 , . . . m � m A j j =0 j ! Generate values from this distribution by applying the Metropolis-Hastings algorithm, verify with a χ 2 -test. You can use the parameter values from exercise 4. 2. For two different call types the joint number of occupied lines is given by A j A i P ( i, j ) = 1 1 2 0 ≤ i + j ≤ m K i ! j !

  20. You can use A 1 , A 2 = 4 and n = 10 . (a) Use Metropolis-Hastings, directly to generate variates from this distribution. (b) Use Metropolis-Hastings, coordinate wise to generate variates from this distribution. (c) Use Gibbs sampling to sample from the distribution. This is (also) coordinate-wise but here we use the exact conditional distributions. You will need to find the conditional distributions analytically. In all three cases test the distribution fit with a χ 2 test The system can be extended to an arbitrary dimension, and we can add restrictions on the different call types.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend