stochastic simulation markov chain monte carlo
play

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen - PowerPoint PPT Presentation

Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk The queueing example The queueing example We simulated the


  1. Stochastic Simulation Markov Chain Monte Carlo Bo Friis Nielsen Institute of Mathematical Modelling Technical University of Denmark 2800 Kgs. Lyngby – Denmark Email: bfni@dtu.dk

  2. The queueing example The queueing example We simulated the system until “stochastic steady state”. We were then able to describe this steady state: • What is the distribution of occupied servers • What is the rejection probability The model was a “state machine”, i.e. a Markov Chain. To obtain steady-state statistics, we used stochastic simulation, i.e. Monte Carlo. DTU 02443 – lecture 8 2

  3. Discrete time Markov chains Discrete time Markov chains • We observe a sequence of X n s taking values in some sample space • The Next value in the sequence X n +1 is determined from some decision rule depending on the value of X n only. • For discrete sample space we can express the decision rule as a matrix of transition probabilities P = { p ij } , p ij = P ( X n +1 = j | X n = i ) • Under some technical assumptions we can find a stationary and limiting distribution π .π j = P ( X ∞ = j ) . • This distribution can be analytically found by solving π = π P (equilibrium distribution) DTU 02443 – lecture 8 3

  4. Markov chains continued Markov chains continued • The theory can be extended to: ⋄ Continuous sample space or ⋄ Continuous time: exercise 4 is an example of a Continuous time Markov chain DTU 02443 – lecture 8 4

  5. The probability of X n The probability of X n • The behaviour of the process itself - X n • The behaviour conditional on X 0 = i is ( p ij ( n )) • Define P ( X n = j ) = µ ( n ) with P ( X 0 = j ) = µ (0) j j µ ( n ) = { µ ( n ) • with � j } we find µ ( n ) = � µ ( n − 1) P = � µ (0) P n = � µ (0) P n � DTU 02443 – lecture 8 5

  6. Small example Small example   1 − p p 0 0   q 0 p 0   P =     0 q 0 p     0 0 q 1 − q � 1 µ (0) = 3 , 0 , 0 , 2 � with � we get 3   1 − p p 0 0   q 0 p 0 � 1 � � 1 − p � 3 , 0 , 0 , 2 , p 3 , 2 q 3 , 2(1 − q )   µ (1) = � =   3 3 3   0 q 0 p     0 0 q 1 − q DTU 02443 – lecture 8 6

  7. and and � 1 � 3 , 0 , 0 , 2 µ (0) = � , 3 (1 − p ) 2 + pq   p 2 (1 − p ) p 0   p 2 q (1 − p ) 2 qp 0 P 2 =       q 2 0 2 qp p (1 − q )     (1 − q ) 2 + qp q 2 0 (1 − q ) q DTU 02443 – lecture 8 7

  8. � 1 � 3 , 0 , 0 , 2 µ (2) = � · 3  (1 − p ) 2 + pq  p 2 (1 − p ) p 0   p 2 q (1 − p ) 2 qp 0       q 2 0 2 qp p (1 − q )     (1 − q ) 2 + qp q 2 0 (1 − q ) q � (1 − p ) 2 + pq � , (1 − p ) p , 4 qp 3 , 2 p (1 − q ) = 3 3 3 DTU 02443 – lecture 8 8

  9. MCMC: What we aim to achieve MCMC: What we aim to achieve We have a variable X with a “complicated” distribution. We cannot sample X directly. We aim to generate a sequence of X i ’s • which each has the same distribution as X • but we allow them to be interdependent. This is an inverse problem relative to the queueing exercise: We start with the distribution of X , and aim to design a state machine which has this steady-state distribution. DTU 02443 – lecture 8 9

  10. MCMC example from Bayesian statistics MCMC example from Bayesian statistics Prior distribution of parameter P ∼ U (0 , 1) : f P ( p ) = 1 (0 ≤ p ≤ 1) Distribution of data, conditional on parameter X for given P = p is Binomial( n, p ) i.e. the data has the conditional probabilities    n  P i (1 − P ) n − i P ( X = i | P ) = i DTU 02443 – lecture 8 10

  11. The posterior distribution of P The posterior distribution of P Conditional density of parameter, given observed data X = i : f P | X = i ( p ) = f P ( p ) P ( X = i | P = p ) P ( X = i ) We need the unconditional probability of the observation:   � 1  n  p i (1 − p ) n − i dp P ( X = i ) = f P ( p ) i 0 We can evaluate this; in more complex models we could not. AIM: To sample from f P | X = i , without evaluating P ( X = i ) . DTU 02443 – lecture 8 11

  12. The posterior distribution The posterior distribution Time series Histogram of Pvec 0.7 1400 0.6 0.5 1200 P 0.4 0.3 1000 0.2 0.1 800 Frequency 0 20 40 60 80 100 Iteration no. 600 400 200 0 0.2 0.4 0.6 0.8 DTU Pvec 02443 – lecture 8 12

  13. When to apply MCMC? When to apply MCMC? The distribution is given by f ( x ) = c · g ( x ) where the unnormalized density g can be evaluated, but the normalising constant c cannot be evaluated (easily). 1 c = � X g ( x ) dx This is frequently the case in Bayesian statistics - the posterior density is proportional to the likelihood function Note (again) the similarity between simulation and evaluation of integrals DTU 02443 – lecture 8 13

  14. Metropolis-Hastings algorithm Metropolis-Hastings algorithm • Proposal distribution h ( x , y ) • Acceptance of solution? The solution will be accepted with probability � � � � 1 , f ( y ) h ( y , x ) 1 , g ( y ) h ( y , x ) min = min f ( x ) h ( x , y ) g ( x ) h ( x , y ) � � � � 1 , g ( y ) = min for h ( y , x ) = h ( x , y ) g ( x ) • Avoiding the troublesome constant K ! • Frequently we apply a symmetric proposal distribution h ( y , x ) = h ( y , x ) Metropolis algorithm • It can be shown that this Markov chain will have f ( x ) as stationary distribution. DTU 02443 – lecture 8 14

  15. Random Walk Metropolis-Hastings Random Walk Metropolis-Hastings Sampling from p.d.f. c · g ( x ) where c is unknown. 1. At iteration i , the state is X i 2. Propose to jump from X i to Y i = X i + ∆ X i where ∆ X i is sampled indepedently from a symmetric distribution • If g ( Y ) ≥ g ( X i ) , accept • If g ( Y ) ≤ g ( X i ) , accept w.p. g ( Y ) /g ( X i ) 3. On accept: Set X i +1 = Y i and goto 1. 4. On reject: Set X i +1 = X i and goto 1. Note that knowing c is not necessary! DTU 02443 – lecture 8 15

  16. Proposal distribution (Gelman 1998) Proposal distribution (Gelman 1998) • A good proposal distribution has the following properties ⋄ For any x , it is easy to sample from h ( x , y ) ⋄ It is easy to compute the accpetance probability ⋄ Each jump goes a reasonable distance in the parameter space ⋄ The proposals are not rejected too frequently DTU 02443 – lecture 8 16

  17. Gibss sampling Gibss sampling • Applies in multivariate cases where the conditional distribution among the coordinates are known. • For a multidimensional distribution x the Gibss sampler will modify only one coordinate at a time. • Typically d -steps in each iteration, where d is the dimension of the parameter space , that is of x DTU 02443 – lecture 8 17

  18. Illustration of ordinary and MCMC sampling Illustration of ordinary and MCMC sampling DTU 02443 – lecture 8 18

  19. Gibbs sampling - first dimension Gibbs sampling - first dimension DTU 02443 – lecture 8 19

  20. Gibbs sampling - second dimension Gibbs sampling - second dimension DTU 02443 – lecture 8 20

  21. Direct Markov chain as oppposed to MCMC Direct Markov chain as oppposed to MCMC • For an ordinary Markov chain we know P and find π - analytically or by simulation • When we apply MCMC ⋄ For a discrete distribution we know K π construct P which has no physical interpretation in general and obtain π by simulation ⋄ For a continuous distribution we know g ( x ) construct a transition kernel P ( x , y ) and get f ( x ) by simulation. DTU 02443 – lecture 8 21

  22. Remarks Remarks • The method is computer intensitive • It is hard to verify the assumptions (Read: impossible) • Warmup period strongly recommended (necessary indeed!) • The samples are correlated • Should be run several times with different starting conditions ⋄ Comparing within run variance with between run variance • Check the BUGS site: http://www.mrc-bsu.cam.ac.uk/bugs/and/or links given at the BUGS site DTU 02443 – lecture 8 22

  23. Further reading Further reading • A. Gelman, J.B. Carlin, H.S. Stern, D.B. Rubin: Bayesian Data Analysis, Chapmann & Hall 1998, ISBN 0 412 03991 5 • W.R. Gilks, S. Richarson, D.J. Spiegelhalter: Markov chain Mone Carlo in practice, Chapmann & Hall 1996, ISBN 0 412 05551 1 DTU 02443 – lecture 8 23

  24. Beyond Random Walk Metropolis-Hastings Beyond Random Walk Metropolis-Hastings • Proposed points Y i can be generated with other schemes - this would change the acceptance probabilities. • In mulitvariate situations, we can process one co-ordinate at a time • If we know conditional distributions in the mulitvariate setting, then we can apply Gibbs sampling • This is well suited for graphical models with many variables, which each interact only with a few others • (Decision support systems is a big area of application) • Many hybrids and specialized versions exist • Very active research area, both theory and applications

  25. Exercise 6: Markov Chain Monte Carlo Exercise 6: Markov Chain Monte Carlo simulation simulation • The number of busy lines in a trunk group (Erlang system) is given by a truncated Poisson distribution A i i ! P ( i ) = � n A j j =0 j ! • Generate values from this distribution by applying the Metropolis-Hastings algorithm, verify with a χ 2 -test. You can use the parameter values from exercise 4.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend