advanced simulation lecture 6
play

Advanced Simulation - Lecture 6 George Deligiannidis February 3rd, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 6 George Deligiannidis February 3rd, 2016 Irreducibility and Recurrence Proposition Assume satisfies the positivity condition, then the Gibbs sampler yields a irreducible and recurrent Markov chain.


  1. Advanced Simulation - Lecture 6 George Deligiannidis February 3rd, 2016

  2. Irreducibility and Recurrence Proposition Assume π satisfies the positivity condition, then the Gibbs sampler yields a π − irreducible and recurrent Markov chain. Proof. Recurrence. Will follow from irreducibility and the fact that π is invariant. Irreducibility. Let X ⊂ R d , such that π ( X ) = 1. Write K for the kernel and let A ⊂ X such that π ( A ) > 0. Then for any x ∈ X � K ( x , A ) = A K ( x , y ) d y � = A π X 1 | − 1 ( y 1 | x 2 , . . . , x d ) × · · · × π X d | X − d ( y d | y 1 , . . . , y d − 1 ) d y . Lecture 6 Gibbs Sampling Asymptotics 2 / 1

  3. Proof. Thus if for some x ∈ X and A with π ( A ) > 0 we have K ( x , A ) = 0, we must have that π X 1 | X − 1 ( y 1 | x 2 , . . . , x d ) × · · · × π X d | X − d ( y d | y 1 , . . . , y d − 1 ) = 0, for π -almost all y = ( y 1 , . . . , y d ) ∈ A . Therefore we must also have that � � � � y 1: j − 1 , x j + 1: d y j π X j | X − j d ∏ � � = 0, π ( y 1 , x 2 , ..., y d ) ∝ � � y 1: j − 1 , x j + 1: d π X j | X − j x j j = 1 for almost all y = ( y 1 , . . . , y d ) ∈ A and thus π ( A ) = 0 obtaining a contradiction. Lecture 6 Gibbs Sampling Asymptotics 3 / 1

  4. LLN for Gibbs Sampler Theorem Assume the positivity condition is satisfied then we have for any integrable function ϕ : X → R : � � X ( i ) � t lim 1 ∑ ϕ = X ϕ ( x ) π ( x ) dx t i = 1 for π − almost all starting value X ( 1 ) . Lecture 6 Gibbs Sampling Asymptotics 4 / 1

  5. Example: Bivariate Normal Distribution Let X : = ( X 1 , X 2 ) ∼ N ( µ , Σ ) where µ = ( µ 1 , µ 2 ) and � σ 2 � ρ Σ = 1 . σ 2 ρ 2 The Gibbs sampler proceeds as follows in this case � � � � 1 Sample X ( t ) X ( t − 1 ) µ 1 + ρ / σ 2 , σ 2 1 − ρ 2 / σ 2 ∼ N − µ 2 1 2 2 2 � � � � 2 Sample X ( t ) X ( t ) µ 2 + ρ / σ 2 , σ 2 2 − ρ 2 / σ 2 ∼ N 1 − µ 1 . 2 1 1 By proceeding this way, we generate a Markov chain X ( t ) whose successive samples are correlated. If successive values of X ( t ) are strongly correlated, then we say that the Markov chain mixes slowly. Lecture 6 Gibbs Sampling Asymptotics 5 / 1

  6. Bivariate Normal Distribution ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● −2 0 2 x Figure: Case where ρ = 0.1, first 100 steps. Lecture 6 Gibbs Sampling Asymptotics 6 / 1

  7. Bivariate Normal Distribution 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 0 2 x Figure: Case where ρ = 0.99, first 100 steps. Lecture 6 Gibbs Sampling Asymptotics 7 / 1

  8. Bivariate Normal Distribution 0.4 0.6 0.3 density density 0.4 0.2 0.2 0.1 0.0 0.0 −2 0 2 −2 0 2 X X (a) Figure A (b) Figure B Figure: Histogram of the first component of the chain after 1000 iterations. Small ρ on the left, large ρ on the right. Lecture 6 Gibbs Sampling Asymptotics 8 / 1

  9. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X (a) b (b) b Figure: Histogram of the first component of the chain after 10000 iterations. Small ρ on the left, large ρ on the right. Lecture 6 Gibbs Sampling Asymptotics 9 / 1

  10. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X (a) Figure A (b) Figure B Figure: Histogram of the first component of the chain after 100000 iterations. Small ρ on the left, large ρ on the right. Lecture 6 Gibbs Sampling Asymptotics 10 / 1

  11. Gibbs Sampling and Auxiliary Variables Gibbs sampling requires sampling from π X j | X − j . In many scenarios, we can include a set of auxiliary variables Z 1 , ..., Z p and have an “extended” distribution of joint density � � π x 1 , ..., x d , z 1 , ..., z p such that � � � dz 1 ... dz d = π ( x 1 , ..., x d ) . π x 1 , ..., x d , z 1 , ..., z p which is such that its full conditionals are easy to sample. Mixture models, Capture-recapture models, Tobit models, Probit models etc. Lecture 6 Gibbs Sampling Asymptotics 11 / 1

  12. Mixtures of Normals mixture population 1 0.5 population 2 population 3 0.4 density 0.3 0.2 0.1 0.0 -2 -1 0 1 Independent data y 1 , ..., y n t K � � ∑ µ k , σ 2 Y i | θ ∼ p k N k k = 1 � � p 1 , ..., p K , µ 1 , ..., µ K , σ 2 1 , ..., σ 2 where θ = . K Lecture 6 Gibbs Sampling Asymptotics 12 / 1

  13. Bayesian Model Likelihood function  2 � � n n K − ( y i − µ k ) p k   . ∏ ∏ ∑ p ( y 1 , ..., y n | θ ) = p ( y i | θ ) = � exp 2 σ 2 2 πσ 2 i = 1 i = 1 k = 1 k k Let’s fix K = 2, σ 2 k = 1 and p k = 1/ K for all k . Prior model K ∏ p ( θ ) = p ( µ k ) k = 1 where µ k ∼ N ( α k , β k ) . Let us fix α k = 0, β k = 1 for all k . Not obvious how to sample p ( µ 1 | µ 2 , y 1 , . . . , y n ) . Lecture 6 Gibbs Sampling Asymptotics 13 / 1

  14. Auxiliary Variables for Mixture Models Associate to each Y i an auxiliary variable Z i ∈ { 1, ..., K } such that � � µ k , σ 2 P ( Z i = k | θ ) = p k and Y i | Z i = k , θ ∼ N k so that K � � y i ; µ k , σ 2 ∑ p ( y i | θ ) = P ( Z i = k ) N k k = 1 The extended posterior is given by n p ( θ , z 1 , ..., z n | y 1 , ..., y n ) ∝ p ( θ ) ∏ P ( z i | θ ) p ( y i | z i , θ ) . i = 1 Gibbs samples alternately P ( z 1: n | y 1: n , µ 1: K ) p ( µ 1: K | y 1: n , z 1: n ) . Lecture 6 Gibbs Sampling Asymptotics 14 / 1

  15. Gibbs Sampling for Mixture Model We have n P ( z 1: n | y 1: n , θ ) = ∏ P ( z i | y i , θ ) i = 1 where P ( z i | θ ) p ( y i | z i , θ ) P ( z i | y i , θ ) = ∑ K k = 1 P ( z i = k | θ ) p ( y i | z i = k , θ ) Let n k = ∑ n i = 1 1 { k } ( z i ) , n k y k = ∑ n i = 1 y i 1 { k } ( z i ) then � n k y k � 1 µ k | z 1: n , y 1: n ∼ N , . 1 + n k 1 + n k Lecture 6 Gibbs Sampling Asymptotics 15 / 1

  16. Mixtures of Normals 0.2 density 0.1 0.0 −5.0 −2.5 0.0 2.5 5.0 observations Figure: 200 points sampled from 1 2 N ( − 2, 1 ) + 1 2 N ( 2, 1 ) . Lecture 6 Gibbs Sampling Asymptotics 16 / 1

  17. Mixtures of Normals 3 density 2 1 0 −2 0 2 µ 1 3 density 2 1 0 −2 0 2 µ 2 Figure: Histogram of the parameters obtained by 10, 000 iterations of Gibbs sampling. Lecture 6 Gibbs Sampling Asymptotics 17 / 1

  18. Mixtures of Normals 2 1 value 0 −1 −2 0 2500 5000 7500 10000 iteration µ 1 µ 2 variable Figure: Traceplot of the parameters obtained by 10, 000 iterations of Gibbs sampling. Lecture 6 Gibbs Sampling Asymptotics 18 / 1

  19. Gibbs sampling in practice Many posterior distributions can be automatically decomposed into conditional distributions by computer programs. This is the idea behind BUGS (Bayesian inference Using Gibbs Sampling), JAGS (Just another Gibbs Sampler). Lecture 6 Gibbs Sampling Asymptotics 19 / 1

  20. Outline Given a target π ( x ) = π ( x 1 , x 2 , ..., x d ) , Gibbs sampling � � � � x − j works by sampling from π X j | X − j x j for j = 1, ..., d . Sampling exactly from one of these full conditionals might be a hard problem itself. Even if it is possible, the Gibbs sampler might converge slowly if components are highly correlated. If the components are not highly correlated then Gibbs sampling performs well, even when d → ∞ , e.g. with an error increasing “only” polynomially with d . Metropolis–Hastings algorithm (1953, 1970) is a more general algorithm that can bypass these problems. Additionally Gibbs can be recovered as a special case. Lecture 6 Gibbs Sampling Asymptotics 20 / 1

  21. Metropolis–Hastings algorithm Target distribution on X = R d of density π ( x ) . Proposal distribution: for any x , x ′ ∈ X , we have q ( x ′ | x ) ≥ 0 and � X q ( x ′ | x ) dx ′ = 1. Starting with X ( 1 ) , for t = 2, 3, ... � ·| X ( t − 1 ) � 1 Sample X ⋆ ∼ q . 2 Compute X ( t − 1 ) � � � X ⋆ �   � � X ⋆ | X ( t − 1 ) � π ( X ⋆ ) q  1,  . = min α � X ( t − 1 ) � � X ⋆ | X ( t − 1 ) � π q � X ⋆ | X ( t − 1 ) � , set X ( t ) = X ⋆ , 3 Sample U ∼ U [ 0,1 ] . If U ≤ α otherwise set X ( t ) = X ( t − 1 ) . Lecture 6 Metropolis–Hastings 21 / 1

  22. Metropolis–Hastings algorithm 2 ● y 0 −2 −2 0 2 x Figure: Metropolis–Hastings on a bivariate Gaussian target. Lecture 6 Metropolis–Hastings 22 / 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend