advanced simulation lecture 6
play

Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, 2018 Patrick Rebeschini Lecture 6 1/ 25 Markov chain Monte Carlo We are interested in sampling from a distribution , for instance a posterior distribution in a Bayesian


  1. Advanced Simulation - Lecture 6 Patrick Rebeschini January 31th, 2018 Patrick Rebeschini Lecture 6 1/ 25

  2. Markov chain Monte Carlo We are interested in sampling from a distribution π , for instance a posterior distribution in a Bayesian framework. Markov chains with π as invariant distribution can be constructed to approximate expectations with respect to π . For example, the Gibbs sampler generates a Markov chain targeting π defined on R d using the full conditionals π ( x i | x 1 , . . . , x i − 1 , x i +1 , . . . , x d ) . Patrick Rebeschini Lecture 6 2/ 25

  3. Gibbs Sampling Assume you are interested in sampling from π ( x ) = π ( x 1 , x 2 , ..., x d ) . Notation: x − i := ( x 1 , ..., x i − 1 , x i +1 , ..., x d ). � X (1) 1 , ..., X (1) � Systematic scan Gibbs sampler . Let be the d initial state then iterate for t = 2 , 3 , ... 1. Sample X ( t ) � ·| X ( t − 1) , ..., X ( t − 1) � ∼ π X 1 | X − 1 . 1 2 d · · · j. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) j − 1 , X ( t − 1) j +1 , ..., X ( t − 1) � ∼ π X j | X − j . j d · · · d. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) � ∼ π X d | X − d . d − 1 d Patrick Rebeschini Lecture 6 3/ 25

  4. Gibbs Sampling Is the joint distribution π uniquely specified by the conditional distributions π X i | X − i ? Does the Gibbs sampler provide a Markov chain with the correct stationary distribution π ? If yes, does the Markov chain converge towards this invariant distribution? It will turn out to be the case under some mild conditions. Patrick Rebeschini Lecture 6 4/ 25

  5. Hammersley-Clifford Theorem Theorem . Consider a distribution whose density π ( x 1 , x 2 , ..., x d ) is such that supp( π ) = ⊗ d i =1 supp( π X i ). Then for any ( z 1 , ..., z d ) ∈ supp( π ), we have d π X j | X − j ( x j | x 1: j − 1 , z j +1: d ) � π ( x 1 , x 2 , ..., x d ) ∝ π X j | X − j ( z j | x 1: j − 1 , z j +1: d ) . j =1 Proof: we have π ( x 1: d − 1 , x d ) = π X d | X − d ( x d | x 1: d − 1 ) π ( x 1: d − 1 ) , π X d | X − d ( z d | x 1: d − 1 ) π ( x 1: d − 1 ) . π ( x 1: d − 1 , z d ) = Therefore π ( x 1: d ) = π ( x 1: d − 1 , z d ) π X d | X − d ( x d | x 1: d − 1 ) π X d | X − d ( z d | x 1: d − 1 ) . Patrick Rebeschini Lecture 6 5/ 25

  6. Hammersley-Clifford Theorem Similarly, we have π ( x 1: d − 1 , z d ) = π X d − 1 | X − ( d − 1) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d − 2 , z d ) , π ( x 1: d − 2 , z d − 1 , z d ) = π X d − 1 | X − ( d − 1) ( z d − 1 | x 1: d − 2 , z d ) π ( x 1: d − 2 , z d ) hence π X d − 1 | X − ( d − 1) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1) ( z d − 1 | x 1: d − 2 , z d ) × π X d | X − d ( x d | x 1: d − 1 ) π X d | X − d ( z d | x 1: d − 1 ) By iterating, we obtain the theorem, where the multiplicative constant is exactly π ( z 1 , . . . , z d ). Patrick Rebeschini Lecture 6 6/ 25

  7. Example: Non-Integrable Target Consider the following conditionals on R + π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) π X 2 | X 1 ( x 2 | x 1 ) = x 1 exp ( − x 1 x 2 ) . We might expect that these full conditionals define a joint probability density π ( x 1 , x 2 ). Hammersley-Clifford would give π ( x 1 , x 2 , ..., x d ) ∝ π X 1 | X 2 ( x 1 | z 2 ) π X 2 | X 1 ( x 2 | x 1 ) π X 1 | X 2 ( z 1 | z 2 ) π X 2 | X 1 ( z 2 | x 1 ) = z 2 exp ( − z 2 x 1 ) x 1 exp ( − x 1 x 2 ) z 2 exp ( − z 2 z 1 ) x 1 exp ( − x 1 z 2 ) ∝ exp ( − x 1 x 2 ) . � � exp ( − x 1 x 2 ) dx 1 dx 2 is not finite so However π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) and π X 2 | X 1 ( x 1 | x 2 ) = x 1 exp ( − x 1 x 2 ) are not compatible. Patrick Rebeschini Lecture 6 7/ 25

  8. Example: Positivity condition violated 2 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 y ● ● ● ● −1 −2 −2 −1 0 1 2 x Figure: Gibbs sampling targeting π ( x, y ) ∝ 1 [ − 1 , 0] × [ − 1 , 0] ∪ [0 , 1] × [0 , 1] ( x, y ). Patrick Rebeschini Lecture 6 8/ 25

  9. Invariance of the Gibbs sampler The kernel of the Gibbs sampler (case d = 2) is K ( x ( t − 1) , x ( t ) ) = π X 1 | X 2 ( x ( t ) | x ( t − 1) ) π X 2 | X 1 ( x ( t ) | x ( t ) 1 ) 1 2 2 Case d > 2: d π X j | X − j ( x ( t ) | x ( t ) 1: j − 1 , x ( t − 1) K ( x ( t − 1) , x ( t ) ) = � j +1: d ) j j =1 Proposition : The systematic scan Gibbs sampler kernel admits π as invariant distribution. Proof for d = 2. We have � � K ( x, y ) π ( x ) dx = π ( y 2 | y 1 ) π ( y 1 | x 2 ) π ( x 1 , x 2 ) dx 1 dx 2 � = π ( y 2 | y 1 ) π ( y 1 | x 2 ) π ( x 2 ) dx 2 = π ( y 2 | y 1 ) π ( y 1 ) = π ( y 1 , y 2 ) = π ( y ) . Patrick Rebeschini Lecture 6 9/ 25

  10. Irreducibility and Recurrence Proposition : Assume π satisfies the positivity condition, then the Gibbs sampler yields a π − irreducible and recurrent Markov chain. Theorem . Assume the positivity condition is satisfied then we have for any integrable function ϕ : X → R : t lim 1 � � X ( i ) � � ϕ = ϕ ( x ) π ( x ) dx t X i =1 for π − almost all starting value X (1) . Patrick Rebeschini Lecture 6 10/ 25

  11. Example: Bivariate Normal Distribution Let X := ( X 1 , X 2 ) ∼ N ( µ, Σ) where µ = ( µ 1 , µ 2 ) and � � σ 2 ρ 1 Σ = . σ 2 ρ 2 The Gibbs sampler proceeds as follows in this case 1 Sample X ( t ) � � X ( t − 1) � � µ 1 + ρ/σ 2 , σ 2 1 − ρ 2 /σ 2 ∼ N − µ 2 1 2 2 2 2 Sample X ( t ) � � X ( t ) � � µ 2 + ρ/σ 2 , σ 2 2 − ρ 2 /σ 2 ∼ N − µ 1 . 1 1 2 1 By proceeding this way, we generate a Markov chain X ( t ) whose successive samples are correlated. If successive values of X ( t ) are strongly correlated, then we say that the Markov chain mixes slowly. Patrick Rebeschini Lecture 6 11/ 25

  12. Bivariate Normal Distribution ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● −2 0 2 x Figure: Case where ρ = 0 . 1, first 100 steps. Patrick Rebeschini Lecture 6 12/ 25

  13. Bivariate Normal Distribution 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −2 0 2 x Figure: Case where ρ = 0 . 99, first 100 steps. Patrick Rebeschini Lecture 6 13/ 25

  14. Bivariate Normal Distribution 0.4 0.6 0.3 density density 0.4 0.2 0.2 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 1000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 14/ 25

  15. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 10000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 15/ 25

  16. Bivariate Normal Distribution 0.4 0.4 0.3 0.3 density density 0.2 0.2 0.1 0.1 0.0 0.0 −2 0 2 −2 0 2 X X Figure: Histogram of the first component of the chain after 100000 iterations. Small ρ on the left, large ρ on the right. Patrick Rebeschini Lecture 6 16/ 25

  17. Gibbs Sampling and Auxiliary Variables Gibbs sampling requires sampling from π X j | X − j . In many scenarios, we can include a set of auxiliary variables Z 1 , ..., Z p and have an “extended” distribution of joint density π ( x 1 , ..., x d , z 1 , ..., z p ) such that � π ( x 1 , ..., x d , z 1 , ..., z p ) dz 1 ...dz d = π ( x 1 , ..., x d ) . which is such that its full conditionals are easy to sample. Mixture models, Capture-recapture models, Tobit models, Probit models etc. Patrick Rebeschini Lecture 6 17/ 25

  18. Mixtures of Normals mixture population 1 0.5 population 2 population 3 0.4 density 0.3 0.2 0.1 0.0 -2 -1 0 1 Independent data y 1 , ..., y n t K � � � µ k , σ 2 Y i | θ ∼ p k N k k =1 � p 1 , ..., p K , µ 1 , ..., µ K , σ 2 1 , ..., σ 2 � . where θ = K Patrick Rebeschini Lecture 6 18/ 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend