advanced simulation lecture 5
play

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016 Irreducibility and aperiodicity Definition Given a distribution over X , a Markov chain is -irreducible if K t ( x , A ) > 0. x X A : ( A ) > 0


  1. Advanced Simulation - Lecture 5 George Deligiannidis February 1st, 2016

  2. Irreducibility and aperiodicity Definition Given a distribution µ over X , a Markov chain is µ -irreducible if K t ( x , A ) > 0. ∀ x ∈ X ∀ A : µ ( A ) > 0 ∃ t ∈ N A µ -irreducible Markov chain of transition kernel K is periodic if there exists some partition of the state space X 1 , ..., X d for d ≥ 2, such that � 1 � = j = i + s mod d � X t ∈ X i � � ∀ i , j , t , s : P X t + s ∈ X j . 0 otherwise. Otherwise the chain is aperiodic. Lecture 5 Continuous State Markov Chains 2 / 40

  3. Recurrence and Harris Recurrence For any measurable set A of X , let ∞ ∑ η A = I A ( X k ) = # of visits to A . k = 1 Definition A µ -irreducible Markov chain is recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ A E x ( η A ) = ∞ . A µ -irreducible Markov chain is Harris recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ X P x ( η A = ∞ ) = 1. Harris recurrence is stronger than recurrence. Lecture 5 Continuous State Markov Chains 3 / 40

  4. Invariant Distribution and Reversibility Definition A distribution of density π is invariant or stationary for a Markov kernel K , if � X π ( x ) K ( x , y ) dx = π ( y ) . A Markov kernel K is π -reversible if �� ∀ f f ( x , y ) π ( x ) K ( x , y ) dxdy �� = f ( y , x ) π ( x ) K ( x , y ) dxdy where f is a bounded measurable function. Lecture 5 Continuous State Markov Chains 4 / 40

  5. Detailed balance In practice it is easier to check the detailed balance condition: ∀ x , y ∈ X π ( x ) K ( x , y ) = π ( y ) K ( y , x ) Lemma If detailed balance holds, then π is invariant for K and K is π -reversible. Example: the Gaussian AR process is π -reversible, π -invariant for τ 2 � � π ( x ) = N x ; 0, 1 − ρ 2 when | ρ | < 1. Lecture 5 Continuous State Markov Chains 5 / 40

  6. Checking for recurrence It’s often straightforward to check for irreducibility, or for an invariant measure but not so for recurrence. Proposition If the chain is µ -irreducible and admits an invariant measure then the chain is recurrent. Remark: A chain that is µ -irreducible and admits an invariant measure is called a positive. Lecture 5 Continuous State Markov Chains 6 / 40

  7. Law of Large Numbers Theorem If K is a π -irreducible, π -invariant Markov kernel, then for any integrable function ϕ : X → R : t 1 � ∑ ϕ ( X i ) = X ϕ ( x ) π ( x ) dx lim t t → ∞ i = 1 almost surely, for π − almost all starting values x. Theorem If K is a π -irreducible, π -invariant, Harris recurrent Markov chain, then for any integrable function ϕ : X → R : t 1 � ∑ lim ϕ ( X i ) = X ϕ ( x ) π ( x ) dx t t → ∞ i = 1 almost surely, for any starting value x. Lecture 5 Limit theorems 7 / 40

  8. Convergence Theorem Suppose the kernel K is π -irreducible, π -invariant, aperiodic. Then, we have � � K t ( x , y ) − π ( y ) � � � dy = 0 lim t → ∞ X for π − almost all starting values x. Under some additional conditions, one can prove that a chain is geometrically ergodic, i.e. there exists ρ < 1 and a function M : X → R + such that for all measurable set A : | K n ( x , A ) − π ( A ) | ≤ M ( x ) ρ n , for all n ∈ N . In other words, we can obtain a rate of convergence. Lecture 5 Limit theorems 8 / 40

  9. Central Limit Theorem Theorem Under regularity conditions, for a Harris recurrent, π -invariant Markov chain, we can prove � � √ t 1 � 0, σ 2 ( ϕ ) D ∑ ϕ ( X i ) − − t → ∞ N − → � � t X ϕ ( x ) π ( x ) dx , t i = 1 where the asymptotic variance can be written ∞ σ 2 ( ϕ ) = V π [ ϕ ( X 1 )] + 2 ∑ C ov π [ ϕ ( X 1 ) , ϕ ( X k )] . k = 2 This formula shows that (positive) correlations increase the asymptotic variance, compared to i.i.d. samples for which the variance would be V π ( ϕ ( X )) . Lecture 5 Limit theorems 9 / 40

  10. Central Limit Theorem Example: for the AR Gaussian model, x ; 0, τ 2 / ( 1 − ρ 2 ) π ( x ) = N � � for | ρ | < 1 and τ 2 C ov ( X 1 , X k ) = ρ k − 1 V [ X 1 ] = ρ k − 1 1 − ρ 2 . Therefore with ϕ ( x ) = x , � � τ 2 ∞ τ 2 τ 2 1 + ρ σ 2 ( ϕ ) = ρ k ∑ 1 + 2 = 1 − ρ = ( 1 − ρ ) 2 , 1 − ρ 2 1 − ρ 2 k = 1 which increases when ρ → 1. Lecture 5 Limit theorems 10 / 40

  11. Markov chain Monte Carlo We are interested in sampling from a distribution π , for instance a posterior distribution in a Bayesian framework. Markov chains with π as invariant distribution can be constructed to approximate expectations with respect to π . For example, the Gibbs sampler generates a Markov chain targeting π defined on R d using the full conditionals π ( x i | x 1 , . . . , x i − 1 , x i + 1 , . . . , x d ) . Lecture 5 MCMC 11 / 40

  12. Gibbs Sampling Assume you are interested in sampling from x R d . π ( x ) = π ( x 1 , x 2 , ..., x d ) , Notation: x − i : = ( x 1 , ..., x i − 1 , x i + 1 , ..., x d ) . � � X ( 1 ) 1 , ..., X ( 1 ) Systematic scan Gibbs sampler . Let be the d initial state then iterate for t = 2, 3, ... 1. Sample X ( t ) � ·| X ( t − 1 ) , ..., X ( t − 1 ) � ∼ π X 1 | X − 1 . 1 2 d · · · j. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) j − 1 , X ( t − 1 ) j + 1 , ..., X ( t − 1 ) � ∼ π X j | X − j . j d · · · d. Sample X ( t ) � ·| X ( t ) 1 , ..., X ( t ) � ∼ π X d | X − d . d d − 1 Lecture 5 MCMC Gibbs Sampling 12 / 40

  13. Gibbs Sampling Is the joint distribution π uniquely specified by the conditional distributions π X i | X − i ? Does the Gibbs sampler provide a Markov chain with the correct stationary distribution π ? If yes, does the Markov chain converge towards this invariant distribution? It will turn out to be the case under some mild conditions. Lecture 5 MCMC Gibbs Sampling 13 / 40

  14. Hammersley-Clifford Theorem I Theorem Consider a distribution whose density π ( x 1 , x 2 , ..., x d ) is such that supp ( π ) = ⊗ d i = 1 supp ( π X i ) . Then for any ( z 1 , ..., z d ) ∈ supp ( π ) , we have � � x 1: j − 1 , z j + 1: d � � x j π X j | X − j d ∏ � . π ( x 1 , x 2 , ..., x d ) ∝ � x 1: j − 1 , z j + 1: d � � π X j | X − j z j j = 1 Remark: The condition above is the positivity condition. Equivalently, if π X i ( x i ) > 0 for i = 1, . . . , d , then π ( x 1 , . . . , x d ) > 0. Lecture 5 MCMC Gibbs Sampling 14 / 40

  15. Proof of Hammersley-Clifford Theorem Proof. We have π ( x 1: d − 1 , x d ) = π X d | X − d ( x d | x 1: d − 1 ) π ( x 1: d − 1 ) , π ( x 1: d − 1 , z d ) = π X d | X − d ( z d | x 1: d − 1 ) π ( x 1: d − 1 ) . Therefore π ( x 1: d ) = π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , x d ) π ( x 1: d − 1 , z d ) = π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , x d ) / π ( x 1: d − 1 ) π ( x 1: d − 1 , z d ) / π ( x 1: d − 1 ) π X d | X 1: d − 1 ( x d | x 1: d − 1 ) = π ( x 1: d − 1 , z d ) π X d | X 1: d − 1 ( z d | x 1: d − 1 ) . Lecture 5 MCMC Gibbs Sampling 15 / 40

  16. Proof. Similarly, we have π ( x 1: d − 1 , z d ) π ( x 1: d − 1 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 1 , z d ) / π ( x 1: d − 2 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π ( x 1: d − 2 , z d − 1 , z d ) / π ( x 1: d − 2 , z d ) π X d − 1 | X − ( d − 1 ) ( x d − 1 | x 1: d − 2 , z d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1 ) ( z d − 1 | x 1: d − 2 , z d ) hence π X d − 1 | X − ( d − 1 ) ( x d − 1 | x 1: d − 2 , z d ) π ( x 1: d ) = π ( x 1: d − 2 , z d − 1 , z d ) π X d − 1 | X − ( d − 1 ) ( z d − 1 | x 1: d − 2 , z d ) π X d | X − d ( x d | x 1: d − 1 ) × π X d | X − d ( z d | x 1: d − 1 ) Lecture 5 MCMC Gibbs Sampling 16 / 40

  17. Proof. By z ∈ supp ( π ) we have that π X i ) ( z i ) > 0 for all i . Also, we are allowed to suppose that π X i ( x i ) > 0 for all i . Thus all the conditional probabilities we introduce are positive since π X j | X − j ( x j | x 1 , . . . , x j − 1 , z j + 1 , . . . , z d ) = π ( x 1 , . . . , x j − 1 , x j , z j + 1 , . . . , z d ) π ( x 1 , . . . , x j − 1 , z j , z j + 1 , . . . , z d ) > 0. By iterating we have the theorem. Lecture 5 MCMC Gibbs Sampling 17 / 40

  18. Example: Non-Integrable Target Consider the following conditionals on R + π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) π X 2 | X 1 ( x 2 | x 1 ) = x 1 exp ( − x 1 x 2 ) . We might expect that these full conditionals define a joint probability density π ( x 1 , x 2 ) . Hammersley-Clifford would give π X 1 | X 2 ( x 1 | z 2 ) π X 2 | X 1 ( x 2 | x 1 ) π ( x 1 , x 2 , ..., x d ) ∝ π X 1 | X 2 ( z 1 | z 2 ) π X 2 | X 1 ( z 2 | x 1 ) = z 2 exp ( − z 2 x 1 ) x 1 exp ( − x 1 x 2 ) z 2 exp ( − z 2 z 1 ) x 1 exp ( − x 1 z 2 ) ∝ exp ( − x 1 x 2 ) . However �� exp ( − x 1 x 2 ) dx 1 dx 2 = ∞ so π X 1 | X 2 ( x 1 | x 2 ) = x 2 exp ( − x 2 x 1 ) and π X 2 | X 1 ( x 1 | x 2 ) = x 1 exp ( − x 1 x 2 ) are not compatible. Lecture 5 MCMC Gibbs Sampling 18 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend