advanced simulation lecture 5
play

Advanced Simulation - Lecture 5 Patrick Rebeschini January 29th, - PowerPoint PPT Presentation

Advanced Simulation - Lecture 5 Patrick Rebeschini January 29th, 2018 Patrick Rebeschini Lecture 5 1/ 23 Limits of standard Monte Carlo methods Monte Carlo methods yield convergence rates in 1 / n , which is independent of the dimension d


  1. Advanced Simulation - Lecture 5 Patrick Rebeschini January 29th, 2018 Patrick Rebeschini Lecture 5 1/ 23

  2. Limits of standard Monte Carlo methods Monte Carlo methods yield convergence rates in 1 / √ n , which is independent of the dimension d . On close inspection, the error still depends on d , through the constant in front of the rate. Unfortunately that “constant” (in n ) typically explodes exponentially with d . Markov chain Monte Carlo methods yield errors which explodes only polynomially in d , at least under some conditions. Patrick Rebeschini Lecture 5 2/ 23

  3. Markov chain Monte Carlo Revolutionary idea introduced by Metropolis et al., J. Chemical Physics, 1953. Key idea : Given a target distribution π , build a Markov chain ( X t ) t ≥ 1 such that, as t → ∞ , X t ∼ π and � n � 1 ϕ ( X t ) → ϕ ( x ) π ( x ) dx n t =1 when n → ∞ e.g. almost surely. Also central limit theorems with a rate in 1 / √ n . Patrick Rebeschini Lecture 5 3/ 23

  4. Markov chains - discrete space Let X be discrete, e.g. X = Z . ( X t ) t ≥ 1 is a Markov chain if P ( X t = x t | X 1 = x 1 , ..., X t − 1 = x t − 1 ) = P ( X t = x t | X t − 1 = x t − 1 ) . Homogeneous Markov chains: ∀ m ∈ N : P ( X t = y | X t − 1 = x ) = P ( X t + m = y | X t + m − 1 = x ) . The Markov transition kernel is K ( i, j ) = K ij = P ( X t = j | X t − 1 = i ) . Patrick Rebeschini Lecture 5 4/ 23

  5. Markov chains - discrete space Let µ t ( x ) = P ( X t = x ), the chain rule yields t � P ( X 1 = x 1 , X 2 = x 2 , ..., X t = x t ) = µ 1 ( x 1 ) K x i − 1 x i . i =2 The m -transition matrix K m as K m ij = P ( X t + m = j | X t = i ) . Chapman-Kolmogorov equation: � K m + n K m ik K n = kj . ij k ∈ X We obtain � µ t +1 ( j ) = µ t ( i ) K ij i i.e. using “linear algebra notation”, µ t +1 = µ t K. Patrick Rebeschini Lecture 5 5/ 23

  6. Irreducibility and aperiodicity A Markov chain is said to be irreducible if all the states communicate with each other, that is � � t : K t ∀ x, y ∈ X inf xy > 0 < ∞ . A state x has period d ( x ) defined as d ( x ) = gcd { s ≥ 1 : K s xx > 0 } . An irreducible chain is aperiodic if all states have period 1. � � θ 1 − θ Example: K θ = is irreducible if 1 − θ θ θ ∈ [0 , 1) and aperiodic if θ ∈ (0 , 1). If θ = 0, the gcd is 2. Patrick Rebeschini Lecture 5 6/ 23

  7. Transience and recurrence Introduce the number of visits to x : ∞ � η x := 1 x ( X k ) . k =1 For a Markov chain, a state x is termed transient if: E x ( η x ) < ∞ , where E x refers to the law of the chain starting from x . A state is called recurrent otherwise and E x ( η x ) = ∞ . Patrick Rebeschini Lecture 5 7/ 23

  8. Invariant distribution Definition: A distribution π is invariant for a Markov kernel K , if πK = π. Note: if there exists t such that X t ∼ π , then X t + s ∼ π for all s ∈ N . Example: for any θ ∈ [0 , 1] � � θ 1 − θ K θ = 1 − θ θ admits � � 1 1 π = 2 2 as invariant distribution. Patrick Rebeschini Lecture 5 8/ 23

  9. Detailed balance A Markov kernel K satisfies detailed balance for π if ∀ x, y ∈ X : π ( x ) K xy = π ( y ) K yx . Lemma : If K satisfies detailed balance for π then K is π -invariant. If K satisfies detailed balance for π then the Markov chain is reversible, i.e. at stationarity, ∀ x, y ∈ X : P ( X t = x | X t +1 = y ) = P ( X t = x | X t − 1 = y ) . Patrick Rebeschini Lecture 5 9/ 23

  10. Lack of reversibility   1 / 3 1 / 3 1 / 3   Let P = 1 0 0  .  0 1 0 Check πP = π for π = (1 / 2 , 1 / 3 , 1 / 6). P cannot be π reversible as 1 → 3 → 2 → 1 is a possible sequence whereas 1 → 2 → 3 → 1 is not (as P 2 , 3 = 0). Detailed balance does not hold as π 2 P 23 = 0 � = π 3 P 32 . Patrick Rebeschini Lecture 5 10/ 23

  11. Remarks All finite space Markov chains have at least one stationary distribution but not all stationary distributions are also limiting distributions.   0 . 4 0 . 6 0 0   0 . 2 0 . 8 0 0   P =   0 0 0 . 4 0 . 6   0 0 0 . 2 0 . 8 Two left eigenvectors of eigenvalue 1: π 1 = (1 / 4 , 3 / 4 , 0 , 0) , π 2 = (0 , 0 , 1 / 4 , 3 / 4) depending on the initial state, two different stationary distributions. Patrick Rebeschini Lecture 5 11/ 23

  12. Equilibrium Proposition : If a discrete space Markov chain is aperiodic and irreducible, and has an invariant distribution, then ∀ x ∈ X P µ ( X t = x ) − t →∞ π ( x ) , − − → for any starting distribution µ . In the Monte Carlo perspective, we will be primarily interested in convergence of empirical averages, such as n � � I n = 1 a.s. � − − − → ϕ ( X t ) n →∞ I = ϕ ( x ) π ( x ) . n t =1 x ∈ X Before turning to these “ergodic theorems”, let us consider continuous spaces. Patrick Rebeschini Lecture 5 12/ 23

  13. Markov chains - continuous space The state space X is now continuous, e.g. R d . ( X t ) t ≥ 1 is a Markov chain if for any (measurable) set A , P ( X t ∈ A | X 1 = x 1 , X 2 = x 2 , ..., X t − 1 = x t − 1 ) = P ( X t ∈ A | X t − 1 = x t − 1 ) . We have � P ( X t ∈ A | X t − 1 = x ) = K ( x, y ) dy = K ( x, A ) , A that is conditional on X t − 1 = x , X t is a random variable which admits a probability density function K ( x, · ). K : X 2 → R is the kernel of the Markov chain. Patrick Rebeschini Lecture 5 13/ 23

  14. Markov chains - continuous space Denoting µ 1 the pdf of X 1 , we obtain directly � t � P ( X 1 ∈ A 1 , ..., X t ∈ A t ) = µ 1 ( x 1 ) K ( x k − 1 , x k ) dx 1 · · · dx t . A 1 ×···× A t k =2 Denoting by µ t the distribution of X t , Chapman-Kolmogorov equation reads � µ t ( y ) = µ t − 1 ( x ) K ( x, y ) dx X and similarly for m > 1 � µ t ( x ) K m ( x, y ) dx µ t + m ( y ) = X where � t + m � K m ( x t , x t + m ) = K ( x k − 1 , x k ) dx t +1 · · · dx t + m − 1 . X m − 1 k = t +1 Patrick Rebeschini Lecture 5 14/ 23

  15. Example Consider the autoregressive (AR) model X t = ρX t − 1 + V t � 0 , τ 2 � . This defines a Markov process such i.i.d. where V t ∼ N that � � 1 − 1 2 τ 2 ( y − ρx ) 2 √ K ( x, y ) = 2 πτ 2 exp . We also have m � X t + m = ρ m X t + ρ m − k V t + k k =1 so in the Gaussian case � � ( y − ρ m x ) 2 1 − 1 K m ( x, y ) = exp � τ 2 2 πτ 2 2 m m m = τ 2 � m � ρ 2 � m − k = τ 2 1 − ρ 2 m with τ 2 1 − ρ 2 . k =1 Patrick Rebeschini Lecture 5 15/ 23

  16. Irreducibility and aperiodicity Given a distribution µ over X , a Markov chain is µ -irreducible if K t ( x, A ) > 0 . ∀ x ∈ X ∀ A : µ ( A ) > 0 ∃ t ∈ N A µ -irreducible Markov chain of transition kernel K is periodic if there exists some partition of the state space X 1 , ..., X d for d ≥ 2, such that � 1 j = i + s mod d ∀ i, j, t, s : P ( X t + s ∈ X j | X t ∈ X i ) = . 0 otherwise. Otherwise the chain is aperiodic. Patrick Rebeschini Lecture 5 16/ 23

  17. Recurrence and Harris Recurrence For any measurable set A of X , let ∞ � η A = I A ( X k ) . k =1 A µ -irreducible Markov chain is recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ A E x ( η A ) = ∞ . A µ -irreducible Markov chain is Harris recurrent if for any measurable set A ⊂ X : µ ( A ) > 0, then ∀ x ∈ X P x ( η A = ∞ ) = 1 . Harris recurrence is stronger than recurrence. Patrick Rebeschini Lecture 5 17/ 23

  18. Invariant Distribution and Reversibility A distribution of density π is invariant or stationary for a Markov kernel K , if � π ( x ) K ( x, y ) dx = π ( y ) . X A Markov kernel K is π -reversible if � � ∀ f f ( x, y ) π ( x ) K ( x, y ) dxdy � � = f ( y, x ) π ( x ) K ( x, y ) dxdy where f is a bounded measurable function. Patrick Rebeschini Lecture 5 18/ 23

  19. Detailed balance In practice it is easier to check the detailed balance condition: ∀ x, y ∈ X π ( x ) K ( x, y ) = π ( y ) K ( y, x ) Lemma: If detailed balance holds, then π is invariant for K and K is π -reversible. Example: the Gaussian AR process is π -reversible, π -invariant for � � τ 2 π ( x ) = N x ; 0 , 1 − ρ 2 when | ρ | < 1. Patrick Rebeschini Lecture 5 19/ 23

  20. Selected asymptotic results Theorem . If K is a π -irreducible, π -invariant Markov kernel, then for any integrable function ϕ : X → R : � t � 1 lim ϕ ( X i ) = ϕ ( x ) π ( x ) dx t t →∞ X i =1 almost surely, for π − almost all starting value x . Theorem . If K is a π -irreducible, π -invariant, Harris recurrent Markov chain, then for any integrable function ϕ : X → R : � t � 1 lim ϕ ( X i ) = ϕ ( x ) π ( x ) dx t t →∞ X i =1 almost surely, for any starting value x . Patrick Rebeschini Lecture 5 20/ 23

  21. Selected asymptotic results Theorem . Suppose the kernel K is π -irreducible, π -invariant, aperiodic. Then, we have � � � � � K t ( x, y ) − π ( y ) � lim � dy = 0 t →∞ X for π − almost all starting value x . Under some additional conditions, one can prove that a chain is geometrically ergodic, i.e. there exists ρ < 1 and a function M : X → R + such that for all measurable set A : | K n ( x, A ) − π ( A ) | ≤ M ( x ) ρ n , for all n ∈ N . In other words, we can obtain a rate of convergence. Patrick Rebeschini Lecture 5 21/ 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend