cs70 lecture25
play

CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. - PowerPoint PPT Presentation

CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. Irreducibility 4. Convergence Review Markov Chain: Finite set X ; 0 ; P = { P ( i , j ) , i , j X } ; Pr [ X 0 = i ] = 0 ( i ) , i X Pr [ X n + 1 = j


  1. CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. Irreducibility 4. Convergence

  2. Review ◮ Markov Chain: ◮ Finite set X ; π 0 ; P = { P ( i , j ) , i , j ∈ X } ; ◮ Pr [ X 0 = i ] = π 0 ( i ) , i ∈ X ◮ Pr [ X n + 1 = j | X 0 ,..., X n = i ] = P ( i , j ) , i , j ∈ X , n ≥ 0. ◮ Note: Pr [ X 0 = i 0 , X 1 = i 1 ,..., X n = i n ] = π 0 ( i 0 ) P ( i 0 , i 1 ) ··· P ( i n − 1 , i n ) . ◮ First Passage Time: ◮ A ∩ B = / 0 ; β ( i ) = E [ T A | X 0 = i ]; α ( i ) = P [ T A < T B | X 0 = i ] ◮ β ( i ) = 1 + ∑ j P ( i , j ) β ( j ); ◮ α ( i ) = ∑ j P ( i , j ) α ( j ) . α ( A ) = 1 , α ( B ) = 0.

  3. Distribution of X n X n 0 . 3 3 0 . 7 2 0 . 2 2 0 . 4 1 1 1 3 0 . 6 0 . 8 n n m m + 1 Recall π n is a distribution over states for X n . Stationary distribution: π = π P . Distribution over states is the same before/after transition. probability entering i : ∑ i , j P ( j , i ) π ( j ) . probability leaving i : π i . are Equal! Distribution same after one step. Questions? Does one exist? Is it unique? If it exists and is unique. Then what? Sometimes the distribution as n → ∞

  4. Stationary: Example Example 1: Balance Equations. � � 1 − a a π P = π ⇔ [ π ( 1 ) , π ( 2 )] = [ π ( 1 ) , π ( 2 )] b 1 − b ⇔ π ( 1 )( 1 − a )+ π ( 2 ) b = π ( 1 ) and π ( 1 ) a + π ( 2 )( 1 − b ) = π ( 2 ) ⇔ π ( 1 ) a = π ( 2 ) b . These equations are redundant! We have to add an equation: π ( 1 )+ π ( 2 ) = 1. Then we find b a π = [ a + b , a + b ] .

  5. Stationary distributions: Example 2 � � 1 0 π P = π ⇔ [ π ( 1 ) , π ( 2 )] = [ π ( 1 ) , π ( 2 )] ⇔ π ( 1 ) = π ( 1 ) and π ( 2 ) = π ( 2 ) . 0 1 Every distribution is invariant for this Markov chain. This is obvious, since X n = X 0 for all n . Hence, Pr [ X n = i ] = Pr [ X 0 = i ] , ∀ ( i , n ) . Discussion. We have seen a chain with one stationary, and a chain with many. When is here just one?

  6. Irreducibility. Definition A Markov chain is irreducible if it can go from every state i to every state j (possibly in multiple steps). Examples: 0 . 3 0 . 3 0 . 3 0 . 7 0 . 7 0 . 7 2 2 2 0 . 2 0 . 2 1 1 0 . 4 1 1 3 3 3 1 1 1 1 0 . 6 0 . 8 0 . 8 [B] [C] [A] [A] is not irreducible. It cannot go from (2) to (1). [B] is not irreducible. It cannot go from (2) to (1). [C] is irreducible. It can go from every i to every j . If you consider the graph with arrows when P ( i , j ) > 0, irreducible means that there is a single connected component.

  7. Existence and uniqueness of Invariant Distribution Theorem A finite irreducible Markov chain has one and only one invariant distribution. That is, there is a unique positive vector π = [ π ( 1 ) ,..., π ( K )] such that π P = π and ∑ k π ( k ) = 1. Ok. Now. Only one stationary distribution if irreducible (or connected.)

  8. Long Term Fraction of Time in States Theorem Let X n be an irreducible Markov chain with invariant distribution π . Then, for all i , n − 1 1 ∑ 1 { X m = i } → π ( i ) , as n → ∞ . n m = 0 The left-hand side is the fraction of time that X m = i during steps 0 , 1 ,..., n − 1. Thus, this fraction of time approaches π ( i ) . Proof: Lecture note 24 gives a plausibility argument.

  9. Long Term Fraction of Time in States Theorem Let X n be an irreducible Markov chain with invariant 1 n ∑ n − 1 distribution π . Then, for all i , m = 0 1 { X m = i } → π ( i ) , as n → ∞ . Example 1: The fraction of time in state 1 converges to 1 / 2, which is π ( 1 ) .

  10. Long Term Fraction of Time in States Theorem Let X n be an irreducible Markov chain with invariant 1 n ∑ n − 1 distribution π . Then, for all i , m = 0 1 { X m = i } → π ( i ) , as n → ∞ . Example 2:

  11. Convergence to Invariant Distribution Question: Assume that the MC is irreducible. Does π n approach the unique invariant distribution π ? Answer: Not necessarily. Here is an example: Assume X 0 = 1. Then X 1 = 2 , X 2 = 1 , X 3 = 2 ,... . Thus, if π 0 = [ 1 , 0 ] , π 1 = [ 0 , 1 ] , π 2 = [ 1 , 0 ] , π 3 = [ 0 , 1 ] , etc. Hence, π n does not converge to π = [ 1 / 2 , 1 / 2 ] . Notice, all cycles or closed walks have even length.

  12. Periodicity Definition: Periodicity is gcd of the lengths of all closed walks. Previous example: 2. Definition If periodicity is 1, Markov chain is said to be aperiodic. Otherwise, it is periodic. Example [A]: Closed walks of length 3 and length 4 = ⇒ periodicity = 1. [B]: All closed walks multiple of 3 = ⇒ periodicity =2.

  13. Convergence of π n Theorem Let X n be an irreducible and aperiodic Markov chain with invariant distribution π . Then, for all i ∈ X , π n ( i ) → π ( i ) , as n → ∞ . Example

  14. Convergence of π n Theorem Let X n be an irreducible and aperiodic Markov chain with invariant distribution π . Then, for all i ∈ X , π n ( i ) → π ( i ) , as n → ∞ . Example

  15. Summary Markov Chains ◮ Markov Chain: Pr [ X n + 1 = j | X 0 ,..., X n = i ] = P ( i , j ) ◮ FSE: β ( i ) = 1 + ∑ j P ( i , j ) β ( j ); α ( i ) = ∑ j P ( i , j ) α ( j ) . ◮ π n = π 0 P n ◮ π is invariant iff π P = π ◮ Irreducible ⇒ one and only one invariant distribution π ◮ Irreducible ⇒ fraction of time in state i approaches π ( i ) ◮ Irreducible + Aperiodic ⇒ π n → π . ◮ Calculating π : One finds π = [ 0 , 0 ...., 1 ] Q − 1 where Q = ··· .

  16. CS70: Continuous Probability. Continuous Probability 1 1. Examples 2. Events 3. Continuous Random Variables

  17. Uniformly at Random in [ 0 , 1 ] . Choose a real number X , uniformly at random in [ 0 , 1 ] . What is the probability that X is exactly equal to 1 / 3? Well, ..., 0. What is the probability that X is exactly equal to 0 . 6? Again, 0. In fact, for any x ∈ [ 0 , 1 ] , one has Pr [ X = x ] = 0. How should we then describe ‘choosing uniformly at random in [ 0 , 1 ] ’? Here is the way to do it: Pr [ X ∈ [ a , b ]] = b − a , ∀ 0 ≤ a ≤ b ≤ 1 . Makes sense: b − a is the fraction of [ 0 , 1 ] that [ a , b ] covers.

  18. Uniformly at Random in [ 0 , 1 ] . Let [ a , b ] denote the event that the point X is in the interval [ a , b ] . Pr [[ a , b ]] = length of [ a , b ] length of [ 0 , 1 ] = b − a = b − a . 1 Intervals like [ a , b ] ⊆ Ω = [ 0 , 1 ] are events. More generally, events in this space are unions of intervals. Example: the event A - “within 0 . 2 of 0 or 1” is A = [ 0 , 0 . 2 ] ∪ [ 0 . 8 , 1 ] . Thus, Pr [ A ] = Pr [[ 0 , 0 . 2 ]]+ Pr [[ 0 . 8 , 1 ]] = 0 . 4 . More generally, if A n are pairwise disjoint intervals in [ 0 , 1 ] , then Pr [ ∪ n A n ] := ∑ Pr [ A n ] . n Many subsets of [ 0 , 1 ] are of this form. Thus, the probability of those sets is well defined. We call such sets events.

  19. Uniformly at Random in [ 0 , 1 ] . Note: A radical change in approach. Finite prob. space: Ω = { 1 , 2 ,..., N } , with Pr [ ω ] = p ω . = ⇒ Pr [ A ] = ∑ ω ∈ A p ω for A ⊂ Ω . Continuous space: e.g., Ω = [ 0 , 1 ] , Pr [ ω ] is typically 0. Instead, start with Pr [ A ] for some events A . Event A = interval, or union of intervals.

  20. Uniformly at Random in [ 0 , 1 ] . Pr [ X ≤ x ] = x for x ∈ [ 0 , 1 ] . Also, Pr [ X ≤ x ] = 0 for x < 0. Pr [ X ≤ x ] = 1 for .2 x > 1. Define F ( x ) = Pr [ X ≤ x ] . Then we have Pr [ X ∈ ( a , b ]] = Pr [ X ≤ b ] − Pr [ X ≤ a ] = F ( b ) − F ( a ) . Thus, F ( · ) specifies the probability of all the events!

  21. Uniformly at Random in [ 0 , 1 ] . Pr [ X ∈ ( a , b ]] = Pr [ X ≤ b ] − Pr [ X ≤ a ] = F ( b ) − F ( a ) . An alternative view is to define f ( x ) = d dx F ( x ) = 1 { x ∈ [ 0 , 1 ] } . Then � b F ( b ) − F ( a ) = a f ( x ) dx . Thus, the probability of an event is the integral of f ( x ) over the event: � Pr [ X ∈ A ] = A f ( x ) dx .

  22. Uniformly at Random in [ 0 , 1 ] . Think of f ( x ) as describing how one unit of probability is spread over [ 0 , 1 ] : uniformly! Then Pr [ X ∈ A ] is the probability mass over A . Observe: ◮ This makes the probability automatically additive. � ∞ ◮ We need f ( x ) ≥ 0 and − ∞ f ( x ) dx = 1.

  23. Uniformly at Random in [ 0 , 1 ] . Discrete Approximation: Fix N ≫ 1 and let ε = 1 / N . Define Y = n ε if ( n − 1 ) ε < X ≤ n ε for n = 1 ,..., N . Then | X − Y | ≤ ε and Y is discrete: Y ∈ { ε , 2 ε ,..., N ε } . Also, Pr [ Y = n ε ] = 1 N for n = 1 ,..., N . Thus, X is ‘almost discrete.’

  24. Nonuniformly at Random in [ 0 , 1 ] . � ∞ This figure shows a different choice of f ( x ) ≥ 0 with − ∞ f ( x ) dx = 1. It defines another way of choosing X at random in [ 0 , 1 ] . Note that X is more likely to be closer to 1 than to 0. � x − ∞ f ( u ) du = x 2 for x ∈ [ 0 , 1 ] . One has Pr [ X ≤ x ] = � x + ε Also, Pr [ X ∈ ( x , x + ε )] = f ( u ) du ≈ f ( x ) ε . x

  25. Another Nonuniform Choice at Random in [ 0 , 1 ] . This figure shows yet a different choice of f ( x ) ≥ 0 with � ∞ − ∞ f ( x ) dx = 1. It defines another way of choosing X at random in [ 0 , 1 ] . Note that X is more likely to be closer to 1 / 2 than to 0 or 1. � 1 / 3 x 2 � 1 / 3 = 2 � For instance, Pr [ X ∈ [ 0 , 1 / 3 ]] = 4 xdx = 2 9 . 0 0 Thus, Pr [ X ∈ [ 0 , 1 / 3 ]] = Pr [ X ∈ [ 2 / 3 , 1 ]] = 2 9 and Pr [ X ∈ [ 1 / 3 , 2 / 3 ]] = 5 9 .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend