markov chains
play

Markov Chains DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Markov Chains DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo Markov property The


  1. Employment State 1 (student) is transient � � X ( j ) � = 1 for all j > i | � � X ( i ) = 1 P

  2. Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0

  3. Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P

  4. Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3

  5. Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3 � � � � k � X ( i + 1 ) = 4 | � � X ( i + j + 1 ) = 4 | � � = lim k →∞ P X ( i ) = 3 P X ( i + j ) = 4 j = 1

  6. Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3 � � � � k � X ( i + 1 ) = 4 | � � X ( i + j + 1 ) = 4 | � � = lim k →∞ P X ( i ) = 3 P X ( i + j ) = 4 j = 1 k →∞ 0 . 1 · 0 . 6 k = 0 = lim

  7. Irreducible Markov chain A Markov chain is irreducible if for any state x and y � = x there exists m ≥ 0 such that � � X ( i + m ) = y | � � > 0 P X ( i ) = x All states in an irreducible Markov chain are recurrent

  8. Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

  9. Period of a state The period m of a state x of a Markov chain � X is the largest integer such that the chain always takes km steps (for a positive integer k ) to return to x

  10. Period of a state 1 0.9 A B C 0.1 1

  11. Aperiodic chain A Markov chain � X is aperiodic if all states have period equal to one

  12. Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

  13. Convergence in distribution A Markov chain converges in distribution if the state vector converges to a constant vector � p ∞ := lim i →∞ � p � X ( i ) i →∞ T i = lim X � p � � X ( 0 )

  14. Mobile phones ◮ Company releases new mobile-phone model ◮ At the moment 90% of the phones are in stock, 10% have been sold locally and none have been exported ◮ Each day a phone is sold with probability 0.2 and exported with probability 0.1 ◮ Initial state vector and transition matrix:     0 . 9 0 . 7 0 0         a := �  , T � X =  0 . 1   0 . 2 1 0     0 0 . 1 0 1

  15. Mobile phones 1 1 Exported Sold 0.7 0.2 0.1 In stock

  16. Mobile phones Exported Sold In stock 0 5 10 15 20 Day

  17. Mobile phones Exported Sold In stock 0 5 10 15 20 Day

  18. Mobile phones Exported Sold In stock 0 5 10 15 20 Day

  19. Mobile phones The company wants to know how many phones are eventually sold locally and how many exported i →∞ T i i →∞ � lim X ( i ) = lim X � p � p � � X ( 0 ) i →∞ T i = lim X � a �

  20. Mobile phones The transition matrix T � X has three eigenvectors       0 0 0 . 80  ,  ,     � q 1 := 0 � q 2 := 1 � q 3 := − 0 . 53 1 0 0 . 27 The corresponding eigenvalues are λ 1 := 1, λ 2 := 1 and λ 3 := 0 . 7 Eigendecomposition of T � X : X := Q Λ Q − 1 T �   λ 1 0 0 � � �   Q := q 1 q 2 � q 3 � Λ := 0 λ 2 0 0 0 λ 3

  21. Mobile phones We express the initial state vector � a in terms of the eigenvectors   0 . 3 Q − 1 �   0 . 7 p � X ( 0 ) = 1 . 122 so that � a = 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3

  22. Mobile phones i →∞ T i lim X � a �

  23. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � �

  24. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � �

  25. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3

  26. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3

  27. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3 = 0 . 3 � q 1 + 0 . 7 � q 2

  28. Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3 = 0 . 3 � q 1 + 0 . 7 � q 2   0   = 0 . 7 0 . 3

  29. Mobile phones 1.0 In stock Sold 0.8 Exported 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

  30. Mobile phones   0 � �   Q − 1 �  p �  i →∞ T i lim X � p � X ( 0 ) = X ( 0 )   � � � 2 Q − 1 � p � X ( 0 ) 1     0 . 6 0 . 6 �  , Q − 1 �    b := 0 b = 0 . 4 (1) 0 . 4 0 . 75     0 . 4 0 . 23  , Q − 1 �    c := � 0 . 5 c = 0 . 77 (2) 0 . 1 0 . 50

  31. Initial state vector � b 1.0 In stock Sold 0.8 Exported 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

  32. Initial state vector � c 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

  33. Stationary distribution p stat is a stationary distribution of � � X if X � p stat = � T � p stat � p stat is an eigenvector with eigenvalue equal to one If � p stat is the initial state i →∞ � lim p � X ( i ) = � p stat

  34. Reversibility Let � p ∈ R s X ( i ) be distributed according to a state vector � ( s = number of states) � X is reversible with respect to � p if � � � � X ( i ) = x j , � � X ( i ) = x k , � � X ( i + 1 ) = x k = P X ( i + 1 ) = x j P for all 1 ≤ j , k ≤ s This is equivalent to the detailed-balance condition � � � � T � kj � p j = T � jk � p k , for all 1 ≤ j , k ≤ s X X

  35. Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X � � X � T � p j

  36. Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1

  37. Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1

  38. Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1 s � � � = � p j T � X kj k = 1

  39. Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1 s � � � = � p j T � X kj k = 1 = � p j

  40. Irreducible chains Irreducible Markov chains have a single stationary distribution Follows from the Perron-Frobenius theorem: ◮ The transition matrix of an irreducible Markov chain has a single eigenvector with eigenvalue equal to one ◮ The eigenvector has nonnegative entries

  41. Irreducible chains If � X is irreducible and aperiodic, its state vector converges to its stationary distribution � p stat for any initial state vector � p � X ( 0 ) � X converges in distribution to a random variable with pmf given by � p stat

  42. Car rental Aim: Model location of cars 3 states: Los Angeles, San Francisco, San Jose New cars are uniformly distributed between the 3 states After that the transition probabilities are San Francisco Los Angeles San Jose � � 0.6 0.1 0.3 San Francisco 0.2 0.8 0.3 Los Angeles 0.2 0.1 0.4 San Jose

  43. Car rental What is the proportion of cars in each city eventually? Does this depend on the initial allocation?

  44. Car rental Markov chain with     1 / 3 0 . 6 0 . 1 0 . 3         p � � X ( 0 ) := T :=  1 / 3   0 . 2 0 . 8 0 . 3      1 / 3 0 . 2 0 . 1 0 . 4

  45. Car rental 0.8 LA 0.2 0.1 0.1 0.3 SF SJ 0.3 0.2 0.6 0.4

  46. Car rental The transition matrix has the following eigenvectors       0 . 273 − 0 . 577 − 0 . 577   ,   ,   q 1 := � 0 . 545 q 2 := � 0 . 789 q 3 := � − 0 . 211 0 . 182 − 0 . 211 0 . 789 The eigenvalues are λ 1 := 1, λ 2 := 0 . 573 and λ 3 := 0 . 227 No matter how the cars are allocated, 27 . 3 % end up in San Francisco, 54 . 5 % in LA and 18 . 2 % in San Jose

  47. Car rental 1.0 SF LA 0.8 SJ 0.6 0.4 0.2 0.0 0 5 10 15 20 Customer

  48. Car rental 1.0 SF LA 0.8 SJ 0.6 0.4 0.2 0.0 0 5 10 15 20 Customer

  49. Car rental 1.0 SF LA 0.8 SJ 0.6 0.4 0.2 0.0 0 5 10 15 20 Customer

  50. Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

  51. Markov-chain Monte Carlo Irreducible aperiodic Markov chains converge to a unique stationary distribution Basic idea: Simulate a Markov chain that converges to the target distribution Very useful in Bayesian statistics Main challenge: Designing the Markov chain so the stationary distribution is the one we want

  52. Metropolis-Hastings algorithm Aim: Construct a Markov chain such that its stationary distribution is p ∈ R s � � p j := p X ( x j ) , 1 ≤ j ≤ s Idea: Sample from an irreducible Markov chain with transition matrix T on the same state space { x 1 , . . . , x s } , forcing it to converge to � p

  53. Metropolis-Hastings algorithm Initialize � X ( 0 ) to an arbitrary value, then for i = 1 , 2 , 3 , . . . 1. Generate C from � X ( i − 1 ) by using T , i.e. � � C = k | � X ( i − 1 ) = j = T kj , 1 ≤ j , k ≤ s P 2. Set � � � � C with probability p acc X ( i − 1 ) , C � X ( i ) := � X ( i − 1 ) otherwise where the acceptance probability is defined as � T jk � � p k p acc ( j , k ) := min , 1 1 ≤ j , k ≤ s T kj � p j

  54. Reversibility implies stationarity Let � p ∈ R s X ( i ) be distributed according to a state vector � � X is reversible with respect to � p if for all 1 ≤ j , k ≤ s � � � � X ( i ) = x j , � � X ( i ) = x k , � � P X ( i + 1 ) = x k = P X ( i + 1 ) = x j Equivalent to the detailed-balance condition � � � � kj � p j = jk � p k , for all 1 ≤ j , k ≤ s T � T � X X If � p is a stationary distribution of � X is reversible with respect to � p , then � X

  55. Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X

  56. Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j

  57. Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P

  58. Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P = p acc ( j , k ) T kj

  59. Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P = p acc ( j , k ) T kj � � Similarly, T � jk = p acc ( k , j ) T jk X

  60. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X

  61. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j

  62. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j }

  63. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k

  64. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k = p acc ( k , j ) T jk � p k

  65. Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k = p acc ( k , j ) T jk � p k � � = T � jk � p k X

  66. Generating a Poisson random variable Aim: Generate a Poisson random variable X We don’t need to know the normalizing constant, just that p X ( x ) ∝ λ x x !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend