Markov Chains DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Employment State 1 (student) is transient � � X ( j ) � = 1 for all j > i | � � X ( i ) = 1 P

Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0

Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P

Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3

Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3 � � � � k � X ( i + 1 ) = 4 | � � X ( i + j + 1 ) = 4 | � � = lim k →∞ P X ( i ) = 3 P X ( i + j ) = 4 j = 1

Employment State 1 (student) is transient � � � � X ( j ) � = 1 for all j > i | � � X ( i + 1 ) = 3 | � � X ( i ) = 1 ≥ P X ( i ) = 1 P = 0 . 1 > 0 State 3 (employed) is recurrent � � X ( j ) � = 3 for all j > i | � � X ( i ) = 3 P � � X ( j ) = 4 for all j > i | � � = P X ( i ) = 3 � � � � k � X ( i + 1 ) = 4 | � � X ( i + j + 1 ) = 4 | � � = lim k →∞ P X ( i ) = 3 P X ( i + j ) = 4 j = 1 k →∞ 0 . 1 · 0 . 6 k = 0 = lim

Irreducible Markov chain A Markov chain is irreducible if for any state x and y � = x there exists m ≥ 0 such that � � X ( i + m ) = y | � � > 0 P X ( i ) = x All states in an irreducible Markov chain are recurrent

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

Period of a state The period m of a state x of a Markov chain � X is the largest integer such that the chain always takes km steps (for a positive integer k ) to return to x

Period of a state 1 0.9 A B C 0.1 1

Aperiodic chain A Markov chain � X is aperiodic if all states have period equal to one

Convergence in distribution A Markov chain converges in distribution if the state vector converges to a constant vector � p ∞ := lim i →∞ � p � X ( i ) i →∞ T i = lim X � p � � X ( 0 )

Mobile phones ◮ Company releases new mobile-phone model ◮ At the moment 90% of the phones are in stock, 10% have been sold locally and none have been exported ◮ Each day a phone is sold with probability 0.2 and exported with probability 0.1 ◮ Initial state vector and transition matrix:     0 . 9 0 . 7 0 0         a := �  , T � X =  0 . 1   0 . 2 1 0     0 0 . 1 0 1

Mobile phones 1 1 Exported Sold 0.7 0.2 0.1 In stock

Mobile phones Exported Sold In stock 0 5 10 15 20 Day

Mobile phones The company wants to know how many phones are eventually sold locally and how many exported i →∞ T i i →∞ � lim X ( i ) = lim X � p � p � � X ( 0 ) i →∞ T i = lim X � a �

Mobile phones The transition matrix T � X has three eigenvectors       0 0 0 . 80  ,  ,     � q 1 := 0 � q 2 := 1 � q 3 := − 0 . 53 1 0 0 . 27 The corresponding eigenvalues are λ 1 := 1, λ 2 := 1 and λ 3 := 0 . 7 Eigendecomposition of T � X : X := Q Λ Q − 1 T �   λ 1 0 0 � � �   Q := q 1 q 2 � q 3 � Λ := 0 λ 2 0 0 0 λ 3

Mobile phones We express the initial state vector � a in terms of the eigenvectors   0 . 3 Q − 1 �   0 . 7 p � X ( 0 ) = 1 . 122 so that � a = 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3

Mobile phones i →∞ T i lim X � a �

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � �

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � �

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3 = 0 . 3 � q 1 + 0 . 7 � q 2

Mobile phones i →∞ T i i →∞ T i lim X � a = lim X ( 0 . 3 � q 1 + 0 . 7 � q 2 + 1 . 122 � q 3 ) � � i →∞ 0 . 3 T i q 1 + 0 . 7 T i q 2 + 1 . 122 T i = lim X � X � X � q 3 � � � i →∞ 0 . 3 λ i q 1 + 0 . 7 λ i q 2 + 1 . 122 λ i = lim 1 � 2 � 3 � q 3 q 2 + 1 . 122 0 . 5 i � = lim i →∞ 0 . 3 � q 1 + 0 . 7 � q 3 = 0 . 3 � q 1 + 0 . 7 � q 2   0   = 0 . 7 0 . 3

Mobile phones 1.0 In stock Sold 0.8 Exported 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

Mobile phones   0 � �   Q − 1 �  p �  i →∞ T i lim X � p � X ( 0 ) = X ( 0 )   � � � 2 Q − 1 � p � X ( 0 ) 1     0 . 6 0 . 6 �  , Q − 1 �    b := 0 b = 0 . 4 (1) 0 . 4 0 . 75     0 . 4 0 . 23  , Q − 1 �    c := � 0 . 5 c = 0 . 77 (2) 0 . 1 0 . 50

Initial state vector � b 1.0 In stock Sold 0.8 Exported 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

Initial state vector � c 1.0 0.8 0.6 0.4 0.2 0.0 0 5 10 15 20 Day

Stationary distribution p stat is a stationary distribution of � � X if X � p stat = � T � p stat � p stat is an eigenvector with eigenvalue equal to one If � p stat is the initial state i →∞ � lim p � X ( i ) = � p stat

Reversibility Let � p ∈ R s X ( i ) be distributed according to a state vector � ( s = number of states) � X is reversible with respect to � p if � � � � X ( i ) = x j , � � X ( i ) = x k , � � X ( i + 1 ) = x k = P X ( i + 1 ) = x j P for all 1 ≤ j , k ≤ s This is equivalent to the detailed-balance condition � � � � T � kj � p j = T � jk � p k , for all 1 ≤ j , k ≤ s X X

Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X � � X � T � p j

Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1

Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1

Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1 s � � � = � p j T � X kj k = 1

Reversibility implies stationarity The detailed-balance condition provides a sufficient condition for stationarity If � p is a stationary distribution of � X is reversible with respect to � p , then � X s � � � � � X � j = jk � T � p T � p k X k = 1 s � � � = T � kj � p j X k = 1 s � � � = � p j T � X kj k = 1 = � p j

Irreducible chains Irreducible Markov chains have a single stationary distribution Follows from the Perron-Frobenius theorem: ◮ The transition matrix of an irreducible Markov chain has a single eigenvector with eigenvalue equal to one ◮ The eigenvector has nonnegative entries

Irreducible chains If � X is irreducible and aperiodic, its state vector converges to its stationary distribution � p stat for any initial state vector � p � X ( 0 ) � X converges in distribution to a random variable with pmf given by � p stat

Car rental Aim: Model location of cars 3 states: Los Angeles, San Francisco, San Jose New cars are uniformly distributed between the 3 states After that the transition probabilities are San Francisco Los Angeles San Jose � � 0.6 0.1 0.3 San Francisco 0.2 0.8 0.3 Los Angeles 0.2 0.1 0.4 San Jose

Car rental What is the proportion of cars in each city eventually? Does this depend on the initial allocation?

Car rental Markov chain with     1 / 3 0 . 6 0 . 1 0 . 3         p � � X ( 0 ) := T :=  1 / 3   0 . 2 0 . 8 0 . 3      1 / 3 0 . 2 0 . 1 0 . 4

Car rental 0.8 LA 0.2 0.1 0.1 0.3 SF SJ 0.3 0.2 0.6 0.4

Car rental The transition matrix has the following eigenvectors       0 . 273 − 0 . 577 − 0 . 577   ,   ,   q 1 := � 0 . 545 q 2 := � 0 . 789 q 3 := � − 0 . 211 0 . 182 − 0 . 211 0 . 789 The eigenvalues are λ 1 := 1, λ 2 := 0 . 573 and λ 3 := 0 . 227 No matter how the cars are allocated, 27 . 3 % end up in San Francisco, 54 . 5 % in LA and 18 . 2 % in San Jose

Car rental 1.0 SF LA 0.8 SJ 0.6 0.4 0.2 0.0 0 5 10 15 20 Customer

Markov-chain Monte Carlo Irreducible aperiodic Markov chains converge to a unique stationary distribution Basic idea: Simulate a Markov chain that converges to the target distribution Very useful in Bayesian statistics Main challenge: Designing the Markov chain so the stationary distribution is the one we want

Metropolis-Hastings algorithm Aim: Construct a Markov chain such that its stationary distribution is p ∈ R s � � p j := p X ( x j ) , 1 ≤ j ≤ s Idea: Sample from an irreducible Markov chain with transition matrix T on the same state space { x 1 , . . . , x s } , forcing it to converge to � p

Metropolis-Hastings algorithm Initialize � X ( 0 ) to an arbitrary value, then for i = 1 , 2 , 3 , . . . 1. Generate C from � X ( i − 1 ) by using T , i.e. � � C = k | � X ( i − 1 ) = j = T kj , 1 ≤ j , k ≤ s P 2. Set � � � � C with probability p acc X ( i − 1 ) , C � X ( i ) := � X ( i − 1 ) otherwise where the acceptance probability is defined as � T jk � � p k p acc ( j , k ) := min , 1 1 ≤ j , k ≤ s T kj � p j

Reversibility implies stationarity Let � p ∈ R s X ( i ) be distributed according to a state vector � � X is reversible with respect to � p if for all 1 ≤ j , k ≤ s � � � � X ( i ) = x j , � � X ( i ) = x k , � � P X ( i + 1 ) = x k = P X ( i + 1 ) = x j Equivalent to the detailed-balance condition � � � � kj � p j = jk � p k , for all 1 ≤ j , k ≤ s T � T � X X If � p is a stationary distribution of � X is reversible with respect to � p , then � X

Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X

Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j

Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P

Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P = p acc ( j , k ) T kj

Reversibility of the Metropolis-Hastings chain Holds if j = k . Assume j � = k � � � � X ( i ) = k | � � T � kj := P X ( i − 1 ) = j X � � X ( i ) = C , C = k | � � = P X ( i − 1 ) = j � � � � X ( i ) = C | C = k , � � C = k | � = P X ( i − 1 ) = j X ( i − 1 ) = j P = p acc ( j , k ) T kj � � Similarly, T � jk = p acc ( k , j ) T jk X

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j }

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k = p acc ( k , j ) T jk � p k

Reversibility of the Metropolis-Hastings chain � � T � kj � p j = p acc ( j , k ) T kj � p j X � T jk � � p k = T kj � p j min , 1 T kj � p j = min { T jk � p k , T kj � p j } � � 1 , T kj � p j = T jk � p k min T jk � p k = p acc ( k , j ) T jk � p k � � = T � jk � p k X

Generating a Poisson random variable Aim: Generate a Poisson random variable X We don’t need to know the normalizing constant, just that p X ( x ) ∝ λ x x !

Markov Chains DS GA 1002 Probability and Statistics for Data Science - PowerPoint PPT Presentation

Markov Chains DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo Markov property The

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Imprecise Markov chains From basic theory to applications II prof. Jasper De Bock Imprecise

Overview Motivation Verifying Continuous-Time Markov Chains 1 Lecture 1+2: Discrete-Time Markov

Discrete time Markov chains Today: Discrete Time Markov Chains, Limiting Discrete time Markov

Discrete Time Markov Chains Discrete-Time Markov Chains Books - Introduction to Stochastic

Overview Verifying Continuous-Time Markov Chains Negative exponential distributions 1 Lecture

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

Simulation of Discrete-Time Markov Chains Discrete-Time Markov Chains (DTMCs) Numerical Solution

Under Interval and Fuzzy From the . . . Symmetric Markov Chains Uncertainty, Symmetric In

Stochastic Processes Markov Processes Hamid R. Rabiee 1 Overview o Markov Property o Markov

Markov chains and MCMC methods Ingo Blechschmidt November 7th, 2014 Kleine Bayessche AG Markov

Markov chains Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi

Information Economics The Signaling Theory Ling-Chieh Kung Department of Information Management

Computer Security David Wagner, C79, 4/4/2013 Thursday, April 4, 13 themes so far: - measuring

Penetra'on Tes'ng Considered Harmful (haroon@thinkst.com) Who i am..

O nline Reput at ion Syst ems Roger D ingledine T he Free Haven Project 1 B ased on linkability

Markov Chains and MCMC CompSci 590.04 Instructor:

Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 89 in

Stein Point Markov Chain Monte Carlo Wilson Chen Institute of Statistical Mathematics, Japan

Bayesian inference and mathematical imaging. Part II: Markov chain Monte Carlo. Dr. Marcelo