CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. - - PowerPoint PPT Presentation

cs70 lecture25
SMART_READER_LITE
LIVE PREVIEW

CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. - - PowerPoint PPT Presentation

CS70: Lecture25. Markov Chains 1.5 1. Review 2. Distribution 3. Irreducibility 4. Convergence Review Markov Chain: Finite set X ; 0 ; P = { P ( i , j ) , i , j X } ; Pr [ X 0 = i ] = 0 ( i ) , i X Pr [ X n + 1 = j


slide-1
SLIDE 1

CS70: Lecture25.

Markov Chains 1.5

  • 1. Review
  • 2. Distribution
  • 3. Irreducibility
  • 4. Convergence
slide-2
SLIDE 2

Review

◮ Markov Chain:

◮ Finite set X ; π0; P = {P(i,j),i,j ∈ X }; ◮ Pr[X0 = i] = π0(i),i ∈ X ◮ Pr[Xn+1 = j | X0,...,Xn = i] = P(i,j),i,j ∈ X ,n ≥ 0. ◮ Note:

Pr[X0 = i0,X1 = i1,...,Xn = in] = π0(i0)P(i0,i1)···P(in−1,in).

◮ First Passage Time:

◮ A∩B = /

0;β(i) = E[TA|X0 = i];α(i) = P[TA < TB|X0 = i]

◮ β(i) = 1+∑j P(i,j)β(j); ◮ α(i) = ∑j P(i,j)α(j). α(A) = 1,α(B) = 0.

slide-3
SLIDE 3

Distribution of Xn

1 0.8 1 2 3 0.7 0.3 0.6 0.4 0.2

1 2 3 n Xn n

m m + 1

Recall πn is a distribution over states for Xn. Stationary distribution: π = πP. Distribution over states is the same before/after transition. probability entering i: ∑i,j P(j,i)π(j). probability leaving i: πi. are Equal! Distribution same after one step. Questions? Does one exist? Is it unique? If it exists and is unique. Then what? Sometimes the distribution as n → ∞

slide-4
SLIDE 4

Stationary: Example

Example 1: Balance Equations. πP = π ⇔ [π(1),π(2)]

  • 1−a

a b 1−b

  • = [π(1),π(2)]

⇔ π(1)(1−a)+π(2)b = π(1) and π(1)a+π(2)(1−b) = π(2) ⇔ π(1)a = π(2)b. These equations are redundant! We have to add an equation: π(1)+π(2) = 1. Then we find π = [ b a+b, a a+b].

slide-5
SLIDE 5

Stationary distributions: Example 2

πP = π ⇔ [π(1),π(2)]

  • 1

1

  • = [π(1),π(2)] ⇔ π(1) = π(1) and π(2) = π(2).

Every distribution is invariant for this Markov chain. This is obvious, since Xn = X0 for all n. Hence, Pr[Xn = i] = Pr[X0 = i],∀(i,n). Discussion. We have seen a chain with one stationary, and a chain with many. When is here just one?

slide-6
SLIDE 6

Irreducibility.

Definition A Markov chain is irreducible if it can go from every state i to every state j (possibly in multiple steps). Examples:

1 0.8 1 0.8 2 3 1 2 3 1 2 3 1 0.7 0.3 0.7 0.3 0.7 0.3 1 1 0.6 0.4

[A] [B] [C]

0.2 1 0.2

[A] is not irreducible. It cannot go from (2) to (1). [B] is not irreducible. It cannot go from (2) to (1). [C] is irreducible. It can go from every i to every j. If you consider the graph with arrows when P(i,j) > 0, irreducible means that there is a single connected component.

slide-7
SLIDE 7

Existence and uniqueness of Invariant Distribution

Theorem A finite irreducible Markov chain has one and only one invariant distribution. That is, there is a unique positive vector π = [π(1),...,π(K)] such that πP = π and ∑k π(k) = 1.

  • Ok. Now.

Only one stationary distribution if irreducible (or connected.)

slide-8
SLIDE 8

Long Term Fraction of Time in States

Theorem Let Xn be an irreducible Markov chain with invariant distribution π. Then, for all i, 1 n

n−1

m=0

1{Xm = i} → π(i), as n → ∞. The left-hand side is the fraction of time that Xm = i during steps 0,1,...,n −1. Thus, this fraction of time approaches π(i). Proof: Lecture note 24 gives a plausibility argument.

slide-9
SLIDE 9

Long Term Fraction of Time in States

Theorem Let Xn be an irreducible Markov chain with invariant distribution π. Then, for all i,

1 n ∑n−1 m=0 1{Xm = i} → π(i), as n → ∞.

Example 1: The fraction of time in state 1 converges to 1/2, which is π(1).

slide-10
SLIDE 10

Long Term Fraction of Time in States

Theorem Let Xn be an irreducible Markov chain with invariant distribution π. Then, for all i,

1 n ∑n−1 m=0 1{Xm = i} → π(i), as n → ∞.

Example 2:

slide-11
SLIDE 11

Convergence to Invariant Distribution

Question: Assume that the MC is irreducible. Does πn approach the unique invariant distribution π? Answer: Not necessarily. Here is an example: Assume X0 = 1. Then X1 = 2,X2 = 1,X3 = 2,.... Thus, if π0 = [1,0], π1 = [0,1],π2 = [1,0],π3 = [0,1], etc. Hence, πn does not converge to π = [1/2,1/2]. Notice, all cycles or closed walks have even length.

slide-12
SLIDE 12

Periodicity

Definition: Periodicity is gcd of the lengths of all closed walks. Previous example: 2. Definition If periodicity is 1, Markov chain is said to be aperiodic. Otherwise, it is periodic. Example

[A]: Closed walks of length 3 and length 4 = ⇒ periodicity = 1. [B]: All closed walks multiple of 3 = ⇒ periodicity =2.

slide-13
SLIDE 13

Convergence of πn

Theorem Let Xn be an irreducible and aperiodic Markov chain with invariant distribution π. Then, for all i ∈ X , πn(i) → π(i), as n → ∞. Example

slide-14
SLIDE 14

Convergence of πn

Theorem Let Xn be an irreducible and aperiodic Markov chain with invariant distribution π. Then, for all i ∈ X , πn(i) → π(i), as n → ∞. Example

slide-15
SLIDE 15

Summary

Markov Chains

◮ Markov Chain: Pr[Xn+1 = j|X0,...,Xn = i] = P(i,j) ◮ FSE: β(i) = 1+∑j P(i,j)β(j);α(i) = ∑j P(i,j)α(j). ◮ πn = π0Pn ◮ π is invariant iff πP = π ◮ Irreducible ⇒ one and only one invariant distribution π ◮ Irreducible ⇒ fraction of time in state i approaches π(i) ◮ Irreducible + Aperiodic ⇒ πn → π. ◮ Calculating π: One finds π = [0,0....,1]Q−1 where Q = ···.

slide-16
SLIDE 16

CS70: Continuous Probability.

Continuous Probability 1

  • 1. Examples
  • 2. Events
  • 3. Continuous Random Variables
slide-17
SLIDE 17

Uniformly at Random in [0,1].

Choose a real number X, uniformly at random in [0,1]. What is the probability that X is exactly equal to 1/3? Well, ..., 0. What is the probability that X is exactly equal to 0.6? Again, 0. In fact, for any x ∈ [0,1], one has Pr[X = x] = 0. How should we then describe ‘choosing uniformly at random in [0,1]’? Here is the way to do it: Pr[X ∈ [a,b]] = b −a,∀0 ≤ a ≤ b ≤ 1. Makes sense: b −a is the fraction of [0,1] that [a,b] covers.

slide-18
SLIDE 18

Uniformly at Random in [0,1].

Let [a,b] denote the event that the point X is in the interval [a,b]. Pr[[a,b]] = length of [a,b] length of [0,1] = b −a 1 = b −a. Intervals like [a,b] ⊆ Ω = [0,1] are events. More generally, events in this space are unions of intervals. Example: the event A - “within 0.2 of 0 or 1” is A = [0,0.2]∪[0.8,1]. Thus, Pr[A] = Pr[[0,0.2]]+Pr[[0.8,1]] = 0.4. More generally, if An are pairwise disjoint intervals in [0,1], then Pr[∪nAn] := ∑

n

Pr[An]. Many subsets of [0,1] are of this form. Thus, the probability of those sets is well defined. We call such sets events.

slide-19
SLIDE 19

Uniformly at Random in [0,1].

Note: A radical change in approach. Finite prob. space: Ω = {1,2,...,N}, with Pr[ω] = pω. = ⇒ Pr[A] = ∑ω∈A pω for A ⊂ Ω. Continuous space: e.g., Ω = [0,1], Pr[ω] is typically 0. Instead, start with Pr[A] for some events A. Event A = interval, or union of intervals.

slide-20
SLIDE 20

Uniformly at Random in [0,1].

Pr[X ≤ x] = x for x ∈ [0,1]. Also, Pr[X ≤ x] = 0 for x < 0. Pr[X ≤ x] = 1 for .2x > 1. Define F(x) = Pr[X ≤ x]. Then we have Pr[X ∈ (a,b]] = Pr[X ≤ b]−Pr[X ≤ a] = F(b)−F(a). Thus, F(·) specifies the probability of all the events!

slide-21
SLIDE 21

Uniformly at Random in [0,1].

Pr[X ∈ (a,b]] = Pr[X ≤ b]−Pr[X ≤ a] = F(b)−F(a). An alternative view is to define f(x) = d

dx F(x) = 1{x ∈ [0,1]}. Then

F(b)−F(a) =

b

a f(x)dx.

Thus, the probability of an event is the integral of f(x) over the event: Pr[X ∈ A] =

  • A f(x)dx.
slide-22
SLIDE 22

Uniformly at Random in [0,1].

Think of f(x) as describing how

  • ne unit of probability is spread over [0,1]: uniformly!

Then Pr[X ∈ A] is the probability mass over A. Observe:

◮ This makes the probability automatically additive. ◮ We need f(x) ≥ 0 and

−∞ f(x)dx = 1.

slide-23
SLIDE 23

Uniformly at Random in [0,1].

Discrete Approximation: Fix N ≫ 1 and let ε = 1/N. Define Y = nε if (n −1)ε < X ≤ nε for n = 1,...,N. Then |X −Y| ≤ ε and Y is discrete: Y ∈ {ε,2ε,...,Nε}. Also, Pr[Y = nε] = 1

N for n = 1,...,N.

Thus, X is ‘almost discrete.’

slide-24
SLIDE 24

Nonuniformly at Random in [0,1].

This figure shows a different choice of f(x) ≥ 0 with

−∞ f(x)dx = 1.

It defines another way of choosing X at random in [0,1]. Note that X is more likely to be closer to 1 than to 0. One has Pr[X ≤ x] =

x

−∞ f(u)du = x2 for x ∈ [0,1].

Also, Pr[X ∈ (x,x +ε)] =

x+ε

x

f(u)du ≈ f(x)ε.

slide-25
SLIDE 25

Another Nonuniform Choice at Random in [0,1].

This figure shows yet a different choice of f(x) ≥ 0 with

−∞ f(x)dx = 1.

It defines another way of choosing X at random in [0,1]. Note that X is more likely to be closer to 1/2 than to 0 or 1. For instance, Pr[X ∈ [0,1/3]] =

1/3

4xdx = 2

  • x21/3

= 2

9.

Thus, Pr[X ∈ [0,1/3]] = Pr[X ∈ [2/3,1]] = 2

9 and

Pr[X ∈ [1/3,2/3]] = 5

9.

slide-26
SLIDE 26

General Random Choice in ℜ

Let F(x) be a nondecreasing function with F(−∞) = 0 and F(+∞) = 1. Define X by Pr[X ∈ (a,b]] = F(b)−F(a) for a < b. Also, for a1 < b1 < a2 < b2 < ··· < bn, Pr[X ∈ (a1,b1]∪(a2,b2]∪(an,bn]] = Pr[X ∈ (a1,b1]]+···+Pr[X ∈ (an,bn]] = F(b1)−F(a1)+···+F(bn)−F(an). Let f(x) = d

dx F(x). Then,

Pr[X ∈ (x,x +ε]] = F(x +ε)−F(x) ≈ f(x)ε. Here, F(x) is called the cumulative distribution function (cdf) of X and f(x) is the probability density function (pdf) of X. To indicate that F and f correspond to the RV X, we will write them FX(x) and fX(x).

slide-27
SLIDE 27

Pr[X ∈ (x,x +ε)]

An illustration of Pr[X ∈ (x,x +ε)] ≈ fX(x)ε: Thus, the pdf is the ‘local probability by unit length.’ It is the ‘probability density.’

slide-28
SLIDE 28

Discrete Approximation

Fix ε ≪ 1 and let Y = nε if X ∈ (nε,(n +1)ε]. Thus, Pr[Y = nε] = FX((n +1)ε)−FX(nε). Note that |X −Y| ≤ ε and Y is a discrete random variable. Also, if fX(x) = d

dx FX(x), then FX(x +ε)−FX(x) ≈ fX(x)ε.

Hence, Pr[Y = nε] ≈ fX(nε)ε. Thus, we can think of X of being almost discrete with Pr[X = nε] ≈ fX(nε)ε.

slide-29
SLIDE 29

Example: CDF

Example: hitting random location on gas tank. Random location on circle. y 1 Random Variable: Y distance from center. Probability within y of center: Pr[Y ≤ y] = area of small circle area of dartboard = πy2 π = y2. Hence, FY (y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1

slide-30
SLIDE 30

Calculation of event with dartboard..

Probability between .5 and .6 of center? Recall CDF. FY (y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 Pr[0.5 < Y ≤ 0.6] = Pr[Y ≤ 0.6]−Pr[Y ≤ 0.5] = FY (0.6)−FY (0.5) = .36−.25 = .11

slide-31
SLIDE 31

PDF.

Example: “Dart” board. Recall that FY (y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 fY (y) = F ′

Y (y) =

   for y < 0 2y for 0 ≤ y ≤ 1 for y > 1 The cumulative distribution function (cdf) and probability distribution function (pdf) give full information. Use whichever is convenient.

slide-32
SLIDE 32

Target

slide-33
SLIDE 33

U[a,b]

slide-34
SLIDE 34

Expo(λ)

The exponential distribution with parameter λ > 0 is defined by fX(x) = λe−λx1{x ≥ 0} FX(x) = 0, if x < 0 1−e−λx, if x ≥ 0. Note that Pr[X > t] = e−λt for t > 0.

slide-35
SLIDE 35

Continuous Random Variables

Continuous random variable X, specified by

  • 1. FX(x) = Pr[X ≤ x] for all x.

Cumulative Distribution Function (cdf). Pr[a < X ≤ b] = FX(b)−FX(a) 1.1 0 ≤ FX(x) ≤ 1 for all x ∈ ℜ. 1.2 FX(x) ≤ FX(y) if x ≤ y.

  • 2. Or fX(x) , where FX(x) =

x

−∞ fX(u)du or fX(x) = d(FX (x)) dx

. Probability Density Function (pdf). Pr[a < X ≤ b] =

b

a fX(x)dx = FX(b)−FX(a)

2.1 fX(x) ≥ 0 for all x ∈ ℜ. 2.2

−∞ fX(x)dx = 1.

Recall that Pr[X ∈ (x,x +δ)] ≈ fX(x)δ. X “takes” value nδ, for n ∈ Z, with Pr[X = nδ] = fX(nδ)δ

slide-36
SLIDE 36

A Picture

The pdf fX(x) is a nonnegative function that integrates to 1. The cdf FX(x) is the integral of fX. Pr[x < X < x +δ] ≈ fX(x)δ Pr[X ≤ x] = Fx(x) =

x

−∞ fX(u)du

slide-37
SLIDE 37

Multiple Continuous Random Variables

One defines a pair (X,Y) of continuous RVs by specifying fX,Y (x,y) for x,y ∈ ℜ where fX,Y (x,y)dxdy = Pr[X ∈ (x,x +dx),Y ∈ (y +dy)]. The function fX,Y (x,y) is called the joint pdf of X and Y. Example: Choose a point (X,Y) uniformly in the set A ⊂ ℜ2. Then fX,Y (x,y) = 1 |A|1{(x,y) ∈ A} where |A| is the area of A.

  • Interpretation. Think of (X,Y) as being discrete on a grid with mesh

size ε and Pr[X = mε,Y = nε] = fX,Y (mε,nε)ε2. Extension: X = (X1,...,Xn) with fX(x).

slide-38
SLIDE 38

Example of Continuous (X,Y)

Pick a point (X,Y) uniformly in the unit circle. Thus, fX,Y (x,y) = 1

π 1{x2 +y2 ≤ 1}.

Consequently,

Pr[X > 0,Y > 0] = 1 4 Pr[X < 0,Y > 0] = 1 4 Pr[X 2 +Y 2 ≤ r2] = r2 π Pr[X > Y] = 1 2.

slide-39
SLIDE 39

Summary

Continuous Probability 1

  • 1. pdf: Pr[X ∈ (x,x +δ]] = fX(x)δ.
  • 2. CDF: Pr[X ≤ x] = FX(x) =

x

−∞ fX(y)dy.

  • 3. U[a,b]: fX(x) =

1 b−a1{a ≤ x ≤ b};FX(x) = x−a b−a for a ≤ x ≤ b.

  • 4. Expo(λ):

fX(x) = λ exp{−λx}1{x ≥ 0};FX(x) = 1−exp{−λx} for x ≤ 0.

  • 5. Target: fX(x) = 2x1{0 ≤ x ≤ 1};FX(x) = x2 for 0 ≤ x ≤ 1.
  • 6. Joints: Is this 4/20?

Joint pdf: Pr[X ∈ (x,x +δ),Y = (y,y +δ)) = fX,Y (x,y)δ 2.