15-251 Great Theoretical Ideas in Computer Science Lecture 23: - - PowerPoint PPT Presentation

15 251 great theoretical ideas in computer science
SMART_READER_LITE
LIVE PREVIEW

15-251 Great Theoretical Ideas in Computer Science Lecture 23: - - PowerPoint PPT Presentation

15-251 Great Theoretical Ideas in Computer Science Lecture 23: Markov Chains November 17th, 2015 My typical day (when I was a student) 9:00am Work X b f ( x ) = f ( S ) S ( x ) S [ n ] My typical day (when I was a student) 9:01am


slide-1
SLIDE 1

15-251 Great Theoretical Ideas in Computer Science

Lecture 23: Markov Chains

November 17th, 2015

slide-2
SLIDE 2

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:00am Work

slide-3
SLIDE 3

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:01am Work 40% Surf 60%

slide-4
SLIDE 4

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:02am Work 40% Surf 60% 60% 10% Email 30%

slide-5
SLIDE 5

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:03am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-6
SLIDE 6

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:00am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-7
SLIDE 7

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:01am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-8
SLIDE 8

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:02am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-9
SLIDE 9

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:03am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-10
SLIDE 10

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:04am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-11
SLIDE 11

My typical day (when I was a student)

f(x) = X

S⊆[n]

b f(S)χS(x)

9:05am Work 40% Surf 60% 60% 10% Email 30% 50% 50%

slide-12
SLIDE 12

And now

Prepare 15-251 slides 100%

slide-13
SLIDE 13

Markov Model

slide-14
SLIDE 14

Markov Model Andrey Markov (1856 - 1922) Russian mathematician. Famous for his work on random processes.

Pr[X ≥ c · E[X]] ≤ 1/c

( is Markov’s Inequality.)

slide-15
SLIDE 15

Markov Model Andrey Markov (1856 - 1922) Russian mathematician. Famous for his work on random processes. A model for the evolution of a random system. The future is independent of the past, given the present.

Pr[X ≥ c · E[X]] ≤ 1/c

( is Markov’s Inequality.)

slide-16
SLIDE 16

Cool things about the Markov model

  • It is a very general and natural model.

Extraordinary number of applications in many different disciplines: computer science, mathematics, biology, physics, chemistry, economics, psychology, music, baseball,...

  • The model is simple and neat.
  • A beautiful mathematical theory behind it.

Starts simple, goes deep.

slide-17
SLIDE 17

The plan

Motivating examples and applications Basic mathematical representation and properties Applications

slide-18
SLIDE 18

The future is independent of the past, given the present.

slide-19
SLIDE 19

Some Examples of Markov Models

slide-20
SLIDE 20

Example: Drunkard Walk

Home

slide-21
SLIDE 21

Example: Diffusion Process

slide-22
SLIDE 22

Example: Weather

A very(!!) simplified model for the weather. Pr[sunny to rainy] = 0.1 Pr[sunny to sunny] = 0.9 Pr[rainy to rainy] = 0.5 Pr[rainy to sunny] = 0.5 Probabilities on a daily basis: Encode more information about current state for a more accurate model.

0.9 0.1 0.5 0.5

  • S

R S R S = sunny R = rainy

slide-23
SLIDE 23

Example: Life Insurance

Goal of insurance company: figure out how much to charge the clients. Find a model for how long a client will live. Pr[healthy to sick] = 0.3 Pr[sick to healthy] = 0.8 Pr[sick to death] = 0.1 Pr[healthy to death] = 0.01 Pr[healthy to healthy] = 0.69 Pr[sick to sick] = 0.1 Pr[death to death] = 1 Probabilistic model of health on a monthly basis:

slide-24
SLIDE 24

Example: Life Insurance

Goal of insurance company: figure out how much to charge the clients. Find a model for how long a client will live. Probabilistic model of health on a monthly basis:

0.1 1 0.69

  0.69 0.3 0.01 0.8 0.1 0.1 1  

H S D H S D

slide-25
SLIDE 25

Some Applications of Markov Models

slide-26
SLIDE 26

Application: Algorithmic Music Composition

slide-27
SLIDE 27

Application: Image Segmentation

slide-28
SLIDE 28

Application: Automatic Text Generation

“While at a conference a few weeks back, I spent an interesting evening with a grain of salt.” Random text generated by a computer (putting random words together): Google: Mark V Shaney

slide-29
SLIDE 29

Application: Speech Recognition

Speech recognition software programs use Markov models to listen to the sound of your voice and convert it into text.

slide-30
SLIDE 30

Application: Google PageRank

1997: Web search was horrible Sorts webpages by number of occurrences of keyword(s).

slide-31
SLIDE 31

Application: Google PageRank

Founders of Google $20Billionaires Sergey Brin Larry Page

slide-32
SLIDE 32

Application: Google PageRank

Jon Kleinberg Nevanlinna Prize

slide-33
SLIDE 33

Application: Google PageRank

How does Google order the webpages displayed after a search?

  • Reputation of the page.
  • Relevance of the page.

2 important factors: Reputation is measured using PageRank. PageRank is calculated using a Markov Chain. The number and reputation of links pointing to that page.

slide-34
SLIDE 34
slide-35
SLIDE 35

The plan

Motivating examples and applications Basic mathematical representation and properties Applications

slide-36
SLIDE 36

The Setting

There is a system with n possible states/values. At each time step, the state changes probabilistically.

1 2 1 2 1 4 3 4 1 1

1 2 3 n

slide-37
SLIDE 37

The Setting

1 2 1 2 1 4 3 4 1 1

There is a system with n possible states/values. At each time step, the state changes probabilistically. 1 2 3 n Memoryless The next state only depends

  • n the current state.

Evolution of the system: random walk on the graph.

slide-38
SLIDE 38

The Setting

1 2 1 2 1 4 3 4 1 1

There is a system with n possible states/values. At each time step, the state changes probabilistically. 1 2 3 n Memoryless The next state only depends

  • n the current state.

Evolution of the system: random walk on the graph.

slide-39
SLIDE 39

The Definition

  • Each edge is labeled with a value in

(a positive probability). (0, 1] The vertices of the graph are called states. The edges are called transitions. The label of an edge is a transition probability.

  • At each vertex, the probabilities on outgoing edges

sum to . 1 A Markov Chain is a directed graph with

V = {1, 2, . . . , n}

such that: (- We usually assume the graph is strongly connected. i.e. there is a path from i to j for any i and j.)

self-loops allowed

slide-40
SLIDE 40

Example: Markov Chain for a Lecture

Arrive Playing with phone Paying attention Kicked out Writing notes 1 2 1 2 1 4 1 4 3 4 1 4 1 2 1 2 1 2 1

This is not strongly connected.

slide-41
SLIDE 41

Define . πt[i] = Pr[Xt = i] πt[i] = probability of being in state i after t steps.

Notation

We write . Xt ∼ πt ( has distribution ) Xt πt Note that someone has to provide . π0 Once this is known, we get the distributions π1, π2, . . . Given some Markov Chain with n states: the state we are in after steps. Xt = For each we have a random variable: t = 0, 1, 2, 3, . . . t πt = [p1 p2 · · · pn]

X

i

pi = 1

1 2 n

slide-42
SLIDE 42

Let’s say we start at state 1, i.e.,

Notation

1 2 3 4

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X0 ∼ π0 X0 ∼ [1 0] = π0

slide-43
SLIDE 43

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X1 ∼ π1 X0 ∼ π0 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-44
SLIDE 44

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X2 = 3 X1 ∼ π1 X0 ∼ π0 X2 ∼ π2 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-45
SLIDE 45

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X2 = 3 X3 = 4 X1 ∼ π1 X0 ∼ π0 X2 ∼ π2 X3 ∼ π3 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-46
SLIDE 46

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X2 = 3 X3 = 4 X4 = 2 X1 ∼ π1 X0 ∼ π0 X2 ∼ π2 X3 ∼ π3 X4 ∼ π4 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-47
SLIDE 47

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X2 = 3 X3 = 4 X4 = 2 X5 = 3 X1 ∼ π1 X0 ∼ π0 X2 ∼ π2 X3 ∼ π3 X4 ∼ π4 X5 ∼ π5 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-48
SLIDE 48

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 X0 = 1 X1 = 4 X2 = 3 X3 = 4 X4 = 2 X5 = 3 X6 = 4 . . . X1 ∼ π1 X0 ∼ π0 X2 ∼ π2 X3 ∼ π3 X4 ∼ π4 X5 ∼ π5 X6 ∼ π6 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0

slide-49
SLIDE 49

Notation

1 2 1 2 1 4 3 4 1 1

1 2 3 4 Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0 = Pr[X1 = 2 | X0 = 1] = Pr[Xt = 2 | Xt−1 = 1] Pr[1 → 2 in one step]

slide-50
SLIDE 50

Notation

Pr[X1 = 2|X0 = 1] = Pr[X1 = 3|X0 = 1] = 1 2 1 2 1 4 1 Pr[X1 = 1|X0 = 1] = 0

1 2 1 2 1 4 3 4 1 1

1 2 3 4 ∀t

Pr[Xt = 2|Xt−1 = 4] =

Pr[Xt = 3|Xt−1 = 2] = ∀t Let’s say we start at state 1, i.e., 1 2 3 4 X0 ∼ [1 0] = π0 Pr[X1 = 4|X0 = 1] =

slide-51
SLIDE 51

Notation

   

1 2 1 2

1 1

1 4 3 4

   

1 2 3 4 1 2 3 4 Transition Matrix

1 2 1 2 1 4 3 4 1 1

1 2 3 4 A Markov Chain with n states can be characterized by the n x n transition matrix : K ∀i, j ∈ {1, 2, . . . , n} K[i, j] = Pr[Xt = j | Xt−1 = i] = Pr[i → j in one step] Note: rows of sum to 1. K

slide-52
SLIDE 52

Some Fundamental and Natural Questions

What is the expected time of having visited every state (given some initial state)? What is the expected time of reaching state i when starting at state j ?

. . .

What is the probability of being in state i after t steps (given some initial state)? πt[i] =? How do you answer such questions?

slide-53
SLIDE 53

Mathematical representation of the evolution

Suppose we start at state 1 and let the system evolve. How can we mathematically represent the evolution?

1 2 1 2 1 4 3 4 1 1

1 2 3 4

   

1 2 1 2

1 1

1 4 3 4

   

1 2 3 4 1 2 3 4 What is ? π1

⇥1 0⇤

π0 = 1 2 3 4 By inspection, .

= ⇥

1 2 1 2

π1 1 2 3 4

slide-54
SLIDE 54

Poll

= ⇥

1 2 1 2

π1 Given , what is ? π2 1 2 3 4 ⇥

1 8 7 8

⇤ ⇥

1 2 1 2

⇤ ⇥

1 4 3 4

⇤ ⇥

1 2 1 2

⇤ ⇥0 1 0⇤ ⇥

5 8 3 8

slide-55
SLIDE 55

Mathematical representation of the evolution

⇥1 0⇤

π0 = What is ? π1 π1[j] = Pr[X1 = j] =

4

X

i=1

Pr[X1 = j | X0 = i] Pr[X0 = i]

(law of total probability)

=

4

X

i=1

K[i, j] · π0[i]

matrix mult.

= (π0 · K)[j] This is true for any . j

slide-56
SLIDE 56

Mathematical representation of the evolution

   

1 2 1 2

1 1

1 4 3 4

    ⇥1 0⇤

π0

= ⇥

1 2 1 2

π1 K The probability of states after 1 step:

the new state (probabilistic)

slide-57
SLIDE 57

Mathematical representation of the evolution

K The probability of states after 2 steps:

1 2 1 2

⇤    

1 2 1 2

1 1

1 4 3 4

   

π1

= ⇥

1 8 7 8

π2

the new state (probabilistic)

slide-58
SLIDE 58

Mathematical representation of the evolution

π1 = π0 · K π2 = π1 · K So π2 = (π0 · K) · K = π0 · K2

slide-59
SLIDE 59

Mathematical representation of the evolution

In general: If the initial probabilistic state is

⇥p1 p2 · · · pn ⇤ pi = probability of being in state i, p1 + p2 + · · · + pn = 1 ,

after t steps, the probabilistic state is:

⇥p1 p2 · · · pn ⇤

    Transition Matrix    

t

= π0 = πt

slide-60
SLIDE 60

i.e., can we say anything about for large ? πt t

Remarkable Property of Markov Chains

Suppose the Markov chain is “aperiodic”. Then, as the system evolves, the probabilistic state converges to a limiting probabilistic state. What happens in the long run? As , for any :

⇥p1 p2 · · · pn ⇤

    Transition Matrix     →

t → ∞

t

π0 = [p1 p2 · · · pn] π

slide-61
SLIDE 61

as .

Remarkable Property of Markov Chains

This is unique. π In other words: πt → π t → ∞ stationary/invariant distribution

    Transition Matrix    

π = π Note:

slide-62
SLIDE 62

Remarkable Property of Markov Chains

Stationary distribution is . ⇥ 5

6 1 6

⇤ In the long run, it is sunny 5/6 of the time, it is rainy 1/6 of the time.

0.9 0.1 0.5 0.5

  • ⇥ 5

6 1 6

⇤ = ⇥ 5

6 1 6

slide-63
SLIDE 63

Remarkable Property of Markov Chains

How did I find the stationary distribution?  0.9 0.1 0.5 0.5 2 = 0.86 0.14 0.7 0.3

0.9 0.1 0.5 0.5 4 = 0.8376 0.1624 0.812 0.188

0.9 0.1 0.5 0.5 8 = 0.833443 0.166557 0.832787 0.167213

  • Exercise: Why do the rows converge to ?

π

slide-64
SLIDE 64

Remarkable Property of Markov Chains

What is a “periodic” Markov chain? π0 = [1 0] π1 = [0 1] π3 = [0 1] π2 = [1 0]

There is still a stationary distribution.

π = [1/2 1/2] [1/2 1/2] 0 1 1

  • = [1/2

1/2]

But it is not a limiting distribution.

We needed the Markov chain to be “aperiodic”. . . . 1 2 1 1 0 1 1

slide-65
SLIDE 65

Summary so far

There is a unique invariant distribution : π For aperiodic Markov Chains: as . πt → π t → ∞ π = π · K Markov Chains can be characterized by the transition matrix . K K[i, j] = Pr[Xt = j | Xt−1 = i] = Pr[i → j in one step] What is the probability of being in state i after t steps? πt[i] = (π0 · Kt)[i] πt = π0 · Kt

slide-66
SLIDE 66

The plan

Motivating examples and applications Basic mathematical representation and properties Applications

slide-67
SLIDE 67

How are Markov Chains applied ?

2 common types of applications: Use the Markov chain to simulate the process. e.g. text generation, music composition. e.g. Google PageRank, image segmentation Build a Markov chain as a statistical model of a real-world process. 1. Use a measure associated with a Markov chain to approximate a quantity of interest. 2.

slide-68
SLIDE 68

How are Markov Chains applied ?

2 common types of applications: Use the Markov chain to simulate the process. e.g. text generation, music composition. e.g. Google PageRank, image segmentation Build a Markov chain as a statistical model of a real-world process. 1. Use a measure associated with a Markov chain to approximate a quantity of interest. 2.

slide-69
SLIDE 69

Automatic Text Generation

Generate a superficially real-looking text given a sample document. Idea: From the sample document, create a Markov chain. Use a random walk on the Markov chain to generate text. Example: Collect speeches of Obama, create a Markov chain. Use a random walk to generate new speeches.

slide-70
SLIDE 70

Automatic Text Generation

  • 1. For each word in the document, create a node/state.
  • 2. Put an edge word1 ---> word2

if there is a sentence in which word2 comes after word1.

  • 3. Edge probabilities reflect frequency of the pair of

words.

like a the to

like a 3 times like the 4 times like to 2 times

3/9 4/9 2/9

The Markov Chain:

slide-71
SLIDE 71

Automatic Text Generation

“I jumped up. I don't know what's going on so I am coming down with a road to opportunity. I believe we can agree on

  • r do about the major challenges facing our country.”
slide-72
SLIDE 72

Automatic Text Generation

Another use: Build a Markov chain based on speeches of Obama. Build a Markov chain based on speeches of Bush. Given a new quote, can predict if it is by Obama or Bush. (by testing which Markov model the quote fits best)

slide-73
SLIDE 73

Image Segmentation

Simple version Given an image that contains an object, figure out: which pixels correspond to the object, which pixels correspond to the background. i.e., label each pixel “object” or “background”

(user labels a small number of pixels with known labels)

slide-74
SLIDE 74

Image Segmentation

  • 1. Each pixel is a node/state.

The Markov Chain:

  • 2. There is an edge between adjacent pixels.

“background” “object”

  • 3. Edge probabilities reflect similarity between pixels.

Which one is more likely: random walker first visits “background”

  • r

“object”?

slide-75
SLIDE 75

Image Segmentation

slide-76
SLIDE 76

Google PageRank

The number and reputation of links pointing to you. PageRank is a measure of reputation: The Markov Chain:

slide-77
SLIDE 77

Google PageRank

The number and reputation of links pointing to you. PageRank is a measure of reputation: The Markov Chain:

  • 1. Every webpage is a node/state.
  • 2. Each hyperlink is an edge:

if webpage A has a link to webpage B, A ---> B

  • 3a. If A has m outgoing edges, each gets label 1/m.
  • 3b. If A has no outgoing edges, put edge A ---> B B

(jump to a random page) ∀

slide-78
SLIDE 78

Google PageRank

PageRank of webpage A = The stationary probability of A Stationary distribution: probability of being in state A in the long run A little tweak: Random surfer jumps to a random page with 15% prob.

slide-79
SLIDE 79

Google PageRank

slide-80
SLIDE 80

Google PageRank

Google:

“PageRank continues to be the heart of our software.”

slide-81
SLIDE 81

How are Markov Chains applied ?

2 common types of applications: Build a Markov chain as a statistical model of a real-world process. Use a measure associated with a Markov chain to approximate a quantity of interest. Use the Markov chain to simulate the process. e.g. text generation, music composition. e.g. Google PageRank, image segmentation 1. 2.

slide-82
SLIDE 82

The plan

Motivating examples and applications Basic mathematical representation and properties Applications