Markov Models and Hidden Markov Models Robert Platt Northeastern - - PowerPoint PPT Presentation

markov models and hidden markov models
SMART_READER_LITE
LIVE PREVIEW

Markov Models and Hidden Markov Models Robert Platt Northeastern - - PowerPoint PPT Presentation

Markov Models and Hidden Markov Models Robert Platt Northeastern University Some images and slides are used from: 1. CS188 UC Berkeley 2. RN, AIMA Markov Models We have already seen that an MDP provides a useful framework for modeling


slide-1
SLIDE 1

Markov Models and Hidden Markov Models

Robert Platt Northeastern University Some images and slides are used from:

  • 1. CS188 UC Berkeley
  • 2. RN, AIMA
slide-2
SLIDE 2

Markov Models

We have already seen that an MDP provides a useful framework for modeling stochastic control problems. Markov Models: model any kind of temporally dynamic system.

slide-3
SLIDE 3

Probability again: Independence

Two random variables, x and y, are independent when: The outcomes of two different coin flips are usually independent of each other

Image: Berkeley CS188 course notes (downloaded Summer 2015)

slide-4
SLIDE 4

Probability again: Independence

If: Then: Why?

slide-5
SLIDE 5

Are T and W independent?

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-6
SLIDE 6

Are T and W independent?

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T P hot 0.5 cold 0.5 W P sun 0.6 rain 0.4

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-7
SLIDE 7

Are T and W independent?

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T W P hot sun 0.3 hot rain 0.2 cold sun 0.3 cold rain 0.2 T P hot 0.5 cold 0.5 W P sun 0.6 rain 0.4

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-8
SLIDE 8

Conditional independence

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

Independence: Conditional independence: Equivalent statements of conditional independence:

slide-9
SLIDE 9

Conditional independence: example

cavity toothache catch P(toothache, catch | cavity) = P(toothache | cavity) = P(catch | cavity) P(toothache | cavity) = P(toothache | cavity, catch) P(catch | cavity) = P(catch | cavity, toothache)

  • r...
slide-10
SLIDE 10

Conditional independence: example

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

  • What about this domain:
  • T

raffjc

  • Umbrella
  • Raining
slide-11
SLIDE 11

Conditional independence: example

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

  • What about this domain:
  • Fire
  • Smoke
  • Alarm
slide-12
SLIDE 12

Markov Processes

transitions State at time=1 State at time=2

slide-13
SLIDE 13

Markov Processes

transitions State at time=1 State at time=2 Since this is a Markov process, we assume transitions are Markov: Markov assumption: Process model:

slide-14
SLIDE 14

Markov Processes

How do we calculate:

slide-15
SLIDE 15

Markov Processes

How do we calculate:

slide-16
SLIDE 16

Markov Processes

How do we calculate:

slide-17
SLIDE 17

Markov Processes

How do we calculate:

Can we simplify this expression?

slide-18
SLIDE 18

Markov Processes

How do we calculate:

slide-19
SLIDE 19

Markov Processes

How do we calculate:

slide-20
SLIDE 20

Markov Processes

How do we calculate:

In general:

slide-21
SLIDE 21

Markov Processes

How do we calculate:

In general:

Process model

slide-22
SLIDE 22

Markov Processes: example

T wo new ways of representing the same CPT

sun rain sun rain

0.1 0.9 0.7 0.3

  • States: X = {rain, sun}

rain sun

0.9

0.7 0.3 0.1

Xt-1 Xt P(Xt|Xt-1) sun sun 0.9 sun rain 0.1 rain sun 0.3 rain rain 0.7

  • Initial distribution: 1.0 sun
  • Process model: P(Xt | Xt-1):

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-23
SLIDE 23

Simulating dynamics forward

Joint distribution: But, suppose we want to predict the state at time T, given a prior distribution at time 1?

...

slide-24
SLIDE 24

Markov Processes: example

  • Initial distribution: 1.0 sun
  • What is the probability distribution after one step?

rain sun

0.9

0.7 0.3 0.1

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-25
SLIDE 25

Simulating dynamics forward

  • From initial observation of sun
  • From initial observation of rain
  • From yet another initial distribution P(X1):

P(X1) P(X2) P(X3) P(X∞) P(X4) P(X1) P(X2) P(X3) P(X∞) P(X4) P(X1) P(X∞)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-26
SLIDE 26

Simulating dynamics forward

  • From initial observation of sun
  • From initial observation of rain
  • From yet another initial distribution P(X1):

P(X1) P(X2) P(X3) P(X∞) P(X4) P(X1) P(X2) P(X3) P(X∞) P(X4) P(X1) P(X∞)

This is called the stationary distribution

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-27
SLIDE 27

Hidden Markov Models (HMMs)

Hidden Markov Models: markov models applied to estimation problems – speech to text – tracking in computer vision – robot localization

slide-28
SLIDE 28

Hidden Markov Models (HMMs)

State, , is assumed to be unobserved However, you get to make one observation, , on each timestep. Called an “emission”

slide-29
SLIDE 29

Hidden Markov Models (HMMs)

Sensor Markov Assumption: the current observation depends only on current state:

slide-30
SLIDE 30

HMM example

Rt Rt+1 P(Rt+1|Rt) +r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8

  • An HMM is defjned by:
  • Initial distribution:
  • T

ransitions:

  • Emissions:

Umbrella

t-1

Umbrella

t

Umbrella

t+1

Raint-1 Raint Raint+1

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-31
SLIDE 31

Real world HMM applications

  • Speech recognition HMMs:
  • Observations are acoustic signals (continuous valued)
  • States are specifjc positions in specifjc words (so, tens of

thousands)

  • Machine translation HMMs:
  • Observations are words (tens of thousands)
  • States are translation options
  • Robot tracking:
  • Observations are range readings (continuous)
  • States are positions on a map (continuous)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-32
SLIDE 32

HMM Filtering

  • Filtering, or monitoring, is the task of tracking the

distribution Bt(X) = Pt(Xt | e1, …, et) (the belief state) over time

  • We start with B1(X) in an initial setting, usually

uniform

  • As time passes, or we get observations, we update

B(X)

  • The Kalman fjlter was invented in the 60’s and fjrst

implemented as a method of trajectory estimation for the Apollo program

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-33
SLIDE 33

HMM Filtering Given a prior distribution, , and a series

  • f observations, , calculate the

posterior distribution: Two steps:

Process update Observation update

slide-34
SLIDE 34

HMM Filtering Given a prior distribution, , and a series

  • f observations, , calculate the

posterior distribution: Two steps:

Process update Observation update

slide-35
SLIDE 35

HMM Filtering Given a prior distribution, , and a series

  • f observations, , calculate the

posterior distribution: Two steps:

Process update Observation update

“Beliefs”

slide-36
SLIDE 36

Process update

This is just forward simulation of the Markov Model

slide-37
SLIDE 37

Process update: example

  • As time passes, uncertainty “accumulates”

T = 1 T = 2 T = 5 (T ransition model: ghosts usually go clockwise)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-38
SLIDE 38

Observation update

Where is a normalization factor

slide-39
SLIDE 39

Observation update

  • As we get observations, beliefs get reweighted, uncertainty

“decreases”

Before observation After observation

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-40
SLIDE 40

Robot localization example

1 Observation model: can read in which directions there is a wall, never more than 1 mistake Process model: may not execute action with small prob. Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-41
SLIDE 41

Robot localization example

1

Lighter grey: was possible to get the reading, but less likely b/c required 1 mistake

Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-42
SLIDE 42

Robot localization example

1 Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-43
SLIDE 43

Robot localization example

1 Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-44
SLIDE 44

Robot localization example

1 Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-45
SLIDE 45

Robot localization example

1 Prob

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-46
SLIDE 46

Weather HMM example

Rt

Rt+1 P(Rt+1|Rt)

+r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8 Umbrella1 Umbrella2 Rain0 Rain1 Rain2 B(+r) = 0.5 B(-r) = 0.5

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-47
SLIDE 47

Weather HMM example

Rt Rt+1 P(Rt+1|Rt)

+r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8 Umbrella1 Umbrella2 Rain0 Rain1 Rain2 B(+r) = 0.5 B(-r) = 0.5 B’(+r) = 0.5 B’(-r) = 0.5

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-48
SLIDE 48

Weather HMM example

Rt Rt+1 P(Rt+1|Rt)

+r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8 Umbrella1 Umbrella2 Rain0 Rain1 Rain2 B(+r) = 0.5 B(-r) = 0.5 B’(+r) = 0.5 B’(-r) = 0.5 B(+r) = 0.818 B(-r) = 0.182

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-49
SLIDE 49

Weather HMM example

Rt Rt+1 P(Rt+1|Rt)

+r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8 Umbrella1 Umbrella2 Rain0 Rain1 Rain2 B(+r) = 0.5 B(-r) = 0.5 B’(+r) = 0.5 B’(-r) = 0.5 B(+r) = 0.818 B(-r) = 0.182 B’(+r) = 0.627 B’(-r) = 0.373

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-50
SLIDE 50

Weather HMM example

Rt Rt+1 P(Rt+1|Rt)

+r +r 0.7 +r

  • r

0.3

  • r

+r 0.3

  • r
  • r

0.7 Rt Ut P(Ut|Rt) +r +u 0.9 +r

  • u

0.1

  • r

+u 0.2

  • r
  • u

0.8 Umbrella1 Umbrella2 Rain0 Rain1 Rain2 B(+r) = 0.5 B(-r) = 0.5 B’(+r) = 0.5 B’(-r) = 0.5 B(+r) = 0.818 B(-r) = 0.182 B’(+r) = 0.627 B’(-r) = 0.373 B(+r) = 0.883 B(-r) = 0.117

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-51
SLIDE 51

Particle Filtering

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-52
SLIDE 52

Representation: Particles

  • Our representation of P(X) is now a list of N

particles (samples)

  • Generally, N << |X|
  • Storing map from X to counts would defeat the

point

  • P(x) approximated by number of particles with

value x

  • So, many x may have P(x) = 0!
  • More particles, more accuracy
  • For now, all particles have a weight of 1

Particles : (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-53
SLIDE 53

Particle Filtering: Elapse Time

  • Each particle is moved by sampling

its next position from the transition model

  • This is like prior sampling – samples’

frequencies refmect the transition probabilities

  • Here, most samples move clockwise, but

some move in another direction or stay in place

  • This captures the passage of time
  • If enough samples, close to exact values

before and after (consistent)

Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-54
SLIDE 54
  • Slightly trickier:
  • Don’t sample observation, fjx it
  • Similar to likelihood weighting,

downweight samples based on the evidence

  • As before, the probabilities don’t sum to
  • ne, since all have been downweighted

(in fact they now sum to (N times) an approximation of P(e))

Particle Filtering: Observe

Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-55
SLIDE 55

Particle Filtering: Resample

  • Rather than tracking weighted

samples, we resample

  • N times, we choose from our

weighted sample distribution (i.e. draw with replacement)

  • This is equivalent to renormalizing

the distribution

  • Now the update is complete for

this time step, continue with the next one

Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 (New) Particles: (3,2) (2,2) (3,2) (2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-56
SLIDE 56

Recap: Particle Filtering

  • Particles: track samples of states rather than an explicit

distribution

Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3)

Elapse Weight Resample

Particles: (3,2) (2,3) (3,2) (3,1) (3,3) (3,2) (1,3) (2,3) (3,2) (2,2) Particles: (3,2) w=.9 (2,3) w=.2 (3,2) w=.9 (3,1) w=.4 (3,3) w=.4 (3,2) w=.9 (1,3) w=.1 (2,3) w=.2 (3,2) w=.9 (2,2) w=.4 (New) Particles: (3,2) (2,2) (3,2) (2,3) (3,3) (3,2) (1,3) (2,3) (3,2) (3,2)

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-57
SLIDE 57

Robot Localization

  • In robot localization:
  • We know the map, but not the robot’s position
  • Observations may be vectors of range fjnder

readings

  • State space and readings are typically

continuous (works basically like a very fjne grid) and so we cannot store B(X)

  • Particle fjltering is a main technique

Slide: Berkeley CS188 course notes (downloaded Summer 2015)

slide-58
SLIDE 58

Particle Filter Localization (Sonar)

slide-59
SLIDE 59

Particle Filter Localization (Laser)

slide-60
SLIDE 60

Dynamic Bayes Nets

slide-61
SLIDE 61

Dynamic Bayes Nets (DBNs)

  • We want to track multiple variables over time,

using multiple sources of evidence

  • Idea: Repeat a fjxed Bayes net structure at

each time

  • Variables from time t can condition on those

from t-1

  • Dynamic Bayes nets are a generalization of

HMMs

G1

a

E1

a E1 b

G1

b

G2

a

E2

a

E2

b

G2

b

t =1 t =2

G3

a

E3

a

E3

b

G3

b

t =3

slide-62
SLIDE 62

DBN Particle Filters

  • A particle is a complete sample for a time step
  • Initialize: Generate prior samples for the t=1 Bayes net
  • Example particle: G1

a = (3,3) G1 b = (5,3)

  • Elapse time: Sample a successor for each particle
  • Example successor: G2

a = (2,3) G2 b = (6,3)

  • Observe: Weight each entire sample by the likelihood of

the evidence conditioned on the sample

  • Likelihood: P(E1

a |G1 a ) * P(E1 b |G1 b )

  • Resample: Select prior samples (tuples of values) in

proportion to their likelihood