CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld - - PDF document

cse 473 artificial intelligence hidden markov models
SMART_READER_LITE
LIVE PREVIEW

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld - - PDF document

CSE 473: Artificial Intelligence Hidden Markov Models Daniel Weld University of Washington [Many of these slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at


slide-1
SLIDE 1

1

CSE 473: Artificial Intelligence Hidden Markov Models

Daniel Weld University of Washington

[Many of these slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

1

Agent vs. Environment

§ An agent is an entity that perceives and acts. § A rational agent selects actions that maximize its utility function.

Agent Sensors ? Actuators Environment

Percepts Actions

Deterministic vs. stochastic Fully observable vs. partially observable

3

slide-2
SLIDE 2

2

It’s Hard!

Deterministic vs. stochastic Fully observable vs. partially observable

4

Partial Observability in Pacman

§ A ghost is in the grid somewhere, but Pacman can’t see it! § Sensor readings tell how close a square is to the ghost

§ On the ghost: red § 1 or 2 away: orange § 2 or 3 away: yellow § 4+ away: green

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3

§ Sensors are noisy, but we know P(Color | Distance) Etc.

5

slide-3
SLIDE 3

3

Pacman Maintains a Belief about Ghost Locations

Belief = A Probability Distribution

  • ver possible locations

Visualized here as color density (each ghost has its own color) Four ghosts shown here (not to be confused with colors from sensor readings)

6

Video of Demo Pacman – Sonar (with beliefs)

7

slide-4
SLIDE 4

4

PROBABILITY REVIEW

8

8

Random Variables

§ A random variable is some aspect of the world about which we (may) have uncertainty

§ R = Is it raining? § T = Is it hot or cold? § D = How long will it take to drive to work? § L = Where is the ghost?

§ We denote random variables with Capital Letters § Random variables have domains (possible outcomes)

§ T in {hot, cold} § D in [0, ¥) § L in possible locations, maybe {(0,0), (0,1), …}

9

slide-5
SLIDE 5

5

Joint Distributions

§ A joint distribution over a set of random variables: specifies a probability for each assignment (or outcome): § Must obey: § Number of parameters to specify joint distribution if n variables, each with |domain| = d? § dn -1 For all but the smallest distributions, impractical to write out! T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3

10

Marginal Distributions

§ Marginal distributions are sub-tables which eliminate variables § Marginalization (summing out): Combine collapsed rows by adding

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 T P hot 0.5 cold 0.5 W P sun 0.6 rain 0.4 11

slide-6
SLIDE 6

6

Conditional Probabilities

§ A simple relation between joint and marginal probabilities

§ In fact, this is taken as the definition of a conditional probability

T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3 P(b) P(a) P(a,b) 12

Bayes Rule

13

P(sneezing | coronavirus) = 0.75 P(sneezing) = 0.10 P(coronavirus) = 0.000001 P(coronavirus | sneezing) = ? 0.0000075 P(coronavirus | sneezing, in-seattle) = ?

13

slide-7
SLIDE 7

7

Probability Recap

§ Conditional probability § Product rule § Chain rule § Bayes rule § X, Y independent if and only if: § X and Y are conditionally independent given Z: if and only if:

14

Conditional Independence

S = Smokes cigarettes D = Early death

15

C = Has (or will have) lung cancer S / D S D | C Forall s,d,c P(s,d | c) = P(s | c)*P(d | c)

15

slide-8
SLIDE 8

8

Probabilistic Inference

§ Probabilistic inference = “compute a desired probability from other known probabilities (e.g. conditional from joint)” § We generally compute conditional probabilities

§ P(on time | no reported accidents) = 0.90 § These represent the agent’s beliefs given the evidence

§ Probabilities change with new evidence:

§ P(on time | no accidents, 5 a.m.) = 0.95 § P(on time | no accidents, 5 a.m., raining) = 0.80 § Observing new evidence causes beliefs to be updated 16

Outline

§ Hidden Markov Models (HMMs)

§ A way to represent a class of probability distributions

§ Task of Filtering (aka Monitoring)

§ HMM Forward Algorithm for Filtering § HMM Particle Filter Representation & Algorithm for Filtering

§ Dynamic Bayes Nets

§ A generalization & improvement on HMMs

17

17

slide-9
SLIDE 9

9

Filtering as “Probabilistic Inference”

18

Stream of observations (evidence) at successive times: e1, e2, … Important Inference question: P(Xt | e1, e2, … et) Deterministic vs. stochastic Fully observable vs. partially observable

18

Hidden Markov Models

Cool representation for uncertain, sequential data § E.g., ghost locations over time in Pacman § E.g., characters on a line in OCR § E.g., words over time in speech recognition

19

19

slide-10
SLIDE 10

10

Hidden Markov Models

Defines a joint probability distribution: X5 X2 E1 X1 X3 X4 E2 E3 E4 E5 XN EN

20

Hidden Markov Model: Example

§ An HMM is defined by:

§ Initial distribution: § Transitions: § Observations:

Aka “evidence,” “emissions”

P(R1 ) 0.6 Rt-1 t f P(Rt | Rt-1 ) 0.7 0.1 Rt t f P(Ut | Rt ) 0.9 0.2

H M M s h a v e S t a t i

  • n

a r y t r a n s i t i

  • n

d y n a m i c s S t a t i

  • n

a r y

  • b

s e r v a t i

  • n

m

  • d

e l

21

slide-11
SLIDE 11

11

Remember: Joint Distributions

§ A joint distribution over a set of random variables: specifies a probability for each assignment (or outcome): § Must obey: § Size of joint distribution if n variables, each with domain sizes d? § dn For all but the smallest distributions, impractical to write out! T W P hot sun 0.4 hot rain 0.1 cold sun 0.2 cold rain 0.3

22

HMM Joint Distribution for T=100

X1 E1 X2 E2 X3 E3 … X100 E100 P T T T T T T T T 0.01 T T T T T T T F 0.007 … F F F F F F F F F 0.026

23

How Many Parameters?

23

slide-12
SLIDE 12

12

Umbrella HMM Example: 5 Parameters

§ An HMM is defined by:

§ Initial distribution: § Transitions: § Observations:

Aka “evidence,” “emissions”

P(R1 ) 0.6 Rt-1 t f P(Rt | Rt-1 ) 0.7 0.1 Rt t f P(Ut | Rt ) 0.9 0.2

H M M s h a v e S t a t i

  • n

a r y t r a n s i t i

  • n

d y n a m i c s S t a t i

  • n

a r y

  • b

s e r v a t i

  • n

m

  • d

e l

24

Xt-1 Xt+1 | Xt

Conditional Independence

HMMs have two important independence properties:

§ Future independent of past given the present X2 E1 X1 X3 X4 E1 E3 E4

? ?

Forall xt-1 , xt , xt+1 P(xt-1 , xt+1 | xt ) = P(xt-1 | xt )*P(xt+1 | xt )

25

slide-13
SLIDE 13

13

Conditional Independence

HMMs have two important independence properties:

§ Future independent of past given the present § Current observation independent of all else given current state X2 E1 X1 X3 X4 E1 E3 E4

? ?

Et Xt-1 | Xt Et all | Xt

For example, …

26

Conditional Independence

§ HMMs have two important independence properties:

§ Markov hidden process, future depends on past via the present § Current observation independent of all else given current state

§ Quiz: does this mean that observations are independent given no evidence?

§ [No, correlated by the hidden state, X2 and X3] X2 E1 X1 X3 X4 E1 E3 E4

? ?

27

slide-14
SLIDE 14

14

HMM Computations

Given

§ parameters § evidence E1:n =e1:n Inference problems include: § Filtering, find P(Xt|e1:t) for some t § Most probable explanation, for some t find x*1:t = argmaxx1:t P(x1:t|e1:t) § Smoothing, find P(Xt|e1:n) for some t < n

28

Filtering (aka Monitoring)

§ The task of tracking the agent’s belief state, B(X), over time

§ B(X) = distribution over world states (outcomes); represents agent knowledge § We start with B(X) in an initial setting, usually uniform § As time passes, or we get evidence/observations, we update B(X)

§ Many algorithms for this:

§ Exact probabilistic inference § Particle filter approximation § Kalman filter (a method for handling continuous Real-valued random vars)

§ invented in the 60’for Apollo Program – real-valued state, Gaussian noise 29

slide-15
SLIDE 15

15

Example of HMM Filtering

Robot tracking:

§ States (X) are positions on a map (continuous) § Observations (E) are range readings (continuous) X2 E1 X1 X3 X4 E1 E3 E4

30

Example: Robot Localization

T=1 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

1 Prob

Example from Michael Pfeiffer

31

slide-16
SLIDE 16

16

Example: Robot Localization

t=1

1 Prob

Green signal = obstacle detected Red signal = no obstacle detected At most one error!

32

Example: Robot Localization

t=2

1 Prob 33

slide-17
SLIDE 17

17

Example: Robot Localization

t=3

1 Prob 34

Example: Robot Localization

t=4

1 Prob 35

slide-18
SLIDE 18

18

Example: Robot Localization

t=5

1 Prob 36

Other Real HMM Examples

§ Speech recognition HMMs:

§ States are specific positions in specific words (so, tens of thousands) § Observations are acoustic signals (continuous valued) X2 E1 X1 X3 X4 E1 E3 E4

37

slide-19
SLIDE 19

19

Other Real HMM Examples

§ Machine translation HMMs:

§ States are translation options § Observations are words (tens of thousands) X2 E1 X1 X3 X4 E1 E3 E4

38

Ghostbusters HMM

§ X = ghost location: x11, … x33 § Ignore pacman location for now

x13 x23 x12 x22 x33 x23 x11 x21 x31 P(X1)

X2 E1 X1 X3 X4 E1 E3 E4

§ How specify HMM?

39

slide-20
SLIDE 20

20

Ghostbusters HMM

§ X = ghost locations: x11, … x33 § Ignore pacman location for now – suppose lower left x11

1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 P(X1)

X2 E1 X1 X3 X4 E1 E3 E4

1/6 1/6 1/6 1/2 Etc…

§ How specify HMM?

§ P(Xinitial) = uniform § P(Xt+1 | Xt) =

P(Xt+1 | Xt=x23)

A big 9 x 9 table. E.g. P(Xt+1 = x33 | Xt = x33), …, P(Xt+1 = x11 | Xt = x11)

40

Ghostbusters HMM

§ X = ghost locations: x11, … x33 § Ignore pacman location for now – suppose lower left x11

1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 P(X1)

X2 E1 X1 X3 X4 E1 E3 E4

1/6 1/6 1/6 1/2 Etc…

§ How specify HMM?

§ P(Xinitial) = uniform § P(Xt+1 | Xt) = A big 9 x 9 table. E.g. P(Xt+1 = x33 | Xt = x33), …, P(Xt+1 = x11 | Xt = x11) § P(Et | Xt) = even bigger table: 4 sonar colors x 9 ghost positions x 9 Pacman positions

41

slide-21
SLIDE 21

21

Ghostbusters HMM

§ P(X1) = uniform § P(X’|X) = ghosts usually move clockwise, but sometimes move in a random direction or stay put § P(E|X) = same sensor model as before: red means probably close, green means likely far away.

1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 P(X1) P(X’|X=x23) 1/6 1/6 1/6 1/2

X2 E1 X1 X3 X4 E1 E3 E4 E5

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3

P(E|X) This is part of a schema – describing the full 4 x 9 x 9 table Etc…

42

Filtering (aka Monitoring)

§ Filtering, or monitoring, is the task of tracking the distribution B(X) (called “the belief state”) over time § We start with B0(X) in an initial setting, usually uniform § We update B(Xt) computing B(Xt+1)

  • 1. As time passes, and

using prob model of how ghosts move

  • 2. As we get observations

using prob model of how noisy sensors work

Defn: B(Xt) ≡ P(Xt | e1:t)

43

slide-22
SLIDE 22

22

Filtering: Base Cases

E1 X1 X2 X1

“Observation” “Passage of Time”

44

Forward Algorithm

§ t = 0 § B(Xt) = initial distribution § Repeat forever

§ B’(Xt+1) = Simulate passage of time from B(Xt) § Observe et+1 § B(Xt+1) = Update B’(Xt+1) based on probability of observing et+1

45

B(Xt) = P(Xt | e1:t) B’(Xt+1) = P(Xt+1 | e1:t)

Defns:

45

slide-23
SLIDE 23

23

Passage of Time

§ Assume we have current belief P(X | evidence to date) § Then, after one time step passes: § Basic idea: beliefs get “pushed” through the transitions

§ With the “B” notation, we have to be careful about what time step t the belief is about, and what evidence it includes

Xt+1 Xt = X

xt

P(Xt+1, xt|e1:t)

= X

xt

P(Xt+1|xt, e1:t)P(xt|e1:t) = X

xt

P(Xt+1|xt)P(xt|e1:t)

P(Xt+1|e1:t)

Marginalizing Defn conditional probability Conditional independence

46

Passage of Time

§ Assume we have current belief P(X | evidence to date) § Then, after one time step passes: § Basic idea: beliefs get “pushed” through the transitions

§ With the “B” notation, we have to be careful about what time step t the belief is about, and what evidence it includes

Xt+1 Xt = X

xt

P(Xt+1, xt|e1:t)

= X

xt

P(Xt+1|xt, e1:t)P(xt|e1:t) = X

xt

P(Xt+1|xt)P(xt|e1:t) § Or compactly:

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

P(Xt+1|e1:t)

47

slide-24
SLIDE 24

24

Example: Passage of Time (with no observations!)

§ As time passes, uncertainty “accumulates”

T = 1 T = 2 T = 5

(Transition model: ghosts usually go clockwise)

48

Reminder: Complete Forward Algorithm

§ t = 0 § B(Xt) = initial distribution § Repeat forever

§ B’(Xt+1) = Simulate passage of time from B(Xt) § Observe et+1 § B(Xt+1) = Update B’(Xt+1) based on probability of et+1

49

B(Xt) = P(Xt | e1:t) B’(Xt+1) = P(Xt+1 | e1:t)

49

slide-25
SLIDE 25

25

Observation

§ Assume we have current belief P(X | previous evidence): § Then, after new evidence comes in: § Or, compactly:

B0(Xt+1) = P(Xt+1|e1:t) P(Xt+1|e1:t+1) = P(Xt+1, et+1|e1:t)/P(et+1|e1:t) = P(et+1|Xt+1)P(Xt+1|e1:t) = P(et+1|e1:t, Xt+1)P(Xt+1|e1:t)

B’ = belief at t+1 before observation

t)/P(et+1|e1:t) t)/P(et+1|e1:t)

+1 P(et+1|Xt+1)B0(Xt+1)

t)/P(et+1|e1:t)

B(Xt+1) =

E1 X1

Defn cond prob Defn cond prob Independence

B = belief at t+1 after observation

50

Observation

§ Assume we have current belief P(X | previous evidence): § Then, after evidence comes in: § Or, compactly:

B0(Xt+1) = P(Xt+1|e1:t) P(Xt+1|e1:t+1) = P(Xt+1, et+1|e1:t)/P(et+1|e1:t) = P(et+1|Xt+1)P(Xt+1|e1:t) = P(et+1|e1:t, Xt+1)P(Xt+1|e1:t)

§ Basic idea: beliefs “reweighted” by likelihood of evidence/observation § Unlike passage of time, we have to normalize

t)/P(et+1|e1:t) t)/P(et+1|e1:t)

+1 P(et+1|Xt+1)B0(Xt+1)

t)/P(et+1|e1:t)

B(Xt+1) =

E1 X1

Defn cond prob Defn cond prob Independence

51

slide-26
SLIDE 26

26

Normalization to Account for Evidence

X E P rain U 0.4 rain

  • 0.1

sun U 0.2 sun

  • 0.3

X P rain 0.67 sun 0.33 X E P rain U 0.4 sun U 0.2 SELECT the joint probabilities matching the evidence (U in this case) NORMALIZE the selection (make it sum to one)

Since could have seen other evidence, we normalize by dividing by the probability of the evidence we did see (in this case dividing by 0.6)…

U = means students have umbrellas

Suppose we observe that students have umbrellas. (evidence = U)

52

Example: Observation

§ As we get observations, beliefs get reweighted, uncertainty “decreases”

Before observation After observation 53

slide-27
SLIDE 27

27

Summary: Forward Algorithm

Every time step, we start with current P(Xt-1 | evidence)

  • 1. We update for time:
  • 2. We update for evidence:

The forward algorithm does both at once (Optimization: don’t need to normalize until final time) Complexity? X2 X1 X2 E2 O(X2 + XE) time per time step Initialize with P(X0)

From spec of HMM

54

Quiz: Weather HMM

Umbr1 = T Umbr2 = T Rain0 Rain1 Rain2

P(R1 ) 0.5 Rt-1 t f P(Rt | Rt-1 ) 0.8 0.6 Rt t f P(Ut | Rt ) 0.9 0.3

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

What is probability of Rain2 if we’ve seen umbrellas on both days?

B(Xt+1) ∝Xt+1

+1 P(et+1|Xt+1)B0(Xt+1)

55

slide-28
SLIDE 28

28

Example: Weather HMM

Umbr1 = T Umbr2 = T Rain0 Rain1 Rain2

B(x0=r) = 0.5

P(R1 ) 0.5 Rt-1 t f P(Rt | Rt-1 ) 0.8 0.6 Rt t f P(Ut | Rt ) 0.9 0.3

B’(x1=r) = P(x1=r | x0=r) * 0.5 + P(x1=r | x0=s) * 0.5 = 0.8*0.5 + 0.6*0.5 = 0.7

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

56

Example: Weather HMM

Umbr1 = T Umbr2 = T Rain0 Rain1 Rain2

B(x0=r) = 0.5

P(R1 ) 0.5 Rt-1 t f P(Rt | Rt-1 ) 0.8 0.6 Rt t f P(Ut | Rt ) 0.9 0.3

B’(x1=r) = P(x1=r | x0=r) * 0.5 + P(x1=r | x0=s) * 0.5 = 0.8*0.5 + 0.6*0.5 = 0.7 B(x1=r) ∝ 0.9 * 0.7 = 0.63 B(x1=s) ∝ 0.3 * 0.3 = 0.09 Divide by 0.72 (=0.63+0.09) to normalize B(x1=r) =0.63/0.72 = 0.875

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

B(Xt+1) ∝Xt+1

+1 P(et+1|Xt+1)B0(Xt+1)

57

slide-29
SLIDE 29

29

Example: Weather HMM

Umbr1 = T Umbr2 = T Rain0 Rain1 Rain2

B(x0=r) = 0.5

P(R1 ) 0.5 Rt-1 t f P(Rt | Rt-1 ) 0.8 0.6 Rt t f P(Ut | Rt ) 0.9 0.3

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

B’(x1=r) = 0.7 B(x1=r) = 0.875 B’(x2=r) = P(x2=r | x1=r)*0.875 + P(x2=r | x1=s)*0.125 = 0.8*0.875 + 0.6*0.125 = 0.775

58

Example: Weather HMM

Umbr1 = T Umbr2 = T Rain0 Rain1 Rain2

B(x0=r) = 0.5

P(R1 ) 0.5 Rt-1 t f P(Rt | Rt-1 ) 0.8 0.6 Rt t f P(Ut | Rt ) 0.9 0.3

B0(Xt+1) = X

xt

P(X0|xt)B(xt)

B’(x1=r) = 0.7 B(x1=r) = 0.875 B’(x2=r) = P(x2=r | x1=r)*0.875 + P(x2=r | x1=s)*0.125 = 0.8*0.875 + 0.6*0.125 = 0.775 B(x1=r) ∝ 0.9 * 0.775 = 0.6975 B(x1=s) ∝ 0.3 * 0.225 = 0.0675 Divide by 0.765 to normalize B(x1=r) = 0.912

B(Xt+1) ∝Xt+1

+1 P(et+1|Xt+1)B0(Xt+1)

59

slide-30
SLIDE 30

30

Summary: Online Belief Updates

Every time step, we start with current P(X | evidence)

  • 1. We update for time:
  • 2. We update for evidence:

The forward algorithm does both at once (Optimization: don’t need to normalize until final time) Complexity? X2 X1 X2 E2 O(X2 + XE) time per time step

60

Particle Filtering

61

slide-31
SLIDE 31

31

Particle Filtering Overview

§ Approximation technique to solve filtering problem § Represents P distribution with samples § Filtering still operates in two steps

§ Elapse time § Incorporate observations

§ (But this part has two sub-steps: weight & resample)

62

62

Particle Filtering

§ Sometimes |X| is too big to use exact inference

§ |X| may be too big to even store B(X) § E.g. X is continuous

§ Solution: approximate inference

§ Track samples of X, not exact distribution of values § Samples are called particles § Time per step is linear in the number of samples § But: number needed may be large § In memory: list of particles, not states

§ Particle is just new name for sample § This is how robot localization works in practice

63

slide-32
SLIDE 32

32

Remember…

An HMM is defined by:

§ Initial distribution: § Transitions: § Emissions:

W e ’ l l s t a r t b y l

  • k

i n g a t t h i s

64

Notation

65

R = T, U = T R = T, U = F R = F, U = T R = F, U = F

65

slide-33
SLIDE 33

33

Here’s a Single Particle

§ It represents a hypothetical state where the robot is in (1,2)

66

Particles Approximate Distribution

§ Our representation of P(X) is now a list of N particles (samples)

§ Generally, N << |X| Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) P(x) Distribution P(x=<3,3>) = 5/10 = 50%

67

slide-34
SLIDE 34

34 P(x) Distribution

Another Example

In the weather HMM, suppose we decide to approximate the distributions with 5 particles. To initialize the filter, we draw 5 samples from B(x0=r) = 0.5 and we might get the following set of particles:

68

Not such a good approximation, but that’s life.

Particles: s r r s s

68

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

70

slide-35
SLIDE 35

35

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

G

  • d

p r

  • b

a b i l i t y e s t i m a t e ?

71

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r s s r r s r r s P(r) = 8/15 = 0.53

I n c r e a s e # p a r t i c l e s … More particles à more accurate, but slower

72

slide-36
SLIDE 36

36

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r s s r r s r r s r

73

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r s s r r s r r s r r

74

slide-37
SLIDE 37

37

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: s r r s s Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r s s r r s r r s r r r s r s r r s r r s r r r

75

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r r s r s r r s r r s r r r P(r) = 11/15 = 0.73

76

slide-38
SLIDE 38

38

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r r s r s r r s r r s r r r

w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

77

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r r

s

r

w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

r r

s

r r r r

s

r

s

78

slide-39
SLIDE 39

39

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r r r

s

r

w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

r r

s

r r r r

s

r

s

r r r

s

r r r

s

r r r r

s

r

s

79

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

r r r

s

r r r

s

r r r r

s

r

s

r r

80

slide-40
SLIDE 40

40

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

r r r

s

r r r

s

r r r r

s

r

s

r r r

81

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Particles: Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

P(X=r) = ? = 13/15 = 87%

r r r

s

r r r

s

r r r r

s

r

s

r r r r r r r r s r r r r r s

82

slide-41
SLIDE 41

41

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r Particles: r r r r r r r r s r r r r r s

83

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize

r Particles: r r r r r r r r s r r r r r s r r r r s r r r s r r r s s

P(X=r) = ? = 11/15 = 73%

84

slide-42
SLIDE 42

42

Particle Filter Weather

What is probability of Rain2 if we’ve seen umbrellas on both days?

Initial P(x) Distribution

  • 1. Elapse Time
  • 2. Observe
  • 2a. Downweight samples

based on evidence

  • 2b. Resample

Initialize w(x) = P(e | x) so… w(r) = w(s) = 0.9 0.3

85

Particles Approximate Distribution

§ Our representation of P(X) is now a list of N particles (samples)

§ Generally, N << |X| Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (1,2) (3,3) (3,3) (2,3) P(x) Distribution P(x=<3,3>) = 5/10 = 50%

86

slide-43
SLIDE 43

43

Summary: Particle Filtering Algorithm

  • 1. Elapse Time

Simulate each particle’s movement by sampling from P(xt+1 | xt)

  • 2. Observe
  • 2a. Downweight all samples based on the probability that they would

have produced the observed evidence P(et | xt)

  • 2b. Resample

87

87

Particle Collapse

§ Some challenges… § What if weights of all particles go to zero? § What if converge to a single particle?

88

88

slide-44
SLIDE 44

44

Which Algorithm?

Particle filter, uniform initial beliefs, 300 particles

90

Which Algorithm?

Exact filter, uniform initial beliefs

91

slide-45
SLIDE 45

45

Robot Localization

§ In robot localization:

§ We know the map, but not the robot’s position § Observations may be vectors of range finder readings § State space and readings are typically continuous (works basically like a very fine grid) and so we cannot store B(X) § Particle filtering is a main technique 92

Particle Filter Localization (Sonar)

[Video: global-sonar-uw-annotated.avi]

93

slide-46
SLIDE 46

46

Particle Filter Localization (Laser)

[Video: global-floor.gif]

94

Robot Mapping

§ SLAM: Simultaneous Localization And Mapping

§ We do not know the map or our location § State consists of position AND map! § Main techniques: Kalman filtering (Gaussian HMMs) and particle methods

DP-SLAM, Ron Parr [Demo: PARTICLES-SLAM-mapping1-new.avi]

95

slide-47
SLIDE 47

47

Particle Filter SLAM – Video 2

[Demo: PARTICLES-SLAM-fastslam.avi]

96

Scaling to Large |X|

§ 1 Ghost: k (eg 9) possible positions in maze § 2 Ghosts: k2 combinations § N Ghosts: kN combinations

97

97

slide-48
SLIDE 48

48

HMM Conditional Independence

§ HMMs have two important independence properties:

§ Markov hidden process: future state independent of past given current state § Current observation independent of all else given current state X2 E1 X1 X3 X4 E1 E3 E4

98

What about Conditional Independence in Snapshot OF STATE

§ Can we do something here? § Factor X into product of (conditionally) independent random vars? § Maybe also factor E

X3 E3

99

slide-49
SLIDE 49

49

Yes! with Bayes Nets

X3

100

Dynamic Bayes Nets

101

slide-50
SLIDE 50

50

Bayes’ Net Representation

§ A directed, acyclic graph, one node per random variable § A conditional probability table (CPT) for each node

§ A collection of distributions over X, one for each combination

  • f parents’ values

§ Bayes’ nets implicitly encode joint distributions

§ As a product of local conditional distributions § To see what probability a BN gives to a full assignment, multiply all the relevant conditionals together: 102

Dynamic Bayes Nets (DBNs)

§ We want to track multiple variables over time, using multiple sources of evidence § Idea: Repeat a fixed Bayes net structure at each time § Variables from time t can condition on those from t-1 § Dynamic Bayes nets are a generalization of HMMs

G1a G1b G2a E2a E2b G2b

t = 0 t = 1

G3a E3a E3b G3b

t = 2

G3a E3a E3b G3b

t = 3 104

slide-51
SLIDE 51

51

DBN Particle Filters

§ A particle is a complete sample for a time step § Initialize: Generate prior samples for the t=1 Bayes net § Example particle: G1a = (3,3) G1b = (5,3)

When generating G1a use prior P(G1a) When generating G1b use conditional P(G1b | G1a) Note: Just as P(X0) is distinct from P(Xt | Xt-1) in an HMM… Structure for t=0 doesn’t have to match that of other time slices (but it usually does)

106

DBN Particle Filters

§ A particle is a complete sample for a time step § Initialize: Generate prior samples for the t=1 Bayes net § Example particle: G1a = (3,3) G1b = (5,3) § Elapse time: Sample a successor for each particle § Example successor: G2a = (2,3) G2b = (6,3)

When generating G2a use conditional P(G2a | G1a G1b)

Note: Just as an HMM specifies a schema P(Xt | Xt-1) … A DBN specifies a repeating structure where nodes at time t could depend on SOME or ALL variables at t-1 and also SOME variables at time t. (No cycles allowed! Must have a conditional probability table for each node in terms of it’s parents)

107

slide-52
SLIDE 52

52

DBN Particle Filters

§ A particle is a complete sample for a time step § Initialize: Generate prior samples for the t=1 Bayes net § Example particle: G1a = (3,3) G1b = (5,3) § Elapse time: Sample a successor for each particle § Example successor: G2a = (2,3) G2b = (6,3)

When generating G2a use conditional P(G2a | G1a G1b) When generating G2b use conditional P(G2b | G2a G1b)

108

DBN Particle Filters

§ A particle is a complete sample for a time step § Initialize: Generate prior samples for the t=1 Bayes net § Example particle: G1a = (3,3) G1b = (5,3) § Elapse time: Sample a successor for each particle § Example successor: G2a = (2,3) G2b = (6,3) § Observe: Weight each entire sample by the likelihood of the evidence conditioned on the sample § Likelihood: P(E1a |G1a ) * P(E1b |G1b ) § Resample: Select prior samples (tuples of values) in proportion to their likelihood

When generating G2a use conditional P(G2a | G1a G1b) When generating G2b use conditional P(G2b | G2a G1b)

W h y

  • k

t

  • m

u l t i p l y t h e s e ?

109