[PDF] - Homework 2 CSE 573: Artificial Intelligence Autumn 2012 Particle PDF Document

SLIDE 1

11/2/2012 1

CSE 573: Artificial Intelligence

Autumn 2012

Particle Filters Particle Filters for Hidden Markov Models

Daniel Weld

Many slides adapted from Dan Klein, Stuart Russell, Andrew Moore & Luke Zettlemoyer

1

Homework 2

2

Homework 3

3

Logistics

Mon 11/5 – Resubmit / regrade HW2, HW3
Mon 11/12 – HW4 due
Wed 11/14 – project groups & idea
1 1 meetings to follow
1-1 meetings to follow
See course webpage for ideas
Plus a new one:
Infinite number of card decks
6 decks
Add state variable

4

Outline

Overview
Probability review
Random Variables and Events
Joint / Marginal / Conditional Distributions
Product Rule, Chain Rule, Bayes’ Rule
Probabilistic inference
Enumeration of Joint Distribution
Bayesian Networks – Preview
Probabilistic sequence models (and inference)
Markov Chains
Hidden Markov Models
Particle Filters

Agent

Environment

Static vs. Dynamic Fully vs. Partially Ob bl Deterministic

What action next?

Percepts Actions

Observable Perfect vs. Noisy ete st c vs. Stochastic Instantaneous vs. Durative

SLIDE 2

11/2/2012 2

Simple Bayes Net

E1 X1

Hidden Var Observable Var

Defines a joint probability distribution:

= P(X1) P(E1|X1) P(X1, E1) = ???

Hidden Markov Model

X5 X2 E1 X1 X3 X4 E2 E3 E4 E5 XN EN

Hidden Vars Observable Vars

Defines a joint probability distribution:

HMM Computations

Given
joint P(X1:n,E1:n)
evidence E1:n =e1:n
Inference problems include:
Filtering, find P(Xt|e1:n) for current time n
Smoothing, find P(Xt|e1:n) for time t < n
Most probable explanation, find

x*1:n = argmaxx1:n P(x1:n|e1:n)

Real HMM Examples

Part-of-speech (POS) Tagging:
Observations are words (thousands of them)
States are POS tags (eg, noun, verb, adjective, det…)

det adj adj noun … X2 E1 X1 X3 X4 E1 E3 E4 The quick brown fox … j j

Real HMM Examples

Speech recognition HMMs:
Observations are acoustic signals (continuous valued)
States are specific positions in specific words (so, tens of

thousands)

X2 E1 X1 X3 X4 E1 E3 E4

Real HMM Examples

Machine translation HMMs:
Observations are words
States are translation options

X2 E1 X1 X3 X4 E1 E3 E4

SLIDE 3

11/2/2012 3

Real HMM Examples

Robot tracking:
Observations are range readings (continuous)
States are positions on a map (continuous)

X2 E1 X1 X3 X4 E1 E3 E4

Ghostbusters HMM

P(X1) = uniform
P(X’|X) = usually move clockwise, but sometimes

move in a random direction or stay in place

P(E|X) = same sensor model as before:

red means close, green means far away.

1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 P(X1) P(X’|X=<1,2>) 1/6 1/6 1/6 1/2

X2 E1 X1 X3 X4 E1 E3 E4 E5

P(red | 3) P(orange | 3) P(yellow | 3) P(green | 3) 0.05 0.15 0.5 0.3

P(E|X)

Conditional Independence

HMMs have two important independence properties:

Markov hidden process, future depends on past via the present
Current observation independent of all else given current state

X2 X1 X3 X4 Quiz: does this mean successive observations are independent?

[No, correlated by the hidden state]

E1 E1 E3 E4

Filtering aka Monitoring, State Estimation

Filtering is the task of tracking the distribution B(X) (the

belief state) over time

We start with B(X) in an initial setting, usually uniform

A ti t b ti d t B(X)

As time passes, or we get observations, we update B(X)
Aside: the Kalman filter
Invented in the 60’s for trajectory estimation in the Apollo program
State evolves using a linear model, eg x = x0 + vt
Observe: value of x with Gaussian noise

Example: Robot Localization

Example from Michael Pfeiffer

t=0 Sensor model: never more than 1 mistake Motion model: may not execute action with small prob.

1 Prob

Example: Robot Localization

t=1

1 Prob

SLIDE 4

11/2/2012 4

Example: Robot Localization

t=2

1 Prob

Example: Robot Localization

t=3

1 Prob

Example: Robot Localization

t=4

1 Prob

Example: Robot Localization

t=5

1 Prob

Inference Recap: Simple Cases

E1 X1 X2 X1

Online Belief Updates

Every time step, we start with current P(X | evidence)
We update for time:

X2

X1

We update for evidence:

X2 E2

SLIDE 5

11/2/2012 5

Passage of Time

Assume we have current belief P(X | evidence to date)
Then, after one time step passes:

X2 X1

Or, compactly:
Basic idea: beliefs get “pushed” through the transitions
With the “B” notation, we have to be careful about what time

step t the belief is about, and what evidence it includes

Example: Passage of Time

As time passes, uncertainty “accumulates”

T = 1 T = 2 T = 5

Transition model: ghosts usually go clockwise

Observation

Assume we have current belief P(X | previous evidence):
Then:

X1

Or:
Basic idea: beliefs reweighted by likelihood of evidence
Unlike passage of time, we have to renormalize

E1

Example: Observation

As we get observations, beliefs get

reweighted, uncertainty “decreases”

Before observation After observation

The Forward Algorithm

We want to know:
We can derive the following updates
To get , compute each entry and normalize

Example: Run the Filter

An HMM is defined by:
Initial distribution:
Transitions:
Emissions:

SLIDE 6

11/2/2012 6

Example HMM Example Pac-man Summary: Filtering

Filtering is the inference process of finding a distribution
ver XT given e1 through eT : P( XT | e1:t )
We first compute P( X1 | e1 ):
For each t from 2 to T, we have P( Xt-1 | e1:t-1 )
Elapse time: compute P( Xt | e1:t-1 )
Observe: compute P(Xt | e1:t-1 , et) = P( Xt | e1:t )

Recap: Reasoning Over Time

Stationary Markov models

X2 X1 X3 X4

rain sun 0.7 0.7 0.3 0.3

X5 X2 E1 X1 X3 X4 E2 E3 E4 E5

X E P rain umbrella 0.9 rain no umbrella 0.1 sun umbrella 0.2 sun no umbrella 0.8

Hidden Markov models

Add a slide

Next slide (intro to particle filtering) is

confusing because the state spaec is so small – show a huge grid, where it’s clear what advantage one gets what advantage one gets.

Maybe also introduce parametric

representations (kalman filter) here

37

Particle Filtering

Sometimes |X| is too big to use exact

inference

|X| may be too big to even store B(X)
E.g. when X is continuous
|X|2 may be too big to do updates

0.0 0.1 0.0 0.0 0.0 0.2 0.0 0.2 0.5

Solution: approximate inference
Track samples of X, not all values
Samples are called particles
Time per step is linear in the number of

samples

But: number needed may be large
In memory: list of particles, not states
This is how robot localization works in

practice

SLIDE 7

11/2/2012 7

Representation: Particles

Our representation of P(X) is now

a list of N particles (samples)

Generally, N << |X|
Storing map from X to counts

would defeat the point

P(x) approximated by number of

particles with value x

So, many x will have P(x) = 0!
More particles, more accuracy
For now, all particles have a

weight of 1

Particles: (3,3) (2,3) (3,3) (3,2) (3,3) (3,2) (2,1) (3,3) (3,3) (2,1)

Particle Filtering: Elapse Time

Each particle is moved by sampling

its next position from the transition model

This is like prior sampling – samples’

frequencies reflect the transition probs

Here, most samples move clockwise, but

some move in another direction or stay in place

This captures the passage of time
If we have enough samples, close to the

exact values before and after (consistent)

Particle Filtering: Observe

Slightly trickier:
Use P(e|x) to sample observation, and
Discard particles which are inconsistent?
(Called Rejection Sampling)
Problems?

Particle Filtering: Observe

Instead of sampling the observation…
Fix It!
A kind of likelihood weighting
Downweight samples based on evidence
Note that probabilities don’t sum to one:

(most have been down-weighted) Instead, they sum to an approximation

f P(e))
What to do?!?

Particle Filtering: Resample

Rather than tracking

weighted samples, we resample – why?

N times, we choose

from our weighted sample distribution

Old Particles: (3,3) w=0.1 (2,1) w=0.9 (2,1) w=0.9 (3,1) w=0.4 (3,2) w=0.3 (2,2) w=0.4 (1,1) w=0.4 (3,1) w=0.4 (2 1) 0 9

sample distribution (i.e. draw with replacement)

This is equivalent to

renormalizing the distribution

Now the update is

complete for this time step, continue with the next one

(2,1) w=0.9 (3,2) w=0.3 New Particles: (2,1) w=1 (2,1) w=1 (2,1) w=1 (3,2) w=1 (2,2) w=1 (2,1) w=1 (1,1) w=1 (3,1) w=1 (2,1) w=1 (1,1) w=1

Recap: Particle Filtering

At each time step t, we have a set of N particles (aka samples)

Initialization: Sample from prior
Three step procedure for moving to time t+1:
1. Sample transitions: for each each particle x, sample next

state

2. Reweight: for each particle, compute its weight given the

actual observation e 3. Resample: normalize the weights, and sample N new particles from the resulting distribution over states

SLIDE 8

11/2/2012 8

Robot Localization

In robot localization:
We know the map, but not the robot’s position
Observations may be vectors of range finder readings
State space and readings are typically continuous (works

basically like a very fine grid) and so we cannot store B(X)

Particle filtering is a main technique

Robot Localization

QuickTime™ and a GIF decompressor are needed to see this picture.

Which Algorithm?

Exact filter, uniform initial beliefs

Which Algorithm?

Particle filter, uniform initial beliefs, 300 particles

Which Algorithm?

Particle filter, uniform initial beliefs, 25 particles

P4: Ghostbusters

Plot: Pacman's grandfather, Grandpac,

learned to hunt ghosts for sport.

He was blinded by his power, but could

h th h t ’ b i d l i

15 13

Noisy distance prob True distance = 8

hear the ghosts’ banging and clanging.

Transition Model: All ghosts move

randomly, but are sometimes biased

Emission Model: Pacman knows a

“noisy” distance to each ghost

11 9 7 5 3 1

SLIDE 9

11/2/2012 9

Dynamic Bayes Nets (DBNs)

We want to track multiple variables over time, using

multiple sources of evidence

Idea: Repeat a fixed Bayes net structure at each time
Variables from time t can condition on those from t-1

t =1 t =2 t =3

Discrete valued dynamic Bayes nets are also HMMs

G1

a

E1a E1b G1

b

G2

a

E2a E2b G2

b

t =1 t =2 G3

a

E3a E3b G3

b

t =3

Exact Inference in DBNs

Variable elimination applies to dynamic Bayes nets
Procedure: “unroll” the network for T time steps, then

eliminate variables until P(XT|e1:T) is computed

t =1 t =2 t =3

Online belief updates: Eliminate all variables from the

previous time step; store factors for current time only

G1

a

E1a E1b G1

b

G2

a

E2a E2b G2

b

G3

a

E3a E3b G3

b

G3

b

DBN Particle Filters

A particle is a complete sample for a time step
Initialize: Generate prior samples for the t=1 Bayes net
Example particle: G1

a = (3,3) G1 b = (5,3)

Elapse time: Sample a successor for each particle
Elapse time: Sample a successor for each particle
Example successor: G2

a = (2,3) G2 b = (6,3)

Observe: Weight each entire sample by the likelihood of

the evidence conditioned on the sample

Likelihood: P(E1

a |G1 a ) * P(E1 b |G1 b )

Resample: Select prior samples (tuples of values) in

proportion to their likelihood

SLAM

SLAM = Simultaneous Localization And Mapping
We do not know the map or our location
Our belief state is over maps and positions!
Main techniques: Kalman filtering (Gaussian HMMs) and particle

methods

DP-SLAM, Ron Parr