Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 - - PowerPoint PPT Presentation

predicting and estimation from time series
SMART_READER_LITE
LIVE PREVIEW

Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 11-755/18797 1 Preliminaries : P(y|x) for Gaussian If P(x,y) is Gaussian: C C x xx xy ( , ) (


slide-1
SLIDE 1

Machine Learning for Signal Processing

Predicting and Estimation from Time Series

Bhiksha Raj 15 Nov 2016

11-755/18797 1

slide-2
SLIDE 2

Preliminaries : P(y|x) for Gaussian

  • The conditional probability of y given x is also Gaussian

– The slice in the figure is Gaussian

  • The mean of this Gaussian is a function of x
  • The variance of y reduces if x is known

– Uncertainty is reduced

11-755/18797 2

  • If P(x,y) is Gaussian:

) , ( ) , (             

yy yx xy xx y x

y x C C C C N P  

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

slide-3
SLIDE 3

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Preliminaries : P(y|x) for Gaussian

11-755/18797 3

Best guess for Y when X is not known

slide-4
SLIDE 4

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Preliminaries : P(y|x) for Gaussian

11-755/18797 4

Best guess for Y when X is not known Correction of Y using information in X Mean of Y given X Given X value Update guess of Y based on information in X Correction is 0 if X and Y are uncorrelated, i.e Cyx = 0

slide-5
SLIDE 5

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Preliminaries : P(y|x) for Gaussian

11-755/18797 5

Best guess for Y when X is not known Correction of Y using information in X Mean of Y given X Given X value

  • ffset

Slope Correction to Y = slope * (offset of X from mean)

slide-6
SLIDE 6

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Preliminaries : P(y|x) for Gaussian

11-755/18797 6

Best guess for Y when X is not known Correction of Y using information in X Uncertainty in Y when X is not known

slide-7
SLIDE 7

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Preliminaries : P(y|x) for Gaussian

11-755/18797 7

Best guess for Y when X is not known Correction of Y using information in X Uncertainty in Y when X is not known Reduced uncertainty from knowing X Shrinkage of uncertainty from knowing X Shrinkage of variance is 0 if X and Y are uncorrelated, i.e Cyx = 0

slide-8
SLIDE 8

Preliminaries : P(y|x) for Gaussian

11-755/18797 8

) ), ( ( ) | (

1 1 xy xx yx yy x xx yx y

C C C C x C C N x y P

 

     

Given X value Mean of Y given X (MAP estimate of Y) Variance of Y when X is known Overall variance

  • f Y when X is

unknown Knowing X modifies the mean of Y and shrinks its variance

slide-9
SLIDE 9

The little parable

11755/18797 15

You’ve been kidnapped And blindfolded

You can only hear the car You must find your way back home from wherever they drop you off

slide-10
SLIDE 10

Kidnapped!

  • Determine by only listening to a running automobile, if

it is:

– Idling; or – Travelling at constant velocity; or – Accelerating; or – Decelerating

  • You only record energy level (SPL) in the sound

– The SPL is measured once per second

11-755/18797 16

slide-11
SLIDE 11

What we know

  • An automobile that is at rest can accelerate, or

continue to stay at rest

  • An accelerating automobile can hit a steady-

state velocity, continue to accelerate, or decelerate

  • A decelerating automobile can continue to

decelerate, come to rest, cruise, or accelerate

  • A automobile at a steady-state velocity can

stay in steady state, accelerate or decelerate

11-755/18797 17

slide-12
SLIDE 12

What else we know

  • The probability distribution of the SPL of the

sound is different in the various conditions

– As shown in figure

  • In reality, depends on the car
  • The distributions for the different conditions
  • verlap

– Simply knowing the current sound level is not enough to know the state of the car

11-755/18797 18

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel)

slide-13
SLIDE 13

The Model!

  • The state-space model

– Assuming all transitions from a state are equally probable – This is a Hidden Markov Model!

19

45 P(x|idle) Idling state 70 P(x|accel) Accelerating state 65 Cruising state 60 Decelerating state 0.5 0.5 0.33 0.33 0.33 0.33 0.33 0.25 0.25 0.25 0.33 0.25 I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25

slide-14
SLIDE 14

Estimating the state at T = 0-

  • A T=0, before the first observation, we know

nothing of the state

– Assume all states are equally likely

11-755/18797 20

Idling Declerating Cruising Accelerating

0.25 0.25 0.25 0.25

slide-15
SLIDE 15

The first observation: T=0

  • At T=0 you observe the sound level x0 = 68dB

SPL – The observation modifies our belief in the state

  • f the system

11-755/18797 21

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB

slide-16
SLIDE 16

The first observation: T=0

11-755/18797 22

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0.0001 0.5 0.7 Idling Declerating Cruising Accelerating

0.0001 0.5 0.7

These don’t have to sum to 1 Can even be greater than 1!

slide-17
SLIDE 17

The first observation: T=0

23

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB Idling Declerating Cruising Accelerating

0.0001 0.5 0.7

𝑸(𝐲𝟏|𝒕𝒖𝒃𝒖𝒇) Idling Declerating Cruising Accelerating

0.25 0.25 0.25 0.25

𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇) Remember the prior

slide-18
SLIDE 18
  • Combine prior information about state and

evidence from observation

  • We want 𝑄(𝑡𝑢𝑏𝑢𝑓|𝐲0)
  • We can compute it using Bayes rule as

𝑄 𝑡𝑢𝑏𝑢𝑓 𝑦0 = 𝑄 𝑡𝑢𝑏𝑢𝑓 𝑄(x0|𝑡𝑢𝑏𝑢𝑓) 𝑄 𝑡𝑢𝑏𝑢𝑓′ 𝑄(x0|𝑡𝑢𝑏𝑢𝑓′)

𝑡𝑢𝑏𝑢𝑓′

11-755/18797 24

Estimating state after at observing x0

slide-19
SLIDE 19

The Posterior

  • Multiply the two, term by term, and normalize

them so that they sum to 1.0

11-755/18797 25

Idling Declerating Cruising Accelerating

0.0001 0.5 0.7

𝑸(𝐲𝟏|𝒕𝒖𝒃𝒖𝒇) Idling Declerating Cruising Accelerating

0.25 0.25 0.25 0.25

𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇)

slide-20
SLIDE 20

Estimating the state at T = 0+

  • At T=0, after the first observation x0, we update
  • ur belief about the states

– The first observation provided some evidence about the state of the system – It modifies our belief in the state of the system

11-755/18797 26

Idling Decelerating Cruising Accelerating

0.0 0.57 0.42 8.3 x 10-5

𝑸(𝑻𝑼=𝟏|𝐲𝟏)

slide-21
SLIDE 21

Predicting the state at T=1

  • Predicting the probability of idling at T=1

– P(idling | idling) = 0.5; – P(idling | deceleration) = 0.25 – P(idling at T=1| x0) = P(IT=0|x0) P(I|I) + P(DT=0|x0) P(I|D) = 2.1 x 10-5

  • In general, for any state S
  • 𝑄 𝑇𝑈=1 𝐲0 =

𝑄 𝑇𝑈=0|𝐲0 𝑄(𝑇𝑈=1|𝑇𝑈=0)

𝑇𝑈=0

27

I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25 I A C D

Idling Decel Cruising Accel 0.0 0.57 0.42 8.3 x 10-5

slide-22
SLIDE 22

Predicting the state at T = 1

11-755/18797 28

Idling Decelerating Cruising Accelerating

0.0 0.57 0.42 8.3 x 10-5 2.1x10-5 0.33 0.33 0.33 𝑄 𝑇𝑈=1 𝐲0 = 𝑄 𝑇𝑈=0|𝐲0 𝑄(𝑇𝑈=1|𝑇𝑈=0)

𝑇𝑈=0 Rounded. In reality, they sum to 1.0

𝑸(𝑻𝑼=𝟏|𝐲𝟏) 𝑸(𝑻𝑼=𝟐|𝐲𝟏)

slide-23
SLIDE 23

Updating after the observation at T=1

  • At T=1 we observe x1 = 63dB SPL

11-755/18797 29

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 63dB

slide-24
SLIDE 24

Updating after the observation at T=1

11-755/18797 30

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 63dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0.2 0.5 0.01 Idling Declerating Cruising Accelerating

0.2 0.5 0.02

𝑸(𝐲𝟐|𝒕𝒖𝒃𝒖𝒇)

slide-25
SLIDE 25

The first observation: T=0

31

45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) Idling Declerating Cruising Accelerating

𝑸(𝐲𝟐|𝒕𝒖𝒃𝒖𝒇)

Idling Declerating Cruising Accelerating

0.33 0.33 0.33

𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇|𝐲𝟏) Remember the prior

0.2 0.5 0.02

63dB

2.1x10-5

slide-26
SLIDE 26
  • Combine prior information from the
  • bservation at time T=0, AND evidence from
  • bservation at T=1 to estimate state at T=1
  • We want 𝑄(𝑡𝑢𝑏𝑢𝑓|𝐲0, 𝐲1)
  • We can compute it using Bayes rule as

𝑄 𝑡𝑢𝑏𝑢𝑓 𝐲0, 𝐲1 = 𝑄 𝑡𝑢𝑏𝑢𝑓|𝐲0 𝑄(𝐲1|𝑡𝑢𝑏𝑢𝑓) 𝑄 𝑡𝑢𝑏𝑢𝑓′|𝐲0 𝑄(𝐲1|𝑡𝑢𝑏𝑢𝑓′)

𝑡𝑢𝑏𝑢𝑓′

11-755/18797 32

Estimating state after at observing x1

slide-27
SLIDE 27

The Posterior at T = 1

  • Multiply the two, term by term, and normalize

them so that they sum to 1.0

11-755/18797 33

Idling Declerating Cruising Accelerating

𝑸(𝐲𝟐|𝒕𝒖𝒃𝒖𝒇) 0.2 0.5 0.02

Idling Declerating Cruising Accelerating

0.33 0.33 0.33 𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇|𝐲𝟏) 2.1x10-5

slide-28
SLIDE 28

Estimating the state at T = 1+

  • The updated probability at T=1 incorporates

information from both x0 and x1

– It is NOT a local decision based on x1 alone – Because of the Markov nature of the process, the state at T=0 affects the state at T=1

  • x0 provides evidence for the state at T=1

11-755/18797 34

Idling Decelerating Cruising Accelerating

0.0 0.713 0.285 0.0014

slide-29
SLIDE 29

Overall Process

Time

  • T=0- : A priori probability
  • T = 0+: Update after X0
  • T=1- (Prediction before X1)
  • T = 1+: Update after X1
  • T=2- (Prediction before X2)
  • T = 2+: Update after X2
  • T= t- (Prediction before Xt)
  • T = t+: Update after Xt

Computation

  • 𝑄 𝑇0 = 𝑄(𝑇)
  • 𝑄 𝑇0|𝑌0 = 𝐷. 𝑄 𝑇0 𝑄(𝑌0|𝑇0)
  • 𝑄 𝑇1|𝑌0 =

𝑄 𝑇1|𝑇0 𝑄(𝑇0|𝑌0)

𝑇0

  • 𝑄 𝑇1|𝑌0:1 = 𝐷. 𝑄 𝑇1|𝑌0 𝑄 𝑌1 𝑇1
  • 𝑄 𝑇2|𝑌0:1 =

𝑄 𝑇2|𝑇1 𝑄(𝑇1|𝑌0:1)

𝑇1

  • 𝑄 𝑇2|𝑌0:2 = 𝐷. 𝑄 𝑇2|𝑌0:1 𝑄 𝑌2 𝑇2
  • 𝑄 𝑇𝑢|𝑌0:𝑢−1 =

𝑄 𝑇𝑢|𝑇𝑢−1 𝑄(𝑇𝑢−1|𝑌0:𝑢−1)

𝑇𝑢−1

  • 𝑄 𝑇𝑢|𝑌0:𝑢 = 𝐷. 𝑄 𝑇𝑢|𝑌0:𝑢−1 𝑄 𝑌𝑢 𝑇𝑢

11-755/18797 35

slide-30
SLIDE 30

Overall procedure

  • At T=0 the predicted state distribution is the initial state

probability

  • At each time T, the current estimate of the distribution over

states considers all observations x0 ... xT

– A natural outcome of the Markov nature of the model

  • The prediction+update is identical to the forward computation

for HMMs to within a normalizing constant

11-755/18797 36

Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1

P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

PREDICT UPDATE

slide-31
SLIDE 31

Decomposing the Algorithm

Predict: 𝑄 𝑇𝑢|𝑌0:𝑢−1 =

𝑄 𝑇𝑢|𝑇𝑢−1 𝑄(𝑇𝑢−1|𝑌0:𝑢−1)

𝑇𝑢−1

Update: 𝑄 𝑇𝑢|𝑌0:𝑢 =

𝑄 𝑇𝑢|𝑌0:𝑢−1 𝑄 𝑌𝑢 𝑇𝑢 𝑄 𝑇|𝑌0:𝑢−1 𝑄 𝑌𝑢 𝑇

𝑇

11-755/18797 38

𝑄 𝑇𝑢, 𝑌0:𝑢 = 𝑄 𝑌𝑢|𝑇𝑢 𝑄 𝑇𝑢|𝑇𝑢−1 𝑄(𝑇𝑢−1, 𝑌0:𝑢−1)

𝑇𝑢−1

slide-32
SLIDE 32

Estimating a Unique state

  • What we have estimated is a distribution over

the states

  • If we had to guess a state, we would pick the

most likely state from the distributions

  • State(T=0) = Accelerating
  • State(T=1) = Cruising

11-755/18797 39

Idling Decelerating Cruising Accelerating 0.0 0.57 0.42 8.3 x 10-5 Idling Decelerating Cruising Accelerating 0.0 0.713 0.285 0.0014

slide-33
SLIDE 33

Estimating the state

  • The state is estimated from the updated

distribution

– The updated distribution is propagated into time, not the state

11-755/18797 40

Estimate(ST) Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1

Estimate(ST) = argmax STP(ST | x0:T) P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)

slide-34
SLIDE 34

A continuous state model

  • HMM assumes a very coarsely quantized state

space

– Idling / accelerating / cruising / decelerating

  • Actual state can be finer

– Idling, accelerating at various rates, decelerating at various rates, cruising at various speeds

  • Solution: Many more states (one for each

acceleration /deceleration rate, crusing speed)?

  • Solution: A continuous valued state

11-755/18797 44

slide-35
SLIDE 35

Tracking and Prediction: The wind and the target

  • Aim: measure wind velocity
  • Using a noisy wind speed sensor

– E.g. arrows shot at a target

  • State: Wind speed at time t depends on speed at

time t-1

𝑻𝒖 = 𝑻𝒖−𝟐 + 𝝑𝒖

  • Observation: Arrow position at time t depends on

wind speed at time t

𝒁𝒖 = 𝑩𝑻𝒖 + 𝜹𝒖

11755/18797 45

slide-36
SLIDE 36

The real-valued state model

  • A state equation describing the dynamics of the system

– st is the state of the system at time t – et is a driving function, which is assumed to be random

  • The state of the system at any time depends only on the state at

the previous time instant and the driving term at the current time

  • An observation equation relating state to observation

– ot is the observation at time t – gt is the noise affecting the observation (also random)

  • The observation at any time depends only on the current state of

the system and the noise

11-755/18797 46

) , (

1 t t t

s f s e

) , (

t t t

s g

  • g

slide-37
SLIDE 37

States are still “hidden”

  • The state is a continuous valued parameter that is not directly

seen

– The state is the position of the automobile or the star

  • The observations are dependent on the state and are the only way
  • f knowing about the state

– Sensor readings (for the automobile) or recorded image (for the telescope)

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-38
SLIDE 38

Update after O1:

Discrete vs. Continuous state systems

Update after O0: Prediction at time 1: Prediction at time 0: P0(s) s

p 

0.2 0.3 0.4 0.1 1 2 3 1 2 3

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

𝑄 𝑇0 = 𝜌(𝑇0) 𝑄 𝑇0 = 𝑄0(𝑇0) 𝑄 𝑇0|𝑃0 = 𝐷. 𝜌(𝑇0)𝑄 𝑃0|𝑇0 𝑄 𝑇0|𝑃0 = 𝐷. 𝑄(𝑇0)𝑄 𝑃0|𝑇0 𝑄 𝑇1|𝑃0 = 𝑄 𝑇0|𝑃0 𝑄 𝑇1|𝑇0

𝑇0

𝑄 𝑇1|𝑃0 = 𝑄 𝑇0|𝑃0 𝑄 𝑇1|𝑇0 𝑒𝑇0

∞ −∞

𝑄 𝑇1|𝑃0:1 = 𝐷. 𝑄(𝑇1|𝑃0)𝑄 𝑃1|𝑇1 𝑄 𝑇1|𝑃0:1 = 𝐷. 𝑄(𝑇1|𝑃0)𝑄 𝑃1|𝑇1

slide-39
SLIDE 39

Update after observing Ot:

Discrete vs. Continuous State Systems

Prediction at time t:

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

1 2 3

𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑄 𝑇𝑢−1|𝑃0:𝑢−1 𝑄 𝑇𝑢|𝑇𝑢−1

𝑇𝑢−1

𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑄 𝑇𝑢−1|𝑃0:𝑢−1 𝑄 𝑇𝑢|𝑇𝑢−1 𝑒𝑇𝑢−1

∞ −∞

𝑄 𝑇𝑢|𝑃0:𝑢 = 𝐷. 𝑄(𝑇𝑢|𝑃0:𝑢−1)𝑄 𝑃𝑢|𝑇𝑢 𝑄 𝑇𝑢|𝑃0:𝑢 = 𝐷. 𝑄(𝑇𝑢|𝑃0:𝑢−1)𝑄 𝑃𝑢|𝑇𝑢 p 

0.2 0.3 0.4 0.1 1 2 3

slide-40
SLIDE 40

Initial state prob.

Discrete vs. Continuous State Systems

Parameters

p

) (s P

) | O ( s P

) | (

1  t t s

s P ) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

) | ( } {

1

i s j s P T

t t ij

  

) | ( s

  • P

Transition prob Observation prob

1 2 3

p 

0.2 0.3 0.4 0.1 1 2 3

slide-41
SLIDE 41

Special case: Linear Gaussian model

  • A linear state dynamics equation

– Probability of state driving term e is Gaussian – Sometimes viewed as a driving term e and additive zero-mean noise

  • A linear observation equation

– Probability of observation noise g is Gaussian

  • At, Bt and Gaussian parameters assumed known

– May vary with time

11-755/18797 58

t t t t

s B

  • g

 

t t t t

s A s e  

1

   

 

e e e e

 e  e p e      

1

5 . exp | | ) 2 ( 1 ) (

T d

P

   

 

g g g g

 g  g p g      

1

5 . exp | | ) 2 ( 1 ) (

T d

P

slide-42
SLIDE 42

Linear model example The wind and the target

  • State: Wind speed at time t depends on speed at

time t-1

𝑻𝒖 = 𝑻𝒖−𝟐 + 𝝑𝒖

  • Observation: Arrow position at time t depends on

wind speed at time t

𝑷𝒖 = 𝑪𝑻𝒖 + 𝜹𝒖

11755/18797 59

slide-43
SLIDE 43

Model Parameters: The initial state probability

  • We also assume the initial state distribution to

be Gaussian

– Often assumed zero mean

11-755/18797 60

   

 

T d

s s R s s R s P    

1

5 . exp | | ) 2 ( 1 ) ( p ) , ; ( ) ( R s s Gaussian s P 

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-44
SLIDE 44

Model Parameters: The observation probability

  • The probability of the observation, given the state, is

simply the probability of the noise, with the mean shifted

– Since the only uncertainty is from the noise

  • The new mean is the mean of the distribution of the

noise + the value of the observation in the absence of noise

11-755/18797 61

t t t t

s B

  • g

 

) , ; ( ) (

g g

 g g   Gaussian P

) , ; ( ) | (

g g

   

t t t t t

s B

  • Gaussian

s

  • P
slide-45
SLIDE 45

Model Parameters: State transition probability

  • The probability of the state at time t, given the

state at t-1, is simply the probability of the driving term, with the mean shifted

11-755/18797 62

t t t t

s A s e  

1

) , ; ( ) (

e e

 e e   Gaussian P

) , ; ( ) | (

1 e e

   

 t t t t t

s A s Gaussian s s P

slide-46
SLIDE 46

Update after observing Ot:

Gaussian Continuous State Linear Systems

Prediction at time t: 𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑄 𝑇𝑢−1|𝑃0:𝑢−1 𝑄 𝑇𝑢|𝑇𝑢−1 𝑒𝑇𝑢−1

∞ −∞

𝑄 𝑇𝑢|𝑃0:𝑢 = 𝐷. 𝑄(𝑇𝑢|𝑃0:𝑢−1)𝑄 𝑃𝑢|𝑇𝑢

P0(s) s

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-47
SLIDE 47

Update after observing Ot: Prediction at time t:

P0(s) s

t t t t

s A s e  

1 t t t t

s B

  • g

 

𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑂(𝑡 𝑢, 𝑆𝑢)

𝑡 𝑢 = 𝐵𝑡 𝑢−1 + 𝜈𝜁 𝑆𝑢 = 𝛪𝜁 + 𝐵𝑆 𝑢−1𝐵𝑈 𝑡 𝑢 = 𝑡 𝑢 + 𝐿𝑢 (𝑃𝑢 − 𝐶𝑡 𝑢 − 𝜈𝛿) 𝑆 𝑢 = (𝐽 − 𝐿𝑢𝐶) 𝑆𝑢

𝑄 𝑇𝑢|𝑃0:𝑢 = 𝑂(𝑡 𝑢, 𝑆 𝑢)

𝐿𝑢 = 𝑆1𝐶𝑈 𝐶𝑆1𝐶𝑈 + 𝛪𝛿

−1

Gaussian Continuous State Linear Systems

slide-48
SLIDE 48

Update after observing Ot: Prediction at time t:

P0(s) s

t t t t

s A s e  

1 t t t t

s B

  • g

 

𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑂(𝑡 𝑢, 𝑆𝑢)

𝑡 𝑢 = 𝐵𝑡 𝑢−1 + 𝜈𝜁 𝑆𝑢 = 𝛪𝜁 + 𝐵𝑆 𝑢−1𝐵𝑈 𝑡 𝑢 = 𝑡 𝑢 + 𝐿𝑢 (𝑃𝑢 − 𝐶𝑡 𝑢 − 𝜈𝛿) 𝑆 𝑢 = (𝐽 − 𝐿𝑢𝐶) 𝑆𝑢

𝑄 𝑇𝑢|𝑃0:𝑢 = 𝑂(𝑡 𝑢, 𝑆 𝑢)

𝐿𝑢 = 𝑆1𝐶𝑈 𝐶𝑆1𝐶𝑈 + 𝛪𝛿

−1

KALMAN FILTER

Gaussian Continuous State Linear Systems

slide-49
SLIDE 49

The Kalman filter

  • Prediction (based on state equation)
  • Update (using observation and observation

equation)

11-755/18797 87

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

g

    

t t t t t t

s B

  • K

s s ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

slide-50
SLIDE 50

 

g

    

t t t t t t

s B

  • K

s s ˆ

Explaining the Kalman Filter

  • Prediction
  • Update

11-755/18797 88

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

The Kalman filter can be explained intuitively without working through the math

slide-51
SLIDE 51

The Kalman filter

  • Prediction
  • Update

11-755/18797 89

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1 1 1 ˆ   

  

e

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

The predicted state at time t is obtained simply by propagating the estimated state at t-1 through the state dynamics equation

 

g

    

t t t t t t

s B

  • K

s s ˆ

slide-52
SLIDE 52

The Kalman filter

  • Prediction
  • Update

11-755/18797 90

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

t t t t t t

s B

  • K

s s    ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

This is the uncertainty in the prediction. The variance of the predictor = variance of et + variance of Ast-1 The two simply add because et is not correlated with st

slide-53
SLIDE 53

 

g

    

t t t t t t

s B

  • K

s s ˆ

The Kalman filter

  • Prediction
  • Update

11-755/18797 91

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

We can also predict the observation from the predicted state using the observation equation

g

  

t t t

s B

  • ˆ
slide-54
SLIDE 54

MAP Recap (for Gaussians)

11-755/18797 92

) ), ( ( ) | (

1 1 xy xx T yx yy x xx yx y

C C C C x C C N x y P

 

     

  • If P(x,y) is Gaussian:

) , ( ) , (             

yy yx xy xx y x

y x C C C C N P  

) ( ˆ

1 x xx yx y

x C C y     

slide-55
SLIDE 55

MAP Recap: For Gaussians

11-755/18797 93

) ), ( ( ) | (

1 1 xy xx T yx yy x xx yx y

C C C C x C C N x y P

 

     

  • If P(x,y) is Gaussian:

) , ( ) , (             

yy yx xy xx y x

x y C C C C N P  

) ( ˆ

1 x xx yx y

x C C y     

“Slope” of the line

slide-56
SLIDE 56

The Kalman filter

  • Prediction
  • Update

11-755/18797 94

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

t t t t t t

s B

  • K

s s    ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

This is the slope of the MAP estimator that predicts s from o RBT = Cso, (BRBT+) = Coo This is also called the Kalman Gain

slide-57
SLIDE 57

The Kalman filter

  • Prediction
  • Update

11-755/18797 95

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

g

    

t t t t t t

s B

  • K

s s ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

The correction is the difference between the actual observation and the predicted

  • bservation, scaled by the Kalman Gain

We must correct the predicted value of the state after making an observation

g

  

t t t

s B

  • ˆ
slide-58
SLIDE 58

The Kalman filter

  • Prediction
  • Update:
  • Update

11-755/18797 96

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

t t t t t t

s B

  • K

s s    ˆ

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

  The uncertainty in state decreases if we

  • bserve the data and make a correction

The reduction is a multiplicative “shrinkage” based on Kalman gain and B

g

  

t t t

s B

  • ˆ
slide-59
SLIDE 59

The Kalman filter

  • Prediction
  • Update:
  • Update

11-755/18797 97

e

  

1

ˆt

t t

s A s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

1 

  

g T t t t T t t t

B R B B R K

t t t t

s A s e  

1 t t t t

s B

  • g

 

 

g

    

t t t t t t

s B

  • K

s s ˆ

slide-60
SLIDE 60

The Kalman Filter

  • Very popular for tracking the state of

processes

– Control systems – Robotic tracking

  • Simultaneous localization and mapping

– Radars – Even the stock market..

  • What are the parameters of the process?

11-755/18797 98

slide-61
SLIDE 61

Kalman filter contd.

  • Model parameters A and B must be known

– Often the state equation includes an additional driving term: st = Atst-1 + Gtut + et – The parameters of the driving term must be known

  • The initial state distribution must be known

11-755/18797 99

t t t t

s B

  • g

 

t t t t

s A s e  

1

slide-62
SLIDE 62

Defining the parameters

  • State state must be carefully defined

– E.g. for a robotic vehicle, the state is an extended vector that includes the current velocity and acceleration

  • S = [X, dX, d2X]
  • State equation: Must incorporate appropriate

constraints

– If state includes acceleration and velocity, velocity at next time = current velocity + acc. * time step – St = ASt-1 + e

  • A = [1 t 0.5t2; 0 1 t; 0 0 1]

11-755/18797 100

slide-63
SLIDE 63

Parameters

  • Observation equation:

– Critical to have accurate observation equation – Must provide a valid relationship between state and observations

  • Observations typically high-dimensional

– May have higher or lower dimensionality than state

11-755/18797 101

slide-64
SLIDE 64

Problems

  • f() and/or g() may not be nice linear functions

– Conventional Kalman update rules are no longer valid

  • e and/or g may not be Gaussian

– Gaussian based update rules no longer valid

11-755/18797 102

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-65
SLIDE 65

Linear Gaussian Model

P(s0| O0)  C P(s0) P(O0| s0)

1 1

) | ( ) O | ( ) O | ( ds s s P s P s P

  

 P(s1| O0:1)  C P(s1| O0) P(O1| s0)

1 1 2 1 : 1 1 : 2

) | ( ) O | ( ) O | ( ds s s P s P s P

  

P(s2| O0:2)  C P(s2| O0:1) P(O2| s2) All distributions remain Gaussian P(s)  P(st|st-1)  P(Ot|st)  P(s0)  P(s)

a priori Transition prob. State output prob

t t t t

s B

  • g

 

t t t t

s A s e  

1

slide-66
SLIDE 66

Problems

  • Nonlinear f() and/or g() : The Gaussian

assumption breaks down

– Conventional Kalman update rules are no longer valid

11-755/18797 104

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-67
SLIDE 67

The problem with non-linear functions

  • Estimation requires knowledge of P(o|s)

– Difficult to estimate for nonlinear g() – Even if it can be estimated, may not be tractable with update loop

  • Estimation also requires knowledge of P(st|st-1)

– Difficult for nonlinear f() – May not be amenable to closed form integration

11-755/18797 105 1 1 1

  • t

: 1 1

  • t

:

) | ( )

  • |

( )

  • |

(

     

t t t t t

ds s s P s P s P

) |

  • (

)

  • |

( )

  • |

(

1

  • t

: t : t t t t

s P s CP s P 

) , (

t t t

s g

  • g

 ) , (

1 t t t

s f s e

slide-68
SLIDE 68

The problem with nonlinearity

  • The PDF may not have a closed form
  • Even if a closed form exists initially, it will typically

become intractable very quickly

11-755/18797 106

t t t

  • s

g t s g t t

  • J

P s

  • P

) , ( : ) , (

| ) ( | ) ( ) | (

g g g g g

) ( ) ( ) 1 ( ) ( ) ( ) 1 ( ... ) 1 ( ) 1 ( | ) ( |

) , (

n n

  • n
  • n
  • J

t t t t t s g

t

g g g g

g

            

) , (

t t t

s g

  • g

slide-69
SLIDE 69

Example: a simple nonlinearity

  • P(o|s) = ?

– Assume g is Gaussian – P(g) = Gaussian(g; g, g)

11-755/18797 107

)) exp( 1 log( s

  g

g0 ; s

slide-70
SLIDE 70

Example: a simple nonlinearity

  • P(o|s) = ?

11-755/18797 108

) , ; ( ) (

g g

 g g   Gaussian P ) )), exp( 1 log( ; ( ) | (

g g

     s

  • Gaussian

s

  • P

)) exp( 1 log( s

  g

g0 ; s

slide-71
SLIDE 71

Example: At T=0.

  • Update

11-755/18797 109

g ; s=0

)) exp( 1 log( s

  g

 Assume initial probability P(s) is Gaussian

) , ; ( ) ( ) ( R s s Gaussian s P s P   ) ( ) | ( ) | ( s P s

  • CP
  • s

P 

) , ; ( ) )), exp( 1 log( ; ( ) | ( R s s Gaussian s

  • CGaussian
  • s

P

g g

    

slide-72
SLIDE 72

UPDATE: At T=0.

  • = Not Gaussian

11-755/18797 110

g ; s=0

)) exp( 1 log( s

  g

) , ; ( ) )), exp( 1 log( ; ( ) | ( R s s Gaussian s

  • CGaussian
  • s

P

g g

    

                   

 

) ( ) ( 5 . ) )) exp( 1 log( ( ) )) exp( 1 log( ( 5 . exp ) | (

1 1

s s R s s

  • s
  • s

C

  • s

P

T T g g g

 

1 R ; 1 ;     

g g

 s

slide-73
SLIDE 73

Prediction for T = 1

11-755/18797 111

1 1

) | ( )

  • |

( )

  • |

( ds s s P s P s P

  

 Prediction

e  

1 t t

s s

) , ; ( ) (

e

e e   Gaussian P

 Trivial, linear state transition equation

) , ; ( ) | (

1 1 e

 

  t t t t

s s Gaussian s s P

   

 

1 1 1 1 1 1

exp ) ( ) ( 5 . ) )) exp( 1 log( ( ) )) exp( 1 log( ( 5 . exp )

  • |

( ds s s s s s s R s s

  • s
  • s

C s P

T T T

                      

     

e g g g

 

 = intractable

slide-74
SLIDE 74

Update at T=1 and later

  • Update at T=1

– Intractable

  • Prediction for T=2

– Intractable

11-755/18797 112 1 1 1

  • t

: 1 1

  • t

:

) | ( )

  • |

( )

  • |

(

     

t t t t t

ds s s P s P s P

) |

  • (

)

  • |

( )

  • |

(

1

  • t

: t : t t t t

s P s CP s P 

slide-75
SLIDE 75

The State prediction Equation

  • Similar problems arise for the state prediction

equation

  • P(st|st-1) may not have a closed form
  • Even if it does, it may become intractable within

the prediction and update equations

– Particularly the prediction equation, which includes an integration operation

11-755/18797 113

) , (

1 t t t

s f s e

slide-76
SLIDE 76

Simplifying the problem: Linearize

  • The tangent at any point is a good local

approximation if the function is sufficiently smooth

11-755/18797 114

s

)) exp( 1 log( s

  g

slide-77
SLIDE 77

Simplifying the problem: Linearize

  • The tangent at any point is a good local

approximation if the function is sufficiently smooth

11-755/18797 115

s

)) exp( 1 log( s

  g

slide-78
SLIDE 78

Simplifying the problem: Linearize

  • The tangent at any point is a good local

approximation if the function is sufficiently smooth

11-755/18797 116

s

)) exp( 1 log( s

  g

slide-79
SLIDE 79

Simplifying the problem: Linearize

  • The tangent at any point is a good local

approximation if the function is sufficiently smooth

11-755/18797 117

s

slide-80
SLIDE 80

Linearizing the observation function

  • Simple first-order Taylor series expansion

– J() is the Jacobian matrix

  • Simply a determinant for scalar state
  • Expansion around current predicted a priori

(or predicted) mean of the state

– Linear approximation changes with time

11-755/18797 118

) (s g

 g ) )( ( ) (

t t g t

s s s J s g

   g ) , ( ) | (

1 : t t t t

R s Gaussian

  • s

P 

slide-81
SLIDE 81

Most probability is in the low-error region

  • P(st) is small where approximation error is large

– Most of the probability mass of s is in low-error regions

11-755/18797 119

s

) , ( ) | (

1 : t t t t

R s Gaussian

  • s

P 

 Most probability mass close to mean

slide-82
SLIDE 82

The state equation?

11-755/18797 121

 Solution: Linearize

e  

 )

(

1 t t

s f s

) , ; ( ) (

e

e e   Gaussian P

 Again, direct use of f() can be disastrous  Linearize around the mean of the updated

distribution of s at t-1

 Converts the system to a linear one

e  

 )

(

1 t t

s f s

) ˆ , ˆ ; ( ) | (

1 1 1 1 : 1     

t t t t t

R s s Gaussian

  • s

P

) ˆ )( ˆ ( ) ˆ (

1 1 1 1    

   

t t t f t t

s s s J s f s e

slide-83
SLIDE 83

Linearized System

  • Now we have a simple time-varying linear

system

  • Kalman filter equations directly apply

11-755/18797 122

) (s g

 g

e  

 )

(

1 t t

s f s ) ˆ )( ˆ ( ) ˆ (

1 1 1 1    

   

t t t f t t

s s s J s f s e ) )( ( ) (

t t g t

s s s J s g

   g

slide-84
SLIDE 84

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 123

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K ) ( ) ˆ (

1 t g t t f t

s J B s J A  

e  

 )

(

1 t t

s f s g   ) ( t

t

s g

  • Jacobians used in

Linearization Assuming e and g are 0 mean for simplicity

slide-85
SLIDE 85

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 124

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K ) ( ) ˆ (

1 t g t t f t

s J B s J A  

e  

 )

(

1 t t

s f s

The predicted state at time t is obtained simply by propagating the estimated state at t-1 through the state dynamics equation

g   ) ( t

t

s g

slide-86
SLIDE 86

) ( t

g t

s J B 

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 125

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K

) ˆ (

1 

t f t

s J A e  

 )

(

1 t t

s f s e   ) ( t

t

s g

  • Uncertainty of prediction.

The variance of the predictor = variance of et + variance of Ast-1 A is obtained by linearizing f()

slide-87
SLIDE 87

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 126

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K e  

 )

(

1 t t

s f s e   ) ( t

t

s g

  • )

( t

t

s g

The Kalman gain is the slope of the MAP estimator that predicts s from o RBT = Cso, (BRBT+) = Coo B is obtained by linearizing g()

) ( t

g t

s J B 

slide-88
SLIDE 88

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 127

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K e  

 )

(

1 t t

s f s e   ) ( t

t

s g

  • We can also predict the observation from

the predicted state using the observation equation

) ( t

t

s g

slide-89
SLIDE 89

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 128

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K e  

 )

(

1 t t

s f s e   ) ( t

t

s g

  • )

( t

t

s g

The correction is the difference between the actual observation and the predicted

  • bservation, scaled by the Kalman Gain

We must correct the predicted value of the state after making an observation

slide-90
SLIDE 90

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 129

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K e  

 )

(

1 t t

s f s e   ) ( t

t

s g

  • The uncertainty in state decreases if we
  • bserve the data and make a correction

The reduction is a multiplicative “shrinkage” based on Kalman gain and B

) ( t

g t

s J B 

slide-91
SLIDE 91

The Extended Kalman filter

  • Prediction
  • Update

11-755/18797 130

) ˆ (

1 

t t

s f s

 

t t t t

R B K I R   ˆ

T t t t t

A R A R

1

ˆ

  

e

 

) ( ˆ

t t t t t

s g

  • K

s s   

 

1 

  

g T t t t T t t t

B R B B R K ) ( ) ˆ (

1 t g t t f t

s J B s J A  

e  

 )

(

1 t t

s f s e   ) ( t

t

s g

slide-92
SLIDE 92

EKFs

  • EKFs are probably the most commonly used algorithm

for tracking and prediction

– Most systems are non-linear – Specifically, the relationship between state and

  • bservation is usually nonlinear

– The approach can be extended to include non-linear functions of noise as well

  • The term “Kalman filter” often simply refers to an

extended Kalman filter in most contexts.

  • But..

11-755/18797 131

slide-93
SLIDE 93

EKFs have limitations

  • If the non-linearity changes too quickly with s, the linear

approximation is invalid

– Unstable

  • The estimate is often biased

– The true function lies entirely on one side of the approximation

  • Various extensions have been proposed:

– Invariant extended Kalman filters (IEKF) – Unscented Kalman filters (UKF)

11-755/18797 132

slide-94
SLIDE 94

Conclusions

  • HMMs are predictive models
  • Continuous-state models are simple

extensions of HMMs

– Same math applies

  • Prediction of linear, Gaussian systems can be

performed by Kalman filtering

  • Prediction of non-linear, Gaussian systems can

be performed by Extended Kalman filtering

11-755/18797 133