Machine Learning for Signal Processing
Predicting and Estimation from Time Series
Bhiksha Raj 15 Nov 2016
11-755/18797 1
Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 - - PowerPoint PPT Presentation
Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 15 Nov 2016 11-755/18797 1 Preliminaries : P(y|x) for Gaussian If P(x,y) is Gaussian: C C x xx xy ( , ) (
11-755/18797 1
11-755/18797 2
) , ( ) , (
yy yx xy xx y x
y x C C C C N P
1 1 xy xx yx yy x xx yx y
1 1 xy xx yx yy x xx yx y
11-755/18797 3
Best guess for Y when X is not known
1 1 xy xx yx yy x xx yx y
11-755/18797 4
Best guess for Y when X is not known Correction of Y using information in X Mean of Y given X Given X value Update guess of Y based on information in X Correction is 0 if X and Y are uncorrelated, i.e Cyx = 0
1 1 xy xx yx yy x xx yx y
11-755/18797 5
Best guess for Y when X is not known Correction of Y using information in X Mean of Y given X Given X value
Slope Correction to Y = slope * (offset of X from mean)
1 1 xy xx yx yy x xx yx y
11-755/18797 6
Best guess for Y when X is not known Correction of Y using information in X Uncertainty in Y when X is not known
1 1 xy xx yx yy x xx yx y
11-755/18797 7
Best guess for Y when X is not known Correction of Y using information in X Uncertainty in Y when X is not known Reduced uncertainty from knowing X Shrinkage of uncertainty from knowing X Shrinkage of variance is 0 if X and Y are uncorrelated, i.e Cyx = 0
11-755/18797 8
1 1 xy xx yx yy x xx yx y
Given X value Mean of Y given X (MAP estimate of Y) Variance of Y when X is known Overall variance
unknown Knowing X modifies the mean of Y and shrinks its variance
11755/18797 15
11-755/18797 16
11-755/18797 17
11-755/18797 18
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel)
19
45 P(x|idle) Idling state 70 P(x|accel) Accelerating state 65 Cruising state 60 Decelerating state 0.5 0.5 0.33 0.33 0.33 0.33 0.33 0.25 0.25 0.25 0.33 0.25 I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25
11-755/18797 20
Idling Declerating Cruising Accelerating
11-755/18797 21
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB
11-755/18797 22
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0.0001 0.5 0.7 Idling Declerating Cruising Accelerating
These don’t have to sum to 1 Can even be greater than 1!
23
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 68dB Idling Declerating Cruising Accelerating
𝑸(𝐲𝟏|𝒕𝒖𝒃𝒖𝒇) Idling Declerating Cruising Accelerating
𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇) Remember the prior
11-755/18797 24
11-755/18797 25
Idling Declerating Cruising Accelerating
𝑸(𝐲𝟏|𝒕𝒖𝒃𝒖𝒇) Idling Declerating Cruising Accelerating
𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇)
11-755/18797 26
Idling Decelerating Cruising Accelerating
𝑸(𝑻𝑼=𝟏|𝐲𝟏)
𝑇𝑈=0
27
I A C D I 0.5 0.5 A 1/3 1/3 1/3 C 1/3 1/3 1/3 D 0.25 0.25 0.25 0.25 I A C D
Idling Decel Cruising Accel 0.0 0.57 0.42 8.3 x 10-5
11-755/18797 28
Idling Decelerating Cruising Accelerating
𝑇𝑈=0 Rounded. In reality, they sum to 1.0
11-755/18797 29
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 63dB
11-755/18797 30
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 63dB P(x|idle) P(x|deceleration) P(x|cruising) P(x|acceleration) 0.2 0.5 0.01 Idling Declerating Cruising Accelerating
31
45 70 65 60 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) Idling Declerating Cruising Accelerating
Idling Declerating Cruising Accelerating
𝑸𝒔𝒋𝒑𝒔: 𝑸(𝒕𝒖𝒃𝒖𝒇|𝐲𝟏) Remember the prior
63dB
𝑡𝑢𝑏𝑢𝑓′
11-755/18797 32
11-755/18797 33
Idling Declerating Cruising Accelerating
Idling Declerating Cruising Accelerating
11-755/18797 34
Idling Decelerating Cruising Accelerating
𝑇0
𝑇1
𝑇𝑢−1
11-755/18797 35
– A natural outcome of the Markov nature of the model
11-755/18797 36
Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1
P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)
PREDICT UPDATE
𝑇𝑢−1
𝑄 𝑇𝑢|𝑌0:𝑢−1 𝑄 𝑌𝑢 𝑇𝑢 𝑄 𝑇|𝑌0:𝑢−1 𝑄 𝑌𝑢 𝑇
𝑇
11-755/18797 38
𝑇𝑢−1
11-755/18797 39
Idling Decelerating Cruising Accelerating 0.0 0.57 0.42 8.3 x 10-5 Idling Decelerating Cruising Accelerating 0.0 0.713 0.285 0.0014
11-755/18797 40
Estimate(ST) Predict the distribution of the state at T Update the distribution of the state at T after observing xT T=T+1
Estimate(ST) = argmax STP(ST | x0:T) P(ST | x0:T-1) = SST-1 P(ST-1 | x0:T-1) P(ST|ST-1) P(ST | x0:T) = C. P(ST | x0:T-1) P(xT|ST)
11-755/18797 44
11755/18797 45
– st is the state of the system at time t – et is a driving function, which is assumed to be random
the previous time instant and the driving term at the current time
– ot is the observation at time t – gt is the noise affecting the observation (also random)
the system and the noise
11-755/18797 46
1 t t t
t t t
– The state is the position of the automobile or the star
– Sensor readings (for the automobile) or recorded image (for the telescope)
t t t
1 t t t
Update after O1:
Update after O0: Prediction at time 1: Prediction at time 0: P0(s) s
p
0.2 0.3 0.4 0.1 1 2 3 1 2 3
t t t
1 t t t
𝑄 𝑇0 = 𝜌(𝑇0) 𝑄 𝑇0 = 𝑄0(𝑇0) 𝑄 𝑇0|𝑃0 = 𝐷. 𝜌(𝑇0)𝑄 𝑃0|𝑇0 𝑄 𝑇0|𝑃0 = 𝐷. 𝑄(𝑇0)𝑄 𝑃0|𝑇0 𝑄 𝑇1|𝑃0 = 𝑄 𝑇0|𝑃0 𝑄 𝑇1|𝑇0
𝑇0
𝑄 𝑇1|𝑃0 = 𝑄 𝑇0|𝑃0 𝑄 𝑇1|𝑇0 𝑒𝑇0
∞ −∞
𝑄 𝑇1|𝑃0:1 = 𝐷. 𝑄(𝑇1|𝑃0)𝑄 𝑃1|𝑇1 𝑄 𝑇1|𝑃0:1 = 𝐷. 𝑄(𝑇1|𝑃0)𝑄 𝑃1|𝑇1
t t t
1 t t t
1 2 3
𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑄 𝑇𝑢−1|𝑃0:𝑢−1 𝑄 𝑇𝑢|𝑇𝑢−1
𝑇𝑢−1
𝑄 𝑇𝑢|𝑃0:𝑢−1 = 𝑄 𝑇𝑢−1|𝑃0:𝑢−1 𝑄 𝑇𝑢|𝑇𝑢−1 𝑒𝑇𝑢−1
∞ −∞
𝑄 𝑇𝑢|𝑃0:𝑢 = 𝐷. 𝑄(𝑇𝑢|𝑃0:𝑢−1)𝑄 𝑃𝑢|𝑇𝑢 𝑄 𝑇𝑢|𝑃0:𝑢 = 𝐷. 𝑄(𝑇𝑢|𝑃0:𝑢−1)𝑄 𝑃𝑢|𝑇𝑢 p
0.2 0.3 0.4 0.1 1 2 3
Initial state prob.
Parameters
1 t t s
t t t
1 t t t
1
t t ij
Transition prob Observation prob
1 2 3
p
0.2 0.3 0.4 0.1 1 2 3
11-755/18797 58
t t t t
t t t t
1
e e e e
e e p e
1
5 . exp | | ) 2 ( 1 ) (
T d
P
g g g g
g g p g
1
5 . exp | | ) 2 ( 1 ) (
T d
P
11755/18797 59
11-755/18797 60
T d
1
t t t t
1 t t t t
11-755/18797 61
t t t t
g g
g g
t t t t t
11-755/18797 62
t t t t
1
e e
1 e e
t t t t t
∞ −∞
P0(s) s
t t t t
1 t t t t
P0(s) s
t t t t
1 t t t t
𝑡 𝑢 = 𝐵𝑡 𝑢−1 + 𝜈𝜁 𝑆𝑢 = 𝛪𝜁 + 𝐵𝑆 𝑢−1𝐵𝑈 𝑡 𝑢 = 𝑡 𝑢 + 𝐿𝑢 (𝑃𝑢 − 𝐶𝑡 𝑢 − 𝜈𝛿) 𝑆 𝑢 = (𝐽 − 𝐿𝑢𝐶) 𝑆𝑢
𝐿𝑢 = 𝑆1𝐶𝑈 𝐶𝑆1𝐶𝑈 + 𝛪𝛿
−1
P0(s) s
t t t t
1 t t t t
𝑡 𝑢 = 𝐵𝑡 𝑢−1 + 𝜈𝜁 𝑆𝑢 = 𝛪𝜁 + 𝐵𝑆 𝑢−1𝐵𝑈 𝑡 𝑢 = 𝑡 𝑢 + 𝐿𝑢 (𝑃𝑢 − 𝐶𝑡 𝑢 − 𝜈𝛿) 𝑆 𝑢 = (𝐽 − 𝐿𝑢𝐶) 𝑆𝑢
𝐿𝑢 = 𝑆1𝐶𝑈 𝐶𝑆1𝐶𝑈 + 𝛪𝛿
−1
KALMAN FILTER
11-755/18797 87
e
1
t t
t t t t
T t t t t
1
e
g
t t t t t t
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t t t t
11-755/18797 88
e
1
t t
t t t t
T t t t t
1
e
1
g T t t t T t t t
t t t t
1 t t t t
11-755/18797 89
e
1
t t
t t t t
T t t t t
1 1 1 ˆ
e
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t t t t
11-755/18797 90
e
1
t t
t t t t
T t t t t
1
e
t t t t t t
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t t t t
11-755/18797 91
e
1
t t
t t t t
T t t t t
1
e
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t
11-755/18797 92
1 1 xy xx T yx yy x xx yx y
) , ( ) , (
yy yx xy xx y x
y x C C C C N P
1 x xx yx y
11-755/18797 93
1 1 xy xx T yx yy x xx yx y
) , ( ) , (
yy yx xy xx y x
x y C C C C N P
1 x xx yx y
11-755/18797 94
e
1
t t
t t t t
T t t t t
1
e
t t t t t t
1
g T t t t T t t t
t t t t
1 t t t t
11-755/18797 95
e
1
t t
t t t t
T t t t t
1
e
g
t t t t t t
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t
11-755/18797 96
e
1
t t
t t t t
T t t t t
1
e
t t t t t t
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t
11-755/18797 97
e
1
t t
t t t t
T t t t t
1
e
1
g T t t t T t t t
t t t t
1 t t t t
g
t t t t t t
11-755/18797 98
11-755/18797 99
t t t t
t t t t
1
11-755/18797 100
11-755/18797 101
11-755/18797 102
t t t
1 t t t
P(s0| O0) C P(s0) P(O0| s0)
1 1
) | ( ) O | ( ) O | ( ds s s P s P s P
P(s1| O0:1) C P(s1| O0) P(O1| s0)
1 1 2 1 : 1 1 : 2
) | ( ) O | ( ) O | ( ds s s P s P s P
P(s2| O0:2) C P(s2| O0:1) P(O2| s2) All distributions remain Gaussian P(s) P(st|st-1) P(Ot|st) P(s0) P(s)
a priori Transition prob. State output prob
t t t t
t t t t
1
11-755/18797 104
t t t
1 t t t
– Difficult to estimate for nonlinear g() – Even if it can be estimated, may not be tractable with update loop
– Difficult for nonlinear f() – May not be amenable to closed form integration
11-755/18797 105 1 1 1
: 1 1
:
) | ( )
( )
(
t t t t t
ds s s P s P s P
1
: t : t t t t
t t t
1 t t t
11-755/18797 106
t t t
g t s g t t
P s
) , ( : ) , (
| ) ( | ) ( ) | (
g g g g g
) ( ) ( ) 1 ( ) ( ) ( ) 1 ( ... ) 1 ( ) 1 ( | ) ( |
) , (
n n
t t t t t s g
t
g g g g
g
t t t
11-755/18797 107
g0 ; s
11-755/18797 108
g g
g g
g0 ; s
11-755/18797 109
g ; s=0
Assume initial probability P(s) is Gaussian
g g
11-755/18797 110
g ; s=0
g g
1 1
T T g g g
1 R ; 1 ;
g g
s
11-755/18797 111
1 1
Prediction
1 t t
e
Trivial, linear state transition equation
1 1 e
t t t t
1 1 1 1 1 1
exp ) ( ) ( 5 . ) )) exp( 1 log( ( ) )) exp( 1 log( ( 5 . exp )
( ds s s s s s s R s s
C s P
T T T
e g g g
= intractable
11-755/18797 112 1 1 1
: 1 1
:
) | ( )
( )
(
t t t t t
ds s s P s P s P
1
: t : t t t t
11-755/18797 113
1 t t t
11-755/18797 114
s
)) exp( 1 log( s
g
11-755/18797 115
s
)) exp( 1 log( s
g
11-755/18797 116
s
)) exp( 1 log( s
g
11-755/18797 117
s
11-755/18797 118
t t g t
1 : t t t t
11-755/18797 119
s
1 : t t t t
Most probability mass close to mean
11-755/18797 121
Solution: Linearize
)
1 t t
e
Again, direct use of f() can be disastrous Linearize around the mean of the updated
Converts the system to a linear one
)
1 t t
1 1 1 1 : 1
t t t t t
1 1 1 1
t t t f t t
11-755/18797 122
)
1 t t
1 1 1 1
t t t f t t
t t g t
11-755/18797 123
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
1 t g t t f t
)
1 t t
t
Linearization Assuming e and g are 0 mean for simplicity
11-755/18797 124
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
1 t g t t f t
)
1 t t
t
g t
11-755/18797 125
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
1
t f t
)
1 t t
t
11-755/18797 126
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
)
1 t t
t
t
g t
11-755/18797 127
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
)
1 t t
t
t
11-755/18797 128
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
)
1 t t
t
t
11-755/18797 129
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
)
1 t t
t
g t
11-755/18797 130
1
t t
t t t t
T t t t t
1
e
t t t t t
1
g T t t t T t t t
1 t g t t f t
)
1 t t
t
11-755/18797 131
approximation is invalid
– Unstable
– The true function lies entirely on one side of the approximation
– Invariant extended Kalman filters (IEKF) – Unscented Kalman filters (UKF)
11-755/18797 132
11-755/18797 133