predicting and estimation from time series
play

Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797 1 Administrivia Final class on Tuesday the 2 nd .. Project Demos: 4 th December (Thursday). Before exams week


  1. Machine Learning for Signal Processing Predicting and Estimation from Time Series Bhiksha Raj 25 Nov 2014 11-755/18797 1

  2. Administrivia • Final class on Tuesday the 2 nd .. • Project Demos: 4 th December (Thursday). – Before exams week • Problem: How to set up posters for SV students? – Find a representative here? 11-755/18797 2

  3. An automotive example • Determine automatically, by only listening to a running automobile, if it is: – Idling; or – Travelling at constant velocity; or – Accelerating; or – Decelerating • Assume (for illustration) that we only record energy level (SPL) in the sound – The SPL is measured once per second 11-755/18797 3

  4. What we know • An automobile that is at rest can accelerate, or continue to stay at rest • An accelerating automobile can hit a steady- state velocity, continue to accelerate, or decelerate • A decelerating automobile can continue to decelerate, come to rest, cruise, or accelerate • A automobile at a steady-state velocity can stay in steady state, accelerate or decelerate 11-755/18797 4

  5. What else we know P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • The probability distribution of the SPL of the sound is different in the various conditions – As shown in figure • In reality, depends on the car • The distributions for the different conditions overlap – Simply knowing the current sound level is not enough to know the state of the car 11-755/18797 5

  6. The Model! P(x|accel) 0.33 70 Accelerating state P(x|idle) 0.5 0.33 0.33 0.5 0.33 0.25 0.33 Idling state Cruising state 65 0.25 45 0.25 0.25 I A C D 0.33 I 0.5 0.5 0 0 A 0 1/3 1/3 1/3 Decelerating state C 0 1/3 1/3 1/3 60 D 0.25 0.25 0.25 0.25 • The state-space model – Assuming all transitions from a state are equally probable 11-755/18797 6

  7. Estimating the state at T = 0- 0.25 0.25 0.25 0.25 Idling Accelerating Cruising Decelerating • A T=0, before the first observation, we know nothing of the state – Assume all states are equally likely 11-755/18797 7

  8. The first observation P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • At T=0 we observe the sound level x 0 = 68dB SPL – The observation modifies our belief in the state of the system • P(x 0 |idle) = 0 • P(x 0 |deceleration) = 0.0001 • P(x 0 |acceleration) = 0.7 • P(x 0 |cruising) = 0.5 – Note, these don’t have to sum to 1 – In fact, since these are densities, any of them can be > 1 11-755/18797 8

  9. Estimating state after at observing x 0 • P(state | x 0 ) = C P(state)P(x 0 |state) – P(idle | x 0 ) = 0 – P(deceleration | x 0 ) = C 0.000025 – P(cruising | x 0 ) = C 0.125 – P(acceleration | x 0 ) = C 0.175 • Normalizing – P(idle | x 0 ) = 0 – P(deceleration | x 0 ) = 0.000083 – P(cruising | x 0 ) = 0.42 – P(acceleration | x 0 ) = 0.57 11-755/18797 9

  10. Estimating the state at T = 0+ 0.57 0.42 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating • At T=0, after the first observation, we must update our belief about the states – The first observation provided some evidence about the state of the system – It modifies our belief in the state of the system 11-755/18797 10

  11. Predicting the state at T=1 I A C D A 0.57 I 0.5 0.5 0 0 0.42 I C A 0 1/3 1/3 1/3 C 0 1/3 1/3 1/3 8.3 x 10 -5 0.0 D D 0.25 0.25 0.25 0.25 Idling Accelerating Cruising Decelerating • Predicting the probability of idling at T=1 – P(idling|idling) = 0.5; – P(idling | deceleration) = 0.25 – P(idling at T=1| x 0 ) = P(I T=0 |x 0 ) P(I|I) + P(D T=0 |x 0 ) P(I|D) = 2.1 x 10 -5 • In general, for any state S – P(S T=1 | x 0 ) = S ST=0 P(S T=0 | x 0 ) P(S T=1 |S T=0 ) 11-755/18797 11

  12. Predicting the state at T = 1 0.57 0.42 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating P(S T=1 | x 0 ) = S ST=0 P(S T=0 | x 0 ) P(S T=1 |S T=0 ) 0.33 0.33 0.33 Rounded. In reality, they sum to 1.0 2.1x10 -5 11-755/18797 12

  13. Updating after the observation at T=1 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) 45 60 65 70 • At T=1 we observe x 1 = 63dB SPL • P(x 1 |idle) = 0 • P(x 1 |deceleration) = 0.2 • P(x 1 |acceleration) = 0.001 • P(x 1 |cruising) = 0.5 11-755/18797 13

  14. Update after observing x 1 • P(state | x 0:1 ) = C P(state| x 0 )P(x 1 |state) – P(idle | x 0:1 ) = 0 P(x|idle) P(x|decel) P(x|cruise) P(x|accel) – P(deceleration | x 0,1 ) = C 0.066 45 60 65 70 – P(cruising | x 0:1 ) = C 0.165 0.33 0.33 0.33 – P(acceleration | x 0:1 ) = C 0.00033 2.1x10 -5 • Normalizing – P(idle | x 0:1 ) = 0 – P(deceleration | x 0:1 ) = 0.285 – P(cruising | x 0:1 ) = 0.713 – P(acceleration | x 0:1 ) = 0. 0014 11-755/18797 14

  15. Estimating the state at T = 1+ 0.713 0.285 0.0 0.0014 Idling Accelerating Cruising Decelerating • The updated probability at T=1 incorporates information from both x 0 and x 1 – It is NOT a local decision based on x 1 alone – Because of the Markov nature of the process, the state at T=0 affects the state at T=1 • x 0 provides evidence for the state at T=1 11-755/18797 15

  16. Estimating a Unique state • What we have estimated is a distribution over the states • If we had to guess a state, we would pick the most likely state from the distributions 0.57 0.42 • State(T=0) = Accelerating 8.3 x 10 -5 0.0 Idling Accelerating Cruising Decelerating 0.713 • State(T=1) = Cruising 0.285 0.0 0.0014 Idling Accelerating Cruising Decelerating 11-755/18797 16

  17. Overall procedure T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T PREDICT UPDATE • At T=0 the predicted state distribution is the initial state probability • At each time T, the current estimate of the distribution over states considers all observations x 0 ... x T – A natural outcome of the Markov nature of the model • The prediction+update is identical to the forward computation for HMMs to within a normalizing constant 11-755/18797 17

  18. Comparison to Forward Algorithm T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T PREDICT UPDATE • Forward Algorithm: – P(x 0:T ,S T ) = P ( x T | S T ) S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) PREDICT UPDATE • Normalized: – P(S T |x 0:T ) = ( S S’ T P(x 0:T ,S’ T ) ) -1 P(x 0:T ,S T ) = C P(x 0:T ,S T ) 11-755/18797 18

  19. Decomposing the forward algorithm  P(x 0:T ,S T ) = P ( x T | S T ) S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) • Predict:  P(x 0:T-1 ,S T ) = S ST-1 P ( x 0:T-1 , S T-1 ) P(S T |S T-1 ) • Update:  P(x 0:T ,S T ) = P ( x T | S T ) P(x 0:T-1 ,S T ) 11-755/18797 19

  20. Estimating the state T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T Estimate(S T ) = argmax ST P(S T | x 0:T ) Estimate(S T ) • The state is estimated from the updated distribution – The updated distribution is propagated into time, not the state 11-755/18797 20

  21. Predicting the next observation T=T+1 P(S T | x 0:T-1 ) = S ST-1 P(S T-1 | x 0:T-1 ) P(S T |S T-1 ) P(S T | x 0:T ) = C. P(S T | x 0:T-1 ) P(x T |S T ) Update the Predict the distribution of the distribution of the state at T state at T after observing x T Predict P(x T |x 0:T-1 ) Predict x T • The probability distribution for the observations at the next time is a mixture: – P(x T |x 0:T-1 ) = S ST P(x T |S T ) P(S T |x 0:T-1 ) • The actual observation can be predicted from P(x T |x 0:T-1 ) 11-755/18797 21

  22. Predicting the next observation • MAP estimate: – argmax xT P(x T |x 0:T-1 ) • MMSE estimate: – Expectation(x T |x 0:T-1 ) 11-755/18797 22

  23. Difference from Viterbi decoding • Estimating only the current state at any time – Not the state sequence – Although we are considering all past observations • The most likely state at T and T+1 may be such that there is no valid transition between S T and S T+1 11-755/18797 23

  24. A known state model • HMM assumes a very coarsely quantized state space – Idling / accelerating / cruising / decelerating • Actual state can be finer – Idling, accelerating at various rates, decelerating at various rates, cruising at various speeds • Solution: Many more states (one for each acceleration /deceleration rate, crusing speed)? • Solution: A continuous valued state 11-755/18797 24

  25. The real-valued state model • A state equation describing the dynamics of the system  e ( , ) s f s  1 t t t – s t is the state of the system at time t – e t is a driving function, which is assumed to be random • The state of the system at any time depends only on the state at the previous time instant and the driving term at the current time • An observation equation relating state to observation  g ( , ) o g s – o t is the observation at time t t t t – g t is the noise affecting the observation (also random) • The observation at any time depends only on the current state of the system and the noise 11-755/18797 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend