chapter 7 1 se sequential data data
play

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, - PowerPoint PPT Presentation

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM 15/16 24 Nov 2015 IRDM Chapter 7, overview Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.


  1. Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM ‘15/16 24 Nov 2015

  2. IRDM Chapter 7, overview  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 2 IRDM ‘15/16

  3. IRDM Chapter 7, today  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 3 IRDM ‘15/16

  4. Chapter 7.1: Basi asic I Ideas eas Aggarwal Ch. 14.1-14.2 VII-1: 4 IRDM ‘15/16

  5. T emperature Data Temp (°C) 28.2 25.4 30.5 15.7 33.4 29.4 28.6 16.1 28.5 27.9 15.5 31.4 VII-1: 5 IRDM ‘15/16

  6. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 6 IRDM ‘15/16

  7. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 7 IRDM ‘15/16

  8. T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 Sept-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 Sept-23 16.1 Sept-24 28.5 June-25 27.9 Sept-26 15.5 June-27 31.4 VII-1: 8 IRDM ‘15/16

  9. Applications Healt lth Monit itorin ing Stock a analy lysis is Weathe her Forecasting ing VII-1: 9 IRDM ‘15/16 Socia ial Network Analysis is

  10. Definition A time s e seri eries of len engt gth 𝑜 consists of 𝑜 tuples 𝑢 1 , 𝑌 1 , 𝑢 2 , 𝑌 2 , … ( 𝑢 𝑜 , 𝑌 𝑜 ) where for a tuple ( 𝑢 𝑗 , 𝑌 𝑗 ) , 𝑢 𝑗 is the ti time s stam tamp, and 𝑌 𝑗 is the data ata at time 𝑢 𝑗 , and we have a total order on the time stamps 𝑢 1 < 𝑢 2 < ⋯ < 𝑢 𝑜 Length may either be finite or infinite  Time stamps may be contiguous, in practice integers are easier  Data when talking about time series, usually numeric, continuous real eal-val alued ed  may be univariate (one attribute) or multivariate (multiple attributes)  VII-1: 10 IRDM ‘15/16

  11. Probabilistic Model of Time Series Consider data 𝑌 𝑗 at time 𝑢 𝑗 as a random variable the actual data we observe at 𝑢 𝑗 is a realiza zati tion of 𝑌 𝑗  Some probabilistic properties can be stable le over time e.g. the mean 𝜈 𝑗 of 𝑌 𝑗 does not change (much)  the covariance between pairs ( 𝑌 𝑗 , 𝑌 𝑗+ℎ ) is (almost) the same as ( 𝑌 1 , 𝑌 1+ℎ ) , i.e.,  the autoc ocovar arian ance of 𝑌 𝑗 does not change (much) A time series is stationa nary if the process behind it doe oes s not ot change  𝜈 𝑢 = 𝜈 𝑡 = 𝜈 for all 𝑢 , 𝑡 , and  𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝐷 𝑌𝑌 𝑡 − 𝑢 = 𝐷 𝑌𝑌 ( 𝜐 ) where 𝜐 = | 𝑡 − 𝑢 | is the amount of time by which the signal is shifted Stationary time series are easy to model and predict  most real-world time series, however, are anything but stationary (recall, if 𝑌 𝑗 has mean 𝜈 𝑗 = 𝐹 [ 𝑌 𝑗 ] , 𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝑑𝑑𝑑 𝑌 𝑢 , 𝑌 𝑡 = 𝐹 𝑌 𝑢 𝑌 𝑡 − 𝜈 𝑢 𝜈 𝑡 ) VII-1: 11 IRDM ‘15/16

  12. Stationarity of Time Series Daily Temperature 40 30 20 10 0 Monthly Temperature 40 30 20 10 0 VII-1: 12 IRDM ‘15/16

  13. Seasonality & trend Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 13 IRDM ‘15/16

  14. Formulation Classically, we assume a time series 𝑌 is composed of 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 where 𝑜𝑑𝑡𝑡𝑡 𝑗 is stationary. To make 𝑌 stationary, we simply have to remove seasonality and trend. VII-1: 14 IRDM ‘15/16

  15. Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 VII-1: 15 IRDM ‘15/16

  16. Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 ′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 𝑌 𝑗 VII-1: 16 IRDM ‘15/16

  17. ′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 where d = 12 𝑌 𝑗 Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 17 IRDM ‘15/16

  18. Example: Removing Seasonality Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 18 IRDM ‘15/16

  19. Trend Trend is a pol olynom nomial f func unction on of time (assumption) How to find the trend function? by fit itting ing functio ions ns 1.  difficult to do, up to what order, when to stop? by di diffe fferen encing 2. ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 ′′ = 𝑌 𝑗 ′ − 𝑌 𝑗−1 ′ 𝑌 𝑗  usually 2 times is enough VII-1: 19 IRDM ‘15/16

  20. Example: Removing Trend Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 20 IRDM ‘15/16

  21. Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 This is the time series we obtained by 20 removing seasonality and trend 15 10 5 0 -5 VII-1: 21 IRDM ‘15/16

  22. Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 The left-over fluctuations are either 20 noise or non-trivial patterns 15 10 5 0 -5 VII-1: 22 IRDM ‘15/16

  23. Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 VII-1: 23 IRDM ‘15/16

  24. Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 Temperature on June-22: Time Temp (°C) 1 June-19 33.4 𝑌 4 = 𝑌 2 + 𝑢 4 − 𝑢 2 × 𝑌 5 − 𝑌 2 2 June-20 29.4 𝑢 5 − 𝑢 2 4 June-22 4−2 = 29.4 + 5−2 × 16.1 − 29.4 5 June-23 16.1 = 20.5 VII-1: 24 IRDM ‘15/16

  25. Smoothing We can remove noise by smoot oothin ing Standard options include avera veraging ng ′ = 𝑡𝑑𝑏 ( 𝑌 𝑗−𝑥 , … , 𝑌 𝑗 ) 𝑌 𝑗 where win window ow le length 𝑥 is a user-specified parameter We can more weight to recent values by exponent nential s smoothi hing 𝑗 ′ = 1 − 𝛽 𝑗 ⋅ 𝑌 0 ′ + 𝛽 � 𝑌 𝑘 ⋅ 1 − 𝛽 𝑗−𝑘 𝑌 𝑗 𝑘=1 where the user chooses decay factor 𝛽 (updated on Nov 26 th : we now average explicitly over past values) VII-1: 25 IRDM ‘15/16

  26. Chapter 7.2: Forec ecast sting ing Aggarwal Ch. 14.3 VII-1: 26 IRDM ‘15/16

  27. Principle of Forecasting If we wish to make predictions, then clearly we must assu assume that something is stab stable over time. VII-1: 27 IRDM ‘15/16

  28. Autoregressive (AR) model Future values depend on past ast va values + random noise  assumption: the time series depends on autocorrelation Which past values?  the 𝑥 immedi diatel ely previous values What relation between past and future?  linear combination What kind of noise?  Gaussian VII-1: 28 IRDM ‘15/16

  29. AR, formally Future value is a linear combination of past ast va values + white noise 𝑥 + 𝑑 + 𝜗 𝑢 𝑌 𝑢 = � 𝑡 𝑗 ⋅ 𝑌 𝑢−𝑗 𝑗=1 noi noise with shifted mean Linear combination of past v valu lues where 𝜗 𝑢 ~ 𝒪 (0, 𝜏 2 ) VII-1: 29 IRDM ‘15/16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend