Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, - PowerPoint PPT Presentation

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM ‘15/16 24 Nov 2015

IRDM Chapter 7, overview  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 2 IRDM ‘15/16

IRDM Chapter 7, today  Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.  Discrete Sequences Basic Ideas 4. Pattern Discovery 5. Hidden Markov Models 6. You’ll find this covered in Aggarwal Ch. 3.4, 14, 15 VII-1: 3 IRDM ‘15/16

Chapter 7.1: Basi asic I Ideas eas Aggarwal Ch. 14.1-14.2 VII-1: 4 IRDM ‘15/16

T emperature Data Temp (°C) 28.2 25.4 30.5 15.7 33.4 29.4 28.6 16.1 28.5 27.9 15.5 31.4 VII-1: 5 IRDM ‘15/16

T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 6 IRDM ‘15/16

T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 June-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 June-23 16.1 June-24 28.5 June-25 27.9 June-26 15.5 June-27 31.4 VII-1: 7 IRDM ‘15/16

T emperature Data Time Temp (°C) Daily Temperature June-15 28.2 40 35 June-16 25.4 30 25 June-17 30.5 20 15 Sept-18 15.7 10 June-19 33.4 5 0 June-20 29.4 June-22 28.6 Sept-23 16.1 Sept-24 28.5 June-25 27.9 Sept-26 15.5 June-27 31.4 VII-1: 8 IRDM ‘15/16

Applications Healt lth Monit itorin ing Stock a analy lysis is Weathe her Forecasting ing VII-1: 9 IRDM ‘15/16 Socia ial Network Analysis is

Definition A time s e seri eries of len engt gth 𝑜 consists of 𝑜 tuples 𝑢 1 , 𝑌 1 , 𝑢 2 , 𝑌 2 , … ( 𝑢 𝑜 , 𝑌 𝑜 ) where for a tuple ( 𝑢 𝑗 , 𝑌 𝑗 ) , 𝑢 𝑗 is the ti time s stam tamp, and 𝑌 𝑗 is the data ata at time 𝑢 𝑗 , and we have a total order on the time stamps 𝑢 1 < 𝑢 2 < ⋯ < 𝑢 𝑜 Length may either be finite or infinite  Time stamps may be contiguous, in practice integers are easier  Data when talking about time series, usually numeric, continuous real eal-val alued ed  may be univariate (one attribute) or multivariate (multiple attributes)  VII-1: 10 IRDM ‘15/16

Probabilistic Model of Time Series Consider data 𝑌 𝑗 at time 𝑢 𝑗 as a random variable the actual data we observe at 𝑢 𝑗 is a realiza zati tion of 𝑌 𝑗  Some probabilistic properties can be stable le over time e.g. the mean 𝜈 𝑗 of 𝑌 𝑗 does not change (much)  the covariance between pairs ( 𝑌 𝑗 , 𝑌 𝑗+ℎ ) is (almost) the same as ( 𝑌 1 , 𝑌 1+ℎ ) , i.e.,  the autoc ocovar arian ance of 𝑌 𝑗 does not change (much) A time series is stationa nary if the process behind it doe oes s not ot change  𝜈 𝑢 = 𝜈 𝑡 = 𝜈 for all 𝑢 , 𝑡 , and  𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝐷 𝑌𝑌 𝑡 − 𝑢 = 𝐷 𝑌𝑌 ( 𝜐 ) where 𝜐 = | 𝑡 − 𝑢 | is the amount of time by which the signal is shifted Stationary time series are easy to model and predict  most real-world time series, however, are anything but stationary (recall, if 𝑌 𝑗 has mean 𝜈 𝑗 = 𝐹 [ 𝑌 𝑗 ] , 𝐷 𝑌𝑌 𝑢 , 𝑡 = 𝑑𝑑𝑑 𝑌 𝑢 , 𝑌 𝑡 = 𝐹 𝑌 𝑢 𝑌 𝑡 − 𝜈 𝑢 𝜈 𝑡 ) VII-1: 11 IRDM ‘15/16

Stationarity of Time Series Daily Temperature 40 30 20 10 0 Monthly Temperature 40 30 20 10 0 VII-1: 12 IRDM ‘15/16

Seasonality & trend Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 13 IRDM ‘15/16

Formulation Classically, we assume a time series 𝑌 is composed of 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 where 𝑜𝑑𝑡𝑡𝑡 𝑗 is stationary. To make 𝑌 stationary, we simply have to remove seasonality and trend. VII-1: 14 IRDM ‘15/16

Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 VII-1: 15 IRDM ‘15/16

Seasonality Seasonality is essentially perio iodici icity  seasonality is a perio iodic ic functio ion n of time with period 𝑒 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 How to find the seasonal ality f ty functi tion? by fitting a sine e or cosi osine function 1. difficult – the signal may also be sine’ish by di diffe fferen encing 2. 𝑌 𝑗 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗 + 𝑢𝑢𝑡𝑜𝑒 𝑗 + 𝑜𝑑𝑡𝑡𝑡 𝑗 𝑌 𝑗−𝑒 = 𝑡𝑡𝑡𝑡𝑑𝑜𝑡𝑡𝑡𝑢𝑧 𝑗−𝑒 + 𝑢𝑢𝑡𝑜𝑒 𝑗−𝑒 + 𝑜𝑑𝑡𝑡𝑡 𝑗−𝑒 ′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 𝑌 𝑗 VII-1: 16 IRDM ‘15/16

′ = 𝑌 𝑗 − 𝑌 𝑗−𝑒 where d = 12 𝑌 𝑗 Monthly Temperature 40 35 30 25 20 15 10 5 0 2011 2012 2013 VII-1: 17 IRDM ‘15/16

Example: Removing Seasonality Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 18 IRDM ‘15/16

Trend Trend is a pol olynom nomial f func unction on of time (assumption) How to find the trend function? by fit itting ing functio ions ns 1.  difficult to do, up to what order, when to stop? by di diffe fferen encing 2. ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 ′′ = 𝑌 𝑗 ′ − 𝑌 𝑗−1 ′ 𝑌 𝑗  usually 2 times is enough VII-1: 19 IRDM ‘15/16

Example: Removing Trend Monthly Temperature 40 35 30 This is the time series we obtained by 25 removing seasonality 20 15 10 5 0 VII-1: 20 IRDM ‘15/16

Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 This is the time series we obtained by 20 removing seasonality and trend 15 10 5 0 -5 VII-1: 21 IRDM ‘15/16

Example: Removing Trend ′ = 𝑌 𝑗 − 𝑌 𝑗−1 𝑌 𝑗 Monthly Temperature 40 35 30 25 The left-over fluctuations are either 20 noise or non-trivial patterns 15 10 5 0 -5 VII-1: 22 IRDM ‘15/16

Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 VII-1: 23 IRDM ‘15/16

Pre-processing We can infer missing values by interpolation 𝑌 𝑙 = 𝑌 𝑗 + 𝑢 𝑙 − 𝑢 𝑗 𝑘 − 𝑌 𝑗 ) × ( 𝑌 𝑢 𝑘 − 𝑢 𝑗 where 𝑢 𝑗 < 𝑢 𝑙 < 𝑢 𝑘 Temperature on June-22: Time Temp (°C) 1 June-19 33.4 𝑌 4 = 𝑌 2 + 𝑢 4 − 𝑢 2 × 𝑌 5 − 𝑌 2 2 June-20 29.4 𝑢 5 − 𝑢 2 4 June-22 4−2 = 29.4 + 5−2 × 16.1 − 29.4 5 June-23 16.1 = 20.5 VII-1: 24 IRDM ‘15/16

Smoothing We can remove noise by smoot oothin ing Standard options include avera veraging ng ′ = 𝑡𝑑𝑏 ( 𝑌 𝑗−𝑥 , … , 𝑌 𝑗 ) 𝑌 𝑗 where win window ow le length 𝑥 is a user-specified parameter We can more weight to recent values by exponent nential s smoothi hing 𝑗 ′ = 1 − 𝛽 𝑗 ⋅ 𝑌 0 ′ + 𝛽 � 𝑌 𝑘 ⋅ 1 − 𝛽 𝑗−𝑘 𝑌 𝑗 𝑘=1 where the user chooses decay factor 𝛽 (updated on Nov 26 th : we now average explicitly over past values) VII-1: 25 IRDM ‘15/16

Chapter 7.2: Forec ecast sting ing Aggarwal Ch. 14.3 VII-1: 26 IRDM ‘15/16

Principle of Forecasting If we wish to make predictions, then clearly we must assu assume that something is stab stable over time. VII-1: 27 IRDM ‘15/16

Autoregressive (AR) model Future values depend on past ast va values + random noise  assumption: the time series depends on autocorrelation Which past values?  the 𝑥 immedi diatel ely previous values What relation between past and future?  linear combination What kind of noise?  Gaussian VII-1: 28 IRDM ‘15/16

AR, formally Future value is a linear combination of past ast va values + white noise 𝑥 + 𝑑 + 𝜗 𝑢 𝑌 𝑢 = � 𝑡 𝑗 ⋅ 𝑌 𝑢−𝑗 𝑗=1 noi noise with shifted mean Linear combination of past v valu lues where 𝜗 𝑢 ~ 𝒪 (0, 𝜏 2 ) VII-1: 29 IRDM ‘15/16

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, - PowerPoint PPT Presentation

Chapter 7-1: Se Sequential Data Data Jilles Vreeken Revision 1, November 26 th Definition of smoothing clarified IRDM 15/16 24 Nov 2015 IRDM Chapter 7, overview Time Series Basic Ideas 1. Prediction 2. Motif Discovery 3.

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

1 Sequential data analysis Sequential data analysis Objects and operators Objects and operators

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Sequential Circuits Combinational circuits : current input output Sequential circuit :

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Lecture 14: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

Sequential Circuits Chapter 4 S. Dandamudi Outline Introduction Example chips

Sequential Circuits Chapter 4 S. Dandamudi Outline Introduction Example chips

Topics 11/13/2006 Chapter 11, start Chapter 12 11/20/2006 Chapter 12 11/27/2006 Chapter 13

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

12/9/2019 Department of Veterinary and Animal Sciences Advanced Quantitative Methods in Herd

Modeling of seasonal baseline in influenza data using HMMs Al Ozonoff , Paola Sebastiani

AFRAID OF DATA? EXPLORING YOUR DATA FOR CQI Lake County Tribal Health Consortium Lake County, CA

Time Series Modeling Shouvik Mani April 5, 2018 15-388/688: Practical Data Science Carnegie

Solar Neutrino and Solar Neutrino and Neutrino Physics in Brazil Neutrino Physics in Brazil

Monthly Unemployment Daniela Gumprecht Directorate Population Quality Issues Madrid 11 May

Announcements The first midterm is a week from today It will be in class and similar in format to

CS145: INTRODUCTION TO DATA MINING Sequence Data: Similarity Search Instructor: Yizhou Sun