Forecasting in R Evaluating modeling accuracy Bahman Rostami-Tabar

Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 2

Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . 4

Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . Assumptions { e t } uncorrelated. If they aren’t, then information left in 1 residuals that should be used in computing forecasts. { e t } have mean zero. If they don’t, then forecasts are 2 biased. 4

Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . Assumptions { e t } uncorrelated. If they aren’t, then information left in 1 residuals that should be used in computing forecasts. { e t } have mean zero. If they don’t, then forecasts are 2 biased. Useful properties (for prediction intervals) { e t } have constant variance. 3 { e t } are normally distributed. 4 4

Example: Antidiabetic drug sales Antidiabetic drug sales 30 25 Sales (US$) colour 20 Data Fitted 15 10 2000 Jan 2002 Jan 2004 Jan 2006 Jan 2008 Jan Month 5

Example: Antidiabetic drug sales augment (fit) %>% autoplot (.resid) + xlab ("Month") + ylab ("") + ggtitle ("Residuals from naïve method") Residuals from naïve method 5 0 −5 −10 2000 Jan 2002 Jan 2004 Jan 2006 Jan 2008 Jan Month 6

Example: Antidiabetic drug sales augment (fit) %>% ggplot ( aes (x = .resid)) + geom_histogram (bins = 30) + ggtitle ("Histogram of residuals") Histogram of residuals 60 40 count 20 0 0 50 100 .resid 7

Example: Antidiabetic drug sales augment (fit) %>% ACF (.resid) %>% autoplot () + ggtitle ("ACF of residuals") ACF of residuals 0.10 0.05 acf 0.00 −0.05 −0.10 5 10 15 20 lag [1] 8

ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. 9

Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. 10

Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test h � r 2 Q = T k k =1 where h is max lag being considered and T is number of observations. If each r k close to zero, Q will be small . If some r k values large (positive or negative), Q 10 will be large .

Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. Ljung-Box test h Q ∗ = T ( T + 2) ( T − k ) − 1 r 2 � k k =1 where h is max lag being considered and T is number of observations. Preferences: h = 10 for non-seasonal data, h = 2 m for seasonal data. 11 Better performance, especially in small samples.

Portmanteau tests If data are WN, Q ∗ has χ 2 distribution with ( h − K ) degrees of freedom where K = no. parameters in model. When applied to raw data, set K = 0. augment (fit) %>% features (.resid, ljung_box, lag=10,dof=0) ## # A tibble: 1 x 4 ## Symbol .model lb_stat lb_pvalue ## <chr> <chr> <dbl> <dbl> ## 1 GOOG NAIVE(Close) 7.91 0.637 12

gg_tsresiduals function fit %>% gg_tsresiduals () 50 .resid 0 0 50 100 150 200 250 trading_day 50 0.10 40 0.05 30 count acf 0.00 20 −0.05 10 −0.10 0 5 10 15 20 0 50 100 lag [1] .resid 13

Evaluating point forecast accuracy 15

Evaluate forecast accuracy Residual diagnostic is not a reliable indication of forecast accuracy A model which fits the training data well will not necessarily forecast well A perfect fit can always be obtained by using a model with enough parameters Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data 16

Fitting 17

Evaluate forecast accuracy The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model 18

Forecast accuracy evaluation using test sets We mimic the real life situation We pretend we don’t know some part of data(new data) It must not be used for any aspect of model training Forecast accuracy is based only on the test set 19

Training and test series 21

Split the data Use functions in dplyr and lubridate such as filter , filter_index , slice , year # Filter the year of interest antidiabetic_drug_sale %>% filter_index ("2006" ~ .) ## # A tsibble: 30 x 2 [1M] ## Month Cost ## <mth> <dbl> ## 1 2006 Jan 23.5 ## 2 2006 Feb 12.5 ## 3 2006 Mar 15.5 22 ## 4 2006 Apr 14.2 ## 5 2006 May 17.8

Forecast errors Forecast “error”: the difference between an observed value and its forecast e T + h = y T + h − ˆ y T + h | T , where the training data is given by { y 1 , . . . , y T } Unlike residuals, forecast errors on the test set involve multi-step forecasts. These are true forecast errors as the test data is not used in computing ˆ y T + h | T . 23

Measures of forecast accuracy ( T + h )th observation, h = 1 , . . . , H y T + h = ˆ y T + h | T = its forecast based on data up to time T . y T + h − ˆ e T + h = y T + h | T MAE = mean( | e T + h | ) � MSE = mean( e 2 mean( e 2 T + h ) RMSE = T + h ) MAPE = 100mean( | e T + h | / | y T + h | ) 24

Measures of forecast accuracy ( T + h )th observation, h = 1 , . . . , H y T + h = ˆ y T + h | T = its forecast based on data up to time T . y T + h − ˆ e T + h = y T + h | T MAE = mean( | e T + h | ) � MSE = mean( e 2 mean( e 2 T + h ) RMSE = T + h ) MAPE = 100mean( | e T + h | / | y T + h | ) MAE, MSE, RMSE are all scale dependent MAPE is scale independent but is only sensible if y t ≫ 0 for all t , and y has a natural zero. 24

Measures of forecast accuracy Mean Absolute Scaled Error MASE = mean( | e T + h | / Q ) where Q is a stable measure of the scale of the time series { y t } . For non-seasonal time series, Q = ( T − 1) − 1 T � | y t − y t − 1 | t =2 works well. Then MASE is equivalent to MAE relative to a naïve method. 25

Measures of forecast accuracy Mean Absolute Scaled Error MASE = mean( | e T + h | / Q ) where Q is a stable measure of the scale of the time series { y t } . For seasonal time series, T Q = ( T − m ) − 1 � | y t − y t − m | t = m +1 works well. Then MASE is equivalent to MAE relative to a seasonal naïve method. 26

Poll: true or false? Good point forecast models should have 1 normally distributed residuals. A model with small residuals will give good 2 forecasts. The best measure of forecast accuracy is MAPE. 3 Always choose the model with the best forecast 4 accuracy as measured on the test set. 27

Issue with traditional train/test split 29

Issue with traditional train/test split Training data Test data time 29

Time series cross-validation 30

Time series cross-validation Time series cross-validation time 31

Time series cross-validation Time series cross-validation time Forecast accuracy averaged over test sets. Also known as “evaluation on a rolling forecasting origin” 31

Creating the rolling training sets There are three main rolling types which can be used. Stretch: extends a growing length window with new data. Slide: shifts a fixed length window through the data. Tile: moves a fixed length window without overlap. Three functions to roll a tsibble: stretch_tsibble() , slide_tsibble() , and tile_tsibble() . For time series cross-validation, stretching windows are most commonly used. 32

Creating the rolling training sets Slide 800 700 600 500 400 Tile 800 700 Trips 600 500 400 Stretch 800 700 600 500 400 2000 2005 2010 2015 Quarter 33

Forecasting in R Evaluating modeling accuracy Bahman Rostami-Tabar - PowerPoint PPT Presentation

Forecasting in R Evaluating modeling accuracy Bahman Rostami-Tabar Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

2018-2019 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR REQUIREMENTS FORECASTING

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &

Air quality forecasting in Europe Forecasting emissions Cross-cutting activities with working

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Tool Demonstration: Demand Forecasting PACE D 2.0 RE Team Agenda Demand Forecasting

6 TH GRADE FORECASTING 2019 FORECASTING: THE PROCESS BY WHICH FAMILIES CHOOSE ELECTIVE CLASSES

AIR QUALITY FORECASTING SYSTEM IN A DOLOMITIC VALLEY: AIR QUALITY FORECASTING SYSTEM IN A

CAPPOOSE HIGH SCHOOL 2020-2021 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR

Coastal Observation and a Coastal Observation and a Forecasting System for the Forecasting

Forecasting Prices and Forecasting Prices and Congestion for Congestion for Transmission Grid

COL866: Foundations of Data Science Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL866:

MIPP: a Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard Adrien

Acceptance-Rejection method The acceptance-rejection method is usually used when the inverse

http://www.iragreenberg.com Review Only one thing executes at any time Scope

Characteri z ing a single v ariable DATA VISU AL IZATION IN R Ron Pearson Instr u ctor What do

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu

A Graphical User Interface for Environmental Statistics Rudolf Dutter Department of Statistics

Carlos Ramos Carreo Grupo de Aprendizaje Automtico, Department of Computer Science ,