[PPT] - Characterizing retail demand with promotional effects for model PowerPoint Presentation

SLIDE 1

Characterizing retail demand with promotional effects for model selection

Patrícia Ramos, José Oliveira, Robert Fildes, Shaohui Ma INESC TEC, Lancaster Centre for Forecasting, Jiangsu University of Science and Technology

SLIDE 2

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

2

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 3

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

3

Increasing product variety with decreasing life cycles

makes sales at the SKU level in a particular store difficult to forecast as

– times series for these items tend to be short and often intermittent – there are often thousands of different SKUs

Retailers are increasing their marketing activities such as

promotions

Demand is usually substantially

higher during promotions leading to potential stock-outs due to inaccurate forecasts

An automated and reliable multivariate forecasting

system is required

Retail business

SLIDE 4

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

4

Forecasts needed on a weekly or daily basis

Which forecasting models perform best on weekly data

with promotional information?

The issue: selecting a best model for sub-sets of SKUs

Can we identify a best model for groups of time series

with common “features”?

Which “features” are relevant in the choice of the

model?

How does this compare with ‘individual selection’?

Key questions

SLIDE 5

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

5

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 6

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

6

Pingo Doce Retailer

The largest food distribution

group in Portugal

409 stores
Around 130M SKUs

SLIDE 7

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

7 CATEGORY MEAN PERCENTAGE OF WEEKS FRESH FISH AQUACULTURE 49.42 WILD FRESH FISH 45.86 TOMATO 33.81 FRESH PORK MEAT 24.43 PEPPER 23.41 LETTUCE 19.37 LIQUID YOGURT 16.19 BEER WITH ALCOHOL 12.43 FRESH VEGETABLES 10.12 BAKED BREAD 8.09 FROZEN COD 7.23 CONFECTIONERY 6.64 PASTEURIZED CREAM 5.37 PASTRY GOODS 4.78 EGGS 3.85 PAPER NAPKINS 2.89 AIR FRESHENER 1.73 NATURAL FLOWERS 0.52

The dataset

Daily SKU information between Jan 2012

and Apr 2015 (1211 days/173 weeks)

– Units sold – Price

Selection of SKUs from the 6 main areas

(93% of daily sales total volume)

– Perishables, grocery, beverage, cleaning products and personal products

Selection of a store with the largest

dimension

Fast moving goods (SKUs with sales on

100% of the weeks)

Data sample

– 988 SKUs – 203 categories – Intense promotional activity – Seasonal and non-seasonal Promotional activity of some categories of fast moving goods

SLIDE 8

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

8

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 9

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

9

Univariate models (4)

– ETS1 – TBATS2 – ARIMA2 – SNaïve

Multivariate models (7)

– LASSO3

Regressors (1+50)

– log(Sales): t-1 – Price: t, t-1 – Relative discount*: t, t-1 – Promotion days in the week: t, t-1 – Last week of the month: t, t-1 – 13 Calendar events: some with t+1, t, some with t-1

*relative discount = (regular price - price with discount)/regular price

1- smooth R package, 2- forecast R package, 3- glmnet R package

Forecasting models

SLIDE 10

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

10

Multivariate models (cont.)

– TBATS & LASSO2,3

Three stages

– 1º Fit a TBATS model and forecast – 2º Apply LASSO to the residuals with the regressors and forecast – 3º Add both forecasts

– TBATSX2,3

Three stages

– 1º Fit a TBATS model and extract the components – 2º Apply LASSO with the TBATS components and the regressors as exogenous variables – 3º Forecast

2- forecast R package , 3- glmnet R package

Forecasting models

SLIDE 11

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

11

Multivariate models (cont.)

– ETSX1

Regressors included as Principal Components

– ARIMA Fourier2

Seasonality handled with Fourier terms

– ARIMAX2

Regressors included as Principal Components

– ARIMAX Fourier2

Seasonality handled with Fourier terms
Regressors included as Principal Components

1- smooth R package, 2- forecast R package

Forecasting models

SLIDE 12

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

12

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 13

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

13

Data (173 weeks) split into

– training set (121 weeks) – test set (52 weeks ~30%)

Annual seasonality (frequency = 52)
Rolling forecast origin with 1-step ahead
Fit a model using the first training set
Re-estimate the parameters of the fitted model at each

forecast origin and use it to forecast

Error measures

– MAPE, MdAPE – MRMAE, MdRMAE, GMRMAE, MRRMSE, MdRRMSE, GMRRMSE

(SNaïve holdout forecasts used as benchmark)

– MASE, MdASE

(scaled by SNaïve forecasts of the in-sample)

Experimental design

SLIDE 14

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

14

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 15

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

15

Univariate models perform worse than correspondent multivariate

models

TBATSX is the best model based on the average rank
Seasonality handled with Fourier terms is preferred for ARIMA
LASSO has a relatively poor performance indicating that the dynamics of

the series is essential and difficult to integrate in an ADL model

All models perform better than the benchmark

Main Results

Method Avg Rank MAPE MdAPE GMRMAE GMRRMSE MASE MdASE TBATSX 1.50 36.92 20.62 0.59 0.60 0.67 0.44 TBATS & LASSO 2.83 36.44 21.47 0.61 0.62 0.69 0.45 ARIMAX Fourier 3.17 35.58 21.61 0.61 0.63 0.69 0.45 ETSX 3.33 38.26 21.59 0.61 0.62 0.69 0.45 ARIMAX 4.17 35.77 21.80 0.62 0.63 0.70 0.45 ARIMA Fourier 6.25 38.32 22.25 0.66 0.69 0.76 0.47 TBATS 7.25 39.06 22.74 0.67 0.70 0.76 0.47 ARIMA 7.83 39.83 22.68 0.67 0.71 0.78 0.47 ETS 9.08 45.44 23.43 0.69 0.70 0.80 0.49 LASSO 9.58 40.74 22.77 0.73 0.71 0.88 0.61 SNAIVE 11.00 68.90 34.17 1.00 1.00 1.10 0.71

SLIDE 16

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

16

Motivation
Retail sales dataset
Demand forecasting models
Experimental design
Forecasting results
Clustering analysis results
Conclusions

Outline

SLIDE 17

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

17

ACF1: first order autocorrelation of Rt obtained from STL

decomposition: Yt = St + Tt + Rt

Strength of trend based on STL: Yt = St + Tt + Rt

1-[Var(Rt)/Var(Yt-St)]

Entropy:

spectral entropy from ForeCA package A low value of entropy suggests a time series easier to forecast

Relative promotion activity:
No. weeks with promotion/Total no. of weeks
Optimal Box-Cox transformation of Yt: lambda

Characteristics/features for time series

Extract 2 principal components to summarize the data

SLIDE 18

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

18

Features space of fast moving goods

Principal component analysis

SLIDE 19

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

19

Features space of fast moving goods

SLIDE 20

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

20

ETS
ETSX
TBATS
TBATSX
TBATS & LASSO
LASSO
ARIMA
ARIMA Fourier
ARIMAX
ARIMAX Fourier
SNaïve

Method selection

based on ‘best’ performing methods

SLIDE 21

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

21

Algorithm

1. Identify the best model for each SKU from the 7

selected methods based on RMAE

2. Specify the number of clusters
3. Assign a cluster to each SKU in the features

space using K-Means Clustering

4. Identify the most frequent best method in each

cluster

5. Assign to each SKU in the cluster the most

frequent best method of its cluster

Classification based on clustering

SLIDE 22

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

22

Classification based on clustering

SLIDE 23

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

23

GMRMAE vs the number of clusters

Aggregate selection is effective: the classification procedure improves

GMRMAE after 101 clusters!

The optimum is obviously obtained with 988 clusters (no. of SKUs)
The classification procedure improves GMRMAE’s benchmark in about 11%
Individual selection has potential to improve
Need to identify method for cluster ex ante (rather than as here, ex post)

SLIDE 24

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

24

Using the classification to identify a model for a time series

Procedure: Given a fast moving SKU, compute its PCs and select its model by identifying the cluster whose centroid is closest

SLIDE 25

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

25

Multivariate models integrating the series patterns give

the best forecasts for retail data with promotions

Improvement in relative accuracy of TBATSX achievable is

41% (measured by RMAE)

Moving from univariate methods to ‘best’ multivariate

method improves accuracy by 11% (measured by RMAE)

Classification based on clustering give good insights for a

model identification procedure

The classification procedure for method selection can

improve forecasts when compared with the best performing method for the population

Conclusions

SLIDE 26

ISF 2017, 25-28 June Characterizing retail demand with promotional effects for model selection

26

Characterizing retail demand with promotional effects for model selection

Patrícia Ramos, José Oliveira, Robert Fildes, Shaohui Ma INESC TEC, Lancaster Centre for Forecasting, Jiangsu University of Science and Technology

Outline

makes sales at the SKU level in a particular store difficult to forecast as

– times series for these items tend to be short and often intermittent – there are often thousands of different SKUs

promotions

higher during promotions leading to potential stock-outs due to inaccurate forecasts

system is required

Retail business

Forecasts needed on a weekly or daily basis

with promotional information?

The issue: selecting a best model for sub-sets of SKUs

with common “features”?

model?

Key questions

Outline

Pingo Doce Retailer

group in Portugal

The dataset

and Apr 2015 (1211 days/173 weeks)

(93% of daily sales total volume)

dimension

100% of the weeks)

Outline

– ETS1 – TBATS2 – ARIMA2 – SNaïve

– LASSO3

Forecasting models

– TBATS & LASSO2,3

– 1º Fit a TBATS model and forecast – 2º Apply LASSO to the residuals with the regressors and forecast – 3º Add both forecasts

– TBATSX2,3

– 1º Fit a TBATS model and extract the components – 2º Apply LASSO with the TBATS components and the regressors as exogenous variables – 3º Forecast

Forecasting models

– ETSX1

– ARIMA Fourier2

– ARIMAX2

– ARIMAX Fourier2

Forecasting models

Outline

– training set (121 weeks) – test set (52 weeks ~30%)

forecast origin and use it to forecast

– MAPE, MdAPE – MRMAE, MdRMAE, GMRMAE, MRRMSE, MdRRMSE, GMRRMSE

(SNaïve holdout forecasts used as benchmark)

– MASE, MdASE

(scaled by SNaïve forecasts of the in-sample)

Experimental design

Outline

models

the series is essential and difficult to integrate in an ADL model

Main Results

Outline

decomposition: Yt = St + Tt + Rt

1-[Var(Rt)/Var(Yt-St)]

spectral entropy from ForeCA package A low value of entropy suggests a time series easier to forecast

Characteristics/features for time series

Extract 2 principal components to summarize the data

Features space of fast moving goods

Principal component analysis

Features space of fast moving goods

Method selection

Algorithm

selected methods based on RMAE

space using K-Means Clustering

cluster

frequent best method of its cluster

Classification based on clustering

Classification based on clustering

GMRMAE vs the number of clusters

GMRMAE after 101 clusters!

Using the classification to identify a model for a time series

Procedure: Given a fast moving SKU, compute its PCs and select its model by identifying the cluster whose centroid is closest

the best forecasts for retail data with promotions

41% (measured by RMAE)

method improves accuracy by 11% (measured by RMAE)

model identification procedure

improve forecasts when compared with the best performing method for the population

Conclusions

Thank you for your attention! Question?