Improving forecasting by estimating time series structural - - PowerPoint PPT Presentation

improving forecasting by estimating time series
SMART_READER_LITE
LIVE PREVIEW

Improving forecasting by estimating time series structural - - PowerPoint PPT Presentation

Improving forecasting by estimating time series structural components across multiple frequencies Nikolaos Kourentzes Fotios Petropoulos Juan R. Trapero Multiple Aggregation Prediction Algorithm Agenda 1. Motivation 2. The idea behind the


slide-1
SLIDE 1

Improving forecasting by estimating time series structural components across multiple frequencies

Nikolaos Kourentzes Fotios Petropoulos Juan R. Trapero

slide-2
SLIDE 2
  • 1. Motivation
  • 2. The idea behind the algorithm
  • 3. Multiple Aggregation Prediction Algorithm
  • 4. Empirical evaluation
  • 5. Conclusions

Agenda

Multiple Aggregation Prediction Algorithm

slide-3
SLIDE 3

Motivation

Forecasting Forecasting is crucial for several operations of organisations

  • Short- and long-term objectives
  • Demand and inventory planning
  • Capacity planning
  • Pricing and marketing strategy planning
  • Budgeting
  • etc.

Requirement for large number of forecasts → Automa,on Key issues in forecasting automation:

  • Model selection
  • Model parameterisation
  • Forecast reconciliation
slide-4
SLIDE 4

Motivation

Exponential Smoothing Let us consider the example of Exponential Smoothing method (ETS)

  • Considered one of the most reliable and robust methods for automatic univariate

forecasting [Makridakis & Hibbon, 2000, Hyndman et al., 2002, Gardner, 2006]

  • It is a family of methods: ETS (error type, trend type, seasonality type)
  • Error: Additive or Multiplicative
  • Trend: None or Additive or Multiplicative, Linear or Damped/Exponential
  • Seasonality: None or Additive or Multiplicative
  • Adequate for a most types of time series
slide-5
SLIDE 5

Motivation

Optimisation and model selection We have an optimisation problem of estimating the smoothing parameters , , , ′ and the initial state This is done by maximising the likelihood of the model: For automatic forecasting we can consider up to 30 different models. This introduces a model selection problem. Hyndman et al., 2002 suggested to solve this via the Akaike Information Criterion (AIC) and provided supporting empirical evidence We select the model with the best AIC, which we use to forecast ∗ ,

  • 2

!, " 2# Number of smoothing parameters and initial states … for well-behaved data

slide-6
SLIDE 6

Motivation

Issues What can go wrong in parameter and model selection:

  • Business time series are often short Limited data
  • Estimation of parameters can fail miserably (for monthly data optimise up to 18

parameters, with often no more than 36 observations)

  • Model selection can fail as well (30 models over-fitting?)
  • Both optimisation and model selection are myopic Focus on data fitting in the

past, rather than ‘forecastability’

  • Special cases:

10 20 30 40 50 60 70 80 100 120 140 160 180 200 220 Sales Month Demand Fit Forecast

True model: Additive trend, additive seasonality Identified model: No trend, additive seasonality Why? In-sample variance explained mostly by seasonality Reliable automatic forecasting requires robust parameter estimation and model selection

slide-7
SLIDE 7

Idea

Time/Frequency domains Given a monthly time series:

20 40 60 80 100 120 1000 2000 3000 4000 5000 6000 7000 Month Demand 0.1 0.2 0.3 0.4 0.5 5 10 15 x 10

6

Frequency Power

Low frequency components = Level + Trend Seasonality and its harmonics Time series plot Power spectrum We can look at a time series in the classical way, or in the frequency domain Differences, in frequency domain:

  • Components are separated
  • ETS is a filter, with smoothing parameters deciding its shape
  • Initial states cannot be retrieved
slide-8
SLIDE 8

Idea

Temporal Aggregation Given a monthly time series we can do temporal non-overlapping aggregating

20 40 60 80100 120 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 1 0.1 0.2 0.3 0.4 5 10 15 x 10

6

Frequency Power 10 20 30 40 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 3 0.1 0.2 0.3 0.4 1 2 3 4 5 x 10

6

Frequency Power 5 10 15 20 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 6 0.1 0.2 0.3 0.4 0.5 1 1.5 2 2.5 x 10

6

Frequency Power 2 4 6 8 10 12 14 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 9 0.1 0.2 0.3 0.4 5 10 15 x 10

5

Frequency Power 2 4 6 8 10 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 12 0.1 0.2 0.3 0.4 2 4 6 8 10 12 x 10

5

Frequency Power

Monthly Quarterly Half-annually 9-monthly Annually Seasonality Weak seasonality No seasonality No seasonality Strong seasonality

slide-9
SLIDE 9

Idea

Temporal Aggregation Temporal non-overlapping aggregation:

  • Show to be beneficial for forecasting accuracy ADIDA algorithm [Nikolopoulos et al.,

2011]

Step 1: Aggregate time series Step 2: Forecast time series (motivated by intermittent data) Step 3: Disaggregate time series

  • Good performance for slow and fast moving goods [Nikolopoulos et al., 2011, Babai et al., 2012]
  • Reduces noise as aggregation level increases, but removes component information

[Spithourakis et al., 2012]

  • Consider aggregating monthly time series and disaggregating, seasonality is lost.

Reconstruction would limit only to deterministic forms.

  • Selection of aggregation level No theoretical grounding [Nikolopoulos et al., 2011,

Spithourakis et al., 2011]

slide-10
SLIDE 10

What if we do not select an aggregation level?

  • use multiple

10 20 30 40 100 120 140 160 180 200 Period Demand Aggregation level 1 ETS(A,N,A)

ETS(A,M,A) ETS(A,M,N) ETS(A,A,N)

Issues:

  • Different model
  • Different length
  • Combination

2 4 6 8 10 12 14 16 100 120 140 160 180 200 Period Demand Aggregation level 3 1 2 3 4 5 6 7 100 120 140 160 180 200 Period Demand Aggregation level 7 1 1.5 2 2.5 3 3.5 4 100 120 140 160 180 200 Period Demand Aggregation level 12

Idea

Temporal Aggregation

slide-11
SLIDE 11

Idea

Combination Forecast combination:

  • Forecast combination is widely considered as beneficial for forecasting accuracy

and forecast error variance [Bates & Granger, 1969, Makridakis & Winkler, 1983, Clemen, 1989, Hibon &

Evgeniou, 2005]

  • Simple combination methods (average, median) considered robust, relatively

accurate to more complex methods [Clemen, 1989, Timmermann, 2006, Jose & Winkler, 2008] Issues:

  • If there are different model types to be combined then the resulting forecast does

not fit well with any component!

10 20 30 40 100 120 140 160 180 200 Period Demand 10 20 30 40 100 120 140 160 180 200 Period Demand

slide-12
SLIDE 12

The MAPA algorithm

Part 1

10 20 30 40 50 60 5000 10000 15000 5 10 15 20 25 30 5000 10000 15000 5 10 15 20 5000 10000 15000 1 2 3 4 5 6 5000 10000 15000 1 2 3 4 5 6 5000 10000 15000 1 2 3 4 5 5000 10000 15000

$

%1'

$

%2'

$

%3'

$

%10'

$

%11'

$

%12'

Aggregate

10 20 30 40 50 60 70 5000 10000 15000

Fit state space ETS

10 20 30 40 50 60

  • 44.26
  • 44.24
  • 44.22

10 20 30 40 50 60 2000 4000 6000 10 20 30 40 50 60 0.5 1 1.5 2

Save states

Level Trend Season

Fit state space ETS

5 10 15 20 25 30 35 2000 4000 6000 8000 10000

5 10 15 20 25 30 1 1.2 1.4 5 10 15 20 25 30

  • 96.27
  • 96.26
  • 96.25

5 10 15 20 25 30 2000 4000 6000

Save states

Level Trend Season

slide-13
SLIDE 13

The MAPA algorithm

Part 2 Transform states to additive and to original sampling frequency Combine states (components) Produce forecasts

slide-14
SLIDE 14

Empirical Evaluation

Assess the performance of MAPA on four datasets:

  • 645 annual time series from the M3 competition [Makridakis & Hibbon, 2000]
  • 1483 semi-annual time series from the FRED database
  • 756 quarterly time series from the M3 competition
  • 1428 monthly time series from the M3 competition

Setup identical to M3 competition to allow comparison with published results. FRED semi-annual setup same as M3 quarterly.

slide-15
SLIDE 15

Empirical Evaluation

Better than benchmark ETS The longer the horizon the better the relative performance

Annual data: 2 aggregation levels Semi-annual data: 2 aggregation levels

slide-16
SLIDE 16

Empirical Evaluation

With seasonality present MAPA outperforms Comb The longer the horizon the better the relative performance

Quarterly data: 4 aggregation levels Monthly data: 12 aggregation levels

slide-17
SLIDE 17

Empirical Evaluation

Summary On average better performance than exponential smoothing

  • Significant for practice, most systems and organisations use exponential smoothing
  • Switching from ETS to MAPA requires small and transparent changes
  • Particularly good for long term forecasts Both high- and low-frequency time

series components captured:

  • Same forecast useful for operational, tactical and strategic horizons
  • Reconciles short-term forecasting with long-term forecasting
  • Operational forecasts naturally aggregate to predictions for capacity planning,
  • etc. Implications for supply chain and operations management

Can we improve further on the short term forecasts?

  • Standard time series modelling approach: combine MAPA with ETS using simple

average.

slide-18
SLIDE 18

Empirical Evaluation

Combined ETS-MAPA Combined ETS with MAPA(Mean) and (MAPA(Median)) MAPA better MAPA-ETS better ETS better In the MAPA-ETS combination we can show that each state is eventually calculated as: (level) w1=(K+1)/2K and w2 to wK equal to 0.5K, K is the maximum temporal aggregation level.

  • Temporal hierarchies! Grounding for theoretically identifying optimum aggregation

combination and variable combination weights conditional on the forecast horizon Best literature result: 13.83

slide-19
SLIDE 19

Conclusions

  • MAPA provides a framework to better identify and estimate the different time series

components better forecasts

  • On average outperforms ETS, one of the most widely used, robust and accurate

univariate forecasting methods

  • Evidence of hierarchies in time may lead to theoretical optimum aggregation levels

and variable combination weights

  • Implications for short- and long-term forecasts
slide-20
SLIDE 20

Nikos Kourentzes

Lancaster University Management School Centre for Forecasting - Lancaster, LA1 4YX email: n.kourentzes@lancaster.ac.uk

Questions?

slide-21
SLIDE 21

Algorithm

Part 3

Multiple Aggregation Prediction Algorithm

Step 1: Aggregation . . . . . .

[ ]

1

Y

[ ]

2

Y

[ ]

3

Y

[ ]

K

Y 2 = k 3 = k K k =

Step 2: Forecasting

ETS Model Selection ETS Model Selection ETS Model Selection ETS Model Selection

. . . . . .

[ ]

1

b

[ ]

1

s

[ ]

2

b

[ ]

2

s

[ ]

3

b

[ ]

3

s

[ ]

K

b

[ ]

K

s

[ ]

2

l

[ ]

3

l

[ ]

K

l

[ ]

1

l

Step 3: Combination

+

l b s

[ ]

1

ˆ Y

−1

K

−1

K

−1

K &

Strengthens and attenuates components Estimation of parameters at multiple levels Robustness on model selection and parameterisation