Improving forecasting by estimating time series structural - - PowerPoint PPT Presentation
Improving forecasting by estimating time series structural - - PowerPoint PPT Presentation
Improving forecasting by estimating time series structural components across multiple frequencies Nikolaos Kourentzes Fotios Petropoulos Juan R. Trapero Multiple Aggregation Prediction Algorithm Agenda 1. Motivation 2. The idea behind the
- 1. Motivation
- 2. The idea behind the algorithm
- 3. Multiple Aggregation Prediction Algorithm
- 4. Empirical evaluation
- 5. Conclusions
Agenda
Multiple Aggregation Prediction Algorithm
Motivation
Forecasting Forecasting is crucial for several operations of organisations
- Short- and long-term objectives
- Demand and inventory planning
- Capacity planning
- Pricing and marketing strategy planning
- Budgeting
- etc.
Requirement for large number of forecasts → Automa,on Key issues in forecasting automation:
- Model selection
- Model parameterisation
- Forecast reconciliation
Motivation
Exponential Smoothing Let us consider the example of Exponential Smoothing method (ETS)
- Considered one of the most reliable and robust methods for automatic univariate
forecasting [Makridakis & Hibbon, 2000, Hyndman et al., 2002, Gardner, 2006]
- It is a family of methods: ETS (error type, trend type, seasonality type)
- Error: Additive or Multiplicative
- Trend: None or Additive or Multiplicative, Linear or Damped/Exponential
- Seasonality: None or Additive or Multiplicative
- Adequate for a most types of time series
Motivation
Optimisation and model selection We have an optimisation problem of estimating the smoothing parameters , , , ′ and the initial state This is done by maximising the likelihood of the model: For automatic forecasting we can consider up to 30 different models. This introduces a model selection problem. Hyndman et al., 2002 suggested to solve this via the Akaike Information Criterion (AIC) and provided supporting empirical evidence We select the model with the best AIC, which we use to forecast ∗ ,
- 2
- ∗
!, " 2# Number of smoothing parameters and initial states … for well-behaved data
Motivation
Issues What can go wrong in parameter and model selection:
- Business time series are often short Limited data
- Estimation of parameters can fail miserably (for monthly data optimise up to 18
parameters, with often no more than 36 observations)
- Model selection can fail as well (30 models over-fitting?)
- Both optimisation and model selection are myopic Focus on data fitting in the
past, rather than ‘forecastability’
- Special cases:
10 20 30 40 50 60 70 80 100 120 140 160 180 200 220 Sales Month Demand Fit Forecast
True model: Additive trend, additive seasonality Identified model: No trend, additive seasonality Why? In-sample variance explained mostly by seasonality Reliable automatic forecasting requires robust parameter estimation and model selection
Idea
Time/Frequency domains Given a monthly time series:
20 40 60 80 100 120 1000 2000 3000 4000 5000 6000 7000 Month Demand 0.1 0.2 0.3 0.4 0.5 5 10 15 x 10
6
Frequency Power
Low frequency components = Level + Trend Seasonality and its harmonics Time series plot Power spectrum We can look at a time series in the classical way, or in the frequency domain Differences, in frequency domain:
- Components are separated
- ETS is a filter, with smoothing parameters deciding its shape
- Initial states cannot be retrieved
Idea
Temporal Aggregation Given a monthly time series we can do temporal non-overlapping aggregating
20 40 60 80100 120 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 1 0.1 0.2 0.3 0.4 5 10 15 x 10
6
Frequency Power 10 20 30 40 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 3 0.1 0.2 0.3 0.4 1 2 3 4 5 x 10
6
Frequency Power 5 10 15 20 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 6 0.1 0.2 0.3 0.4 0.5 1 1.5 2 2.5 x 10
6
Frequency Power 2 4 6 8 10 12 14 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 9 0.1 0.2 0.3 0.4 5 10 15 x 10
5
Frequency Power 2 4 6 8 10 1000 2000 3000 4000 5000 6000 7000 Period Aggregation Level 12 0.1 0.2 0.3 0.4 2 4 6 8 10 12 x 10
5
Frequency Power
Monthly Quarterly Half-annually 9-monthly Annually Seasonality Weak seasonality No seasonality No seasonality Strong seasonality
Idea
Temporal Aggregation Temporal non-overlapping aggregation:
- Show to be beneficial for forecasting accuracy ADIDA algorithm [Nikolopoulos et al.,
2011]
Step 1: Aggregate time series Step 2: Forecast time series (motivated by intermittent data) Step 3: Disaggregate time series
- Good performance for slow and fast moving goods [Nikolopoulos et al., 2011, Babai et al., 2012]
- Reduces noise as aggregation level increases, but removes component information
[Spithourakis et al., 2012]
- Consider aggregating monthly time series and disaggregating, seasonality is lost.
Reconstruction would limit only to deterministic forms.
- Selection of aggregation level No theoretical grounding [Nikolopoulos et al., 2011,
Spithourakis et al., 2011]
What if we do not select an aggregation level?
- use multiple
10 20 30 40 100 120 140 160 180 200 Period Demand Aggregation level 1 ETS(A,N,A)
ETS(A,M,A) ETS(A,M,N) ETS(A,A,N)
Issues:
- Different model
- Different length
- Combination
2 4 6 8 10 12 14 16 100 120 140 160 180 200 Period Demand Aggregation level 3 1 2 3 4 5 6 7 100 120 140 160 180 200 Period Demand Aggregation level 7 1 1.5 2 2.5 3 3.5 4 100 120 140 160 180 200 Period Demand Aggregation level 12
Idea
Temporal Aggregation
Idea
Combination Forecast combination:
- Forecast combination is widely considered as beneficial for forecasting accuracy
and forecast error variance [Bates & Granger, 1969, Makridakis & Winkler, 1983, Clemen, 1989, Hibon &
Evgeniou, 2005]
- Simple combination methods (average, median) considered robust, relatively
accurate to more complex methods [Clemen, 1989, Timmermann, 2006, Jose & Winkler, 2008] Issues:
- If there are different model types to be combined then the resulting forecast does
not fit well with any component!
10 20 30 40 100 120 140 160 180 200 Period Demand 10 20 30 40 100 120 140 160 180 200 Period Demand
The MAPA algorithm
Part 1
10 20 30 40 50 60 5000 10000 15000 5 10 15 20 25 30 5000 10000 15000 5 10 15 20 5000 10000 15000 1 2 3 4 5 6 5000 10000 15000 1 2 3 4 5 6 5000 10000 15000 1 2 3 4 5 5000 10000 15000
…
$
%1'
$
%2'
$
%3'
$
%10'
$
%11'
$
%12'
Aggregate
10 20 30 40 50 60 70 5000 10000 15000
Fit state space ETS
10 20 30 40 50 60
- 44.26
- 44.24
- 44.22
10 20 30 40 50 60 2000 4000 6000 10 20 30 40 50 60 0.5 1 1.5 2
Save states
Level Trend Season
Fit state space ETS
5 10 15 20 25 30 35 2000 4000 6000 8000 10000
5 10 15 20 25 30 1 1.2 1.4 5 10 15 20 25 30
- 96.27
- 96.26
- 96.25
5 10 15 20 25 30 2000 4000 6000
Save states
Level Trend Season
The MAPA algorithm
Part 2 Transform states to additive and to original sampling frequency Combine states (components) Produce forecasts
Empirical Evaluation
Assess the performance of MAPA on four datasets:
- 645 annual time series from the M3 competition [Makridakis & Hibbon, 2000]
- 1483 semi-annual time series from the FRED database
- 756 quarterly time series from the M3 competition
- 1428 monthly time series from the M3 competition
Setup identical to M3 competition to allow comparison with published results. FRED semi-annual setup same as M3 quarterly.
Empirical Evaluation
Better than benchmark ETS The longer the horizon the better the relative performance
Annual data: 2 aggregation levels Semi-annual data: 2 aggregation levels
Empirical Evaluation
With seasonality present MAPA outperforms Comb The longer the horizon the better the relative performance
Quarterly data: 4 aggregation levels Monthly data: 12 aggregation levels
Empirical Evaluation
Summary On average better performance than exponential smoothing
- Significant for practice, most systems and organisations use exponential smoothing
- Switching from ETS to MAPA requires small and transparent changes
- Particularly good for long term forecasts Both high- and low-frequency time
series components captured:
- Same forecast useful for operational, tactical and strategic horizons
- Reconciles short-term forecasting with long-term forecasting
- Operational forecasts naturally aggregate to predictions for capacity planning,
- etc. Implications for supply chain and operations management
Can we improve further on the short term forecasts?
- Standard time series modelling approach: combine MAPA with ETS using simple
average.
Empirical Evaluation
Combined ETS-MAPA Combined ETS with MAPA(Mean) and (MAPA(Median)) MAPA better MAPA-ETS better ETS better In the MAPA-ETS combination we can show that each state is eventually calculated as: (level) w1=(K+1)/2K and w2 to wK equal to 0.5K, K is the maximum temporal aggregation level.
- Temporal hierarchies! Grounding for theoretically identifying optimum aggregation
combination and variable combination weights conditional on the forecast horizon Best literature result: 13.83
Conclusions
- MAPA provides a framework to better identify and estimate the different time series
components better forecasts
- On average outperforms ETS, one of the most widely used, robust and accurate
univariate forecasting methods
- Evidence of hierarchies in time may lead to theoretical optimum aggregation levels
and variable combination weights
- Implications for short- and long-term forecasts
Nikos Kourentzes
Lancaster University Management School Centre for Forecasting - Lancaster, LA1 4YX email: n.kourentzes@lancaster.ac.uk
Questions?
Algorithm
Part 3
Multiple Aggregation Prediction Algorithm
Step 1: Aggregation . . . . . .
[ ]
1
Y
[ ]
2
Y
[ ]
3
Y
[ ]
K
Y 2 = k 3 = k K k =
Step 2: Forecasting
ETS Model Selection ETS Model Selection ETS Model Selection ETS Model Selection
. . . . . .
[ ]
1
b
[ ]
1
s
[ ]
2
b
[ ]
2
s
[ ]
3
b
[ ]
3
s
[ ]
K
b
[ ]
K
s
[ ]
2
l
[ ]
3
l
[ ]
K
l
[ ]
1
l
Step 3: Combination
+
l b s
[ ]
1
ˆ Y
∑
−1
K
∑
−1
K
∑
−1
K &
Strengthens and attenuates components Estimation of parameters at multiple levels Robustness on model selection and parameterisation