 
              Improving forecasting by estimating time series structural components across multiple frequencies Nikolaos Kourentzes Fotios Petropoulos Juan R. Trapero
Multiple Aggregation Prediction Algorithm Agenda 1. Motivation 2. The idea behind the algorithm 3. Multiple Aggregation Prediction Algorithm 4. Empirical evaluation 5. Conclusions
Motivation Forecasting Forecasting is crucial for several operations of organisations • Short- and long-term objectives • Demand and inventory planning • Capacity planning • Pricing and marketing strategy planning • Budgeting • etc. Requirement for large number of forecasts → Automa,on Key issues in forecasting automation: • Model selection • Model parameterisation • Forecast reconciliation
Motivation Exponential Smoothing Let us consider the example of Exponential Smoothing method ( ETS ) • Considered one of the most reliable and robust methods for automatic univariate forecasting [Makridakis & Hibbon, 2000, Hyndman et al., 2002, Gardner, 2006] • It is a family of methods: ETS (error type, trend type, seasonality type) • Error: A dditive or M ultiplicative • Trend: N one or A dditive or M ultiplicative, Linear or Damped/Exponential • Seasonality: N one or A dditive or M ultiplicative • Adequate for a most types of time series
Motivation Optimisation and model selection We have an optimisation problem of estimating the smoothing parameters � � ��, �, �, ��′ and the initial state � � This is done by maximising the likelihood � of the model: � � � ∗ �, � � � ���� � � �� � 2 � ��� ��� ��� � ��� ��� For automatic forecasting we can consider up to 30 different models. This introduces a model selection problem . Hyndman et al., 2002 suggested to solve this via the Akaike Information Criterion (AIC) and provided supporting empirical evidence Number of smoothing �� � � ∗ � !, � parameters and initial " � � 2# states We select the model with the best AIC, which we use to forecast … for well-behaved data
Motivation Issues What can go wrong in parameter and model selection : Business time series are often short � Limited data • • Estimation of parameters can fail miserably (for monthly data optimise up to 18 parameters, with often no more than 36 observations) Model selection can fail as well (30 models � over-fitting?) • Both optimisation and model selection are myopic � Focus on data fitting in the • past, rather than ‘ forecastability ’ • Special cases: True model : 220 Additive trend, additive seasonality Demand Fit Forecast 200 180 Identified model : 160 Sales No trend, additive seasonality 140 120 Why? 100 In-sample variance explained mostly by 80 10 20 30 40 50 60 70 Month seasonality Reliable automatic forecasting requires robust parameter estimation and model selection
Idea Time/Frequency domains Given a monthly time series: Time series plot 6 Power spectrum x 10 7000 15 6000 5000 10 Demand Power 4000 3000 5 2000 0 1000 0.1 0.2 0.3 0.4 0.5 20 40 60 80 100 120 Frequency Month Low frequency components = Level + Trend Seasonality and its harmonics We can look at a time series in the classical way, or in the frequency domain Differences, in frequency domain: • Components are separated ETS is a filter, with smoothing parameters � deciding its shape • Initial states � � cannot be retrieved •
Idea Temporal Aggregation Given a monthly time series we can do temporal non-overlapping aggregating Quarterly Annually Monthly Half-annually 9-monthly Aggregation Level 1 Aggregation Level 3 Aggregation Level 6 Aggregation Level 9 Aggregation Level 12 7000 7000 7000 7000 7000 6000 6000 6000 6000 6000 5000 5000 5000 5000 5000 4000 4000 4000 4000 4000 3000 3000 3000 3000 3000 2000 2000 2000 2000 2000 1000 1000 1000 1000 1000 20 40 60 80100 120 10 20 30 40 5 10 15 20 2 4 6 8 10 12 14 2 4 6 8 10 Period Period Period Period Period 6 6 6 5 5 x 10 x 10 x 10 x 10 x 10 15 5 2.5 15 12 10 4 2 10 10 8 3 1.5 Power Power Power Power Power 6 2 1 5 5 4 1 0.5 2 0 0 0 0 0 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 Frequency Frequency Frequency Frequency Frequency Weak seasonality Strong seasonality Seasonality No seasonality No seasonality
Idea Temporal Aggregation Temporal non-overlapping aggregation : Show to be beneficial for forecasting accuracy � ADIDA algorithm [Nikolopoulos et al., • 2011] � Step 1: Aggregate time series � Step 2: Forecast time series (motivated by intermittent data) � Step 3: Disaggregate time series • Good performance for slow and fast moving goods [Nikolopoulos et al., 2011, Babai et al., 2012] • Reduces noise as aggregation level increases, but removes component information [Spithourakis et al., 2012] • Consider aggregating monthly time series and disaggregating, seasonality is lost. Reconstruction would limit only to deterministic forms. Selection of aggregation level � No theoretical grounding [Nikolopoulos et al., 2011, • Spithourakis et al., 2011]
Idea Temporal Aggregation What if we do not select an aggregation level? � � use multiple � � Aggregation level 1 ETS(A,N,A) ETS(A,M,A) Aggregation level 3 200 200 180 180 Demand 160 Demand 160 140 140 120 120 Issues : 100 100 • Different model 10 20 30 40 2 4 6 8 10 12 14 16 Period Period • Different length ETS(A,A,N) Aggregation level 7 ETS(A,M,N) Aggregation level 12 • Combination 200 200 180 180 Demand Demand 160 160 140 140 120 120 100 100 1 2 3 4 5 6 7 1 1.5 2 2.5 3 3.5 4 Period Period
Idea Combination Forecast combination : • Forecast combination is widely considered as beneficial for forecasting accuracy and forecast error variance [Bates & Granger, 1969, Makridakis & Winkler, 1983, Clemen, 1989, Hibon & Evgeniou, 2005] Simple combination methods (average, median) considered robust, relatively • accurate to more complex methods [Clemen, 1989, Timmermann, 2006, Jose & Winkler, 2008] Issues: • If there are different model types to be combined then the resulting forecast does not fit well with any component! 200 200 180 180 Demand 160 Demand 160 140 140 120 120 100 100 10 20 30 40 10 20 30 40 Period Period
The MAPA algorithm Part 1 Aggregate Fit state space ETS Fit state space ETS 15000 15000 10000 %1' $ � 10000 5000 8000 10000 0 10 20 30 40 50 60 15000 6000 %2' 10000 $ � 5000 4000 5000 0 5 10 15 20 25 30 0 2000 15000 0 10 20 30 40 50 60 70 0 5 10 15 20 25 30 35 %3' $ � 10000 Save states Save states 5000 0 6000 6000 5 10 15 20 Level Level … 4000 4000 2000 2000 15000 5 10 15 20 25 30 %10' 10 20 30 40 50 60 $ � 10000 -44.22 -96.25 Trend Trend 5000 0 -96.26 -44.24 1 2 3 4 5 6 15000 %11' $ � 10000 -96.27 -44.26 5000 10 5 10 20 30 15 40 20 50 25 60 30 0 1.4 2 1 2 3 4 5 6 Season Season 15000 1.5 %12' 10000 $ � 1.2 5000 1 0 1 2 3 4 5 0.5 1 10 5 10 20 30 15 40 20 50 25 60 30
The MAPA algorithm Part 2 Transform states to additive and to original sampling frequency Combine states (components) Produce forecasts
Empirical Evaluation Assess the performance of MAPA on four datasets: • 645 annual time series from the M3 competition [Makridakis & Hibbon, 2000] • 1483 semi- annual time series from the FRED database • 756 quarterly time series from the M3 competition • 1428 monthly time series from the M3 competition Setup identical to M3 competition to allow comparison with published results. FRED semi-annual setup same as M3 quarterly.
Empirical Evaluation Annual data: 2 aggregation levels Semi-annual data: 2 aggregation levels Better than benchmark ETS The longer the horizon the better the relative performance
Empirical Evaluation Quarterly data: 4 aggregation levels Monthly data: 12 aggregation levels With seasonality present MAPA outperforms Comb The longer the horizon the better the relative performance
Empirical Evaluation Summary On average better performance than exponential smoothing • Significant for practice, most systems and organisations use exponential smoothing • Switching from ETS to MAPA requires small and transparent changes Particularly good for long term forecasts � Both high- and low-frequency time • series components captured: • Same forecast useful for operational, tactical and strategic horizons • Reconciles short-term forecasting with long-term forecasting • Operational forecasts naturally aggregate to predictions for capacity planning, etc. � Implications for supply chain and operations management Can we improve further on the short term forecasts? • Standard time series modelling approach: combine MAPA with ETS using simple average.
Recommend
More recommend