The Effectiveness of Discretization in Forecasting: An Empirical - - PowerPoint PPT Presentation

the effectiveness of discretization in forecasting an
SMART_READER_LITE
LIVE PREVIEW

The Effectiveness of Discretization in Forecasting: An Empirical - - PowerPoint PPT Presentation

The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models 6 th Workshop on Mining and Learning from Time Series @ KDD 2020 Stephan Rabanser 1* stephan.rabanser@mail.utoronto.ca Tim Januschowski 2


slide-1
SLIDE 1

The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models

6th Workshop on Mining and Learning from Time Series @ KDD 2020

Stephan Rabanser1* stephan.rabanser@mail.utoronto.ca Tim Januschowski2 tjnsch@amazon.com Valentin Flunkert2 flunkert@amazon.com David Salinas3 david.salinas@naverlabs.com Jan Gasthaus2 gasthaus@amazon.com

1University of Toronto

Vector Institute *Work done at AWS AI Labs

2Amazon

AWS AI Labs

3NAVER LABS

Europe

August 24, 2020

Europe

slide-2
SLIDE 2

Motivation & Setup

  • Recent advancements in global forecasting:

model architectures and probabilistic outputs.

  • We investigate effects of (discrete) I/O

representations.

prediction sample paths multiple sampling windows context range prediction range (τ) xi z T

i i

zi,1:Ti φ xi,1:Ti+τ zi,Ti+1:Ti+τ ψ Model Distribution Likelihood

  • φ: input transformation.
  • ψ: output transformation,

influences output distribution.

The Effectiveness of Discretization in Forecasting 2

Europe

slide-3
SLIDE 3

Scaling Problem: A Motivating Example (m4 hourly)

Original time series

200 400 600 10000 20000 30000 40000 200000 400000 600000 0.0 0.2 0.4 0.6 0.8 1.0

Time series after scaling

200 400 600 1 2 3 4 10 20 0.0 0.2 0.4 0.6 0.8 1.0

Time series after q-transform

200 400 600 200 400 600 800 1000 250 500 750 1000 0.0 0.2 0.4 0.6 0.8 1.0

The Effectiveness of Discretization in Forecasting 3

Europe

slide-4
SLIDE 4

Continuous Transforms

Addressing the scaling problem in global forecasting is of utmost importance!

Scaling

Apply an affine transformation to each time series:

  • General form: z′

i,t = (zi,t − bi)/ai.

  • Classic mean scaling (ms):
  • ai =

1 Ti

Ti

t=1 |zi,t|

  • bi = 0
  • Lots of possible variations ...

Probability Integral Transform (pit)

Maps a RV Z through its CDF:

  • Y = FZ(Z) with Y being uniform.
  • Data preprocessing: make the

empirical marginal of each time series approximately uniform [3].

  • z′

i,t = ˆ

Fi(zi,t) with ˆ Fi being the ECDF for time series zi,1:Ti.

The Effectiveness of Discretization in Forecasting 4

Europe

slide-5
SLIDE 5

Discretizing Transforms

  • Binning function b : R → {1, 2, . . . , B} mapping a real input to a discrete output.
  • Each b ∈ {1, . . . , B} is tied to a bucket Sb = [lb−1, lb): b(z) = b iff z ∈ Sb.

Equally-Spaced Binning

Construct buckets to be equal in width:

  • Only optimal for uniform data.

A A A A A A A A A A

Quantile Binning (discrete pit)

Construct buckets to be equal in mass:

  • Adapts bins to fit the data distr.

A A WWW A A W I AAA

The Effectiveness of Discretization in Forecasting 5

Europe

slide-6
SLIDE 6

Our Binning Strategies: Local Absolute & Global Relative Binning

Local Absolute Binning (lab) Global Relative Binning (grb) ms ms ms Hybrid Binning (hyb) bin bin bin emb emb emb concat

The Effectiveness of Discretization in Forecasting 6

Europe

slide-7
SLIDE 7

Models & Output Distributions

Models

We consider three different models which we combine with the aforementioned I/O transformations:

  • Simple Feed Forward: SFF
  • Autoregressive CNN: WaveNet [2]
  • Autoregressive RNN: DeepAR [4]

zi,1:Ti φ xi,1:Ti+τ Model Distribution

Output Distributions

We compare three different approaches for modeling the output distribution p(zt|ht):

  • Student-t distribution (st);
  • Piecewise-linear spline quantile

function approach of [1] (plqs);

  • Categorical distribution (cat);

zi,Ti+1:Ti+τ ψ Model Distribution Likelihood

The Effectiveness of Discretization in Forecasting 7

Europe

slide-8
SLIDE 8

Experimental Results

  • Varying I/O representations with models on m4, electricity, traffic, wiki.

Output Scaling vs Binning

  • Output representation has large perf.
  • impact. Loss differences (max/min/avg):
  • WaveNet: 3.6x / 1.2x / 1.7x
  • DeepAR: 7.6x / 1.4x / 2.9x
  • SFF: 1.8x / 1.0x / 1.2x
  • WaveNet profits a lot from binning (8/9),

WaveNet with grb performs best (7/9).

  • DeepAR shows degradation in perf. with

binning over ms (avg 2.6x higher loss).

  • Mixed results for SFF (no clear winner).

Input Scaling vs Binning

  • Input representation has a smaller perf.
  • impact. Loss differences (max/min/avg):
  • WaveNet: 3.0x / 1.4x / 1.9x
  • DeepAR: 5.7x / 1.0x / 1.9x
  • SFF: 1.8x / 1.0x / 1.2x
  • There is no one clear dominant

representation outperforming others.

  • Multi-scale hybrid binning often does well

(6/9), lab performs badly (9/9).

  • grb and pit mostly on par (avg 1.4x).

The Effectiveness of Discretization in Forecasting 8

Europe

slide-9
SLIDE 9

Binning Resolution Effects (m4 hourly)

100 101 102 103 104 Number of input bins 0.03 0.04 0.05 0.06 0.07 Mean wQL

GRB LAB PIT

Performance effects of varying input binning resolutions w.r.t a fixed 1024-bin q-grb output binning.

101 102 103 104 Number of output bins 0.0 0.5 1.0 1.5 Mean wQL

GRB LAB

Performance effects of varying output binning resolutions w.r.t a fixed 1024-bin q-grb input binning.

The Effectiveness of Discretization in Forecasting 9

Europe

slide-10
SLIDE 10

Summary

Picking a good I/O representation is equally important as selecting a good model! Extended Paper: https://arxiv.org/abs/2005.10111 GluonTS: Probabilistic Time Series Modeling Library (Python): https://github.com/awslabs/gluon-ts

The Effectiveness of Discretization in Forecasting 10

Europe

slide-11
SLIDE 11

References

  • J. Gasthaus, K. Benidis, Y. Wang, S. S. Rangapuram, D. Salinas, V. Flunkert, and T. Januschowski.

Probabilistic Forecasting with Spline Quantile Function RNNs. In The 22nd International Conference on Artificial Intelligence and Statistics, 2019.

  • A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and
  • K. Kavukcuoglu.

Wavenet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499, 2016.

  • D. Salinas, M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus.

High-dimensional multivariate forecasting with low-rank gaussian copula processes. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 6824–6834. Curran Associates, Inc., 2019.

  • D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski.

DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting, 2019. The Effectiveness of Discretization in Forecasting 11

Europe