Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook - - PowerPoint PPT Presentation

prophet
SMART_READER_LITE
LIVE PREVIEW

Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook - - PowerPoint PPT Presentation

Prophet Forecasting at Scale Sean J. Taylor and Ben Letham Facebook / Core Data Science Outline Motivation and requirements Review of forecasting methods Curve-fitting as forecasting Uncertainty estimation Tuning parameters


slide-1
SLIDE 1

Prophet

Forecasting at Scale

Sean J. Taylor and Ben Letham Facebook / Core Data Science

slide-2
SLIDE 2

Outline

  • Motivation and requirements
  • Review of forecasting methods
  • Curve-fitting as forecasting
  • Uncertainty estimation
  • Tuning parameters

https://xkcd.com/605/

slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

Background

  • We have many applications that

require forecasts.

  • Often even a single metric must

be forecast numerous times (e.g. for each country)

  • Not many people have

forecasting training or experience.

  • Not many existing solutions or

tools.

slide-5
SLIDE 5

Many applications

Capacity planning

  • How many servers, employees, meals, parking

spaces, etc., are we going to need? Goal setting

  • How much would a metric grow by next year if

we did nothing at all? Anomaly detection

  • Is this spike in bug reports due to some actual

problem or because it’s a holiday in Brazil? Stuff we haven’t thought of yet

  • Forecasts can become components in complex

data pipelines.

slide-6
SLIDE 6

Forecasting experience is uncommon

Often people train for:

  • training and deploying

predictive models

  • working with text, image, and

graph data

  • experimentation (A/B testing)
  • data visualization
  • deep learning
slide-7
SLIDE 7

Pareto principle for forecasting

  • 80% of applications can be handled by a

relatively constrained class of models.

  • Don’t sell to “top-of-market” -- very

complex forecasting problems which can benefit from most advanced approaches (e.g. LSTMs).

  • We focus on scaling to more applications

by making forecasting quick, simple, and repeatable for human analysts to use.

  • We focus on scaling to more users by

making the tool easy to use for beginners with a path to improve models for experts.

Cumulative share of applications from lowest to highest complexity Cumulative share

  • f complexity
slide-8
SLIDE 8

Prophet

(semi) automate forecasting

  • find similarities across

forecasting problems

  • build a tool that can solve most
  • f them
  • make it easy to use + teach

everyone to use it

  • give a path forward to improving

forecasts

slide-9
SLIDE 9

Python API

>>> from fbprophet import Prophet >>> m = Prophet() >>> m.fit(data) >>> future = m.make_future_dataframe(periods=365) >>> forecast = m.predict(future)

Implementation

  • Python and R packages
  • CRAN: prophet
  • PyPI: fbprophet
  • Core procedure implemented in Stan

(a probabilistic programming language).

  • Version 0.1 released Feb 2017
  • Version 0.4 released Dec 2018
  • >8000 Github stars
slide-10
SLIDE 10

Review of time series methods

slide-11
SLIDE 11

AR and MA models

ε0 ε1 ε2 X1 X0 X2 ε0 ε1 ε2 X1 X0 X2 ε0 ε1 ε2 X1 X0 X2 ε0 ε1 ε2 X1 X0 X2

White noise AR(1) ARMA(1,1) MA(1)

slide-12
SLIDE 12

ARMA models

  • A special case of ARIMA

models with no integration (initial differencing step).

  • Problem: parameters don’t

correspond to any human- interpretable properties of the time series.

Xt =

p

i=1

αiXt−i +

q

i=1

θqϵt−q + ϵt

ARMA(p,q)

slide-13
SLIDE 13

Exponential smoothing

St = αXt + (1 − α)St−1

slide-14
SLIDE 14

Double exponential smoothing

St = αXt + (1 − α)(St−1 + Bt−1) Bt = β(St − St−1) + (1 − β)Bt−1

slide-15
SLIDE 15

Business time series features

  • outliers
  • multiple

seasonalities

  • changes in trends
  • abrupt changes
slide-16
SLIDE 16

Parameters should capture structure

slide-17
SLIDE 17

Curve fitting

“Curve Fitting by Segmented Straight Lines” Bellman and Roth (1969)

slide-18
SLIDE 18

Additive model

y(t) = piecewise trend(t) + seasonality(t) + holiday effects(t) + i.i.d. noise

slide-19
SLIDE 19

Polynomials

  • Polynomials are a natural

choice for fitting curves.

  • We can control the complexity
  • f the fit using the degree of

the polynomial.

  • But polynomials are terrible at

extrapolation.

slide-20
SLIDE 20

Splines

  • Splines are piecewise

polynomial curves.

  • They can have lower

interpolation error than polynomials with fewer terms.

slide-21
SLIDE 21

Piecewise linear

  • The main curve that Prophet

uses is piecewise linear.

  • These curves are simple to fit

and tend to extrapolate well.

  • The hard part is deciding

which “knots” or changepoints to use.

slide-22
SLIDE 22

Changepoint selection

  • We generate a grid of potential

changepoints.

  • Each changepoint is an
  • pportunity for the underlying

curve to change its slope.

  • Apply a Laplace prior

(equivalent to L1-penalty) to changes to select simpler curves.

slide-23
SLIDE 23

Changepoints in action

  • The Laplace prior is tuned

using a prior scale that is an input to the procedure.

  • Smaller prior scales result in

fewer changepoints and less flexible curves.

  • Notice how the trend line does

not vary much!

slide-24
SLIDE 24

Seasonality

  • A partial Fourier sum can

approximate an arbitrary periodic signal.

  • For a period P

, we generate N pairs

  • f terms using the following

periodic equation:

  • Coefficient parameters are fit to

data.

s(t) =

N

n=1 (an cos (

2πnt P ) + bn sin ( 2πnt P ))

slide-25
SLIDE 25

Estimating uncertainty

Three sources of uncertainty:

  • irreducible noise (路)
  • parameter uncertainty
  • trend forecast uncertainty
slide-26
SLIDE 26

Irreducible uncertainty

  • Anything Prophet cannot fit is

modeled as mean-zero i.i.d. random noise.

  • This creates tube-shaped

uncertainty in the forecast.

  • Large uncertainty indicates the

model has fit the historical data poorly.

slide-27
SLIDE 27

Parameter uncertainty

  • Every parameter we have fit in

the model has sampling variance.

  • This includes all seasonalities,

trends, and changepoints.

  • As an option, can use Stan’s

built in HMC implementation to sample draws from posterior.

Credit: Thomas Wiecki

slide-28
SLIDE 28

Trend uncertainty

  • ne large

trend change distribution

  • f simulated

future trends

slide-29
SLIDE 29

Trend change simulation

  • At each date in the forecast we

allow the trend to change.

  • The rate of change is

estimated based on how many changepoints were selected.

  • The distribution of changes is

selected based on their magnitudes.

slide-30
SLIDE 30

Tuning

If you run a forecasting procedure and you don’t like the forecast what can you?

  • Adjust the input data you

supply.

  • Manually edit the results in a

spreadsheet.

  • Change the parameters you

used for your model.

slide-31
SLIDE 31

Changepoint prior scale

  • How likely we are to include

changepoints in the model.

  • Controls flexibility of the curve.
  • Rigid curves: large i.i.d. errors

(tube shaped)

  • Flexible curves: large trend

uncertainty (cone shaped)

slide-32
SLIDE 32

Seasonality prior scale

  • Regularizes the parameters on

the Fourier expansion.

  • Overfitting seasonality can also

be controlled by turning off various types of seasonal patterns or using fewer Fourier terms.

slide-33
SLIDE 33

Capacities

  • Piecewise logistic growth

curves have a capacity parameter that we do not fit from data.

  • Often we can use obvious

constraints as upper and lower bounds on forecasts.

  • The user can specify the

capacity as a constant or as a time series.

slide-34
SLIDE 34

Holidays

  • Recurring events that can’t be

modeled by smooth curves.

  • We allow users to configure

these to allow custom dates.

  • We also provide standard

holidays for dozens of countries.

slide-35
SLIDE 35

Takeaways

  • Forecasting “at scale” is 25%

technology problem 75% people problem.

  • Prophet is a simple model (with some

tricks) but covers many important use- cases at Facebook and elsewhere.

  • Simple is good! Prophet works robustly

and fails in understandable ways.

  • Using curve-fitting with interpretable

parameters allows users to input their domain knowledge into forecasts.

slide-36
SLIDE 36

Conclusions

Try out Prophet! https://facebook.github.io/prophet/

  • Give us feedback -- when it works well and

when it doesn’t work well. Contribute to the project!

  • We welcome pull requests :)

Read our paper!

  • “Forecasting at Scale”


(The American Statistician) https://peerj.com/preprints/3190/