Forecasting the electricity consumption by aggregating specialized - - PowerPoint PPT Presentation

forecasting the electricity consumption by aggregating
SMART_READER_LITE
LIVE PREVIEW

Forecasting the electricity consumption by aggregating specialized - - PowerPoint PPT Presentation

Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Setting Algorithms


slide-1
SLIDE 1

Forecasting the electricity consumption by aggregating specialized experts

Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris)

June 2013 – WIPFOR

slide-2
SLIDE 2

Setting Algorithms Specialized experts

Goal

Consumption (GW) Mon Tue Wed Thu Fri Sat Sun 25 30 35 40

Goal Short-term (one-day-ahead) forecasting of the French electricity consumption

Many models developed by EDF R&D: parametric, semi-parametric, and non-parametric Evolution of the electrical scene in France

⇒ existing models get questionable       

Adaptive methods

  • f

models aggregation

2 / 20

slide-3
SLIDE 3

Setting Algorithms Specialized experts

Setting – Sequential prediction with expert advice

Each instance t

  • Each expert suggests a prediction xi,t of the consumption yt
  • We assign weight to each expert and we predict
  • yt =

pt · xt

  • =

N

  • i=1
  • pi,txi,t
  • Our goal is to minimize our cumulative loss

T

  • t=1

(

yt − yt)2

  • Our loss

=

min

i=1,...,N T

  • t=1

(xi,t − yt)2

  • Loss of the best expert

+

RT

  • Estimation error

Good set of experts Good aggregating algorithm

3 / 20

slide-4
SLIDE 4

Setting Algorithms Specialized experts

Setting – Sequential prediction with expert advice

Each instance t

  • Each expert suggests a prediction xi,t of the consumption yt
  • We assign weight to each expert and we predict
  • yt =

pt · xt

  • =

N

  • i=1
  • pi,txi,t
  • Our goal is to minimize our cumulative loss

T

  • t=1

(

yt − yt)2

  • Our loss

=

min

q∈∆N T

  • t=1

(q · xt − yt)2

  • Loss of the best

convex combination

+

RT

  • Estimation error

Good set of experts As varied as possible Good aggregating algorithm

4 / 20

slide-5
SLIDE 5

Setting Algorithms Specialized experts

Minimizing both approximation and estimation error

T

  • t=1

(

yt − yt)2

  • Our loss

=

min

q∈∆N T

  • t=1

(q · xt − yt)2

  • Approximation error

+

RT

  • Estimation error

Approximation error

⇒ good heterogeneous set of experts

Ex: specializing the experts, bagging, boosting, . . . Estimation error

⇒ efficient algorithm for aggregating specialized experts

Ex: Exponentially weighted average, Exponentiated Gradient, Ridge, . . . Prediction Learning and Games, Cesa-Bianchi and Lugosi, 2006

5 / 20

slide-6
SLIDE 6
  • I. Aggregating algorithms

June 2013 – WIPFOR

Prediction Learning and Games, Cesa-Bianchi and Lugosi, 2006

slide-7
SLIDE 7

Setting Algorithms Specialized experts

Exponentially weighted average forecaster (EWA)

Each instance t

  • Each expert suggests a prediction xi,t of the consumption yt
  • We assign to expert i the weight
  • pi,t =

exp

  • −η t−1

s=1(xi,s − ys)2

N

j=1 exp

  • −η t−1

s=1(xj,s − ys)2

  • and we predict

yt = N

i=1

pi,txi,t Our cumulated loss is upper bounded by

T

  • t=1

(

yt − yt)2

  • Our loss
  • min

i=1,...,d T

  • t=1

(xi,t − yt)2

  • Loss of the best expert

+

  • T log N
  • Estimation error

7 / 20

slide-8
SLIDE 8

Setting Algorithms Specialized experts

Exponentially weighted average forecaster (EWA)

Each instance t

  • Each expert suggests a prediction xi,t of the consumption yt
  • We assign to expert i the weight
  • pi,t =

exp

  • −η t−1

s=1(xi,s − ys)2

N

j=1 exp

  • −η t−1

s=1(xj,s − ys)2

  • and we predict

yt = N

i=1

pi,txi,t Our cumulated loss is upper bounded by

✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘ ✘ ❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳ ❳

T

  • t=1

(

yt − yt)2

  • Our loss
  • min

q∈∆N T

  • t=1

(q · xt − yt)2

  • Loss of the best

convex combination

+ ?

  • Estimation error

8 / 20

slide-9
SLIDE 9

Setting Algorithms Specialized experts

Motivation of convex combinations

Residuals −2000 −1000 1000 Gam KWF EWA EG

EWA

Weights Oct Nov Dec Jan Mar Avr 0.0 0.2 0.4 0.6 0.8 1.0 Gam KWF

EG

Oct Nov Dec Jan Mar Avr

9 / 20

slide-10
SLIDE 10

Setting Algorithms Specialized experts

Exponentiated gradient forecaster (EG)

Each instance t

  • Each expert suggests a prediction xi,t of the consumption yt
  • We assign to expert i the weight
  • pi,t ∝ exp
  • −ηt−1

s=1ℓi,s

  • where ℓi,s = 2(

ys − ys)xi,s

  • and we predict

yt = N

i=1

pi,txi,t Our cumulated loss is then bounded as follow

T

  • t=1

(

yt − yt)2

  • Our loss
  • min

q∈∆N T

  • t=1

(q · xt − yt)2

  • Loss of the best

convexe combination

+

  • T log N
  • Estimation error

Idea of proof

T

t=1 (

yt − yt)2 − (q⋆ · xt − yt)2

  • T

t=1

2( pt · xt − yt)xt

  • ·(

pt − q⋆)

= T

t=1

ℓt · (

pt − q⋆)

  • T

t=1

pt · ℓt − min

i

T

t=1 ℓi,t

10 / 20

slide-11
SLIDE 11
  • II. A good set of experts

June 2013 – WIPFOR

slide-12
SLIDE 12

Setting Algorithms Specialized experts

Consider as heterogeneous experts as possible

Some ideas to get more variety inside the set of experts Consider heterogeneous prediction methods

  • Gam: semi-parametric method

Generalized Additive Models, Wood, 2006

  • KWF: functional method based on similarity between days

Clustering functional data using Wavelets, Antoniadis and al, 2013

Create new experts from the same method thanks to boosting, bagging Vary the considered covariate: weather, calendar, . . . Specializing the experts: focus on specific situation (cloudy days,. . . ) during the training

12 / 20

slide-13
SLIDE 13

Setting Algorithms Specialized experts

The dataset

The dataset includes 1 696 days from January 1, 2008 to June 15, 2012 The electricity consumption of EDF customers Side information

  • weather: temperature, nebulosity, wind
  • temporal: date, EJP
  • loss of clients

We remove uncommon days (public holidays ±2) i.e., 55 days each year. We split the dataset in two subsets

  • Jan. 2008 – Aug. 2011: training set to build the experts
  • Sept. 2011 – Jun. 2012: testing set

13 / 20

slide-14
SLIDE 14

Setting Algorithms Specialized experts

Performance of the forecasting methods and of the aggregating algorithms

Method

RMSE (MW)

Gam 847 KWF 1287 EWA 813 EG 778

EWA

Weights Oct Nov Dec Jan Mar Avr 0.0 0.2 0.4 0.6 0.8 1.0 Gam KWF

EG

Weights Oct Nov Dec Jan Mar Avr 0.0 0.2 0.4 0.6 0.8 1.0

14 / 20

slide-15
SLIDE 15

Setting Algorithms Specialized experts

Specializing the experts to diversify

Idea Focus on specific scenarios during the training of the methods Meteorological scenarios High / low temperature High / low variation of the temperature (since the last day, during the day) Other scenarios High / low consumption Winter / summer Such specialized experts suggest prediction only the days corresponding to their scenario

15 / 20

slide-16
SLIDE 16

Setting Algorithms Specialized experts

Specializing a method in cold days

At day t, we consider Tt = average temperature of the day We normalize Tt on [0, 1] and we choose for each day the weight wt = (1 − Tt)2 We then train our forecasting method us- ing the prior weights wt on the training days

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 Activation weight Density Activation weights Feb May Jul Oct 0.0 0.2 0.4 0.6 0.8

16 / 20

slide-17
SLIDE 17

Setting Algorithms Specialized experts

Weights given in 2008 for several specializing scenarios

Difference of temp. with last day

Activation weights 0.0 0.2 0.4 0.6 0.8

Hot / cold days High / low consumption

Activation weights Feb May Jul Oct 0.0 0.2 0.4 0.6 0.8

Variation of temp. during the day

Feb May Jul Oct

17 / 20

slide-18
SLIDE 18

Setting Algorithms Specialized experts

Aggregating experts that specialize

Setting Each day some of the experts are active and output predictions (according to their specialization) while other experts do not When the expert i is non active, we do not have access to its prediction A solution is to assume that non active experts output the same prediction

  • yt as we do and solve the fixed-point equation
  • yt =
  • j active
  • pj,t xj,t +
  • i non active
  • pi,t

yt Can be extended to activation functions of the experts ∈ [0, 1]

Forecasting the electricity consumption by aggregating specialized experts, Devaine and al., 2013

18 / 20

slide-19
SLIDE 19

Setting Algorithms Specialized experts

Performance of algorithms with specialized experts

Méthode

RMSE (MW)

Gam 847 KWF 1287 EWA 813 EG 778 Spec + EWA 765 Spec + EG 714

19 / 20

slide-20
SLIDE 20

Setting Algorithms Specialized experts

Performance of algorithms with specialized experts

Hour

RMSE (MW) 500 1000 1500 4 8 12 16 20 Gam KWF

Month

500 1000 1500 Jan Feb Mar Avr May Jun Sep Oct Nov Dec EWA EG Spec.EWA Spec.EG

20 / 20