Forecasting intraday-load curve using sparse learning methods - - PowerPoint PPT Presentation

forecasting intraday load curve using sparse learning
SMART_READER_LITE
LIVE PREVIEW

Forecasting intraday-load curve using sparse learning methods - - PowerPoint PPT Presentation

Forecasting intraday-load curve using sparse learning methods Dominique Picard LPMA- Universit Paris-Diderot-Paris 7 Collaborators : Mathilde Mougeot UPD, Vincent Lefieux RTE, Laurence Maillard RTE Numerical methods for high dimensional


slide-1
SLIDE 1

Forecasting intraday-load curve using sparse learning methods

Dominique Picard

LPMA- Université Paris-Diderot-Paris 7

Collaborators : Mathilde Mougeot UPD, Vincent Lefieux RTE, Laurence Maillard RTE

Numerical methods for high dimensional problems Dominique Picard Forecasting intraday-load curve using sparse learning

slide-2
SLIDE 2

Pre-big data- framework, towards streaming machine learning

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-3
SLIDE 3

Pre-streaming machine learning

  • Volume - moderate
  • Variety -moderate
  • Velocity -small

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-4
SLIDE 4

Pre-streaming machine learning

  • Volume - moderate
  • smart (data-driven) organisation of the information
  • methods allowing increasing volume of data
  • Variety -moderate
  • multidimensional functional data
  • Velocity -small

We describe a forecasting pipeline i.e. chain of learning algorithms to achieve a final functional prediction.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-5
SLIDE 5

Description of the problem

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-6
SLIDE 6

Intraday load curve during a week Monday January 25th to Sunday January 31th

20 40 60 80 100 120 140 160 180 6 6.5 7 7.5 8 8.5 9 x 104 Dominique Picard Forecasting intraday-load curve using sparse learning

slide-7
SLIDE 7

Intraday load curve forecasting -here 48h-

10 20 30 40 50 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 x 10

4

20100602−3−−20100603−4 Y Yapx tpred

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-8
SLIDE 8

Forecasting pipeline

1 Construction of a ’ smart encyclopedia’ of past scenarios

  • ut of a data basis using different learning algorithms.

2 Build a set of prediction experts consulting the

encyclopedia.

3 Aggregate the prediction experts

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-9
SLIDE 9

Data basis The past data basis

  • Electrical consumption of the past
  • Other ’shape variables’: calendar data, functional bases
  • Meteorological input

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-10
SLIDE 10

Electrical consumptions of the past

  • Recorded every half hour from January 1st, 2003 to August

31th, 2010.

  • For this period of time, the global consumption signal is

split into N = 2800 sub signals (Y1, . . . , Yt, . . . , YN). Yt ∈ Rn, defines the intra day load curve for the tth day of size n = 48.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-11
SLIDE 11

Intraday load curve for seven days Monday January 25th to Sunday January 31th

20 40 60 80 100 120 140 160 180 6 6.5 7 7.5 8 8.5 9 x 104 Dominique Picard Forecasting intraday-load curve using sparse learning

slide-12
SLIDE 12

Shape and seasonal effects

5 10 15 20 25 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 x 104

Figure : Intraday load curves for various days. 2010-02-03 winter: black dashed dot line, 2010-05-21 spring: red dashed line, 2009-10-23 autumn: green solid line, 2010-08-19 summer: blue dot line, 2010-01-01 public day: gray dot line.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-13
SLIDE 13

Calendar and functional effects (endogenous)

5 10 15 20 25 3.5 4 4.5 5 5.5 6 x 10

4

5 10 15 20 25 6.4 6.6 6.8 7 7.2 7.4 7.6 7.8 8 8.2 8.4 x 10

4

5 10 15 20 25 6 6.2 6.4 6.6 6.8 7 7.2 7.4 7.6 7.8 x 10

4

5 10 15 20 25 3 3.2 3.4 3.6 3.8 4 4.2 4.4 x 10

4

Figure :

autumn, winter, spring and summer Dominique Picard Forecasting intraday-load curve using sparse learning

slide-14
SLIDE 14

Calendar and functional effects (shape description)

  • Consumption on day T can be explained by consumption of

days t′ < T of the past.

  • can be explained by calendar values of the day T

(monday,..., sunday, months, seasons,...

  • Is a function of time and can be expressed in a standard

dictionary of functions (wavelets, Fourier,..)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-15
SLIDE 15

Functional aspect : dictionary

10 20 30 40 50 60 70 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4

Figure : Functions of the dictionary. Constant (black-solid line), cosine (blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and temperature (green-solid line with points) functions.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-16
SLIDE 16

Meteorological inputs: Exogeneous variables

  • A total of 371 (=2x39+293) meteorological variables
  • recorded each day half-hourly over the 2800 days of the

same period of time.

Temperature:

Tk for k = 1, . . . , 39 measured in 39 weather stations scattered all over the French territory.

Cloud Cover:

Nk for k = 1, . . . , 39 measured in the same 39 weather stations.

Wind:

Wk′ for k′ = 1, . . . , 293 available at 293 network points scattered all over the territory.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-17
SLIDE 17

Weather stations

Figure :

Temperature and Cloud covering measurement stations. Wind stations Dominique Picard Forecasting intraday-load curve using sparse learning

slide-18
SLIDE 18

Brest- Lille- Marseille

5 10 15 20 25 −4 −2 2 4 6 8 10 12

(a) T

5 10 15 20 25 10 20 30 40 50 60 70 80

(b) CC

5 10 15 20 25 1 2 3 4 5 6 7 8 9 10 11

(c) W

Figure :

Brest (blue line), Lille (red line) and Marseille (green). Dominique Picard Forecasting intraday-load curve using sparse learning

slide-19
SLIDE 19

Main issues

1 Large dimension 2 Prediction requires to explain with a small number of

predictable parameters

3 Most of the potentially explanatory variables (load curve,

meteo, functions of the dictionary) are highly correlated

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-20
SLIDE 20

Reduced set of explanatory variables

For each t index of the day of interest, we register the daily electrical consumption signal Yt and Zt = [[C]t [M]t] [C]t is the concatenation of the "calendar,functional, past-consumptions" variables and [M]t "meteo variables".

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-21
SLIDE 21

Sparse approximation on the learning set Sparse Approximation of each consumption day on a learning set of days (2003-2010), using the set of potentially explanatory variables.

  • For each day t of the learning set, we build an

approximation ˆ Yt of the (observed) signal Yt with the help

  • f the set of explanatory variables (Zt):
  • ˆ

Yt = Gt(Zt) Gt(Zt) = Zt ˆ βt

(∗) Sparse Approximation and Knowledge Extraction for Electrical Consumption Signals, 2012,

  • M. Mougeot, D. P., K. Tribouley & V. Lefieux, L. Teyssier-Maillard

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-22
SLIDE 22

High dimensional Linear Models

Y = Xβ + ǫ β ∈ I Rk is the unknown parameter (to be estimated)

  • ǫ = (ǫ1, . . . , ǫn)∗ is a (non observed) vector of random
  • errors. It is assumed to be variables i.i.d. N(0, σ2)
  • X is a known matrix n × k.

High dimension : k >> nt

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-23
SLIDE 23

Forecasting procedure Forecasting using the encyclopedia

  • Construction of a set of forecasting experts.
  • Aggregation of the experts.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-24
SLIDE 24

Expert associated to the strategy M Forecasting experts

  • Strategy : M a function, data dependent or not, from N to

N such that for any d ∈ N, M(d) < d (purely non anticipative).

  • Plug-in To the strategy M we associate the expert ˜

YM

t :

the prediction of the signal of day t using forecasting strategy M, ˜ YM

t

= GM(t)(Zt) = Zt ˆ βM(t)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-25
SLIDE 25

Examples of strategies : time depending

tm1: Refer to the day before: (The coefficients used for prediction are those calculated the previous day) M(d) = d − 1 ˜ Ytm1

t

= Zt ˆ βt−1 tm7: Refer to one week before: M(d) = d − 7 ˜ Ytm7

t

= Zt ˆ βt−7

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-26
SLIDE 26

Experts introducing meteorological scenarios

  • T: Find the day having the closest temperature indicators,

regarding the sup distance (over the days, and over the indicators): M(d) = ArgMint supk∈{1,...,6}, i∈{1,...,48} |Tk

d(i) − Tk t (i)|

  • Tm: Find the day having the closest median temperature

with the sup distance (over the days): M(d) = ArgMint supi∈{1,...,48} |T3

d(i) − T3 t(i)|

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-27
SLIDE 27

MAPE error

For day t, the prediction MAPE error over the interval [0, T] is defined by: MAPE(Y, ˜ YM

t )(T) = 1

T

T

  • i=1

|˜ YM

t (i) − Yt(i)|

Yt(i) MISE(Y, ˜ YM

t )(T) = 1

T

T

  • i=1

|˜ YM

t (i) − Yt(i)|2

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-28
SLIDE 28

Prediction evaluation

M mean med min max Naive 0.0634 0.0415 0.0046 0.1982 Apx 0.0129 0.0104 0.0023 0.0786 tm1 0.0340 0.0281 0.0063 0.1490 tm7 0.0327 0.0258 0.0054 0.2297 T 0.0306 0.0263 0.0058 0.1085 Tm 0.0329 0.0275 0.0047 0.2020 T/N 0.0347 0.0293 0.0056 0.1916 Tm/N 0.0358 0.0300 0.0054 0.2156 T/G 0.0323 0.0271 0.0050 0.1916 T/d 0.0351 0.0278 0.0053 0.1916 T/c 0.0340 0.0259 0.0053 0.1937 Ns/G 0.0322 0.0251 0.0049 0.2078 N/d 0.0305 0.0239 0.0042 0.1449 N/c 0.0307 0.0237 0.0042 0.1990

Table : Mape errors for the different strategies for intraday

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-29
SLIDE 29

Prediction evaluation-Comparing experts

tm1 tm7 Ts Ts/N Tm Tm/N T/g N/g T/j N/j T/c N/c 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Best Predictor Performance

Figure : Percentage of best predictor

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-30
SLIDE 30

Prediction evaluation-Comparing experts on days

1 2 3 4 5 6 7 0.05 0.1 0.15 0.2 0.25 Ranking Predictor Performances per Day tm1 tm7 Ts Ts/N Tm Tm/N T/g N/g T/j N/j T/c N/c

Figure : Percentage of best predictor among days (1:monday, . . . 7:sunday)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-31
SLIDE 31

Prediction evaluation-Comparing experts

2 4 6 8 10 12 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 Best Predictor Performances per Month tm1 tm7 Ts Ts/N Tm Tm/N T/g N/g T/j N/j T/c N/c

Figure : Percentage of best predictor among month

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-32
SLIDE 32

Aggregation of predictors: Exponential weights

(inspired by various theoretical results -see Lecue, Rigollet, Stolz, Tsybakov,...-) ˜ Ywgt

d ∗ =

M

m=1 wm d ˜

Ym

d

M

m=1 wm d

with wM

d

= exp(− 1 Tθ

T

  • i=1

|˜ YM

d (i) − Yt(i)|2)

θ is a parameter, (often called temperature in physic applications, see the discussion below) T = Tpred.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-33
SLIDE 33

(mape=0.7%).

10 20 30 40 50 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 x 10

4

20100602−3−−20100603−4 Y Yapx tpred

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-34
SLIDE 34

Forecasting

(mape=0.7%).

10 20 30 40 50 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 x 10

4

20100602−3−−20100603−4 Y Yapx tpred

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-35
SLIDE 35

Winter forecast

20 40 60 80 100 120 140 160 180 5 5.5 6 6.5 7 7.5 8 8.5 9 x 104

Figure : Forecast (solid blue line) and observed (dashed dark line) electrical consumption

for a winter week from Monday February 1st to Sunday January 7th 2010. Dominique Picard Forecasting intraday-load curve using sparse learning

slide-36
SLIDE 36

Spring forecast

20 40 60 80 100 120 140 160 180 2.5 3 3.5 4 4.5 5 5.5 6 x 104

Figure : Forecast (solid blue line) and observed (dashed dark line) electrical consumption

for a spring week from Monday June 14th to Sunday June 21th 2010. Dominique Picard Forecasting intraday-load curve using sparse learning

slide-37
SLIDE 37

Sparse methods- collinearity- structure

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-38
SLIDE 38

High dimensional Linear Models

Y = Xβ + ǫ β ∈ I Rk is the unknown parameter (to be estimated)

  • ǫ = (ǫ1, . . . , ǫn)∗ is a (non observed) vector of random
  • errors. It is assumed to be variables i.i.d. N(0, σ2)
  • X is a known matrix n × k.

High dimension : k >> nt

(∗) M. Mougeot, D. P., K. Tribouley, JRSS B 2012,B Stat. Methodol. vol 74 Dominique Picard Forecasting intraday-load curve using sparse learning

slide-39
SLIDE 39

FBUND sparse reconstruction

500 1000 1500 2000 2500 122.6 122.65 122.7 122.75 122.8 122.85 122.9 122.95 123 123.05 123.1 FBund 20091207 Trading time

  • M. Mougeot

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-40
SLIDE 40

FBUND sparse reconstruction

500 1000 1500 2000 2500 122.6 122.65 122.7 122.75 122.8 122.85 122.9 122.95 123 123.05 123.1 FBund 20091207, S=11 Trading time

  • M. Mougeot

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-41
SLIDE 41

Genomic example

Y =      1 . . . 1      X =

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-42
SLIDE 42

The matrix X : genomic

X =

  • X : expression of different genes

behaves like n × p random variables i.i.d. N(0, 1). (large random matrices)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-43
SLIDE 43

Signal denoising

Y =

500 1000 1500 2000 2500 122.6 122.65 122.7 122.75 122.8 122.85 122.9 122.95 123 123.05 123.1 FBund 20091207 Trading time

What is X in this case ?

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-44
SLIDE 44
  • Statistical learning, regression estimation

Yi = f(ti) + ǫi + ui, i = 1 . . . n

  • ǫ′

is are i.i.d. N(0, 1).

  • ui’s possibly random, not necessarily random nor iid but

’small’.

  • ti are observation times (ti = i

n).

  • f is the parameter to be estimated.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-45
SLIDE 45

Using a dictionary

To estimate f, we consider a dictionary D of size #D = p D = {g1, . . . gp} and assume that f can be well fitted by this dictionary. f =

p

  • ℓ=1

βℓ gℓ + h (1) where hopefully h is a ’small’ function (in absolute value).

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-46
SLIDE 46

Modeling

Which coincide with the following model: Y = Xβ + u + ǫ if we put ui = h(ti) and X =     g1(t1) . . . gp(t1) . . . . . . g1(tn) . . . gp(tn)    

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-47
SLIDE 47

The dictionary problem

Of course sparsity is linked with the dictionary.

  • Fourier Basis
  • Wavelet basis
  • Needlets
  • Combination of ’bases’

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-48
SLIDE 48

Fourier basis

5 10 15 20 25 30 35 40 45 50 −1 −0.5 0.5 1 1.5 Dictionary func 1 5 10 15 20 25 30 35 40 45 50 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 Dictionary func 4 5 10 15 20 25 30 35 40 45 50 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 0.25 Dictionary func 30

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-49
SLIDE 49

Haar wavelets

50 −0.2 0.2 Dictionary func 48 50 −0.5 0.5 Dictionary func 49 50 −0.5 0.5 Dictionary func 50 50 −0.5 0.5 Dictionary func 51 50 −0.5 0.5 Dictionary func 52 50 −0.5 0.5 Dictionary func 53 50 −0.5 0.5 Dictionary func 54 50 −0.5 0.5 Dictionary func 55 50 −0.5 0.5 Dictionary func 56 50 −0.5 0.5 Dictionary func 57 50 −0.5 0.5 Dictionary func 58 50 −0.5 0.5 Dictionary func 59 50 −0.5 0.5 Dictionary func 60 50 −0.5 0.5 Dictionary func 61 50 −0.5 0.5 Dictionary func 62

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-50
SLIDE 50

Conditions generally required to solve the problem

  • ’Sparsity’. conditions on the vector β
  • Conditions on the matrix X (not too high collinearities,

RIP...

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-51
SLIDE 51

Restricted identity property

For C ⊂ {1, . . . p}, denote XC the matrix X restricted to the raws which are in C and the associated Gram-matrix M(C) := 1 nXt

CXC

Restricted identity property. means that M(C) is almost the identity matrix for any C small enough.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-52
SLIDE 52

Example 1: RIP

RIP(m0, ν) assumes that There exist 0 ≤ ν < 1 and m0 ≥ 1 such that : ∀x ∈ I Rm, x2

l2(m)(1 − ν) ≤ xtM(C)x ≤ x2 l2(m)(1 + ν),

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-53
SLIDE 53

Example 2: Coherence condition

  • M := 1

nXtX.

  • Mjj = 1 for all j.
  • Coherence.

τn = sup

ℓ=m

|Mℓm| = sup

ℓ=m

|1 n

n

  • i=1

XiℓXim| Coherence = ⇒ RIP(⌊ν/τn⌋, ν)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-54
SLIDE 54

Sparsity conditions

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-55
SLIDE 55

Sparsity conditions

# {ℓ ∈ {1, . . . , k}, |βℓ| = 0} ≤ S

|βℓ|q ≤ M, 0 < q < 1 (Bq(M)) SMALL NUMBER OF BIG COEFFICIENTS

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-56
SLIDE 56

Penalization for sparsity

Many penalizations introduced historically in the regression framework (to put identification constraints on β)

  • Ridge: E(β, λ) = ||Y − Xβ||2 + λΣjβ2

j

  • Lasso: E(β, λ) = ||Y − Xβ||2 + λΣj|βj|
  • Scad: E(β, λ) = ||Y − Xβ||2 + λΣjwjg(βj)

Solutions based on: → Convex Optimization for l2, l1, non convex Opti. for Scad Candes & Tao (2007), Fan & Lv (2008, 2010), ... Many others...

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-57
SLIDE 57

Fast greedy methods : 2-step thresholding procedures

Y = Xβ + ǫ Y (n × 1), X (n × k)

steps compute size Step 1=pre-selection Find b Leaders Xb (n, b) b < n << k Least squares

  • n Leaders

˜ β = (X∗

bXb)−1X∗ bY

(1, b) Step 2=denoising the coefficients ˆ β (1, ˆ S)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-58
SLIDE 58

LOL : coefficient-wise : Step1

B = {ℓ, Kℓ ≥ λ1}, Kℓ = | 1

n

N

i=1 XiℓYi|

100 200 300 400 500 600 700 800 900 1000 0.5 1 1.5 2 2.5 3 3.5 4

λ1

n=250, S=10, b=2.0,

n = 250, p = 1000, X i.i.d. N(0, 1), S = 10 card(B) = 170 >> S

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-59
SLIDE 59

LOL: step 3

20 40 60 80 100 120 140 160 180 −3 −2 −1 1 2 3 4 5

λ2 λ2

n=250, S=10, p=1000,b=2,SNR=5

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-60
SLIDE 60

Structuring

Y = Xβ + ǫ, X : N × k We decide to re-arrange the k predictors into p (p ≤ k) groups

  • f variables

X = [XG1, . . . , XGp] where G1, . . . , Gp is a partition of {1, . . . , k}. Xℓ = X(j,t), XGj = [X(j,1), . . . , X(j,|Gj|)]

  • j ∈ {1, . . . , p} is the index of the group Gj
  • t is the altitude (height) of ℓ inside the group Gj.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-61
SLIDE 61

Structured Sparsity

  • Structured Sparsity

p

  • j=1

wjβq

Gj,r = p

  • j=1

wj[

T

  • t=1

|β(j,t)|r]q/r ≤ (M)q. if wj = 1, r ≥ q,

p

  • j=1

βq

Gj,r = p

  • j=1

[

T

  • t=1

|β(j,t)|r]q/r ≤

  • j,t

|β(j,t)|q

  • Structured sparsity generally less stringent than ordinary
  • ne
  • Means we require a small number of ’big’ groups

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-62
SLIDE 62

Example of structure: Wavelet-grouping

  • Block thresholding (global blocks)

β = (βjk) Gj = {(j, k), 0 ≤ k ≤ 2j}, 0 ≤ j ≤ p Size of Gj = 2j, Sparsity = Besov(s, r, q)(M)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-63
SLIDE 63

GR-LOL - step1

The columns of X are again normalized 1 n

N

  • i=1

X2

i(j,t) = 1, ∀ (j, t).

"grouped correlation" search and thresholding: K(j,t) = |1 n

N

  • i=1

Xi(j,t)Yi| ∀ (j, t), 1 ≤ j ≤ p, 1 ≤ t ≤ T ρ2

j =

  • t=1,...,T

K2

(j,t),

T = max |Gj|

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-64
SLIDE 64

GR-LOL - step1

  • λ(1) is tuning parameter
  • B =
  • j = 1, . . . , p, ρ2

j ≥ λ(1)2

(2)

  • GB = ∪j∈BGj.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-65
SLIDE 65

GR-LOL - step2

OLS on the block-leaders by considering the new pseudo-linear model Y = XGBβGB + error. ˆ βGB = ˆ β(B) and ˆ βGc

B = 0

where ˆ β(B) = [Xt

GBXGB]−1XGBY.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-66
SLIDE 66

GR-LOL - step3 Block-Thresholding

  • λ(2) is another tuning parameter.
  • We apply the second threshold on the estimated coefficients

∀ℓ = (j, t) ∈ {1, . . . , k}, ˆ β∗

ℓ = ˆ

βℓ I{ ˆ βGj,2 ≥ λ(2) }

  • ˆ

β2

Gj,2 :=

  • 0≤t≤T

ˆ β2

(j,t).

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-67
SLIDE 67

Boosting the convergence by grouping

Y = Xβ + ǫ Y (N × 1), X (N × k) p groups

  • Calculate the (internal) correlations of the columns of the

matrix X as well as their (external) correlation with the target Y.

  • Put columns which are highly correlated (internal

correlation) in different groups

  • Gather the columns with typically close correlation to the

target (external correlation)

  • Make T (number of groups) as small as possible

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-68
SLIDE 68

Boosting the rates : cut off

  • Divide the columns of X into two sets : S1 : highly

correlated, S2 : weakly correlated.

  • Put S1 as ’group beginners’ (each of them has smallest

altitude in its group) to separate them.

  • Choose the cut off between S1 and S2.
  • Fill the groups with affinity with the delegate in terms of

Kl = | 1

n

N

i=1 XilYi| : Gathering the columns with typically

close correlation with the target

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-69
SLIDE 69

Back to electrical consumption

10 20 30 40 50 60 70 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 x 104 time

Figure : French consumption

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-70
SLIDE 70

Temperatures

Figure : Temperature spots

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-71
SLIDE 71

Dictionary

10 20 30 40 50 60 70 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4

Figure : Functions of the dictionary. Constant (black-solid line), cosine (blue-dotted line), sine (blue-dashdot line), Haar (red-dashed line) and temperature (green-solid line with points) functions.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-72
SLIDE 72

Delegates

20 40 60 80 100 1000 2000 3000 4000 5000 6000 1 1 1 3 3 3 E C S H T

Figure : "Correlation" between the consumption signal and the various dictionary

  • functions. The chosen delegates are tagged with a red star.
  • For LOL, E = 1.86% (×24) selected functions:

T-T-C-T-H-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-T-S-C and are meaningful functions (20 : T), (2 : C), (1 : S), (1 : H) .

  • Group LOL: E = 0.75%, 24 regressors / 8 groups

THS-THH-THH-TCS-TCST-HHTC-STSH. meaningful functions (8 : T); (3 : C); (5 : S); (8 : H).

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-73
SLIDE 73

Approximation

10 20 30 40 50 60 70 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 x 104 sig grol lol

Figure : Model of the consumption signal (black-solid line) using GROL (red-dashed line) and LOL (blue dot dashed line).

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-74
SLIDE 74

Pre-processing the explanatory variables

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-75
SLIDE 75

Reduced dictionary : endogeneous variables : patterns

  • Represent sparsely each day on the dictionary (H, S, C).
  • Use K-means algorithm to cluster this representation : 8

groups

  • Define these groups into calendar boolean variables
  • Define in each group the consumption ’pattern’ of the

group (simply the mean) meanG(t)

  • Zt = [[C]t [M]t]
  • Put [C]t = [meanG(t), Yt−7]

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-76
SLIDE 76

Reduced dictionary, groups of patterns

Table : Groups, 1 . . . 8, are defined using a calendar interpretation of clusters from Monday (day 1) to Sunday (day 7) and from January (month 1) to December (month 12) computed form January 1st to August 31th [?].

. Months Days 1 2 3 4 5 6 7 8 9 10 11 12 1 7 8 5 3 3 3 3 1 3 3 5 7 2 7 8 5 3 3 3 3 1 3 3 5 7 3 7 8 5 3 3 3 3 1 3 3 5 7 4 7 8 5 3 3 3 3 1 3 3 5 7 5 7 8 5 3 3 3 3 1 3 3 5 7 6 6 8 4 4 2 2 2 2 2 2 4 6 7 6 6 4 4 2 2 2 2 2 2 4 6

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-77
SLIDE 77

K-means algorithm

1 Place K points into the space represented by the objects

that are being clustered. These points represent initial group centroids.

2 Assign each object to the group that has the closest

centroid.

3 When all objects have been assigned, recalculate the

positions of the K centroids.

4 Repeat Steps 2 and 3 until the centroids no longer move.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-78
SLIDE 78

Important features

1 Number of clusters 2 Statibility of the algorithm

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-79
SLIDE 79

Reduced dictionary : Meteo variables

  • Linear summary of the variables PCA 90% of the variance.

Each variable separately.

  • Non-linear summary : for each variable,

(Max, Min, Med, Variance)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-80
SLIDE 80

Convergence results

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-81
SLIDE 81

2-thresholding-step Procedures

Y = Xβ + ǫ Y (n × 1), X (n × p) steps compute size Step 1=preselection Find b Leaders Xb (n, b) b < n << p least squares

  • n Leaders

˜ β = (X∗

bXb)−1X∗ bY

(1, b) Step 2=denoising the coefficients ˆ β (1, ˆ S)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-82
SLIDE 82

GR-LOL - step1

The columns of X are normalized 1 n

N

  • i=1

X2

i(j,t) = 1, ∀ (j, t).

"grouped correlation" search and thresholding: K(j,t) = |1 n

N

  • i=1

Xi(j,t)Yi| ∀ (j, t), 1 ≤ j ≤ p, 1 ≤ t ≤ T ρ2

j =

  • t=1,...,T

K2

(j,t),

T = max |Gj|

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-83
SLIDE 83

GR-LOL - step1

  • λ(1) is tuning parameter
  • B =
  • j = 1, . . . , p, ρ2

j ≥ λ(1)2

(3)

  • GB = ∪j∈BGj.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-84
SLIDE 84

GR-LOL - step2

OLS on the block-leaders by considering the new pseudo-linear model Y = XGBβGB + error. ˆ βGB = ˆ β(B) ( hence ˆ βGc

B = 0)

where ˆ β(B) = [Xt

GBXGB]−1XGBY.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-85
SLIDE 85

GR-LOL - step3 Block-Thresholding

  • λ(2) is another tuning parameter.
  • We apply the second threshold on the estimated coefficients

∀ℓ = (j, t) ∈ {1, . . . , k}, ˆ β∗

ℓ = ˆ

βℓ I{ ˆ βGj,2 ≥ λ(2) }

  • ˆ

β2

Gj,2 :=

  • 0≤t≤T

ˆ β2

(j,t).

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-86
SLIDE 86

GR-LOL - Tuning thresholds

Choose:

  • Threshold λ1 such that

GRLOL λ(1) = κ1[

  • T ∨ log p

n ∨ τ ∗] LOL λ1 = κ1[

  • log p

n ∨ τ ∗]

  • Threshold λ2 such that

GRLOL λ(2) = κ2[

  • T ∨ log p

n ∨ τ ∗] LOL λ2 = κ2[

  • log p

n ∨ τ ∗] T := max

j

|Gj|

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-87
SLIDE 87

Concentration results

Loss Function d(ˆ β∗, β)2 =

k

  • l=1

( βl − βl)2 Assumptions

  • Sparsity:

p

  • j=1

βq

Gj,1 = p

  • j=1

[

T

  • t=1

|β(j,t)|]q ≤ (M)q. (β ∈ B1,q(M))

  • Dimension: p ≤ exp(c′n),

(c′ constant)

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-88
SLIDE 88

Concentration results

sup

B1,q(M)

Ed(ˆ β∗, β)2 ≤ D[

  • T ∨ log p

n ∨ τ ∗](2−q) sup

B1,0(S)

Ed(ˆ β∗, β)2 ≤ DS[

  • T ∨ log p

n ∨ τ ∗]2 for some positive constant D What is τ ∗ ?

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-89
SLIDE 89

Coherence.

  • Let M be the k × k Gram matrix :

M := 1 nX∗X.

  • and the Coherence

τn = sup

ℓ=m

|Mℓm| = sup

ℓ=m

|1 n

N

  • i=1

XiℓXim| = sup

(j,t)=(j′,t′)

|M(j,t)(j′,t′)| = sup

(j,t)=(j′,t′)

|1 n

N

  • i=1

Xi(j,t)Xi(j′,t′)|

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-90
SLIDE 90

Splitting the coherence : multitask inspiration

We split the coherence τn into γBG and γBA where γBG := sup

t

sup

j=j′

  • M(j,t)(j′,t)
  • .

between groups-given altitude, sup over altitude

γBA := sup

j,j′ sup t=t′

  • M(j,t)(j′,t′)
  • (small)

different altitudes, no matter which groups Dominique Picard Forecasting intraday-load curve using sparse learning

slide-91
SLIDE 91

τ ∗

Let us define : τ ∗ = T γBA + γBG where T = maxj=1,...,p #{Gj}.

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-92
SLIDE 92

Example

ST coefficients, all equal to γ. γBA = 0, γBG = γ ≥

  • log k

n LOL GRLOL (opt) GRLOL (worse) RATES ST[γ2 + log k

n ]

S[γ2 + T

n + log k/T n

] ST[γ2 + T

n + log k/T n

]

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-93
SLIDE 93

Boosting the rates : strategies for grouping

Y = Xβ + ǫ Y (N × 1), X (N × k) Question : how to group to obtain better rates, when possible ? [p

j=1[T t=1 |β(j,t)|]q]

[

  • T∨log p

n

∨ {T γBA + γBG}] ↓ ↓ GATHERING WORKING on T γBA + γBG

Dominique Picard Forecasting intraday-load curve using sparse learning

slide-94
SLIDE 94

Boosting the rates

  • Divide the columns of X into two sets : S1 : highly

correlated, S2 : weakly correlated.

  • Put S1 as ’group beginners’ (each of them has smallest

altitude in its group) − → γBA << γBG = γmax

  • Realize a ’good cut off S1 and S2, ensuring :

TγBA ≤ γBG, log p/n ≤ γ2

BG,

T/n ≤ γ2

BG

  • Fill the groups with affinity with the delegate in terms of

Kl = | 1

n

N

i=1 XilYi| : indication of p j=1[T t=1 |β(j,t)|]q as

small as possible

Dominique Picard Forecasting intraday-load curve using sparse learning