Ensemble probabilistic forecast, and metamodeling of urban pollution - - PowerPoint PPT Presentation

ensemble probabilistic forecast and metamodeling of urban
SMART_READER_LITE
LIVE PREVIEW

Ensemble probabilistic forecast, and metamodeling of urban pollution - - PowerPoint PPT Presentation

Ensemble probabilistic forecast, and metamodeling of urban pollution Vivien Mallet 1 , 2 vivien.mallet@inria.fr 1 INRIA 2 Sorbonne Universit, Laboratoire Jacques-Louis Lions, CNRS, CEREMA Collaborators (EDF R&D, INRIA, CNRS, ENS, HEC, IBM


slide-1
SLIDE 1

Ensemble probabilistic forecast, and metamodeling of urban pollution

Vivien Mallet1,2

vivien.mallet@inria.fr

1INRIA 2Sorbonne Université, Laboratoire Jacques-Louis Lions, CNRS, CEREMA

Collaborators (EDF R&D, INRIA, CNRS, ENS, HEC, IBM Research, Shevchenko National University of Kiev, NUMTECH, Météo France, IRSN): Jean Thorey, Gilles Stoltz, Christophe Chaussin, Sergiy Zhuk, Alexander Nakonechniy, Anne Tilloy, David Poulet, Fabien Brocheton, Laurent Descamps, Sylvain Girard

AI4Climate, September 2019

1 / 39

slide-2
SLIDE 2

Objectives

Make a probabilistic forecast based on an ensemble of forecasts. China Europe UK

160 200 240 280 320 360 400 440 160 200 240 280 320 360 400 440 160 200 240 280 320 360 400 440

Korea Brazil France

160 200 240 280 320 360 400 440 160 200 240 280 320 360 400 440 160 200 240 280 320 360 400 440

Annual average of forecasts of global horizontal irradiance (GHI) in W m−2 for year 2012, as found in the THORPEX Interactive Grand Global Ensemble (TIGGE).

2 / 39

slide-3
SLIDE 3

Objective and strategy in the deterministic case

To produce the best forecast using a sequence of given ensemble of forecasts xm

t ,

a sequence of observations or analyses yt.

Strategy

Aggregated forecast:

  • yt =

M

  • m=1

um

t xm t

Success if the ensemble forecasts yt beat any sequence of forecasts xm

t

3 / 39

slide-4
SLIDE 4

Deterministic ensemble forecasting

Notation

xm

t

Forecast by model/member m at time t yt Observation at time t

Principle

To produce an aggregated forecast yt as efficient as possible, using the linear combination:

  • yt =

M

  • m=1

um

t xm t

t − 2 t − 1 t t + 1 xm

t−2

xm

t−1

xm

t

xm

t+1

um

t−2 →

yt−2 um

t−1 →

yt−1 um

t →

yt um

t+1 →

yt+1 yt−2 yt−1 yt yt+1

4 / 39

slide-5
SLIDE 5

Deterministic ensemble forecasting

Notation

xm

t

Forecast by model/member m at time t yt Observation at time t

Principle

To produce an aggregated forecast yt as efficient as possible, using the linear combination:

  • yt =

M

  • m=1

um

t xm t

t − 2 t − 1 t t + 1 xm

t−2

xm

t−1

xm

t

xm

t+1

um

t−2 →

yt−2 um

t−1 →

yt−1 um

t →

yt um

t+1 →

yt+1 yt−2 yt−1 yt yt+1

4 / 39

slide-6
SLIDE 6

Deterministic ensemble forecast using machine learning

Computing the aggregation weights

Ridge regression with discount in time (λ > 0 and β > 0): ∀i ut = argmin

v∈RM

 λv2

2 + s<t

  • s=1
  • β

(t − s)2 + 1 ys −

M

  • m=1

vmxm

s

2 

Theoretical comparison with the best linear combination with constant weights

1 t

s≤t

  • s=1
  • ys −

M

  • m=1

um

s xm s

2

− argmin

v∈RM

 1

t

s≤t

  • s=1
  • ys −

M

  • m=1

vmxm

s

2  ≤ O ln t

t

  • 5 / 39
slide-7
SLIDE 7

Application to solar radiation, typical day (W m−2)

Ridge regression is carried out at each grid point. Best initial forecast Our weighted forecast Satellite observations

240 320 400 480 560 640 720 800 240 320 400 480 560 640 720 800 240 320 400 480 560 640 720 800

Average decrease of error (RMSE) by about 20% on the global solar irradiance (downward shortwave solar radiation).

6 / 39

slide-8
SLIDE 8

Application to solar radiation, annual average (W m−2)

Reference forecast Aggregated forecasts

200 240 280 320 360 400 440 480 200 240 280 320 360 400 440 480

Satellite observation

200 240 280 320 360 400 440 480

Thorey, J., Mallet, V., Chaussin, C., Descamps, L., Blanc , P. (2015). Ensemble forecast of solar radiation using TIGGE weather forecasts and HelioClim database. Solar Energy, Elsevier, 120.

7 / 39

slide-9
SLIDE 9

Toward probabilistic forecast using filtering

Formulation in terms of filtering

State equation: u0 = c + e ut+1 = Aut + (I − A)c + et Observations (or analyses): yt = Etut + ηt Et =

  • x1

t , . . . , xm t

  • Mallet, V., Nakonechny, A., Zhuk, S. (2013). Minimax filtering for sequential aggregation:

Application to ensemble forecast of ozone analyses. Journal of Geophysical Research, 118(11).

8 / 39

slide-10
SLIDE 10

Toward probabilistic forecast using filtering

Kalman filtering

Assignment of variances to initial weight errors, (weight) model errors and observational errors The filter computes a variance Pt for the weight error at time t The aggregated forecast has error variance EtPtE⊤

t

Minimax filtering

Bounds on errors, described by an ellipsoid e⊤Q−1e +

T−1

  • t=0

e⊤

t Q−1 t et + T

  • t=0

A−1

t η2 t ≤ 1

Admissible weights are compatible with weight model, observations and errors bounds Weights defined such that, for any direction ℓ and admissible ut, sup

e,e0,...,et−1,η0,...,ηt

ℓ⊤(utrue

t

− ut) ≤ sup

e,e0,...,et−1,η0,...,ηt

ℓ⊤(utrue

t

− ut)

9 / 39

slide-11
SLIDE 11

Ozone maps, yearly average (µg m−3)

10 5 5 10 15 20 35 40 45 50 55 10 5 5 10 15 20 35 40 45 50 55 10 5 5 10 15 20 35 40 45 50 55 10 5 5 10 15 20 35 40 45 50 55 48 56 64 72 80 88 96 104 112

10 / 39

slide-12
SLIDE 12

Ozone daily peak in a grid cell (µg m−3)

220 240 260 280 300 320 50 100 150 200

11 / 39

slide-13
SLIDE 13

Ozone daily peak in a grid cell (µg m−3)

220 240 260 280 300 320 50 100 150 200

12 / 39

slide-14
SLIDE 14

Probabilistic forecast

Notation

The ensemble of meteorological forecasts (GHI, temperature, cloud cover): V m

t , with t = 1, . . . , T and m = 1, . . . , M

The corresponding PV forecasts xm

t or xt

The observed PV yt The Heaviside function Ha: unit step function centered on a Our probabilistic forecast M

m=1 um t Hxm

t , with M

m=1 um t = 1, um t ≥ 0

t − 2 t − 1 t t + 1 V m

t−2

V m

t−1

V m

t

V m

t+1

xm

t−2

xm

t−1

xm

t

xm

t+1

um

t−2

um

t−1

um

t

um

t+1

  • m um

t−2Hxm

t−2

  • m um

t−1Hxm

t−1

  • m um

t Hxm

t

  • m um

t+1Hxm

t+1

yt−2 yt−1 yt yt+1

13 / 39

slide-15
SLIDE 15

Probabilistic forecast

Notation

The ensemble of meteorological forecasts (GHI, temperature, cloud cover): V m

t , with t = 1, . . . , T and m = 1, . . . , M

The corresponding PV forecasts xm

t or xt

The observed PV yt The Heaviside function Ha: unit step function centered on a Our probabilistic forecast M

m=1 um t Hxm

t , with M

m=1 um t = 1, um t ≥ 0

t − 2 t − 1 t t + 1 V m

t−2

V m

t−1

V m

t

V m

t+1

xm

t−2

xm

t−1

xm

t

xm

t+1

um

t−2

um

t−1

um

t

um

t+1

  • m um

t−2Hxm

t−2

  • m um

t−1Hxm

t−1

  • m um

t Hxm

t

  • m um

t+1Hxm

t+1

yt−2 yt−1 yt yt+1

13 / 39

slide-16
SLIDE 16

Scoring probabilistic forecasts with the CRPS

Hy: unit step function centered on the observation y. G: cumulative distribution function (CDF) the i.i.d. random variables X and X′. The continuous ranked probability score (CRPS): CRPS(G, y) =

  • (G − Hy)2 = E(|X − y|) − 1

2 E(|X − X′|) g y x G Hy y x 1 (G − Hy)2 y x 1

Well-recognized in the meteorological community. More demanding than squared error on the mean

  • G − Hy

2.

14 / 39

slide-17
SLIDE 17

Interesting features of the CRPS

Strict properness: If Y is a target random variable with CDF F, the best possible probabilistic forecast is F: EY [CRPS(F, Y)] < EY [CRPS(G, Y)] if G = F . Convenient use of CDF-based scores with ensemble of forecasts.

15 / 39

slide-18
SLIDE 18

Mixture models with the CRPS

Combination of CDFs: G =

m≤M umHxm .

CRPS (G, y) = M

m=1 um|xm − y| − 1 2

M

m,k=1 umuk|xm − xk| .

¯ G ¯ um xm x 1 ¯ G: uniform weights, ¯ um = 1/M.

16 / 39

slide-19
SLIDE 19

Mixture models with the CRPS

Combination of CDFs: G =

m≤M umHxm .

CRPS (G, y) = M

m=1 um|xm − y| − 1 2

M

m,k=1 umuk|xm − xk| .

G um ¯ G ¯ um xm x 1 ¯ G: uniform weights, ¯ um = 1/M. G: non uniform weights

16 / 39

slide-20
SLIDE 20

Sequential aggregation with the CRPS (1/2)

The weights may be computed following ridge regression, with λ > 0: ut = argmin

w∈RM

  • λw⊤w +

t−1

  • t′=1

CRPS

M

  • m=1

wmHxm

t′ , yt′

  • The so-called regret is bounded:

T

  • t=1

CRPS

M

  • m=1

um

t Hxm

t , yt

  • forecaster’s loss

− inf

u

T

  • t=1

CRPS

M

  • m=1

umHxm

t , yt

  • losses with best fixed weights

≤ o(T)

17 / 39

slide-21
SLIDE 21

Sequential aggregation with the CRPS (2/2)

Convex weights may be computed, with η > 0: um

t+1 =

um

t exp (−η ∂CRPSm(ut, xt, yt))

M

k=1 uk t exp (−η ∂CRPSk(ut, xt, yt))

where ∂CRPSm(u, xt, yt) = ∂ CRPS(M

k=1 ukHxk

t , yt)

∂um that is ∂CRPSm(u, xt, yt) = |xm

t − yt| − M

  • k=1

uk

t |xm t − xk t | + yt − M

  • k=1

uk

t xk t

With a proper η, the regret is also bounded:

T

  • t=1

CRPS

M

  • m=1

um

t Hxm

t , yt

  • forecaster’s loss

− inf

u∈S

T

  • t=1

CRPS

M

  • m=1

umHxm

t , yt

  • losses with best fixed weights

≤ o(T) where S is the set of complex weights.

18 / 39

slide-22
SLIDE 22

Application to PV forecast at EDF production sites

Application to 219 PV EDF production sites From March to October 2013, using conversion models (from meteorological variables to PV) trained for about the entire year 2012 Using ensembles from ECMWF (50 members, 0.25◦) and Météo France (ARPEGE, 34 members, 0.20◦) Lead time: 36 h Using the learning algorithm ML-poly (without parameter) with the CRPS gradients

Gaillard, P., Stoltz, G., and van Erven, T. A Second-order Bound with Excess Losses. In: Proceedings of COLT’14. Vol. 35. JMLR: Workshop and Conference Proceedings, 2014,

  • pp. 176–196.

19 / 39

slide-23
SLIDE 23

Application to PV forecast at EDF production sites

1 2 3 4 5 6 −0.002 0.002 0.006 0.010 BIAS

  • Horizon (days)

1 2 3 4 5 6 0.025 0.030 0.035 0.040 CRPS

  • Horizon (days)

1 2 3 4 5 6 0.035 0.045 0.055 Horizon (days) MAE

  • 1

2 3 4 5 6 0.045 0.055 0.065 0.075 RMSE

  • Horizon (days)
  • Raw

quantile det ECMWF Weighted

Bias removal. Score improvement up to D+4.

20 / 39

slide-24
SLIDE 24

Application to PV forecast at EDF production sites

219 sites

  • 1

2 3 4 5 6 −0.35 −0.25 −0.15 −0.05 CRPS Skill

  • Horizon (days)
  • Raw

quantile det ECMWF quantile det Arpège ENS PEARP

CRPS gains between 4 % and 10 % up to D+3 against the raw ensemble.

21 / 39

slide-25
SLIDE 25

Application to PV forecast at EDF production sites

Checking the “consistency” property

Hopefully, xm are independent and identically sampled from the same distribution as Y y is drawn from Y as well Consequently the realizations y should not be distinguishable from the samples xm This is the “consistency” property

  • J. L. Anderson (1997). “The impact of dynamical constraints on the

selection of initial conditions for ensemble predictions: low-order perfect model results”. In: Monthly Weather Review 125, pp. 2, 969–2, 983

22 / 39

slide-26
SLIDE 26

Application to PV forecast at EDF production sites

Rank histogram

y is of rank m if xm ≤ y < xm+1, with xm sorted

It is 0 if y < x1 It is M if xN ≤ y

I.e., y is of rank m if m members simulated a lower value The rank is in {0, . . . , M} References:

  • J. L. Anderson (1996). “A Method for Producing and Evaluating

Probabilistic Forecasts from Ensemble Model Integrations”. In: Journal of Climate 9.7, pp. 1518–1530

  • T. M. Hamill and S. J. Colucci (1997). “Verification of Eta/RSM

Short-Range Ensemble Forecasts”. In: Monthly Weather Review 125,

  • pp. 1312–1327
  • O. Talagrand, R. Vautard, and B. Strauss (1999). Evaluation of

Probabilistic Prediction System. Proceedings of the ECMWF Workshop on Predictability. Reading, United Kingdom

23 / 39

slide-27
SLIDE 27

Application to PV forecast at EDF production sites

Rank Frequency 4000 10000 (a) (b) Figure: Rank histogram for the raw ensemble (a) and the weighted ensemble (b).

24 / 39

slide-28
SLIDE 28

Application to PV forecast at EDF production sites

  • 0.00

0.05 0.10 0.15 0.20 0.25 0.00 0.05 0.10 0.15 0.20 0.25 Binned Spread RMSE

  • Raw

quantile det ECMWF quantile det Arpège ENS PEARP Weighted

Thorey, J., Mallet, V., Baudin, P. (2017). Online learning with the Continuous Ranked Probability Score for ensemble

  • forecasting. Q.J.R. Meteorol. Soc., 143.

Thorey, J., Chaussin, C., Mallet, V. (2018). Ensemble forecast of photovoltaic power with online CRPS learning. International Journal of Forecasting.

Thorey, J. (2017). Ensemble forecasting using sequential aggregation for photovoltaic power applications. PhD thesis. 25 / 39

slide-29
SLIDE 29

Conclusions regarding probabilistic ensemble forecasting

Mathematical method

Weighting the ensemble of forecasts, and combining the individual CDFs Minimizing the CRPS with online learning Performance guarantees with essentially no assumptions on the ensemble or the observations

Practical application

Applied to real PV data, can be applied for wind power High flexibility regarding the potential ensemble members

Perspectives

Introducing the spatial dimensions Taking into account errors in the observations

26 / 39

slide-30
SLIDE 30

Model formulation

y = M(p) .

Input p

Ten scalars: wind velocity, wind direction, temperature, cloudiness, precipitations, day of the year, hour, [NO2]background, [NOx]background, [O3]background.

Output y

y ∈ R3 × 104 per pollutant. Objective: designing a model M with low computational requirements so that M(p) ≃ M(p).

27 / 39

slide-31
SLIDE 31

Example for Clermont-Ferrand: NO2 concentrations (µg m−3)

Model

6 12 18 24 30 36 42 48 54 60 28 / 39

slide-32
SLIDE 32

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

First principal component

29 / 39

slide-33
SLIDE 33

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Second principal component

29 / 39

slide-34
SLIDE 34

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Third principal component

29 / 39

slide-35
SLIDE 35

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Forth principal component

29 / 39

slide-36
SLIDE 36

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Fifth principal component

29 / 39

slide-37
SLIDE 37

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Sixth principal component

29 / 39

slide-38
SLIDE 38

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Seventh principal component

29 / 39

slide-39
SLIDE 39

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y.

Eighth principal component

29 / 39

slide-40
SLIDE 40

First step: dimension reduction

Projection of model outputs onto the subspace spanned by the N columns Ψj of Ψ, where N is small. The base may be computed with principal component analysis on some learning set of model outputs, whose mean is denoted ¯ y. We obtain y ≃ ¯ y + ΨΨ⊤(M(p) − ¯ y) . The fluctuation around the mean is given in the reduced subspace. f(p) = Ψ⊤M(p) . This model reduced in dimension is equally demanding as the original model, in terms of computational requirements

29 / 39

slide-41
SLIDE 41

First step: dimension reduction

Principal component analysis

2 4 6 8 10 12 14 16 18 Number of principal components 2 4 6 8 10 12 14 Unexplained variance (%)

30 / 39

slide-42
SLIDE 42

Second step: statistical emulation of the reduced model

Design of a fast surrogate model fj(p) for each fj(p) = Ψ⊤

j M(p)

Generation of new learning samples p(i), i = 1, . . . , M, e.g., with Latin hypercube sampling Calls to the reduced model at the sample points: fj(p(i)) Construction of an emulator of the reduced model:

  • fj(p) =

K

  • k=1

βj,kpk

  • regression

+

M

  • i=1

wj,i(p, p(1), . . . , p(M))

  • fj(p(i)) −

K

  • k=1

βj,kp(i)

k

  • interpolation of the residuals

.

31 / 39

slide-43
SLIDE 43

Second step: statistical emulation of the reduced model

Example: Kriging

  • fj(p) =

K

  • k=1

βj,kpk

  • regression

+

M

  • i=1

wj,i(p, p(1), . . . , p(M))

  • fj(p(i)) −

K

  • k=1

βj,kp(i)

k

  • interpolation of the residuals

. We denote P = [p(1), . . . , p(M)], f j = [fj(p(1)), . . . , fj(p(M))]⊤, R(p, q) the parameterized covariance of the underlying Gaussian process, r(p) = [R(p(1), p), . . . , R(p(M), p)]⊤, et R = [R(p(i), p(j))](i,j). Then, the Kriging weights are [βj,1, . . . , βj,K]⊤ = (PRP⊤)−1PR−1f j , [wj,1(p), . . . , wj,M(p)]⊤ = r⊤(p)R−1 .

32 / 39

slide-44
SLIDE 44

Third step: reconstruction

y ≃ ¯ y + ΨΨ⊤(M(p) − ¯ y) . After the introduction of the emulator, we finally obtain y ≃ ¯ y − ΨΨ⊤¯ y + Ψ

   

  • f1(p)

. . .

  • fN(p)

    .

33 / 39

slide-45
SLIDE 45

Scores of the meta-model: comparison against observations

Bias and RMSE in µg m−3

Original model Meta-model Obs. mean Bias Corr. RMSE Bias Corr. RMSE Gare 47.9

  • 18.86

0.60 30.0

  • 17.94

0.61 29.4 Roussillon 36.6

  • 11.83

0.62 22.3

  • 11.18

0.64 21.8 Lecoq 24.0 0.58 0.63 16.1 0.47 0.65 15.8 Delille 25.5

  • 1.37

0.64 15.8

  • 0.33

0.66 15.4 Jaude 25.4

  • 3.16

0.63 16.9

  • 2.04

0.66 16.3 Montferrand 23.0 0.22 0.66 14.5 1.25 0.69 14.2 Gerzat 21.0

  • 1.72

0.73 11.6

  • 0.64

0.75 11.2 Gravanches 21.2 0.25 0.69 13.5 1.19 0.71 13.2 Royat 11.9 4.18 0.56 14.5 5.37 0.56 14.9

34 / 39

slide-46
SLIDE 46

Original model, meta-model and observations

Concentrations of NO2 in µg m−3, mean over 9 stations 50 100 150 200 Time 10 20 30 40 50 60 70 80 90 Mean concentration

  • V. Mallet, A. Tilloy, D. Poulet, S. Girard, F. Brocheton. (2018). Meta-modeling of

ADMS-Urban by dimension reduction and emulation. Atmospheric Environment, 184, 2018.

35 / 39

slide-47
SLIDE 47

Some conclusions regarding the meta-models

Many applications

Air pollution at urban scale (including work of Janelle Hammond) Urban traffic model

  • R. Chen, Uncertainty quantification in the simulation of road traffic and associated

atmospheric emissions in a metropolitan area.

Urban noise mapping (ongoing PhD work of Antoine Lesieur) Wildland fire propagation (ongoing PhD work of Frédéric Allaire) Dispersion of radionuclides after Fukushima disaster (ongoing PhD thesis of Ngoc Bao Tran Le)

Perspectives

Investigation of the best interpolation methods Dimension reduction for transport problems Correction of the meta-model with observations: see LJLL seminar on 25 October

36 / 39

slide-48
SLIDE 48

“Smart cities” and multiple sensors

ANR project CENSE – http://cense.ifsttar.fr/

37 / 39

slide-49
SLIDE 49

Mobile observations with Ambiciti

Ambiciti application (Android & iOS) Observations collected in Paris for one hour

38 / 39

slide-50
SLIDE 50

Mobile observations with Ambiciti

Ambiciti application (Android & iOS) Observations collected in Paris for eight hours

38 / 39

slide-51
SLIDE 51

Simulation at street resolution and large scale

Operational in Ambiciti startup, providing forecasts for Ouest-France website

39 / 39