 
              Ensemble probabilistic forecast, and metamodeling of urban pollution Vivien Mallet 1 , 2 vivien.mallet@inria.fr 1 INRIA 2 Sorbonne Université, Laboratoire Jacques-Louis Lions, CNRS, CEREMA Collaborators (EDF R&D, INRIA, CNRS, ENS, HEC, IBM Research, Shevchenko National University of Kiev, NUMTECH, Météo France, IRSN): Jean Thorey, Gilles Stoltz, Christophe Chaussin, Sergiy Zhuk, Alexander Nakonechniy, Anne Tilloy, David Poulet, Fabien Brocheton, Laurent Descamps, Sylvain Girard AI4Climate, September 2019 1 / 39
Objectives Make a probabilistic forecast based on an ensemble of forecasts. China Europe UK 440 440 440 400 400 400 360 360 360 320 320 320 280 280 280 240 240 240 200 200 200 160 160 160 Korea Brazil France 440 440 440 400 400 400 360 360 360 320 320 320 280 280 280 240 240 240 200 200 200 160 160 160 Annual average of forecasts of global horizontal irradiance (GHI) in W m −2 for year 2012, as found in the THORPEX Interactive Grand Global Ensemble (TIGGE). 2 / 39
Objective and strategy in the deterministic case To produce the best forecast using a sequence of given ensemble of forecasts x m t , a sequence of observations or analyses y t . Strategy Aggregated forecast: M � u m t x m y t = � t m =1 y t beat any sequence of forecasts x m Success if the ensemble forecasts � t 3 / 39
Deterministic ensemble forecasting Notation x m Forecast by model/member m at time t t y t Observation at time t Principle To produce an aggregated forecast � y t as efficient as possible, using the linear combination: M � u m t x m y t = � t m =1 t − 2 t − 1 t t + 1 x m x m x m x m t − 2 t − 1 t +1 t u m u m u m u m t − 2 → � y t − 2 t − 1 → � y t − 1 t → � y t t +1 → � y t +1 y t − 2 y t − 1 y t y t +1 4 / 39
Deterministic ensemble forecasting Notation x m Forecast by model/member m at time t t y t Observation at time t Principle To produce an aggregated forecast � y t as efficient as possible, using the linear combination: M � u m t x m y t = � t m =1 t − 2 t − 1 t t + 1 x m x m x m x m t − 2 t − 1 t +1 t u m u m u m u m t − 2 → � y t − 2 t − 1 → � y t − 1 t → � y t t +1 → � y t +1 y t − 2 y t − 1 y t y t +1 4 / 39
Deterministic ensemble forecast using machine learning Computing the aggregation weights Ridge regression with discount in time ( λ > 0 and β > 0 ):  � 2  � � � s<t M � � β  λ � v � 2 v m x m  ∀ i u t = argmin 2 + ( t − s ) 2 + 1 y s − s v ∈ R M s =1 m =1 Theoretical comparison with the best linear combination with constant weights � � 2 s ≤ t M � � 1 u m s x m y s − s t s =1 m =1  � 2  � � ln t � s ≤ t M � �  1 v m x m  ≤ O − argmin y s − s t t v ∈ R M s =1 m =1 5 / 39
Application to solar radiation, typical day (W m −2 ) Ridge regression is carried out at each grid point. Best initial forecast Our weighted forecast Satellite observations 800 800 800 720 720 720 640 640 640 560 560 560 480 480 480 400 400 400 320 320 320 240 240 240 Average decrease of error (RMSE) by about 20% on the global solar irradiance (downward shortwave solar radiation). 6 / 39
Application to solar radiation, annual average (W m −2 ) Reference forecast Aggregated forecasts 480 480 440 440 400 400 360 360 320 320 280 280 240 240 200 200 Satellite observation 480 440 400 360 320 280 240 200 Thorey, J., Mallet, V., Chaussin, C., Descamps, L., Blanc , P. (2015). Ensemble forecast of solar radiation using TIGGE weather forecasts and HelioClim database. Solar Energy, Elsevier, 120. 7 / 39
Toward probabilistic forecast using filtering Formulation in terms of filtering State equation: u 0 = c + e u t +1 = A u t + ( I − A ) c + e t Observations (or analyses): y t = E t u t + η t � � x 1 t , . . . , x m E t = t Mallet, V., Nakonechny, A., Zhuk, S. (2013). Minimax filtering for sequential aggregation: Application to ensemble forecast of ozone analyses. Journal of Geophysical Research, 118(11). 8 / 39
Toward probabilistic forecast using filtering Kalman filtering Assignment of variances to initial weight errors, (weight) model errors and observational errors The filter computes a variance P t for the weight error at time t The aggregated forecast has error variance E t P t E ⊤ t Minimax filtering Bounds on errors, described by an ellipsoid T − 1 T � � e ⊤ Q − 1 e + t Q − 1 A − 1 t η 2 e ⊤ t e t + t ≤ 1 t =0 t =0 Admissible weights are compatible with weight model, observations and errors bounds Weights defined such that, for any direction ℓ and admissible u t , ℓ ⊤ ( u true ℓ ⊤ ( u true sup − � u t ) ≤ sup − u t ) t t e , e 0 ,..., e t − 1 , η 0 ,..., η t e , e 0 ,..., e t − 1 , η 0 ,..., η t 9 / 39
Ozone maps, yearly average (µg m −3 ) 55 55 112 50 50 104 45 45 96 40 40 88 35 35 10 5 0 5 10 15 20 10 5 0 5 10 15 20 80 55 55 72 50 50 64 45 45 56 40 40 35 35 48 10 5 0 5 10 15 20 10 5 0 5 10 15 20 10 / 39
Ozone daily peak in a grid cell (µg m −3 ) 200 150 100 50 0 220 240 260 280 300 320 11 / 39
Ozone daily peak in a grid cell (µg m −3 ) 200 150 100 50 0 220 240 260 280 300 320 12 / 39
Probabilistic forecast Notation The ensemble of meteorological forecasts (GHI, temperature, cloud cover): V m t , with t = 1 , . . . , T and m = 1 , . . . , M The corresponding PV forecasts x m t or x t The observed PV y t The Heaviside function H a : unit step function centered on a Our probabilistic forecast � M t , with � M m =1 u m m =1 u m t = 1 , u m t H x m t ≥ 0 t − 2 t − 1 t t + 1 V m V m V m V m t − 2 t − 1 t +1 t x m x m x m x m t − 2 t − 1 t t +1 u m u m u m u m t − 2 t − 1 t +1 t � � � � m u m m u m m u m m u m t − 2 H x m t − 1 H x m t H x m t +1 H x m t − 2 t − 1 t t +1 y t − 2 y t − 1 y t y t +1 13 / 39
Probabilistic forecast Notation The ensemble of meteorological forecasts (GHI, temperature, cloud cover): V m t , with t = 1 , . . . , T and m = 1 , . . . , M The corresponding PV forecasts x m t or x t The observed PV y t The Heaviside function H a : unit step function centered on a Our probabilistic forecast � M t , with � M m =1 u m m =1 u m t = 1 , u m t H x m t ≥ 0 t − 2 t − 1 t t + 1 V m V m V m V m t − 2 t − 1 t +1 t x m x m x m x m t − 2 t − 1 t t +1 u m u m u m u m t − 2 t − 1 t +1 t � � � � m u m m u m m u m m u m t − 2 H x m t − 1 H x m t H x m t +1 H x m t − 2 t − 1 t t +1 y t − 2 y t − 1 y t y t +1 13 / 39
Scoring probabilistic forecasts with the CRPS H y : unit step function centered on the observation y . G: cumulative distribution function (CDF) the i.i.d. random variables X and X ′ . The continuous ranked probability score (CRPS): � ( G − H y ) 2 = E( | X − y | ) − 1 2 E( | X − X ′ | ) CRPS( G , y ) = H y g 1 1 ( G − H y ) 2 G 0 0 y x y x y x Well-recognized in the meteorological community. �� � 2 . More demanding than squared error on the mean G − H y 14 / 39
Interesting features of the CRPS Strict properness : If Y is a target random variable with CDF F, the best possible probabilistic forecast is F: E Y [CRPS( F , Y )] < E Y [CRPS( G , Y )] if G � = F . Convenient use of CDF-based scores with ensemble of forecasts. 15 / 39
Mixture models with the CRPS Combination of CDFs: G = � m ≤ M u m H x m . CRPS ( G , y ) = � M � M m =1 u m | x m − y | − 1 m,k =1 u m u k | x m − x k | . 2 ¯ ¯ G: uniform weights, G 1 u m = 1 /M . ¯ u m ¯ 0 x x m 16 / 39
Mixture models with the CRPS Combination of CDFs: G = � m ≤ M u m H x m . CRPS ( G , y ) = � M � M m =1 u m | x m − y | − 1 m,k =1 u m u k | x m − x k | . 2 ¯ ¯ G: uniform weights, G 1 u m = 1 /M . ¯ u m ¯ G u m G: non uniform weights 0 x x m 16 / 39
Sequential aggregation with the CRPS (1/2) The weights may be computed following ridge regression, with λ > 0 : � � M �� t − 1 � � λ w ⊤ w + w m H x m u t = argmin CRPS t ′ , y t ′ w ∈ R M t ′ =1 m =1 The so-called regret is bounded: � M � � T � M �� T � � � � u m u m H x m CRPS t H x m t , y t − inf CRPS t , y t ≤ o ( T ) u t =1 m =1 t =1 m =1 � �� � � �� � forecaster’s loss losses with best fixed weights 17 / 39
Sequential aggregation with the CRPS (2/2) Convex weights may be computed, with η > 0 : u m t exp ( − η ∂ CRPS m ( u t , x t , y t )) u m t +1 = � M k =1 u k t exp ( − η ∂ CRPS k ( u t , x t , y t )) where ∂ CRPS( � M k =1 u k H x k t , y t ) ∂ CRPS m ( u , x t , y t ) = ∂u m that is � M � M ∂ CRPS m ( u , x t , y t ) = | x m u k t | x m t − x k u k t x k t − y t | − t | + y t − t k =1 k =1 With a proper η , the regret is also bounded: � M � � T � M �� T � � � � u m u m H x m CRPS t H x m t , y t − inf CRPS t , y t ≤ o ( T ) u ∈S t =1 m =1 t =1 m =1 � �� � � �� � forecaster’s loss losses with best fixed weights where S is the set of complex weights. 18 / 39
Recommend
More recommend