 
              Scaling of scoring rules Jonas Wallin joint work with David Bolin (KAUST) CIRM virtual conference 2020-06-02
Forecast and observation classes (a) Forecast (b) Observation (c) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 2 / 36
Scoring functions apply to deterministic forecasts The forecast x is evaluated against the observation y using scoring functions such as S( x, y ) = − ( x − y ) 2 negative Squared Error (SE) negative Absolute Error (AE) S( x, y ) = −| x − y | 3 / 36
Bayes predictors should be used for probilistic forecasts For a probabilistic forecast P , decision theory tells us that if the scoring function S is given, we should issue the Bayes predictor , x = arg min x E P [S( x, Y )] ˆ as the point forecast, where the expectation is with respect to P . S( x, y ) = − ( x − y ) 2 Squared Error (SE) x = mean ( P ) ˆ Absolute Error (AE) S( x, y ) = −| x − y | x = median ( P ) ˆ 4 / 36
The basic idea Assume we have a prediction p ∈ P and an observation o ∈ O where we wish to measure the skill of the prediction by applying a function s : P × O − → R with a higher function value indicating a better skill. What are good theoretical properties for s ? 5 / 36
General framework without any formulas... Assume Q is Nature’s distribution of some event y and denote our forecast for y by P . 6 / 36
General framework without any formulas... Assume Q is Nature’s distribution of some event y and denote our forecast for y by P . For forecast evaluation, we should use performance metrics that follow the principle in the long run, we will obtain the optimal performance for P = Q 6 / 36
Probabilistic forecasts should generally be evaluated using proper scoring rules A consistent scoring function is a special case of a proper scoring rule for probabilistic forecasts Definition (Murphy and Winkler, 1968) If F denotes a class of probabilistic forecasts on R , a proper scoring rule is any function S : F × R → R such that S( Q , Q ) := E Q S( Q , Y ) ≥ E Q S( P , Y ) =: S( P , Q ) for all P , Q ∈ F . 7 / 36
The class of proper scoring rules is large S( P , y ) = − ( mean ( P ) − y ) 2 S( P , y ) = −| median ( P ) − y | Gneiting, T. and Raftery, A.E. (2007): Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association , 102, 359-178. 8 / 36
Optimally, forecasts should be probabilistic All those whose duty it is to issue regular daily forecasts know that there are times when they feel very confident and other times when they are doubtful as to coming weather. It seems to me that the condition of confidence or otherwise forms a very important part of the prediction. Cooke (Monthly Weather Review, 1906) (d) Forecast (e) Observation (f) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 9 / 36
The class of proper scoring rules is large The perhaps the two most common proper scoring rule is the continuous ranked probability score (CRPS) S( P , y ) = − E P | X − y | + 1 2 E P E P | X − X ′ | and the log score S( P , y ) = − log( f ( y )) , Gneiting, T. and Raftery, A.E. (2007): Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association , 102, 359-178. 10 / 36
The different scores behave somewhat differently 4 3 Score 2 1 0 −4 −2 0 2 4 y SE AE CRPS IGN 11 / 36
Average scores facilitate comparison across methods Assume we have two forecasting methods m = 1 , 2 . They issue point forecasts P mi with observed values y i , at a finite set of times, locations or instances i = 1 , . . . , n The methods are assessed and ranked by the mean score (our contribution starts here) n n = 1 S m ¯ � S( P mi , y i ) for m = 1 , 2 . n i =1 12 / 36
Average scores facilitate comparison across methods −4 −2 0 2 4 −4 −2 0 2 4 13 / 36
Two observations, two models 14 / 36
Two observations, two models 14 / 36
Two observations, two models 14 / 36
Two observations, two models, result Model 1 Model 2 CRPS CRPS 0.0023 0.02346 Y 1 Y 2 4.0486 3.920 mean 2.0255 1.9719 15 / 36
Other example Consider a situation with two observations Y i ∼ Q θ i = N (0 , σ 2 i ) , i = 1 , 2 , with σ 1 = 0 . 1 and σ 2 = 1 . Assume that we want to evaluate a model which has predictive σ 2 distributions P i = N (0 , ˆ i ) for Y i , using the average of a proper scoring rule for the two observations. 16 / 36
Other example Consider a situation with two observations Y i ∼ Q θ i = N (0 , σ 2 i ) , i = 1 , 2 , with σ 1 = 0 . 1 and σ 2 = 1 . Assume that we want to evaluate a model which has predictive σ 2 distributions P i = N (0 , ˆ i ) for Y i , using the average of a proper scoring rule for the two observations. 17 / 36
Other example CRPS log(LS) 18 / 36
Varying scale in practice? −40 1 0 −50 ● ● ● ● ● ● ● ● ● Latitude ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● −2 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● −60 ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● −3 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● −70 160 170 180 Longitude Kuusela, M. and Stein, M.L. (2018): Locally stationary spatio-temporal interpolation of argo profiling float data. Proceedings of the Royal Society A , 474 Bolin, D. and Wallin, J. (2020):Multivariate type-G Matérn-SPDE random fields, JRSSB 19 / 36
Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup 20 / 36
Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. 20 / 36
Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. We have a set of observations { y i } n i =1 at the locations { s i } n i =1 . 20 / 36
Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. We have a set of observations { y i } n i =1 at the locations { s i } n i =1 . � n s = 1 The score of the model, P , is given by ¯ i =1 S ( P i , y i ) . n 20 / 36
Realization 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ● ● ● ● ● ● ● ● 0.15 ● ● ● ● ● ● y 0.50 ● ● ● 0.10 ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 0.25 0.50 0.75 1.00 21 / 36 x
variation of the standard deviation 1.00 ● ● ● 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● sd ● ● ● ● ● ● ● 0.006 ● ● ● ● start ● ● ● ● 0.004 0.50 y 0.50 ● ● ● ● ● ● ● ● ● 0.002 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.002 0.004 0.006 x sd Figure: true kriging standard Figure: emperical density of the true devation, by location kriging standard devation, σ i 22 / 36
Mathematical framework Definition If S is a proper scoring rule . If Q σ , P σ are probability measure with scaling σ then � ˜ � � S ( P ˆ σ , Q σ , π ) = π ( dσ ) , S P ˆ σ ( σ ) , Q σ is a proper scoring rule 23 / 36
Mathematical framework Definition If S is a proper scoring rule . If Q σ , P σ are probability measure with scaling σ then � ˜ � � S ( P ˆ σ , Q σ , π ) = π ( dσ ) , S P ˆ σ ( σ ) , Q σ is a proper scoring rule The difference between this scoring rule and regular scoring rule is that there is no ¯ S ( P ˆ σ , y ) function. It is a theortical construction. However if σ i ∼ π and Y i ∼ Q σ i then n 1 � σ i , Y i ) → ˜ S ( P ˆ S ( P ˆ σ , Q σ , π ) n i =1 23 / 36
Defining ¯ s mathematically What affects the shape of π be? 24 / 36
Defining ¯ s mathematically What affects the shape of π be? If Y is a Gaussian processes σ i (and hence π ) is bascially determined by the distance of the locations, s . 24 / 36
Recommend
More recommend