Scaling of scoring rules Jonas Wallin joint work with David Bolin - PowerPoint PPT Presentation

Scaling of scoring rules Jonas Wallin joint work with David Bolin (KAUST) CIRM virtual conference 2020-06-02

Forecast and observation classes (a) Forecast (b) Observation (c) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 2 / 36

Scoring functions apply to deterministic forecasts The forecast x is evaluated against the observation y using scoring functions such as S( x, y ) = − ( x − y ) 2 negative Squared Error (SE) negative Absolute Error (AE) S( x, y ) = −| x − y | 3 / 36

Bayes predictors should be used for probilistic forecasts For a probabilistic forecast P , decision theory tells us that if the scoring function S is given, we should issue the Bayes predictor , x = arg min x E P [S( x, Y )] ˆ as the point forecast, where the expectation is with respect to P . S( x, y ) = − ( x − y ) 2 Squared Error (SE) x = mean ( P ) ˆ Absolute Error (AE) S( x, y ) = −| x − y | x = median ( P ) ˆ 4 / 36

The basic idea Assume we have a prediction p ∈ P and an observation o ∈ O where we wish to measure the skill of the prediction by applying a function s : P × O − → R with a higher function value indicating a better skill. What are good theoretical properties for s ? 5 / 36

General framework without any formulas... Assume Q is Nature’s distribution of some event y and denote our forecast for y by P . 6 / 36

General framework without any formulas... Assume Q is Nature’s distribution of some event y and denote our forecast for y by P . For forecast evaluation, we should use performance metrics that follow the principle in the long run, we will obtain the optimal performance for P = Q 6 / 36

Probabilistic forecasts should generally be evaluated using proper scoring rules A consistent scoring function is a special case of a proper scoring rule for probabilistic forecasts Definition (Murphy and Winkler, 1968) If F denotes a class of probabilistic forecasts on R , a proper scoring rule is any function S : F × R → R such that S( Q , Q ) := E Q S( Q , Y ) ≥ E Q S( P , Y ) =: S( P , Q ) for all P , Q ∈ F . 7 / 36

The class of proper scoring rules is large S( P , y ) = − ( mean ( P ) − y ) 2 S( P , y ) = −| median ( P ) − y | Gneiting, T. and Raftery, A.E. (2007): Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association , 102, 359-178. 8 / 36

Optimally, forecasts should be probabilistic All those whose duty it is to issue regular daily forecasts know that there are times when they feel very confident and other times when they are doubtful as to coming weather. It seems to me that the condition of confidence or otherwise forms a very important part of the prediction. Cooke (Monthly Weather Review, 1906) (d) Forecast (e) Observation (f) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 9 / 36

The class of proper scoring rules is large The perhaps the two most common proper scoring rule is the continuous ranked probability score (CRPS) S( P , y ) = − E P | X − y | + 1 2 E P E P | X − X ′ | and the log score S( P , y ) = − log( f ( y )) , Gneiting, T. and Raftery, A.E. (2007): Strictly proper scoring rules, prediction and estimation. Journal of the American Statistical Association , 102, 359-178. 10 / 36

The different scores behave somewhat differently 4 3 Score 2 1 0 −4 −2 0 2 4 y SE AE CRPS IGN 11 / 36

Average scores facilitate comparison across methods Assume we have two forecasting methods m = 1 , 2 . They issue point forecasts P mi with observed values y i , at a finite set of times, locations or instances i = 1 , . . . , n The methods are assessed and ranked by the mean score (our contribution starts here) n n = 1 S m ¯ � S( P mi , y i ) for m = 1 , 2 . n i =1 12 / 36

Average scores facilitate comparison across methods −4 −2 0 2 4 −4 −2 0 2 4 13 / 36

Two observations, two models 14 / 36

Two observations, two models, result Model 1 Model 2 CRPS CRPS 0.0023 0.02346 Y 1 Y 2 4.0486 3.920 mean 2.0255 1.9719 15 / 36

Other example Consider a situation with two observations Y i ∼ Q θ i = N (0 , σ 2 i ) , i = 1 , 2 , with σ 1 = 0 . 1 and σ 2 = 1 . Assume that we want to evaluate a model which has predictive σ 2 distributions P i = N (0 , ˆ i ) for Y i , using the average of a proper scoring rule for the two observations. 16 / 36

Other example Consider a situation with two observations Y i ∼ Q θ i = N (0 , σ 2 i ) , i = 1 , 2 , with σ 1 = 0 . 1 and σ 2 = 1 . Assume that we want to evaluate a model which has predictive σ 2 distributions P i = N (0 , ˆ i ) for Y i , using the average of a proper scoring rule for the two observations. 17 / 36

Other example CRPS log(LS) 18 / 36

Varying scale in practice? −40 1 0 −50 ● ● ● ● ● ● ● ● ● Latitude ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● −2 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● −60 ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● −3 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● −70 160 170 180 Longitude Kuusela, M. and Stein, M.L. (2018): Locally stationary spatio-temporal interpolation of argo profiling float data. Proceedings of the Royal Society A , 474 Bolin, D. and Wallin, J. (2020):Multivariate type-G Matérn-SPDE random fields, JRSSB 19 / 36

Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup 20 / 36

Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. 20 / 36

Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. We have a set of observations { y i } n i =1 at the locations { s i } n i =1 . 20 / 36

Example spatial statistics We will now go through how model evaluation using a scoring rule is typically done is spatial statistics. We start with the basic setup Let s i , i = 1 , . . . , n be a set of, typically irregular, locations. We have a set of observations { y i } n i =1 at the locations { s i } n i =1 . � n s = 1 The score of the model, P , is given by ¯ i =1 S ( P i , y i ) . n 20 / 36

Realization 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Y ● ● ● ● ● ● ● ● ● 0.15 ● ● ● ● ● ● y 0.50 ● ● ● 0.10 ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 0.25 0.50 0.75 1.00 21 / 36 x

variation of the standard deviation 1.00 ● ● ● 1.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.75 ● ● 0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● sd ● ● ● ● ● ● ● 0.006 ● ● ● ● start ● ● ● ● 0.004 0.50 y 0.50 ● ● ● ● ● ● ● ● ● 0.002 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.00 0.00 0.25 0.50 0.75 1.00 0.002 0.004 0.006 x sd Figure: true kriging standard Figure: emperical density of the true devation, by location kriging standard devation, σ i 22 / 36

Mathematical framework Definition If S is a proper scoring rule . If Q σ , P σ are probability measure with scaling σ then � ˜ � � S ( P ˆ σ , Q σ , π ) = π ( dσ ) , S P ˆ σ ( σ ) , Q σ is a proper scoring rule 23 / 36

Mathematical framework Definition If S is a proper scoring rule . If Q σ , P σ are probability measure with scaling σ then � ˜ � � S ( P ˆ σ , Q σ , π ) = π ( dσ ) , S P ˆ σ ( σ ) , Q σ is a proper scoring rule The difference between this scoring rule and regular scoring rule is that there is no ¯ S ( P ˆ σ , y ) function. It is a theortical construction. However if σ i ∼ π and Y i ∼ Q σ i then n 1 � σ i , Y i ) → ˜ S ( P ˆ S ( P ˆ σ , Q σ , π ) n i =1 23 / 36

Defining ¯ s mathematically What affects the shape of π be? 24 / 36

Defining ¯ s mathematically What affects the shape of π be? If Y is a Gaussian processes σ i (and hence π ) is bascially determined by the distance of the locations, s . 24 / 36

Scaling of scoring rules Jonas Wallin joint work with David Bolin - PowerPoint PPT Presentation

Scaling of scoring rules Jonas Wallin joint work with David Bolin (KAUST) CIRM virtual conference 2020-06-02 Forecast and observation classes (a) Forecast (b) Observation (c) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

IP Scoring Rules: Foundations and Applications Jason Konek Department of Philosophy University

Waldorf Presentation 26 3 SCORING 1 2 3 4 FINAL Waldorf (6-2, 4) 0 13 0 13 26

Continuous Flow Scoring of Prose Constructed Response: A Hybrid of Automated and Human Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

MECT Microeconometrics Blundell Lecture 3 Selection Models Richard Blundell

Hypothesis Testing: Large Sample Asymptotic Theory Part IV James J. Heckman University of

Multi-parameter MCMC notes by Mark Holder Review In the last lecture we justified the

SUPERVISOR TRAINING Maddy Marasciulo-Rice Africa Regional Case Management Specialist, Malaria

COL106: Data Structures and Algorithms Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL106: Data

On Channel Bindings draft-ietf-nfsv4-channel-bindings-02.txt Nicolas.Williams@sun.com (to be

HINS Status and Strategy/Plans with Respect to Project X Bob Webber AAC Meeting November 16-17,

Sambuz

Useful Links

Newsletter

Mail Us

Scaling of scoring rules Jonas Wallin joint work with David Bolin - PowerPoint PPT Presentation

Scaling of scoring rules Jonas Wallin joint work with David Bolin (KAUST) CIRM virtual conference 2020-06-02 Forecast and observation classes (a) Forecast (b) Observation (c) Comparison 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

IP Scoring Rules: Foundations and Applications Jason Konek Department of Philosophy University

Waldorf Presentation 26 3 SCORING 1 2 3 4 FINAL Waldorf (6-2, 4) 0 13 0 13 26

Continuous Flow Scoring of Prose Constructed Response: A Hybrid of Automated and Human Scoring

Discriminative word alignment by learning the Discriminative word alignment by learning the

MECT Microeconometrics Blundell Lecture 3 Selection Models Richard Blundell

Hypothesis Testing: Large Sample Asymptotic Theory Part IV James J. Heckman University of

Multi-parameter MCMC notes by Mark Holder Review In the last lecture we justified the

SUPERVISOR TRAINING Maddy Marasciulo-Rice Africa Regional Case Management Specialist, Malaria

COL106: Data Structures and Algorithms Ragesh Jaiswal, IITD Ragesh Jaiswal, IITD COL106: Data

On Channel Bindings draft-ietf-nfsv4-channel-bindings-02.txt Nicolas.Williams@sun.com (to be

HINS Status and Strategy/Plans with Respect to Project X Bob Webber AAC Meeting November 16-17,

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms