Ensemble verification: Old scores, new perspectives Sabrina Wahl, - - PowerPoint PPT Presentation

ensemble verification old scores new perspectives
SMART_READER_LITE
LIVE PREVIEW

Ensemble verification: Old scores, new perspectives Sabrina Wahl, - - PowerPoint PPT Presentation

Ensemble verification Quantile score and decomposition Generalization to CRPS Ensemble verification: Old scores, new perspectives Sabrina Wahl, Petra Friederichs, Jan Keller WMO Verification Workshop Berlin, May 2017 Sabrina Wahl Ensemble


slide-1
SLIDE 1

Ensemble verification Quantile score and decomposition Generalization to CRPS

Ensemble verification: Old scores, new perspectives

Sabrina Wahl, Petra Friederichs, Jan Keller

WMO Verification Workshop Berlin, May 2017

Sabrina Wahl Ensemble verification

slide-2
SLIDE 2

Ensemble verification Quantile score and decomposition Generalization to CRPS

ensemble forecast probabilistic forecast

equally probable simulations

  • f numerical model

pdf, cdf, mean, sd, quantiles, probabilities,…

translation, interpretation, post-processing

➡ proper scoring rules:
 CRPS, Brier score, quantile score, 
 logarithmic score, MSE, MAE, … ➡ calibration: rank (pit) histogram, beta score ➡ discrimination: generalized discrimination score ➡ sharpness: prediction interval

time M N p(y ) p(y |x ) p(x |y ) ti tf xi xf p(x ) i f model space

  • bservation space

H G i i f f

  • 1
  • 1
  • Fig. 1 from Stephenson et al. (2005)

Sabrina Wahl Ensemble verification

slide-3
SLIDE 3

Ensemble verification Quantile score and decomposition Generalization to CRPS

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 5 10 15 20 25 30 35 days temperature (°C)

  • ensemble forecast in terms of empirical distribution

boxplot represents forecast distribution in terms of quantiles evaluation of ensemble members as quantiles

Sabrina Wahl Ensemble verification

slide-4
SLIDE 4

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

Verification-framework for quantiles

score for quantile forecasts qτ when y is the event that materializes, with τ ∈ (0, 1) the probability level SQ(qτ, y) = ρτ(y − qτ) =

  • | y − qτ | τ

if y ≥ qτ | y − qτ | (1 − τ) if y < qτ empirical quantile score from a set of N forecast-observation pairs QS(τ) = 1 N

N

  • i=1

ρτ(yi − qτ,i) decomposition of the quantile score (Bentzien and Friederichs, 2014) QS(τ) = 1 N

N

  • i=1

ρτ(yi − qτ,i) = UNC(τ) − RES(τ) + REL(τ)

Sabrina Wahl Ensemble verification

slide-5
SLIDE 5

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

Calibration: quantile reliability diagram forecast intervals Ik y(k)

τ : conditional observed

quantile in Ik discrete values y(k)

τ , q(k) τ

with k = 1, ..., K ≤ N

  • 0.05

0.20 0.50 2.00 5.00 20.00 0.05 0.10 0.20 0.50 1.00 2.00 5.00 10.00 20.00 50.00

  • bservation

quantile forecast Sabrina Wahl Ensemble verification

slide-6
SLIDE 6

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

Calibration: quantile reliability diagram forecast intervals Ik y(k)

τ : conditional observed

quantile in Ik discrete values y(k)

τ , q(k) τ

with k = 1, ..., K ≤ N

  • 0.05

0.20 0.50 2.00 5.00 20.00 0.05 0.10 0.20 0.50 1.00 2.00 5.00 10.00 20.00 50.00

  • bservation

quantile forecast Sabrina Wahl Ensemble verification

slide-7
SLIDE 7

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

Reliability, perfect if y (k)

τ

= q(k)

τ

REL = 1 N

K

  • k=1
  • n∈Ik
  • ρτ
  • yn − q(k)

τ

  • − ρτ
  • yn − ¯

y(k)

τ

  • Resolution, good if y (k)

τ

= ¯ yτ RES = 1 N

K

  • k=1
  • n∈Ik
  • ρτ (yn − ¯

yτ) − ρτ

  • yn − ¯

y (k)

τ

  • Uncertainty, from sample climatology ¯

yτ UNC = 1 N

N

  • n=1

ρτ(yn − ¯ yτ)

Sabrina Wahl Ensemble verification

slide-8
SLIDE 8

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

Score for multiple quantiles qτ1, ..., qτk with τ1, ..., τk ∈ (0, 1) SQ(qτ1, ..., qτk , y) =

k

  • i=1

ρτi (y − qτi ) interpret ensemble members e(1) ≤ e(2) ≤ ... ≤ e(M) as quantiles to the probability levels τ1, ..., τM ∈ (0, 1) QSENS =

M

  • j=1

QS(τj) =

M

  • j=1
  • 1

N

N

  • i=1

ρτj

  • yi − e(j)

i

  • quantile score decomposition for ensemble

QSENS =

M

  • j=1

UNC(τj) −

M

  • j=1

RES(τj) +

M

  • j=1

REL(τj)

Sabrina Wahl Ensemble verification

slide-9
SLIDE 9

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

QSENS =

M

  • j=1

UNC(τj) −

M

  • j=1

RES(τj) +

M

  • j=1

REL(τj) quantile reliability curves for each τj graphical exploration of UNC(τ), RES(τ), REL(τ) for τ = (τ1, ..., τM)

Example: COSMO-DE-EPS 12-hourly precipitation forecasts for 365 days in 2011. Number of observations N = 384 679 (from 1079 observing sites). Number of ensemble members M = 20.

Sabrina Wahl Ensemble verification

slide-10
SLIDE 10

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

quantile reliability curves should be close to diagonal ”spread” around the diagonal indicates insufficient ensemble spread underestimation of higher quantiles

  • verestimation of lower

quantiles

quantile forecast conditional observed quantile τ 0.03 0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.43 0.48 0.52 0.57 0.62 0.67 0.72 0.77 0.82 0.87 0.92 0.97 0.1 0.2 0.5 1 2 5 10 20 50 0.1 0.2 0.5 1 2 5 10 20 50

Sabrina Wahl Ensemble verification

slide-11
SLIDE 11

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 τ QS UNC 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 τ RES UNC 0.0 0.2 0.4 0.6 0.8 1.0 0.000 0.005 0.010 0.015 0.020 τ REL

graphical exploration of UNC(τ), RES(τ), REL(τ)

  • ptimal score:

QS = 0 REL = 0 RES = UNC

Sabrina Wahl Ensemble verification

slide-12
SLIDE 12

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

quantile score decomposition QS = UNC − RES + REL (1) uncertainty is independent of forecasts, divide eq. (1) by UNC QSS = 1 − QS UNC = RES UNC − REL UNC (2)

  • ptimal values

QSS = 1 maximum improvement over climatology RES/UNC = 1 maximum achievable resolution REL/UNC = 0 perfect calibration

Sabrina Wahl Ensemble verification

slide-13
SLIDE 13

Ensemble verification Quantile score and decomposition Generalization to CRPS single quantile multiple quantiles

plot scaled resolution vs. scaled reliability contours show lines of constant quantile skill score combine three forecast attributes in one diagram compare different quantiles and/or forecast models

rel/unc (optimum: 0) res/unc (optimum: 1)

− . 3 − . 2 − . 1 0.1 0.2 0.3 0.4 . 5

0.001 0.005 0.020 0.050 0.200 0.500 0.1 0.2 0.3 0.4 0.5 0.6

τ 0.03 0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.43 0.48 0.52 0.57 0.62 0.67 0.72 0.77 0.82 0.87 0.92 0.97

Sabrina Wahl Ensemble verification

slide-14
SLIDE 14

Ensemble verification Quantile score and decomposition Generalization to CRPS 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0 t probability F(t) CRPS F(t) − H(t − y) 5 10 15 20 25 0.0 0.2 0.4 0.6 0.8 1.0 t probability 1−F(t) BS y > t : F(t) y < t : 1 − F(t) 0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 25 τ quantile F−1(τ) QS y < qτ :(1 − τ)(qτ − y) y > qτ : τ(y − qτ)

SCRP =

  • R

SB(1 − F(u), y) du = 2 1 SQ(F −1(τ), y) dτ

see e.g. Gneiting and Raftery (2007)

Sabrina Wahl Ensemble verification

slide-15
SLIDE 15

Ensemble verification Quantile score and decomposition Generalization to CRPS

let e(1) ≤ e(2) ≤ ... ≤ e(M) be an ensemble forecast for Y cumulative distribution function from ensemble Fe(x) =

M

  • i=1

wi H(x − e(i)) weights wi > 0 and M

i=1 wi = 1

Fe features exactly M jumps at the points x = e(i) with jump height wi

w3 a3 F e2 w1 a1 a2 w2 e1 e3 1 ! ! !

  • Fig. 1 from Broecker (2012)

Sabrina Wahl Ensemble verification

slide-16
SLIDE 16

Ensemble verification Quantile score and decomposition Generalization to CRPS

score for distribution Fe SCRP(Fe, y) =

  • [Fe(x) − H(x − y)] dx

is equivalent to sum of weighted quantile scores (Broecker, 2012) SCRP(Fe, y) = 2

M

  • i=1

wi ρτi (y − e(i)) with decomposition SCRP(Fe, y) = 2

M

  • i=1

wi UNC(τi) − 2

M

  • i=1

wi RES(τi) + 2

M

  • i=1

wi REL(τi)

Sabrina Wahl Ensemble verification

slide-17
SLIDE 17

Ensemble verification Quantile score and decomposition Generalization to CRPS

contours show lines of constant CRPS skill score scaled resolution and reliability: sum over all τ compare different forecast models and/or lead times

rel/unc (optimum: 0) res/unc (optimum: 1)

0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.002 0.005 0.010 0.020 0.050 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 X 1 2 3 4 5 6 7 8 9 X 1 2 3 4 5 6 7 8 9 X ncep eccc ecmwf

Example: Global EPS daily 12 UTC 500 hPa geopotenial forecasts for 30 days in 2012 (JJA). Number of gridboxes: 720 × 361 (observations: ERA Interim). Number of ensemble members: 20 to 50.

Sabrina Wahl Ensemble verification

slide-18
SLIDE 18

Ensemble verification Quantile score and decomposition Generalization to CRPS

Summary SCRP(Fe, y) = 2

M

  • i=1

wi UNC(τi) − 2

M

  • i=1

wi RES(τi) + 2

M

  • i=1

wi REL(τi) Ensemble verification using quantiles can have different levels of complexity Representation of CRPS as weighted sum over quantile scores

1

CRPS single value (compare different models, lead times, ...)

2

CRPS attributes: skill, resolution and reliability as function of τ

3

quantile reliability curves Application to empirical distribution as well as to parametric distribution derived from statistical postprocessing

Sabrina Wahl Ensemble verification

slide-19
SLIDE 19

Ensemble verification Quantile score and decomposition Generalization to CRPS

1 Bentzien and Friederichs, “Decomposition and graphical portrayal of the quantile score,” Quarterly Journal of the Royal Meteorological Society, vol. 140, pp. 1924–1934, 2014. 2 Broecker, ”Evaluating raw ensembles with the continuous ranked probability score”, Quarterly Journal of the Royal Meteorological Society, vol. 138, pp. 1611–1617, 2012. 3 Gneiting and Raftery, “Strictly proper scoring rules, prediction, and estimation”, Journal of the American Statistical Association, vol. 102, pp. 359–378, 2007. 4 Hyndman and Fan, ”Sample quantiles in statistical packages”, The American Statistician, vol. 50, pp. 361-365, 1996. 5 Stephenson et al., ”Forecast assimilation: a unified framework for the combination of multi-model weather and climate predictions”, Tellus, vol. 57 pp. 253-264, 2005.

Sabrina Wahl Ensemble verification

slide-20
SLIDE 20

Ensemble verification Quantile score and decomposition Generalization to CRPS

Hyndman and Fan (1996): Sample quantiles in statistical packages Definition 4: τj =

j M

Definition 5: τj = j−0.5

M

Definition 6: τj =

j M+1

Definition 7: τj =

j−1 M−1

Definition 8: τj =

j−1/3 M+1/3

Definition 9: τj =

j−3/8 M+1/4

for j = 1, ..., M (number of ensemble members)

Sabrina Wahl Ensemble verification

slide-21
SLIDE 21

Ensemble verification Quantile score and decomposition Generalization to CRPS rel/unc (optimum: 0) res/unc (optimum: 1)

−0.2 −0.1 0.1 . 2 . 3 . 4 . 5 0.6 0.7

0.02 0.05 0.10 0.20 0.50 1.00 0.66 0.68 0.70 0.72 0.74 0.76

τ 0.03 0.08 0.13 0.18 0.23 0.28 0.33 0.38 0.43 0.48 0.52 0.57 0.62 0.67 0.72 0.77 0.82 0.87 0.92 0.97

−10 10 20 30 −10 10 20 30 quantile forecast conditional observed quantile

τ 0.033 0.082 0.131 0.18 0.23 0.279 0.328 0.377 0.426 0.475 τ 0.525 0.574 0.623 0.672 0.721 0.77 0.82 0.869 0.918 0.967

Example: COSMO-DE-EPS daily 12 UTC temperature forecasts for 365 days in 2011. Number of observations N = 174 603 (from 481 observing sites). Number of ensemble members M = 20.

Sabrina Wahl Ensemble verification