Manfred Dorninger University of Vienna Vienna, Austria manfred.dorninger@univie.ac.at
Verification of forecasts of continuous variables
7th Verification Tutorial Course, Berlin, 3-6 May, 2017
Thanks to: B. Brown, M. Göber, B. Casati
continuous variables Manfred Dorninger University of Vienna - - PowerPoint PPT Presentation
Verification of forecasts of continuous variables Manfred Dorninger University of Vienna Vienna, Austria manfred.dorninger@univie.ac.at Thanks to: B. Brown, M. Gber, B. Casati 7 th Verification Tutorial Course, Berlin, 3-6 May, 2017 Types
Manfred Dorninger University of Vienna Vienna, Austria manfred.dorninger@univie.ac.at
7th Verification Tutorial Course, Berlin, 3-6 May, 2017
Thanks to: B. Brown, M. Göber, B. Casati
– Ex: Temperature, Rainfall amount, Humidity, Wind speed
– Dichotomous (e.g., Rain vs. no rain, freezing or no freezing) – Multi-category (e.g., Cloud amount, precipitation type) – May result from subsetting continuous variables into categories
“verify” something: i.e., was the forecast right or wrong?
know “how” they were wrong
Scatter-plot: plot of
values Perfect forecast = obs, points should be on the 45o diagonal Provides information on: bias, outliers, error magnitude, linear association, peculiar behaviours in extremes, misses and false alarms (link to contingency table)
Regression line
Scatter-plot: How will the scatter plot and regression line change for longer forecasts? Scatter-plot: How would you interpret a horizontal regression line?
OBS 72 h FC 24 h FC FC OBS FC No correlation no skill
Quantile-quantile plots: OBS quantile versus the corresponding FCST quantile Perfect: FCST=OBS, points should be on the 45o diagonal
q0.75
Q: how many forecasts exhibit an error larger than 10 degrees ? Q: How many forecasts exhibit an error larger than 5 degrees ? Q: Is the forecast error due mainly to an under-forecast or an over-forecast ?
Does the forecast detect correctly temperatures above 18 degrees ? Does the forecast detect correctly temperatures below 10 degrees ?
Q: How does the forecast handle the temperatures above 10 degrees ?
forecast of temperatures larger than 10 degrees ? Q: How does the forecast handle the temperatures below -20 degrees ?
events or false alarms cold events ?
temperature compare with the
Visual comparison: Histograms, box-plots, … Summary statistics:
IQR STDEV MEDIAN MEAN 9.75 5.99 17.00 18.62 FCST 8.52 5.18 20.25 20.71 OBS
1
1 mean median
n i i= 0.5
= X = x n = q
2 1 0.25
1 st dev Inter Quartile Range IQR
n i i= 0.75
= x X n = = q q
Conditional histogram and conditional box-plot
Histogram of forecast temperatures given an observed temperature of -3 deg C and -7 deg C. 11 Atlantic region stations for the period 1/86 to 3/86. Sample size 701 cases. Stanski et al., 1989
cannot discriminate
Q: Look at the figure: What can you say about the forecast system??
cannot discriminate
Frequency Temp
can discriminate
Frequency Temp
between the forecast and observed means
indicates over-forecast, negative bias indicates under- forecast ( bias correction)
and negative error can – and hopefully do – cancel out)
1
n i i i=
f = forecast; x = observation
Gorgas, 2006 Monthly mean bias of MSLP field (LM-VERA) in hPa over eastern Alps Heat low too weak Cold high too weak
magnitude
1
n i i i=
Average of the squares of the errors: it measures the magnitude of the error, weighted on the squares of the errors it does not indicate the direction of the error Quadratic rule, therefore large weight on large errors: good if you wish to penalize large error sensitive to large êrrors (e.g. precipitation) and outliers; sensitive to large variance (high resolution models); encourage conservative forecasts (e.g. climatology)
Attribu ibute: e: meas asures ures accur uracy acy
2 1
n i i i=
RMSE is the squared root of the MSE: measures the magnitude of the error retaining the variable unit (e.g. OC) Similar properties of MSE: it does not indicate the direction the error; it is defined with a quadratic rule = sensitive to large values, etc. NOTE: RMSE is always larger or equal than the MAE Q: if I verify two sets of data and in one I find RMSE ≫ MAE, in the other I find RMSE ≳ MAE, which set is more likely to have large outliers ? Which set has larger variance ? Attribu ibute: e: meas asures ures accur uracy acy
1 2 2 1 1
1 cov 1 1
n i i i= XY n n Y X i i i= i=
y y x x (Y,X) n r = = s s y y x x n n
Measures linear association between forecast and observation Y and X rescaled (non-dimensional) covariance: ranges in [-1,1] It is not sensitive to the bias The correlation coefficient alone does not provide information on the inclination of the regression line (it says only is it is positively or negatively tilted); observation and forecast variances are needed; the slope coefficient of the regression line is given by b = (sX/sY)rXY Not robust = better if data are normally distributed Not resistant = sensitive to large values and outliers Attribu ibute: e: measures asures assoc
ation ion
fx
What is wrong with the correlation coefficient as a measure of performance?
Doesn’t take into account biases and amplitude – can inflate performance estimate More appropriate as a measure of “potential” performance
) , ( * 2 ) , cov( * 2 2
2 2 2 2 2 2 2 2 2 2
cor bias MSE
bias MSE
MSE
MSE
f f
Consequence: smooth forecasts verify better ) , ( min
_ _ ! !
cor MSE MSE
MSE f f
Bias can be subtracted ! BC_(R)MSE
Reynold‘s Averaging
Dorninger Verifikation WS 2015
2 2 2
2 2 2
2 2
1 _
f
X X X X N RMSE BC
Dorninger Verifikation WS 2015
f
r cos
Dorninger Verifikation WS 2015 Gorgas, 2006 Reference
predictions than climatology? How much more accurate are they than the model’s temperature predictions?
Where M is the verification measure for the forecasts, Mref is the measure for the reference forecasts, and Mperf is the measure for perfect forecasts (=0)
ref perf ref
M M SS M M
2 2 2
and 1
2 2 Y cli X XY XY X X X
s MSE Y X Y = X ; MSE = s RV = = r r s s s
Same definition and properties as the MAE skill score: measure accuracy with respect to reference forecast, positive values = skill; negative values = no skill Sensitive to sample size (for stability) and sample climatology (e.g. extremes): needs large samples Reduction of Variance: MSE skill score with respect to climatology. If sample climatology is considered:
linear correlation bias reliability: regression line slope coeff b=(sX/sY)rXY
1
ref MSE perf ref ref
MSE MSE MSE SS = = MSE MSE MSE
Attribu ibute: e: meas asures ures sk skill ll
10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 1 2 3 4 5 6 7 8 9 10 11 12 Reduced error variance % MSE in (m/s)**2
Reduced variance MSE(Persistence) MSE(forecast) 24h mean wind forecast
Higher skill Lower accuracy
Accuracy vs skill
High skill because getting reference worse.
2 2 2 2 2 2 m m m m m m m m m map cent m m m map m map m m m m m m m map m map unc m m m m m m m map m map m map m map
y' = y c x' = x c y' y' x' x' AC = y' y' x' x' y c x c y' x' AC = = y c x c y' x'
Forecast and observation anomalies to evaluate forecast quality not accounting for correct forecast
Centred and uncentred AC for weather variables defined over a spatial domain: cm is the climatology at the grid-point m,
the field
ECMWF
by using the cumulative frequencies of the
distribution are penalized less than errors in the centre of the distribution
version developed by Potts (1996)
1
n X i X i i=
q0.75
Bias
References: Jolliffe and Stephenson (2012): Forecast Verification: a practitioner’s guide, 2nd Ed. Wiley & Sons. Wilks (2011): Statistical Methods in Atmospheric Science, Academic press. Stanski, Burrows, Wilson (1989) Survey of Common Verification Methods in Meteorology http://www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web _page.html