Procedure for Air Quality Models Benchmarking P. Thunis, E. - - PowerPoint PPT Presentation

procedure for air quality models benchmarking
SMART_READER_LITE
LIVE PREVIEW

Procedure for Air Quality Models Benchmarking P. Thunis, E. - - PowerPoint PPT Presentation

FAIRMODE WG2 SG4 activity FAIRMODE meeting, Oslo, Sept. 2010 1 Institute for Environment and Sustainability Procedure for Air Quality Models Benchmarking P. Thunis, E. Georgieva, S. Galmarini http://ies.jrc.ec.europa.eu/


slide-1
SLIDE 1

FAIRMODE meeting, Oslo, Sept. 2010 1

http://ies.jrc.ec.europa.eu/ http://www.jrc.ec.europa.eu/

Institute for Environment and Sustainability

Procedure for Air Quality Models Benchmarking

  • P. Thunis, E. Georgieva, S. Galmarini

FAIRMODE WG2 – SG4 activity

slide-2
SLIDE 2

FAIRMODE meeting, Oslo, Sept. 2010 2

Agenda

Objective Key elements of the proposed procedure Usage of the procedure Discussion Lunch Break The Benchmarking service Discussion Work Plan Contributions & links to other SG Discussion

slide-3
SLIDE 3

FAIRMODE meeting, Oslo, Sept. 2010 3

Objective

Develop a procedure for the benchmarking of AQ models to evaluate and keep track of their performances:

– based on a common and permanent evaluation “scale” – with periodic joint exercises to assess and compare model quality.

Constraints:

– Make use of available tools and methodologies – Based on consensus – Application specific (assessment & planning)

slide-4
SLIDE 4

FAIRMODE meeting, Oslo, Sept. 2010 4

  • USA-EPA AMET package (Appel and Gilliam, 2008)
  • Tools from CityDelta and EuroDelta (Cuvelier et al. 2007)
  • ENSEMBLE platform (Galmarini S. et al. 2001, 2004).
  • BOOT software (Chang and Hanna, 2005)
  • Model validation Kit (Olesen, 2005)
  • EPA Guidance (2007, 2009)
  • AIR4EU conclusions (Borrego et al. 2008)
  • Mesoscale Model Evaluation – COST728 (Schluenzen & Sokhi, 2008)
  • Quality assurance of microscale models – COST732 (2007)
  • SEMIP project (Smoke & emissions model inter-comparison, 2009)
  • Evaluating the Performance of Air Quality Models, AEA (2009)
  • ASTM Guidance (ASTM, 2000)
  • PM model performance metrics (Boylan and Russell 2006)
  • Summary diagrams (Jolliff et al. 2009)

Many tools & methodologies already existing…

slide-5
SLIDE 5

FAIRMODE meeting, Oslo, Sept. 2010 5

Key elements of the procedure

Evaluation tool based on City- & Euro- Delta, POMI and HTAP inter- comparison exercises Multi-model evaluation and inter- comparison platform used by several modeling communities Statistical indicators and diagrams, criteria and goals, automatic reporting. Extraction of Monitoring data, Emissions, BC…

DELTA: ENSEMBLE Benchmarking Service Data Extraction

slide-6
SLIDE 6

FAIRMODE meeting, Oslo, Sept. 2010 6

Model results DELTA

JRC USER

Data Extraction Facility BENCHMARKING service Model performance evaluation reports

Benchmarking procedure: Key elements

slide-7
SLIDE 7

FAIRMODE meeting, Oslo, Sept. 2010 7

  • Intended for rapid diagnostics by single users (at

home)

  • Focus mostly on surface measurement-model pairs

(reduced set) “independence” of scale

  • Focus on AQD related pollutants on a yearly period

(but AQ related input data also checked)

  • Exploration and benchmarking modes
  • Includes a set of statistical indices and diagrams

(agreed)

  • Flexibility in terms of:

– Addition of new statistical indicators & diagrams – Choice of monitoring stations, models, scenarios…

The DELTA tool

slide-8
SLIDE 8

FAIRMODE meeting, Oslo, Sept. 2010 8

The DELTA Tool

slide-9
SLIDE 9

FAIRMODE meeting, Oslo, Sept. 2010 9

  • JRC Web based platform
  • All variables AQ and Meteo (4D fields) may be

considered (full set)

  • Exploration and benchmarking modes
  • Used for multi-model analysis & evaluation
  • Includes a set of statistical indices and diagrams

(agreed)

  • Acts as a model results depository
  • Flexibility in terms of:

– Model vs model comparison, model vs obs,model vs. groups of models – Choice of monitoring stations, models, scenarios…

The ENSEMBLE platform

slide-10
SLIDE 10

FAIRMODE meeting, Oslo, Sept. 2010 10

PURPOSE:

  • Selection of a core set of statistical indicators

and diagrams for a given model application in the frame of the AQD

  • Production of summary performance reports

based on a common scale

The BENCHMARKING service

slide-11
SLIDE 11

FAIRMODE meeting, Oslo, Sept. 2010 11

FEATURES:

  • Based on different testing levels (obs., mod. vs. mod., responses to

emission scenarios, input data, BC)

  • Decomposition of the evaluation in temporal and spatial segments on a

reduced dataset but for an entire year.

  • Structured around an agreed core set of indicators and diagrams

specific for each AQD related application

  • Definition of bounds for specific indicators, called hereafter goals and

criteria (regularly revised based on future joint modelling exercises).

  • Reports are obtained through an automatic procedure and follow a

pre-defined template

  • JRC based service but with replica included in the DELTA tool, i.e. one

unique “scale” used in ENSEMBLE and DELTA to evaluate models

The BENCHMARKING service

slide-12
SLIDE 12

FAIRMODE meeting, Oslo, Sept. 2010 12

Single usage

  • Observations (AIRBASE,…)
  • Reference model data (EU)
  • Boundary conditions

Joint exercise

  • All required input data

The EXTRACTION facility

slide-13
SLIDE 13

FAIRMODE meeting, Oslo, Sept. 2010 13

  • Usage 1: Individual model / MS
  • Usage 2: Periodical Joint Activities

Usage of the procedure

slide-14
SLIDE 14

FAIRMODE meeting, Oslo, Sept. 2010 14

Model results DELTA

JRC USER

Data Extraction Facility Official Reports Unofficial Working Report BENCHMARKING service

Usage 1: Individual Model/MS

REDUCED SET REDUCED SET

slide-15
SLIDE 15

FAIRMODE meeting, Oslo, Sept. 2010 15

Model results DELTA

JRC USER

Data Extraction Facility Official Reports Unofficial Working Report Unofficial Working Report BENCHMARKING service

Usage 2: Joint activities

REDUCED SET REDUCED SET FULL SET

slide-16
SLIDE 16

FAIRMODE meeting, Oslo, Sept. 2010 16

  • Same single evaluation tool
  • Common (JRC based) place for evaluation & inter-

comparison and acquisition of data

  • Tracking of the historic evolution of model quality

relevant for policy decisions

  • Evolving reporting tool
  • Data depository
  • Quantification of uncertainty in model results

Expected benefits

slide-17
SLIDE 17

FAIRMODE meeting, Oslo, Sept. 2010 17

Conclusions & discussion

  • Common and general frame for model evaluation
  • Application-specific benchmarking service
  • User and JRC based components
  • Updating process via expert-judgment bounds
  • Common joint exercises
slide-18
SLIDE 18

FAIRMODE meeting, Oslo, Sept. 2010 18

PURPOSE:

  • Selection of a core set of statistical indicators and

diagrams for a given model application in the frame of the AQD

  • Production of summary performance reports based on

a common scale and pre-defined template

Reduced vs. full model datasets Organized around different testing levels Updating process: bounds (goals and criteria) Breakdown of the analysis into temporal and spatial segments Summary and annexes

The BENCHMARKING service

slide-19
SLIDE 19

FAIRMODE meeting, Oslo, Sept. 2010 19

Testing levels:

  • Input data

ICI Model vs. Input data

  • Observations MOI

Model vs. Observations

  • Multi-model

MMI Model vs. model (base-case)

  • Scenarios

MRI Model vs. model (scenarios)

The BENCHMARKING service

slide-20
SLIDE 20

FAIRMODE meeting, Oslo, Sept. 2010 20

  • R

Correlation

  • B

Bias

  • SD

Standard deviation

  • FAC2

Factor 2

  • RMSE

Root Mean Square Error

  • RMSEs

Systematic RMSE

  • RMSEu

Unsystematic RMSE

  • CRMSE

Centered RMSE

  • IOA

Index of Agreement

  • MFB

Mean Fractional Bias

  • MFE

Mean Fractional Error

  • RDE

Relative Directive Error

  • RPE

Relative Percentile Error

Core sets O3 / App1 NO2

Set and core-sets of indicators

O3 / App2

slide-21
SLIDE 21

FAIRMODE meeting, Oslo, Sept. 2010 21

  • Scatter plots
  • Q-Q plots
  • Bar-plot
  • Time series
  • Taylor diagrams
  • Target diagrams
  • Soccer plots
  • Bugle plots
  • Conditional plots
  • Multi-model diagram

Set and core-sets of diagrams

Core sets O3 / App1 NO2 O3 / App2

slide-22
SLIDE 22

FAIRMODE meeting, Oslo, Sept. 2010 22

Criteria: Acceptable performance for a given type of application (e.g. PM: MFE=75%, MFB=+/-60%) Goal: Best performance a model should aim to reach given its current capabilities (e.g. PM: MFE=50%, MFB=+/-30%)

  • Dev. ENS:

Deviation from ensemble mean. Flagged when model results are deviating from fixed bounds around the ensemble mean and no observation is available.

  • Obs. Unc:

Best performance a model should aim to reach given the observation uncertainty Updating of bounds based on outcome of joint exercises

Bounds

slide-23
SLIDE 23

FAIRMODE meeting, Oslo, Sept. 2010 23

Meteorology- regional scale (Emery et al., 2001)

Parameter Metric Criteria Wind speed RMSE Bias IOA ≤ 2 m/s ≤ ± 0.5 m/s ≥ 0.6 Wind direction Gross error Bias ≤ 30 deg ≤ ± 10 deg Temperature Gross error Bias IOA ≤ 2K ≤ ± 0.5 K ≥ 0.8 Humidity Gross error Bias IOA ≤ 2 g/kg ≤ ± 1 g/kg ≥ 0.6

Criteria & goals

slide-24
SLIDE 24

FAIRMODE meeting, Oslo, Sept. 2010 24

Species Metric Criteria Goal

Boylan and Russel, 2005, EPA report 2007

Main PM constituents (> 30% total mass), PM2.5 MFE MFB 75% ±60% 50% ±30% Minor PM constituents (< 30% total mass) Exp variations to reach 100% / 200% at 0 concentrations Ozone MFE MFB 35% 15%

Evaluating the Performance of Air Quality Models, AEA (2009)

Any pollutant FAC2 NMB Half points within

  • 0.2 < MFB < 0.2

Air quality model performances evaluation, Chang et Hanna (2004)

NOx, CO, PM10 FAC2 FB NMSE Half points within

  • 0.3 < FB < 0.3

NMSE < 4

Air Quality (Regional scale modelling) Criteria & goals

slide-25
SLIDE 25

FAIRMODE meeting, Oslo, Sept. 2010 25

Bugle plot (Boylan 2005) Summary diagrams

slide-26
SLIDE 26

FAIRMODE meeting, Oslo, Sept. 2010 26

From Taylor diagram to Target plot (Jolliff 2009)

Cos-1R

CMRSE

Summary diagrams

slide-27
SLIDE 27

FAIRMODE meeting, Oslo, Sept. 2010 27

Species Concentration (ug/m3

“ENSEMBLE”

  • Dev. ENS -
  • Dev. ENS +

NOx VOC NH3

Multi-model Diagram

Summary diagrams

slide-28
SLIDE 28

FAIRMODE meeting, Oslo, Sept. 2010 28

An example: POMI data

URB – SUB - RUR PIE - LOM

Summary diagrams

Target plot: all stations Target plot: Groups Bugle plot: Groups

slide-29
SLIDE 29

FAIRMODE meeting, Oslo, Sept. 2010 29

DELTA

JRC

Official Reports Depository Application-specific Summary report (ICI – MOI - MRI) Annexes (ICI – MOI - MRI) BENCHMARKING service

USER

Performance summary report

SINGLE-MODEL Joint-exercise Summary report (MMI – MRI) Annexes (MMI – MRI) MULTI-MODEL

slide-30
SLIDE 30

FAIRMODE meeting, Oslo, Sept. 2010 30

Application specific performance summary report (single-model)

slide-31
SLIDE 31

FAIRMODE meeting, Oslo, Sept. 2010 31

Performance summary report (multi-model)

RMSE Bias IOA … Mod 1 Mod 2 Mod 3 Mod X

slide-32
SLIDE 32

FAIRMODE meeting, Oslo, Sept. 2010 32

Conclusions & discussion

  • Completeness of the testing levels
  • Composite diagrams to synthesize information
  • Choice of relevant indicators and diagrams to define core set

depending on application (model type?)

  • Complexity, organization and size of the reports:

– Nb. of diagrams & indicators – Nb. of variables tested – Summary and extended report sections

  • Bounds: definition & updating process
slide-33
SLIDE 33

FAIRMODE meeting, Oslo, Sept. 2010 33

Work Plan

  • Discussion and consensus on overall methodology

(FAIRMODE meeting 09/2010)

  • Development of the DELTA and ICI-MOI benchmarking

service prototypes (Dec 2010)

  • Testing of the prototypes on existing datasets (2011)
  • Development of the JRC Web facilities (MMI-MRI

benchmarking, data extraction, harmonization of output formats…)

  • Set-up of a joint exercise for testing of the whole system

(2012)

slide-34
SLIDE 34

FAIRMODE meeting, Oslo, Sept. 2010 34

Contributions / Interactions

  • Discussion and definition of the benchmarking service elements

(species, statistics, goals and criterias…) for model performance reporting per pollutant/scale.

– Urban/agglomerate scale: first on POMI dataset but other datasets required (even single model validation) workshop by mid 2011 – European scale: within the Eurodelta exercise draft by end 2011 – Local scale: Datasets are required ??

  • Practical organization & communication

– Are emails sufficient? – Intermediate workshops?

  • Links to other SGs

– Required methodology to assess station representativness – Data assimilation techniques could make use of benchmark databank (in future)

  • Definition of and participation to the joint activities