Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using - - PowerPoint PPT Presentation

predicting hourly ozone pollution in dallas fort worth
SMART_READER_LITE
LIVE PREVIEW

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using - - PowerPoint PPT Presentation

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using Spatio Temporal Clustering Mahdi Ahmadi Yan Huang Kuruvilla John University of North Texas May 2023, 2015 Dallas, Texas, USA Presentation Outline Research


slide-1
SLIDE 1

Predicting Hourly Ozone Pollution in Dallas‐Fort Worth Area Using Spatio‐Temporal Clustering

May 20‐23, 2015 Dallas, Texas, USA Mahdi Ahmadi Yan Huang Kuruvilla John University of North Texas

slide-2
SLIDE 2

2

Presentation Outline

  • Research Background

– Ozone Pollution – Previous works

  • Research Method
  • Analysis and Results

– Temporal clustering – Spatial clustering – Liner regression

  • Closing remarks
slide-3
SLIDE 3
  • Ground level ozone can harm lung function and

irritate respiratory system

– Exposure to ozone is linked to premature death, asthma, bronchitis, and heart attack – It can cause damage to crops and animals

  • It is categorized as one of a ‘criteria pollutants’

by EPA

– National standard level of ozone has been amended since 1990

Tropospheric Ozone

slide-4
SLIDE 4

Ozone Pollution

U.S. counties with high ozone concentrations in 2009.

(source: http://www.epa.gov/o3healthtraining/what.html#fig1)

Current standard level: 0.075 ppm Proposed standard level: 0.060‐0.065 ppm

slide-5
SLIDE 5

Ozone Formation

slide-6
SLIDE 6
  • Ozone is a product of a chain of complex

photo‐chemical reactions

  • Ozone level is a non‐linear function of many

factors: O3 = f (SR, Temp, RH, P, Rain, Wind direction, Wind speed, NOx, VOCs, CO, …)

Ozone Formation

slide-7
SLIDE 7

7

  • EPA NO2 and O3 monitoring sites

Measurement Data

slide-8
SLIDE 8
  • Data set

– Meteorological data (temperature, solar intensity, relative humidity, wind speed, wind direction) – Emission data (NOx, VOCs) and Ozone data

  • Measured and recorded on the hourly base for

more than 14 years by

– Size of the database (US): 5 TB (14 years worth of data, over 700 monitoring sites)

Big Data Challenge

slide-9
SLIDE 9

Photochemical Modelling

  • Photochemical modelling

challenges

– Time‐consuming and complex procedure – High level of knowledge/proficiency – Accuracy of modelling/prediction – Difficulty of validation – High dimensionality of the solution/modeling space

slide-10
SLIDE 10
  • Objective: Forecasting ozone and adjusting

for meteorological variations

  • Additive linear models (S. Abdul‐Wahab et al., 1996;

Cassmassi & Bassett, 1993; Dueñas et al., 2002; Feister & Balzer, 1991; Fiore et al., 1998; Katsoulis, 1996; Korsog & Wolff, 1991; Kuntasal & Chang, 1987; Walker, 1985; Zeldin et al., 1990)

  • Non‐linear regression models (Aneiros‐Pérez et al.,

2004; Bloomfield et al., 1996; Chen et al., 1998; Davis, Eder, et al., 1998a; Khokhlov et al., 2008; Smith & Shively, 1995)

Previous Studies

slide-11
SLIDE 11
  • Objective: Forecasting ozone and adjusting

for meteorological variations

  • Principal Component Analysis (S. A. Abdul‐Wahab et

al., 2005; Pryor et al., 1995; Statheropoulos et al., 1998; Yu & Chang, 2000)

  • Artificial Neural Networks (S. A. Abdul‐Wahab & Al‐Alawi,

2002; Balaguer Ballester et al., 2002; Chaloulakou et al., 2003; Dutot et al., 2007; Elkamel et al., 2001; Gomez‐Sanchis et al., 2006; Hadjiiski & Hopke, 2000; Karatzas et al., 2008; Ruiz‐Suarez et al., 1995; Spellman, 1999; Wang et al., 2003; Yi & Prybutok, 1996)

Previous Studies

slide-12
SLIDE 12
  • Objective: Forecasting ozone and adjusting

for meteorological variations

  • Data Mining Techniques (Austin et al., 2014; Bruno et al.,

2004; Kaburlasos et al., 2007; Sujit Kumar Sahu & Bakar, 2012; Sujit K Sahu et al., 2007; Domínguez et. al., 2014).

  • Towards more data driven techniques…

Previous Studies

slide-13
SLIDE 13
  • Objective of the research is to predict ozone

and adjust it for meteorological variables using data mining techniques

– Recognize spatial and temporal patterns of

  • zone pollution in DFW area

– Use the time series in each cluster to build better linear regression model – Use the model to predict ozone and adjust it for meteorological variables

Research Objectives

slide-14
SLIDE 14

Research Flow Chart

Ozone and meteorological data archive reading from TAMIS datbase Pre‐processing dataset Data mining project 8‐hr ozone pattern recognition using k‐ means cluster analysis Post‐processing temporal cluster analysis to identify

  • zone seasons

Hierarchical cluster analysis to find spatial patterns of hourly ozone Developing linear regression model for each season and each zone

slide-15
SLIDE 15

Study Area

slide-16
SLIDE 16
  • Dataset

– Meteorological data (temperature, solar intensity, wind speed) from TAMIS (Texas Air Monitoring Information System) database – The dataset includes 1‐hr measurement time series

  • f ozone (O3), ambient temperature (T), solar

radiation (SR), wind speed (W) for 12 years (2002‐ 2013). – Approximately 5,886,720 total entries (14 CAMS x 24 hour x 365 days x 12 years x 4 variables

The Dataset

slide-17
SLIDE 17
  • Missing Values

– Any gap in the time series equal or less than 4 hour were replaced by employing linear interpolation. – Any day with more than 4 consecutive missing values was removed from the dataset.

  • 8‐hr time series

– For the seasonal analysis it makes more sense to perform cluster analysis on 8‐hr average ozone – 8‐hr average time series using moving average technique were generated. – The original 1‐hr time series were kept for spatial cluster analysis and liner regression.

Pre‐Processing

slide-18
SLIDE 18

Pre‐Processing

Hourly time series of variables

slide-19
SLIDE 19

Pre‐Processing

Scatter Plots of variables

slide-20
SLIDE 20

Pre‐Processing

Time series of 8‐hr ozone

slide-21
SLIDE 21
  • k‐means cluster analysis employed to find ozone

seasons

– Euclidean and Manhattan distance functions

  • KNIME 2.10.3 (an open platform for data‐driven

innovation) and Weka 3.7.11

Seasonal Pattern

slide-22
SLIDE 22

Seasonal Pattern

slide-23
SLIDE 23

Seasonal Pattern

Selecting proper number of k

Sum of square within (SSW)

slide-24
SLIDE 24

Seasonal Pattern

slide-25
SLIDE 25

Seasonal Pattern

slide-26
SLIDE 26

Selecting Ozone Seasons

Cluster Season Months #1

Low

Jan Feb Nov Dec #2

‐ ‐ ‐ ‐ #3

Moderate

Mar Apr May Oct #4

High

Jun Jul Aug Sep

slide-27
SLIDE 27
  • Agglomerative hierarchical cluster analysis

– Ward’s method and average‐link distance function – k‐means cluster analysis repeated to cross‐check the results

  • KNIME 2.10.3 and OriginPro2015 were used

Spatial Pattern

slide-28
SLIDE 28

Spatial Clusters

Spatial clusters during low ozone season

slide-29
SLIDE 29

Spatial Clusters

Spatial clusters during moderate ozone season

slide-30
SLIDE 30

Spatial Clusters

Spatial clusters during high ozone season

slide-31
SLIDE 31
  • The goal is to find linear fit in the form of:
  • Natural logarithm of ozone should be used to account

for the multiplicative effect of variables

  • 1‐Step ahead ozone value is the key variable
  • Time lag factor is used to account for cumulative effect
  • f meteorological variables
  • R2 and RSME are used to evaluate the fit

Linear Regression

Log SR

Log 1

slide-32
SLIDE 32
  • The goal is to find linear fit in the form of:

Linear Regression

Log SR

Log 1

Cluster

  • R2

RMSE #1 0.002797 0.359812 0.021030 0.793794 0.176548 0.827 7.361 #2 0.000206 0.241734 ‐0.000247 0.871374 0.356038 0.890 5.142 #3 0.002699 9.652480 0.027754 0.687160 0.311890 0.827 7.872

slide-33
SLIDE 33

Ozone Forecasting

slide-34
SLIDE 34

Ozone Forecasting

slide-35
SLIDE 35

Closing Remarks

  • Ozone seasons can be distinguished using

simple clustering techniques

  • Because it is data‐driven , temporal

clustering of ozone time series can account for seasonal variability in different locations (also can be compared/integrated with the expert’s knowledge)

slide-36
SLIDE 36

Closing Remarks

  • Spatial cluster analysis of ozone time series

can be used to develop linear regression models with very high accuracy (in addition to recognizing local ozone patterns)

  • Spatial clustering facilitates more flexible

forecasting (when a monitoring station is

  • ut of service)