predicting hourly ozone pollution in dallas fort worth
play

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using - PowerPoint PPT Presentation

Predicting Hourly Ozone Pollution in Dallas Fort Worth Area Using Spatio Temporal Clustering Mahdi Ahmadi Yan Huang Kuruvilla John University of North Texas May 2023, 2015 Dallas, Texas, USA Presentation Outline Research


  1. Predicting Hourly Ozone Pollution in Dallas ‐ Fort Worth Area Using Spatio ‐ Temporal Clustering Mahdi Ahmadi Yan Huang Kuruvilla John University of North Texas May 20‐23, 2015 Dallas, Texas, USA

  2. Presentation Outline • Research Background – Ozone Pollution – Previous works • Research Method • Analysis and Results – Temporal clustering – Spatial clustering – Liner regression • Closing remarks 2

  3. Tropospheric Ozone • Ground level ozone can harm lung function and irritate respiratory system – Exposure to ozone is linked to premature death, asthma, bronchitis, and heart attack – It can cause damage to crops and animals • It is categorized as one of a ‘criteria pollutants’ by EPA – National standard level of ozone has been amended since 1990

  4. Ozone Pollution Current standard level: 0.075 ppm Proposed standard level: 0.060‐0.065 ppm U.S. counties with high ozone concentrations in 2009. (source: http://www.epa.gov/o3healthtraining/what.html#fig1)

  5. Ozone Formation

  6. Ozone Formation • Ozone is a product of a chain of complex photo‐chemical reactions • Ozone level is a non‐linear function of many factors: O 3 = f (SR, Temp, RH, P, Rain, Wind direction, Wind speed, NOx, VOCs, CO, …)

  7. Measurement Data • EPA NO 2 and O 3 monitoring sites 7

  8. Big Data Challenge • Data set – Meteorological data (temperature, solar intensity, relative humidity, wind speed, wind direction) – Emission data (NOx, VOCs) and Ozone data • Measured and recorded on the hourly base for more than 14 years by – Size of the database (US): 5 TB (14 years worth of data, over 700 monitoring sites)

  9. Photochemical Modelling • Photochemical modelling challenges – Time‐consuming and complex procedure – High level of knowledge/proficiency – Accuracy of modelling/prediction – Difficulty of validation – High dimensionality of the solution/modeling space

  10. Previous Studies • Objective : Forecasting ozone and adjusting for meteorological variations • Additive linear models (S. Abdul‐Wahab et al., 1996; Cassmassi & Bassett, 1993; Dueñas et al., 2002; Feister & Balzer, 1991; Fiore et al., 1998; Katsoulis, 1996; Korsog & Wolff, 1991; Kuntasal & Chang, 1987; Walker, 1985; Zeldin et al., 1990) • Non‐linear regression models (Aneiros‐Pérez et al., 2004; Bloomfield et al., 1996; Chen et al., 1998; Davis, Eder, et al., 1998a; Khokhlov et al., 2008; Smith & Shively, 1995)

  11. Previous Studies • Objective : Forecasting ozone and adjusting for meteorological variations • Principal Component Analysis (S. A. Abdul‐Wahab et al., 2005; Pryor et al., 1995; Statheropoulos et al., 1998; Yu & Chang, 2000) • Artificial Neural Networks (S. A. Abdul‐Wahab & Al‐Alawi, 2002; Balaguer Ballester et al., 2002; Chaloulakou et al., 2003; Dutot et al., 2007; Elkamel et al., 2001; Gomez‐Sanchis et al., 2006; Hadjiiski & Hopke, 2000; Karatzas et al., 2008; Ruiz‐Suarez et al., 1995; Spellman, 1999; Wang et al., 2003; Yi & Prybutok, 1996)

  12. Previous Studies • Objective : Forecasting ozone and adjusting for meteorological variations • Data Mining Techniques (Austin et al., 2014; Bruno et al., 2004; Kaburlasos et al., 2007; Sujit Kumar Sahu & Bakar, 2012; Sujit K Sahu et al., 2007; Domínguez et. al., 2014). • Towards more data driven techniques…

  13. Research Objectives • Objective of the research is to predict ozone and adjust it for meteorological variables using data mining techniques – Recognize spatial and temporal patterns of ozone pollution in DFW area – Use the time series in each cluster to build better linear regression model – Use the model to predict ozone and adjust it for meteorological variables

  14. Research Flow Chart Ozone and 8 ‐ hr ozone pattern meteorological Pre ‐ processing recognition using k ‐ Data mining project data archive dataset means cluster reading from analysis TAMIS datbase Developing linear Hierarchical cluster Post ‐ processing regression model for analysis to find temporal cluster each season and spatial patterns of analysis to identify each zone hourly ozone ozone seasons

  15. Study Area

  16. The Dataset • Dataset – Meteorological data (temperature, solar intensity, wind speed) from TAMIS (Texas Air Monitoring Information System) database – The dataset includes 1‐hr measurement time series of ozone (O 3 ), ambient temperature (T), solar radiation (SR), wind speed (W) for 12 years (2002‐ 2013). – Approximately 5,886,720 total entries (14 CAMS x 24 hour x 365 days x 12 years x 4 variables

  17. Pre‐Processing • Missing Values – Any gap in the time series equal or less than 4 hour were replaced by employing linear interpolation. – Any day with more than 4 consecutive missing values was removed from the dataset. • 8‐hr time series – For the seasonal analysis it makes more sense to perform cluster analysis on 8‐hr average ozone – 8‐hr average time series using moving average technique were generated. – The original 1‐hr time series were kept for spatial cluster analysis and liner regression .

  18. Pre‐Processing Hourly time series of variables

  19. Pre‐Processing Scatter Plots of variables

  20. Pre‐Processing Time series of 8‐hr ozone

  21. Seasonal Pattern • k‐means cluster analysis employed to find ozone seasons – Euclidean and Manhattan distance functions • KNIME 2.10.3 (an open platform for data‐driven innovation) and Weka 3.7.11

  22. Seasonal Pattern

  23. Seasonal Pattern Sum of square within (SSW) Selecting proper number of k

  24. Seasonal Pattern

  25. Seasonal Pattern

  26. Selecting Ozone Seasons Cluster Season Months Low #1 Jan Feb Nov Dec ‐ #2 ‐ ‐ ‐ ‐ Moderate #3 Mar Apr May Oct High #4 Jun Jul Aug Sep

  27. Spatial Pattern • Agglomerative hierarchical cluster analysis – Ward’s method and average‐link distance function – k‐means cluster analysis repeated to cross‐check the results • KNIME 2.10.3 and OriginPro2015 were used

  28. Spatial Clusters Spatial clusters during low ozone season

  29. Spatial Clusters Spatial clusters during moderate ozone season

  30. Spatial Clusters Spatial clusters during high ozone season

  31. Linear Regression • The goal is to find linear fit in the form of: Log�� � � � � �� ��� � �SR ��� � �� ��� � � Log�� � � ��� � � �1� • Natural logarithm of ozone should be used to account for the multiplicative effect of variables • 1‐Step ahead ozone value is the key variable • Time lag factor is used to account for cumulative effect of meteorological variables • R 2 and RSME are used to evaluate the fit

  32. Linear Regression • The goal is to find linear fit in the form of: Log�� � � � � �� ��� � �SR ��� � �� ��� � � Log�� � � ��� � � �1� � � � � � R 2 Cluster RMSE #1 0.002797 0.359812 0.021030 0.793794 0.176548 0.827 7.361 #2 0.000206 0.241734 ‐ 0.000247 0.871374 0.356038 0.890 5.142 #3 0.002699 9.652480 0.027754 0.687160 0.311890 0.827 7.872

  33. Ozone Forecasting

  34. Ozone Forecasting

  35. Closing Remarks • Ozone seasons can be distinguished using simple clustering techniques • Because it is data‐driven , temporal clustering of ozone time series can account for seasonal variability in different locations (also can be compared/integrated with the expert’s knowledge)

  36. Closing Remarks • Spatial cluster analysis of ozone time series can be used to develop linear regression models with very high accuracy (in addition to recognizing local ozone patterns) • Spatial clustering facilitates more flexible forecasting (when a monitoring station is out of service)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend