Understanding Climate Change: A Data Driven Approach Vipin Kumar - - PowerPoint PPT Presentation

understanding climate change a data driven approach
SMART_READER_LITE
LIVE PREVIEW

Understanding Climate Change: A Data Driven Approach Vipin Kumar - - PowerPoint PPT Presentation

NSF Expeditions in Computing Understanding Climate Change: A Data Driven Approach Vipin Kumar University of Minnesota kumar@cs.umn.edu http://climatechange.cs.umn.edu Expeditions Team Vipin Kumar, UM Auroop Ganguly, NEU Nagiza


slide-1
SLIDE 1

NSF Expeditions in Computing

http://climatechange.cs.umn.edu

Understanding Climate Change: A Data Driven Approach

Vipin Kumar

University of Minnesota kumar@cs.umn.edu

slide-2
SLIDE 2

March 4, 2014 Slide 2

Expeditions Team

Vipin Kumar, UM Auroop Ganguly, NEU Nagiza Samatova, NCSU Arindam Banerjee, UM Fred Semazzi, NCSU Joe Knight, UM Shashi Shekhar, UM Peter Snyder, UM Jon Foley, UM Alok Choudhary, NW Ankit Agrawal, NW Abdollah Homiafar Michael Steinbach Singdhansu Chatterjee Karsten Steinhaeuser Stefan Liess Shyam Boriah NCA&T UM UM UM UM UM

slide-3
SLIDE 3

March 4, 2014 Slide 3

Understanding Climate Change - Motivation

slide-4
SLIDE 4

March 4, 2014 Slide 4

Understanding Climate Change – Physics-Based Approach

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Parameterization and non-linearity

  • f differential equations are sources for uncertainty!

Cell Clouds Land Ocean

Figure Courtesy: NCAR

slide-5
SLIDE 5

March 4, 2014 Slide 5

Understanding Climate Change - Physics Based Approach

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Cell Clouds Land Ocean

Figure Courtesy: NCAR Figure Courtesy: ORNL

slide-6
SLIDE 6

March 4, 2014 Slide 6

Understanding Climate Change - Physics Based Approach

Projection of temperature increase under different Special Report on Emissions Scenarios (SRES) by 24 different GCM configurations from 16 research centers used in the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report.

Figure Courtesy: ORNL

slide-7
SLIDE 7

March 4, 2014 Slide 7

Physics based models are essential but insufficient

“The sad truth of climate science is that the most crucial information is the least reliable” (Nature, 2010)

– Relatively reliable predictions at global scale for ancillary

variables such as temperature

– Least reliable predictions for variables that are crucial for

impact assessment such as regional precipitation

Regional hydrology exhibits large variations among major IPCC model projections

Disagreement between IPCC models

Low uncertainty High uncertainty Out of scope Temperature Hurricanes Fires Pressure Extremes Malaria outbreaks Large-scale wind Precipitation Landslides Physics based models

slide-8
SLIDE 8

March 4, 2014 Slide 8

Data-Driven Knowledge Discovery in Climate Science

Transformation from Data-Poor to Data-Rich

  • Sensor Observations
  • Reanalysis Data
  • Model Simulations

A new and transformative data-driven approach that:

  • Makes use of wealth of observational and simulation data
  • Advances understanding of climate processes
  • Informs climate change impacts and adaptation

“Climate change research is now ‘big science,’ comparable in its magnitude, complexity, and societal importance to human genomics and bioinformatics.” (Nature Climate Change, Oct 2012)

slide-9
SLIDE 9

March 4, 2014 Slide 9

Need for data driven analysis

Low uncertainty High uncertainty Out of scope Temperature Global hurricanes Global fires Pressure Extremes Malaria outbreaks Large-scale wind Precipitation Landslides Global fires Atlantic hurricanes Global sea surface temperatures

slide-10
SLIDE 10

March 4, 2014 Slide 10

Need for data driven analysis

Low uncertainty High uncertainty Out of scope Temperature Global hurricanes Global fires Pressure Extremes Malaria outbreaks Large-scale wind Precipitation Landslides Global fires Atlantic hurricanes Global sea surface temperatures

fic

8 ° W 7 ° W 6 ° W 5 ° W 4 ° W 8 ° W 7 ° W 6 ° W 5 ° W 4 ° W 8 ° W 7 ° W 6 ° W 5 ° W 4 ° W 8 ° W 7 ° W 6 ° W 5 ° W 4 ° W 3 5 ° S 3 ° S 2 5 ° S 2 ° S 1 5 ° S 1 ° S 5 ° S ° 5 ° N 1 ° N 3 5 ° S 3 ° S 2 5 ° S 2 ° S 1 5 ° S 1 ° S 5 ° S ° 5 ° N 1 ° N

ñ – – ñ ° ° ° ° ° ° 5° × °

6 12 18 24

Average June-October Atlantic Tropical Cyclones (1979 - 2010) 14.4 11.33 8

El Niño Neutral La Niña

Correlation with fires in Amazon Chen et al., Science, 2011 SST Anomaly Time Series in the ENSO region

slide-11
SLIDE 11

March 4, 2014 Slide 11

Challenges in data driven analysis

  • Spatio-temporal auto- and cross-

correlation

  • Noisy, heterogeneous, and

uncertain

  • Evolutionary processes
  • Multiple spatio-temporal scales
  • Unknown, non-linear, and long-

range dependency structure

  • Variability
  • Class imbalance
  • Multivariate non-stationary
  • Large unlabeled datasets
  • Significance testing

Faghmous and Kumar (2013)

slide-12
SLIDE 12

March 4, 2014 Slide 12

Guiding Theme

The discovery and characterization of patterns and dependencies have emerged as the primary research tasks because they…

  • 1. Provide an empirical understanding of physical processes…
  • finding pressure dipole between Tahiti and Darwin led to the understanding of

modulation of the Walker Circulation

  • 2. Allow for prediction of unknown quantities…
  • where observations are sparse
  • for statistical downscaling
  • where physical models are inadequate (e.g., predicting the number of hurricanes

using a large number of covariates)

  • 3. Enable long-range projection of highly stochastic processes…
  • deriving climate extremes or hurricanes from low-resolution global model

simulations

slide-13
SLIDE 13

March 4, 2014 Slide 13

Project vision and scope

Process Understanding Extreme Events

  • Heat Waves
  • Rainfall Extremes
  • Droughts
  • Hurricanes

Model Evaluation Downscaling

  • Statistical
  • Dynamical

Ocean-Atm.-Land Interactions Change Detection

  • Abrupt vs. Gradual
  • Point vs. Regions/Intervals
  • Change in Extremes

Spatio-Temporal Classification Sparse/High-Dim. Methods Causal Relationships Networks/Graphs HPC Computational Innovations Understanding Climate Change

Transformative Computer Science Research Advancing Climate Change Science

slide-14
SLIDE 14

March 4, 2014 Slide 14

Pattern Mining: Ocean Eddies Monitoring

  • Scalable spatio-temporal pattern

mining algorithms for noisy and continuous data

  • Novel multiple object tracking for

uncertain features

  • Detect more accurate features

and tracks for improved ocean dynamics monitoring

  • Open source data base of 20+

years of eddies and eddy tracks available for scientific applications

Faghmous et al. AAAI (2012a) Faghmous et al. CIDU (2012b) Best student paper award Faghmous et al. AAAI (2013) NSF Nordic Research Opportunity Grant to conduct research at the Bjerknes Centre for Climate Research in Norway

slide-15
SLIDE 15

March 4, 2014 Slide 15

Network analysis: Climate teleconnections

Kawale et al. SDM(2011a) Kawaleet al. CIDU (2011b) Best student paper award Kawale et al. ACM SIGKDD (2012) Steinhaeuser et al. Climate Dynamics (2012). SC’11: Exploration in Science through Computation Award Grace Hopper ‘12: Best Poster Award (Winner of the ACM Student Research Competition)

  • Scalable method for discovering

anti-correlated graph regions

  • Novel dynamic graph clustering

for dense directed graphs

  • Significance testing for spatio-

temporal patterns

  • Discovered previously unknown

climate teleconnection

  • Analyzed climate network

properties to better understand global climate dynamics

  • Method used to compare climate

models

Climate Network

slide-16
SLIDE 16

March 4, 2014 Slide 16

Fu et al. UAI(2013) Subbian et al. SDM(2013) Best Application Paper Award Hsieh et al. NIPS(2012) Wang et al. ICML(2012) Chatterjee et al. SDM(2012) Best Student Paper Award Fu et al. SDM(2012)

  • Hierarchical sparse regression: rates
  • f convergence with low samples
  • Multi-task learning with spatial

smoothing

  • Primal decomposition based LP

solver for max-cut type problems (~10 million+ node graphs)

  • Regional land-climate predictions

from observations over oceans

  • Combining multiple GCM outputs

more accurately than state-of-art

  • Mega-drought detection, trends over

past 100-1000 years

RMSE Prediction RMSE from spatially smoothened Multi-model ensemble

  • Fig. RMSE vs. Model Complexity of OLS and Sparse

Regression Methods

Predictive Modeling: Regression, Ensembles, Inference

slide-17
SLIDE 17

March 4, 2014 Slide 17

Relationship mining: Seasonal hurricane activity

  • Contrast-based network mining for

discriminatory signatures

  • Novel dynamic graph clustering for

dense directed graphs

  • Statistically robust methodology for

automatic inference of modulating networks

  • Improved forecast skill for seasonal

hurricane activity

  • Discovered key factors and mechanisms

modulating NA hurricane variability

  • Discovered novel climate index with

much improved correlation with NA hurricane variability: 0.69 vs 0.49

High activity Low activity NSF News, DOE Research News, Science360 Sencan et al. IJCAI (2011) Pendse et al. SIAM SDM (2012) Chen et al. Data Mining & Knowledge Discovery (2012) Chen et al. SIAM SDM (2013) Chen et al. IJCAI (2013) Semazzi et al. in review at journal (2013)

slide-18
SLIDE 18

March 4, 2014 Slide 18

Extremes and uncertainty: Heat waves, heavy rainfall, …

Ghosh et al. Nature Climate Change (2012) Parish et al. Computers & Geosciences (2012) Kodra et al. Environmental Research Letters (2012) Ganguly et al. Climate Extremes & UQ: Book Ch. (2013) Kodra et al. in revision at journal (2013) Kumar et al. in review at journal (2013)

  • Extreme value theory in space-time and

dependence of extremes on covariates

  • Mutual information and copula-methods for

space-time extremes dependence

  • Uncertainty quantification with Bayesian and

resampling techniques

  • Physics-guided data mining and quantification of

uncertainty

  • Spatiotemporal trends in heat waves, cold snaps,

and heavy rain with climate change

  • Climate model evaluation and physics-guided

uncertainty quantification

  • Covariate-based improvement of extremes

projections under climate change

  • Translation to adaptation and stakeholder

relevant metrics

Press Release 11- ­ 266

JOURNAL PIECE REVEALS NEW DATA- ­DRIVEN METHODS FOR UNDERSTANDING CLIMATE CHANGE

Geographical variability

  • f

rainfall extremes in India enhances interpretation

  • f

climate change data ­ ­ ­ ­ ­ ­

slide-19
SLIDE 19

March 4, 2014 Slide 19

High Performance Tools and Methods

Jin et al. EuroMPI (2011) Patwary et al. SC (2012) Hentrix et al. HPC (2012) Kumar et al. IPDPS (2011) Rangel et al. in review (2013) Jin et al. in review (2013)

  • Created a library of common data mining /

machine learning kernels for clustering, classification, PCA, etc.

  • Many algorithms have shown speedups of

two to three orders of magnitude.

  • Developed technologies for compressing

and querying huge datasets, and for performing similarity searches with a more than 10-fold speed-up

  • Devised an image indexing technique

based on a new Locality Sensitive Hashing (LSH) scheme.

  • Developing HPC solutions for our

collaborators, including bootstrapping methods for extreme value prediction and Markov Random Field based abrupt change detection

Improving I/O for the Global Cloud Resolving Model

slide-20
SLIDE 20

March 4, 2014 Slide 20

Case Study: Data-Driven Discovery of Dipoles

Dipoles represent a class of teleconnections characterized by anomalies of opposite polarity at two locations at the same time.

slide-21
SLIDE 21

March 4, 2014 Slide 21

Importance of Dipoles

Correlation of land temperature anomalies with NAO Correlation of land temperature anomalies with SOI

SOI strongly influences global climate variability. NAO influences sea level pressure (SLP) and temperature over the Northern Hemisphere.

Crucial for understanding the climate system and are known to cause temperature and precipitation anomalies throughout the globe.

slide-22
SLIDE 22

March 4, 2014 Slide 22

List of Major Climate Oscillations

AO: EOF Analysis of 20N-90N Latitude AAO: EOF Analysis of 20S-90S Latitude

Discovered primarily by human

  • bservation or by EOF analysis.

van Loon & Rogers, 1978 Wallace & Gutzler, 1981 von Storch & Zwiers, 2002

slide-23
SLIDE 23

March 4, 2014 Slide 23

Motivation for Automatic Discovery of Dipoles

  • The known dipoles are defined

by static locations but the underlying phenomenon is dynamic

  • Manual discovery can miss

many dipoles

  • EOF and other types of

eigenvector analysis finds the strongest signals and the physical interpretation of those can be difficult.

23

Dynamic behavior of the high and low pressure fields corresponding to NOA climate index (Portis et al, 2001)

AO: EOF Analysis of 20N- 90N Latitude AAO: EOF Analysis of 20S- 90S Latitude

slide-24
SLIDE 24

March 4, 2014 Slide 24

Challenges in studying dipoles

  • The distribution of positive and

negative edges around the Earth is uneven as most of the highly positive edges come from nearby locations due to spatial autocorrelation. The area weighted correlation shows that the equator is dominant.

  • If we remove all edges <

5000km away the distribution is balanced.

  • The number of negative edges

around the globe is very high. So an algorithm focusing on negative edges will not scale.

Distribution of edges around the Earth

Distribution of edges > 5000km away Distribution of negative edges

Distribution of edges around the Earth with abs correlation > 0.5

Distribution of negative edges

Distribution of edges around the Earth having a distance > 5000km and abs correlation > 0.2

slide-25
SLIDE 25

March 4, 2014 Slide 25

Graph-Based Approach for Dipole Discovery

Nodes in the Graph correspond to grid points on the globe.

Discovered Dipoles

Steinbach et al., 2003

Tsonis et al., 2004, 2006 Donges et al., 2009a,b

Kawale et al., 2011 Edge weight corresponds to correlation between the two anomaly time series

Climate Network

slide-26
SLIDE 26

March 4, 2014 Slide 26

Benefits of Automatic Dipole Discovery

  • Detection of Global Dipole

Structure

  • Most known dipoles discovered
  • New dipoles may represent

previously unknown phenomenon.

  • Enables analysis of relationships

between different dipoles

  • Location based definition

possible for some known indices that are defined using EOF analysis.

  • Dynamic versions are often

better than static

  • Dipole structure provides an

alternate method to analyze GCM performance CIDU’11: Best Student Paper Award SC’11: Explorations in Science through Computation Award Grace Hopper’12: Best Poster Award (Winner of the ACM Student Research Competition)

Kawale et al., 2011a,b, 2012

slide-27
SLIDE 27

Slide 27 March 4, 2014

Comparing Dipole Structure in Historical (Reanalysis) Data

NCEP 1979-2000 ERA-Interim 1979-2000 JRA-25 1979-2000 MERRA 1979-2000

slide-28
SLIDE 28

Slide 28 March 4, 2014

Static vs Dynamic NAO Index - Impact on land temperature

The dynamic index generates a stronger impact on land temperature anomalies as compared to the static index.

Figure to the right shows the aggregate area weighted correlation for networks computed for different 20 year periods during 1948-2008.

Area-weighted Score

slide-29
SLIDE 29

Slide 29 March 4, 2014

The dynamic index generates a stronger impact on land temperature anomalies as compared to the static index.

Figure to the right shows the aggregate area weighted correlation for networks computed for different 20 year periods during 1948-2008.

Area-weighted Score

Static vs Dynamic NAO Index - Impact on land temperature

slide-30
SLIDE 30

Slide 30 March 4, 2014

Location Based definition of AO

  • Mean Correlation between static and dynamic index: 0.84
  • Impact on land temperature anomalies comparatively same using static and dynamic index

Impact on Land temperature Anomalies using Static and Dynamic AO

Static AO: EOF Analysis

  • f 20N-90N Latitude

EOF-AO Dynamic Dipole -AO

Composite maps for timeseries from both approaches on hadley center SLP data (1979-2011).

slide-31
SLIDE 31

Slide 31 March 4, 2014

Location Based definition of AAO

  • Mean Correlation between Static and Dynamic index = 0.88
  • Impact on land temperature anomalies comparatively same using static and dynamic index

Impact on Land temperature Anomalies using Static and Dynamic AAO

Static AAO: EOF Analysis of 20S-90S Latitude

EOF-AAO Dynamic Dipole -AAO

Composite maps for timeseries from both approaches on hadley center SLP data (1979-2011).

slide-32
SLIDE 32

Slide 32 March 4, 2014

A New Dipole near Australia?

1 2 3

3 1

  • Comparison of dipoles by looking at

land temperature impact.

  • Significant difference between the

AAO impact and that due to dipoles 1,2,3 which are similar.

AAO AAO

slide-33
SLIDE 33

Slide 33 March 4, 2014 March 6, 2013

Composites of ASO dipole from Hadley center SLP data on Hadley center SLP at 95% confidence ASO AAO SOI WHOLE YEAR SOUTHERN WINTER (JJA)

slide-34
SLIDE 34

March 4, 2014 Slide 34

Composites of ASO dipole from Hadley center SLP data on GPCP precipitation data at 95% confidence ASO AAO SOI WHOLE YEAR SOUTHERN WINTER (JJA)

slide-35
SLIDE 35

Slide 35 March 4, 2014

Model Analysis : Dipole Structure in CCSM and GFDL

The dipole structure of the top 2 models from CMIP3 to CMIP5

GFDL (CMIP3) CCSM(CMIP3) CCSM (CMIP5) GFDL (CMIP5)

SOI present SOI absent SOI present SOI present

slide-36
SLIDE 36

Slide 36 March 4, 2014

saurabh’s plots

slide-37
SLIDE 37

Slide 37 March 4, 2014

slide-38
SLIDE 38

Slide 38 March 4, 2014

Surface Temperature Correlated with SOI

NCEP2 JRA CCSM3 MIROC-3.2medres GFDL-CM2.1 CCSM4 MIROC5 GFDL-CM3

Reanalysis CMIP3 CMIP5

slide-39
SLIDE 39

Slide 39 March 4, 2014

Precipitation Correlated with SOI

GPCP CCSM4 MIROC5 GFDL-CM3 CCSM3 MIROC-3.2medres GFDL-CM2.1

Observation CMIP3 CMIP5

slide-40
SLIDE 40

March 4, 2014 Slide 40

Conclusion

  • Global climate change is a defining societal challenge for our

generation

  • Data-guided discovery methods can play a major role in

answering some of these key questions

  • Significant advances in spatio-temporal data mining

methodologies are needed to harness these opportunities