Introduction to Geostatistics Abhi Datta 1 , Sudipto Banerjee 2 and - PowerPoint PPT Presentation

Introduction to Geostatistics Abhi Datta 1 , Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. 2 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles. 3 Departments of Forestry and Geography, Michigan State University, East Lansing, Michigan.

• Course materials available at https://abhirupdatta.github.io/spatstatJSM2017/

What is spatial data? • Any data with some geographical information 2

What is spatial data? • Any data with some geographical information • Common sources of spatial data: climatology, forestry, ecology, environmental health, disease epidemiology, real estate marketing etc • have many important predictors and response variables • are often presented as maps 2

What is spatial data? • Any data with some geographical information • Common sources of spatial data: climatology, forestry, ecology, environmental health, disease epidemiology, real estate marketing etc • have many important predictors and response variables • are often presented as maps • Other examples where spatial need not refer to space on earth: • Neuroimaging (data for each voxel in the brain) • Genetics (position along a chromosome) 2

Point-referenced spatial data • Each observation is associated with a location (point) • Data represents a sample from a continuous spatial domain • Also referred to as geocoded or geostatistical data 2500 ● 2000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● Northing (km) ● ● ● ● ● ● ● ● ● 1500 ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● 40 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 500 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 500 1000 1500 2000 2500 Easting (km) Figure: Pollutant levels in Europe in March, 2009 3

Point level modeling • Point-level modeling refers to modeling of point-referenced data collected at locations referenced by coordinates (e.g., lat-long, Easting-Northing). • Data from a spatial process { Y ( s ) : s ∈ D } , D is a subset in Euclidean space. • Example: Y ( s ) is a pollutant level at site s • Conceptually: Pollutant level exists at all possible sites • Practically: Data will be a partial realization of a spatial process – observed at { s 1 , . . . , s n } • Statistical objectives: Inference about the process Y ( s ); predict at new locations. • Remarkable: Can learn about entire Y ( s ) surface. The key: Structured dependence 4

Exploratory data analysis (EDA): Plotting the data • A typical setup: Data observed at n locations { s 1 , . . . , s n } • At each s i we observe the response y ( s i ) and a p × 1 vector of covariates x ( s i ) ′ • Surface plots of the data often helps to understand spatial patterns y ( s ) x ( s ) Figure: Response and covariate surface plots for Dataset 1 5

What’s so special about spatial? • Linear regression model: y ( s i ) = x ( s i ) ′ β + ǫ ( s i ) • ǫ ( s i ) are iid N (0 , τ 2 ) errors • y = ( y ( s 1 ) , y ( s 2 ) , . . . , y ( s n )) ′ ; X = ( x ( s 1 ) ′ , x ( s 2 ) ′ , . . . , x ( s n ) ′ ) ′ • Inference: ˆ β = ( X ′ X ) − 1 X ′ Y ∼ N ( β, τ 2 ( X ′ X ) − 1 ) • Prediction at new location s 0 : � y ( s 0 ) = x ( s 0 ) ′ ˆ β • Although the data is spatial, this is an ordinary linear regression model 6

Residual plots • Surface plots of the residuals ( y ( s ) − � y ( s )) help to identify any spatial patterns left unexplained by the covariates Figure: Residual plot for Dataset 1 after linear regression on x ( s ) 7

Residual plots • Surface plots of the residuals ( y ( s ) − � y ( s )) help to identify any spatial patterns left unexplained by the covariates Figure: Residual plot for Dataset 1 after linear regression on x ( s ) • No evident spatial pattern in plot of the residuals • The covariate x ( s ) seem to explain all spatial variation in y ( s ) • Does a non-spatial regression model always suffice? 7

Western Experimental Forestry (WEF) data • Data consist of a census of all trees in a 10 ha. stand in Oregon • Response of interest: Diameter at breast height (DBH) • Covariate: Tree species (Categorical variable) DBH Species Residuals 8

Western Experimental Forestry (WEF) data • Data consist of a census of all trees in a 10 ha. stand in Oregon • Response of interest: Diameter at breast height (DBH) • Covariate: Tree species (Categorical variable) DBH Species Residuals • Local spatial patterns in the residual plot • Simple regression on species seems to be not sufficient 8

More EDA • Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern ? 9

More EDA • Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern ? First law of geography ”Everything is related to everything else, but near things are more related than distant things.” – Waldo Tobler 9

More EDA • Besides eyeballing residual surfaces, how to do more formal EDA to identify spatial pattern ? First law of geography ”Everything is related to everything else, but near things are more related than distant things.” – Waldo Tobler • In general ( Y ( s + h ) − Y ( s )) 2 roughly increasing with || h || will imply a spatial correlation • Can this be formalized to identify spatial pattern? 9

Empirical semivariogram • Binning: Make intervals I 1 = (0 , m 1 ), I 2 = ( m 1 , m 2 ), and so forth, up to I K = ( m K − 1 , m K ). Representing each interval by its midpoint t k , we define: N ( t k ) = { ( s i , s j ) : � s i − s j � ∈ I k } , k = 1 , . . . , K . • Empirical semivariogram: � 1 ( Y ( s i ) − Y ( s j )) 2 γ ( t k ) = 2 | N ( t k ) | s i , s j ∈ N ( t k ) • For spatial data, the γ ( t k ) is expected to roughly increase with t k • A flat semivariogram would suggest little spatial variation 10

Empirical variogram: Data 1 y residuals • Residuals display little spatial variation 11

Empirical variograms: WEF data • Regression model: DBH ∼ Species DBH Residuals • Variogram of the residuals confirm unexplained spatial variation 12

Modeling with the locations • When purely covariate based models does not suffice, one needs to leverage the information from locations • General model using the locations: y ( s ) = x ( s ) ′ β + w ( s ) + ǫ ( s ) for all s ∈ D • How to choose the function w ( · )? • Since we want to predict at any location over the entire domain D , this choice will amount to choosing a surface w ( s ) • How to do this ? 13

Gaussian Processes (GPs) • One popular approach to model w(s) is via Gaussian Processes (GP) • The collection of random variables { w ( s ) | s ∈ D } is a GP if • it is a valid stochastic process • all finite dimensional densities { w ( s 1 ) , . . . , w ( s n ) } follow multivariate Gaussian distribution • A GP is completely characterized by a mean function m ( s ) and a covariance function C ( · , · ) • Advantage: Likelihood based inference. w = ( w ( s 1 ) , . . . , w ( s n )) ′ ∼ N ( m , C ) where m = ( m ( s 1 ) , . . . , m ( s n )) ′ and C = C ( s i , s j ) 14

Valid covariance functions and isotropy • C ( · , · ) needs to be valid. For all n and all { s 1 , s 2 , ..., s n } , the resulting covariance matrix C ( s i , s j ) for ( w ( s 1 ) , w ( s 2 ) , ..., w ( s n )) must be positive definite • So, C ( · , · ) needs to be a positive definite function • Simplifying assumptions: • Stationarity: C ( s 1 , s 2 ) only depends on h = s 1 − s 2 (and is denoted by C ( h )) • Isotropic: C ( h ) = C ( || h || ) • Anisotropic: Stationary but not isotropic • Isotropic models are popular because of their simplicity, interpretability, and because a number of relatively simple parametric forms are available as candidates for C . 15

Some common isotropic covariance functions Model Covariance function, C ( t ) = C ( || h || )   0 if t ≥ 1 /φ   σ 2 � 2 ( φ t ) 3 � 1 − 3 2 φ t + 1 Spherical C ( t ) = if 0 < t ≤ 1 /φ    τ 2 + σ 2 otherwise � σ 2 exp( − φ t ) if t > 0 Exponential C ( t ) = τ 2 + σ 2 otherwise � σ 2 exp( −| φ t | p ) Powered if t > 0 C ( t ) = τ 2 + σ 2 exponential otherwise � σ 2 (1 + φ t ) exp( − φ t ) Mat´ ern if t > 0 C ( t ) = τ 2 + σ 2 at ν = 3 / 2 otherwise 16

Introduction to Geostatistics Abhi Datta 1 , Sudipto Banerjee 2 and - PowerPoint PPT Presentation

Introduction to Geostatistics Abhi Datta 1 , Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland. 2 Department of Biostatistics,

Geostatistics for Gaussian processes Hans Wackernagel Geostatistics group MINES ParisTech

Model-based Geostatistics Peter J Diggle Lancaster University and Johns Hopkins University

Thoughts on Presentations Related to Geostatistics Clayton V. Deutsc h Professor and Director of

GeoNeurale of Munich and Geovariances at the UTG - Augsburg, Germany Geostatistics: the

Geostatistics Portal the platform for census data geovisualization Mirosaw Migacz Chief GIS

MODEL BASED GEOSTATISTICS (Course Slides) Presented by: Paulo Justiniano Ribeiro Jr 1

Can the SPDE approach replace traditional Geostatistics for industrial applications? N. Desassis,

Gaussian Process Lei Tang Arizona State University Jul. 31th, 2007 Lei Tang (ASU) Gaussian

PRELIMINARY RESULTS ON 3D GEOLOGIC MAP OF SLOVAKIA A Case Study on Using the Moving

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Augmenting Polygons with Matchings Alexander Pilz, Jonathan Rollin, Lena Schlipf, Andr e

From Fuzzification and Resulting Formalism: Idea K -Vectors Towards K -Covectors Intervalization

Steins Method and Stochastic Geometry Giovanni Peccati (Luxembourg University) Firenze 16

From Cluster Algebras to Quiver Grassmannians Dylan Rupel Michigan State University April 26,

More on kernels Marcel Lthi Graphics and Vision Research Group Department of Mathematics and

Course on Inverse Problems Albert Tarantola Lesson VIII: Monte Carlo Methods Monte Carlo

sr t rtt

Spherical Designs and Determinantal Point Processes Masatake HIRAO ( ) (Aichi