Basics of Geographic Analysis in R Spatial Autocorrelation and - PowerPoint PPT Presentation

Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013

Outline 1. Introduction 2. Spatial Data and Basic Visualization in R 3. Spatial Autocorrelation 4. Spatial Weights 5. Spatial Regression

What is Spatial Autocorrelation? ◮ Spatial autocorrelation measures the degree to which a phenomenon of interest is correlated to itself in space. ◮ Tests of spatial autocorrelation examine whether the observed value of a variable at one location is independent of values of that variable at neighboring locations. ◮ Positive spatial autocorrelation indicates that similar values appear close to each other, or cluster, in space ◮ Negative spatial autocorrelation indicates that neighboring values are dissimilar or, equivalenty, that similar values are dispersed. ◮ Null spatial autocorrelation indicates that the spatial pattern is random.

What is Spatial Autocorrelation? Negative autocorrelation Positive autocorrelation No autocorrelation

Global autocorrelation: Moran’s I ◮ The Moran’s I coefficient calculates the ratio between the product of the variable of interest and its spatial lag, with the product of the variable of interest, adjusted for the spatial weights used. � n � n j =1 w ij ( y i − ¯ y )( y j − ¯ y ) n i =1 I = � n � n � n y ) 2 j =1 w ij i =1 ( y i − ¯ i =1 ◮ where y i is the value of a variable for the i th observation, ¯ y is the sample mean and w ij is the spatial weight of the connection between i and j . ◮ Values range from –1 (perfect dispersion) to +1 (perfect correlation). A zero value indicates a random spatial pattern. − 1 ◮ Under the null hypothesis of no autocorrelation, E [ I ] = n − 1

Global autocorrelation: Moran’s I ◮ Calculating the variance of Moran’s I is a little more involved: n s 1 − s 2 s 3 Var ( I ) = ( n − 1)( n − 2)( n − 3)( � � j w ij ) 2 i � 1 s 1 =( n 2 − 3 n + 3) ( w ij + w ji ) 2 � � � 2 i j � � w ji ) 2 � � � � � w ij ) 2 − n ( w ij + + 3( i j j i j s 2 = n − 1 � x ) 4 i ( y i − ¯ ( n − 1 � x ) 2 ) 2 i ( y i − ¯ s 3 =1 � 1 ( w ij + w ji ) 2 − 2 n ( w ij + w ji ) 2 � � � � � 2 2 i j i j � 2 � � � + 6 w ij i j

Global autocorrelation: Geary’s C ◮ The Geary’s C uses the sum of squared differences between pairs of data values as its measure of covariation. j w ij ( y i − y j ) 2 ( n − 1) � � i C = y ) 2 2( � � j w ij ) � i ( y i − ¯ i ◮ where y i is the value of a variable for the i th observation, ¯ y is the sample mean and w ij is the spatial weight of the connection between i and j . ◮ Values range from 0 (perfect correlation) to 2 (perfect dispersion). A value of 1 indicates a random spatial pattern.

Global autocorrelation: Join Counts ◮ When the variable of interest is categorical , a join count analysis can be used to assess the degree of clustering or dispersion. ◮ A binary variable is mapped in two colors (Black & White), such that a join, or edge, is classified as either WW (0-0), BB (1-1), or BW (1-0). ◮ Join count statistics can show ◮ positive spatial autocorrelation (clustering) if the number of BW joins is significantly lower than what we would expect by chance, ◮ negative spatial autocorrelation (dispersion) if the number of BW joins is significantly higher than what we would expect by chance, ◮ null spatial autocorrelation (random pattern) if the number of BW joins is approximately the same as what we would expect by chance.

Global autocorrelation: Join Counts ◮ By the naive definition of probability, if we have n B Black units and n W = n − n B White units, the respective probabilities of observing the two types of units are: P B = n B P W = n − n B = 1 − P B n n ◮ The probabilities of BB and WW in two adjacent cells are P BB = P B P B = P 2 P WW = (1 − P B )(1 − P B ) = (1 − P B ) 2 B ◮ The probability of BW in two adjacent cells is P BW = P B (1 − P B ) + (1 − P B ) P B = 2 P B (1 − P B )

Global autocorrelation: Join Counts ◮ The expected counts of each type of join are: E [ BB ] =1 E [ WW ] = 1 � � � � w ij P 2 w ij (1 − P B ) 2 B 2 2 i j i j E [ BW ] =1 � � w ij 2 P B (1 − P B ) 2 i j ◮ Where 1 � � j w ij is the total number of joins (of any type) 2 i on a map, assuming a binary connectivity matrix. ◮ The observed counts are: BB =1 WW = 1 � � � � w ij (1 − y i )(1 − y j ) w ij y i y j 2 2 i j i j BW =1 � � w ij ( y i − y j ) 2 2 i j ◮ where y i = 1 if unit i is Black and y i = 0 if White.

Global autocorrelation: Join Counts ◮ The variance of BW is calculated as σ 2 BW = E [ BW 2 ] − E [ BW ] 2 � 2 s 2 n B ( n − n B ) + ( s 3 − s 1 ) n B ( n − n B ) =1 4 n ( n − 1) n ( n − 1) � + 4( s 2 1 + s 2 − s 3 ) n B ( n B − 1)( n − n B )( n − n B − 1) − E [ BW ] 2 n ( n − 1)( n − 2)( n − 3) � � s 1 = w ij i j s 2 =1 � � ( w ij − w ji ) 2 2 i j � � � w ji ) 2 s 3 = ( w ij + i j j

Global autocorrelation: Join Counts ◮ A test statistic for the BW join count is Z ( BW ) = BW − E [ BW ] � σ 2 BW ◮ The join count statistic is assumed to be asymptotically normally distributed under the null hypothesis of no spatial autocorrelation. ◮ The test of significance is then provided by evaluating the BW statistic as a standard deviate (Cliff and Ord, 1981).

Local autocorrelation ◮ Global tests for spatial autocorrelation are calculated from local relationships between observed values at spatial units and their neighbors. ◮ It is possible to break these measures down into their components, thus constructing local tests for spatial autocorrelation. ◮ These tests can be used to detect ◮ Clusters, or units with similar neighbors ◮ Enclaves, or units with dissimilar neighbors

Local autocorrelation Below is a scatterplot of county vote for Obama and its spatial lag (average vote received in neighboring counties). The Moran’s I coefficient is drawn as the slope of the linear relationship between the two. The plot is partitioned into four quadrants: low-low, low-high, high-low and high-high. Moran Scatterplot 100 Percent for Obama (Spatial Lag) 80 Northampton Person 60 Warren Hertford Durham Edgecombe Orange 40 Yadkin Mecklenburg Watauga 20 0 0 20 40 60 80 100 Percent for Obama

Local autocorrelation: Local Moran’s I ◮ A local Moran’s I coefficient for unit i can be constructed as one of the n components which comprise the global test: y ) � n ( y i − ¯ j =1 w ij ( y j − ¯ y ) I i = � n i =1 ( y i − ¯ y ) 2 n ◮ As with global statistics, we assume that the global mean ¯ y is an adequate representation of the variable of interest. ◮ As before, local statistics can be tested for divergence from expected values, under assumptions of normality.

Local autocorrelation: Local Moran’s I Below is a plot of Local Moran | z | -scores for the 2008 Presidential Elections. Higher absolute values of z scores (red) indicate the presence of “enclaves”, where the percentage of the vote received by Obama was significantly different from that in neighboring counties. Local Moran's I (|z| scores) 7 6 5 4 3 2 1 0

Words of Caution 1. By themselves, spatial autocorrelation tests do not always produce useful insights into the DGP.

Words of Caution 1. By themselves, spatial autocorrelation tests do not always produce useful insights into the DGP. 2. These tests are also highly sensitive to one’s choice of spatial weights. Where the weights do not reflect the “true” structure of spatial interaction, estimated autocorrelation (or lack thereof) may actually stem from misspecification.

Words of Caution Below is a correlogram of Moran’s I coefficients for Polity IV country democracy scores in 2008. The x -axis represents distances between country capitals, in kilometers. Here, democracy is significantly ( p ≤ . 05) spatially autocorrelated only at distances of 3,000 km and below. So, autocorrelation estimates will depend highly on choice of lag distance. Moran's I Coefficient -0.2 -0.6 -1.0 0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000 1.0 p ≤ 0.05 0.8 0.6 p-value 0.4 0.2 0.0 0 2000 6000 10000 14000 18000 22000 26000 30000 34000 38000

Words of Caution 1. By themselves, spatial autocorrelation tests do not always produce useful insights into the DGP. 2. These tests are also highly sensitive to one’s choice of spatial weights. Where the weights do not reflect the “true” structure of spatial interaction, estimated autocorrelation (or lack thereof) may actually stem from misspecification. 3. As originally designed, spatial autocorrelation tests assumed there are no neighborless units in the study area.

Outline 1. Introduction 2. Spatial Data and Basic Visualization in R 3. Spatial Autocorrelation 4. Spatial Weights 5. Spatial Regression

Choosing your neighbors? ◮ Most spatial weights matrices W are based on some version of a connectivity matrix C . ◮ C is an n × n binary matrix, where i = { 1 , 2 , . . . , n } and j = { 1 , 2 , . . . , n } are the units in the system (for example, countries in the international system). ◮ Entry c ij = 1 if two units i � = j are considered connected, and c ij = 0 if they are not. ◮ The tricky part is how the word “connected” is defined.

Basics of Geographic Analysis in R Spatial Autocorrelation and - PowerPoint PPT Presentation

Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization in R 3. Spatial Autocorrelation 4.

Geographic Centroid Routing for Vehicular Networks Effects of GPS Error on Geographic Routing

Criteria of Geographic Relevance Stefano De Sabbata September 15th, 2010

Coordinating Council Census Bureau Geographic Partnership Programs Introduction Greg Hanks

Geographic Strategies for Teaching Young Learners South Carolina Geographic Alliance Monti

Geographic Solutions Network and Hosting Changes Tim Himes Geographic Solutions EFM IT

Standardization of Geographic Names in Humanitarian Information Management (Towards a

World Regional World Regional Geography Geography David Sallee David Sallee Lesson 4 Lesson

Geographic Regions Review Update GNSO Council 10 March 2012

The Geographic Regions of the US and NC The Geography and 4 Regions of NC NC Geographic

Path Vector Face Routing: Geographic Routing with Local Face Information Ben Leong, Sayan Mitra,

MODULE 6 PLUMBING AND ELECTRICAL BASICS OF MODERN LABORATORY DESIGN 6 6 PLUMBING AND ELECTRICAL

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

QlikView Designer 11.2 Set Analysis Agis Kalogiannis 2014 SET ANALYSIS BASICS Customer has

Basics of Geographic Analysis in R Spatial Regression Yuri M. Zhukov GOV 2525: Political

SET-PARTITION TABLEAUX Tom Halverson Macalester College FPSAC 2019 Ljubljana July 2, 2019

More on games (Ch. 5.4-5.7) Announcements Midterm will be on gradescope (got an email from

Telescopers for Rational and Algebraic Functions via Residues Shaoshi Chen Department of

VERIFICATION & TESTING STRATEGIES FOR COMPLIANCE WITH ISO 13485:2016, IEC 62304 / 60601-1 /

Computing autotopism groups of partial Latin rectangles: a pilot study Ra ul M. Falc on (U.

Geographic Data Science - Lecture V Space, formally Dani Arribas-Bel Today The need to

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE C OMBINATORIAL S EARCH introduction

Proving Correctness of a KRK Chess Endgame Strategy by SAT-based Constraint Solving Marko

Basics of Geographic Analysis in R Spatial Autocorrelation and - PowerPoint PPT Presentation

Basics of Geographic Analysis in R Spatial Autocorrelation and Spatial Weights Yuri M. Zhukov GOV 2525: Political Geography February 25, 2013 Outline 1. Introduction 2. Spatial Data and Basic Visualization in R 3. Spatial Autocorrelation 4.

Geographic Centroid Routing for Vehicular Networks Effects of GPS Error on Geographic Routing

Criteria of Geographic Relevance Stefano De Sabbata September 15th, 2010

Coordinating Council Census Bureau Geographic Partnership Programs Introduction Greg Hanks

Geographic Strategies for Teaching Young Learners South Carolina Geographic Alliance Monti

Geographic Solutions Network and Hosting Changes Tim Himes Geographic Solutions EFM IT

Standardization of Geographic Names in Humanitarian Information Management (Towards a

World Regional World Regional Geography Geography David Sallee David Sallee Lesson 4 Lesson

Geographic Regions Review Update GNSO Council 10 March 2012

The Geographic Regions of the US and NC The Geography and 4 Regions of NC NC Geographic

Path Vector Face Routing: Geographic Routing with Local Face Information Ben Leong, Sayan Mitra,

MODULE 6 PLUMBING AND ELECTRICAL BASICS OF MODERN LABORATORY DESIGN 6 6 PLUMBING AND ELECTRICAL

Probability Basics Probabilistic Inference Martin Emms October 1, 2020 Probability Basics

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

QlikView Designer 11.2 Set Analysis Agis Kalogiannis 2014 SET ANALYSIS BASICS Customer has

Basics of Geographic Analysis in R Spatial Regression Yuri M. Zhukov GOV 2525: Political

SET-PARTITION TABLEAUX Tom Halverson Macalester College FPSAC 2019 Ljubljana July 2, 2019

More on games (Ch. 5.4-5.7) Announcements Midterm will be on gradescope (got an email from

Telescopers for Rational and Algebraic Functions via Residues Shaoshi Chen Department of

VERIFICATION &amp; TESTING STRATEGIES FOR COMPLIANCE WITH ISO 13485:2016, IEC 62304 / 60601-1 /

Computing autotopism groups of partial Latin rectangles: a pilot study Ra ul M. Falc on (U.

Geographic Data Science - Lecture V Space, formally Dani Arribas-Bel Today The need to

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE C OMBINATORIAL S EARCH introduction

Proving Correctness of a KRK Chess Endgame Strategy by SAT-based Constraint Solving Marko

VERIFICATION & TESTING STRATEGIES FOR COMPLIANCE WITH ISO 13485:2016, IEC 62304 / 60601-1 /