Spatial Autoregressive Models Sudipto Banerjee Division of - PowerPoint PPT Presentation

Spatial Autoregressive Models Sudipto Banerjee Division of Biostatistics, University of Minnesota September 22, 2009 1 Areal Modelling

Areal unit data Maps of raw standard mortality ratios (SMR) of lung and esophagus cancer between 1991 and 1998 in Minnesota counties (a) (c) Lung Cancer Esophagus Cancer 0 - 0.5 0 - 0.5 0.5 - 0.9 0.5 - 0.9 0.9 - 1.1 0.9 - 1.1 1.1 - 1.5 1.1 - 1.5 1.5 - 3.3 1.5 - 3.3 2 Areal Modelling

Areal unit data 3 Areal Modelling

Areal unit data 4 Areal Modelling

Areal unit data Key Issues Is there spatial pattern? Spatial pattern implies that observations from units closer to each other are more similar than those recorded in units farther away. Do we want to smooth the data? Perhaps to adjust for low population sizes (or sample sizes) in certain units? How much do we want to smooth? Inference for new areal units? Is prediction meaningful here? If we modify the areal units to new units (e.g. from zip codes to county values), what can we say about the new counts we expect for the latter given those for the former? This is the Modifiable Areal Unit Problem (MAUP) or Misalignment. 5 Areal Modelling

Proximity matrices W , entries w ij , ( w ii = 0 ); choices for w ij : w ij = 1 if i, j share a common boundary (possibly a common vertex) w ij is an inverse distance between units w ij = 1 if distance between units is ≤ K w ij = 1 for m nearest neighbors. W need not be symmetric. W : standardize row i by w i + = � � j w ij (row stochastic but need not be symmetric). W elements often called “weights”; nicer interpretation? 6 Areal Modelling

Proximity matrices Note that proximity matrices are user-defined. We can define distance intervals, (0 , d 1 ] , ( d 1 , d 2 ] , and so on. First order neighbours: all units within distance d 1 . First order proximity matrix W (1) . Analogous to W , w (1) ij = 1 if i and j are first order neighbors; 0 otherwise. Second order neighbors: all units within distance d 2 , but separated by more than d 1 . Second order proximity matrix W (2) ; w (2) ij = 1 if i and j are second order neighbors; 0 otherwise And so on... 7 Areal Modelling

Proximity matrices There are analogues for areal data of the empirical correlation function and the variogram. Moran’s I : analogue of lagged autocorrelation n � � j w ij ( Y i − ¯ Y )( Y j − ¯ Y ) i I = ( � i � = j w ij )( � i ( Y i − ¯ Y ) 2 I is not supported on [ − 1 , 1] . Geary’s C : analogue of Durbin-Watson statistic ( n − 1) � � j w ij ( Y i − Y j ) 2 i C = � i � = j w ij ) � i ( Y i − ¯ Y ) 2 Both are asymptotically normal if Y i are i.i.d., the first with mean − 1 / ( n − 1) and the second with mean 1 . Significance testing using a Monte Carlo test, permutation invariance 8 Areal Modelling

Proximity matrices The areal correlogram is a useful tool to study spatial association with areal data. Working with I , we can replace w ij with w (1) ij taken from W (1) and compute → I (1) ij taken from W (2) and compute Next replace w ij with w (2) → I (2) , etc. Plot I ( r ) vs. r If there is spatial pattern, we expect I ( r ) to decline in r initially and then vary about 0. 9 Areal Modelling

Proximity matrices 10 Areal Modelling

Spatial smoothers P i w ij Y j To smooth Y i , replace with ˆ Y i = Note: K -nearest w i + neighbours (KNN) regression falls within this framework. More generally, (1 − α ) Y i + α ˆ Y i Linear (convex) combination, shrinkage Model-based smoothing, e.g., E ( Y i |{ Y j , j = 1 , 2 , ..., n } ) 11 Areal Modelling

Markov Random Fields First, consider Y = ( y 1 , y 2 , ..., y n ) and consider the set { p ( y i | y j , j � = i ) } We know p ( y 1 , y 2 , ...y n ) determines { p ( y i | y j , j � = i ) } (full conditional distributions) ??? Does { p ( y i | y j , j � = i ) } determine p ( y 1 , y 2 , ...y n ) ? If so, we call the joint distribution a Markov Random Field . In general we cannot write down an arbitrary set of conditionals and assert that they determine the joint distribution. Example: Y 1 | Y 2 ∼ N ( α 0 + α 1 Y 2 , σ 2 1 ) Y 2 | Y 1 ∼ N ( β 0 + β 1 Y 3 1 , σ 2 2 ) . The first equation implies that E [ Y 1 ] = α 0 + α 1 E [ Y 2 ] , i.e., E [ Y 1 ] is linear in E [ Y 2 ] . The second equation implies that E [ Y 2 ] = β 0 + β 1 E [ Y 3 1 ] , i.e. E [ Y 2 ] is linear in E [ Y 3 1 ] . Clearly this isn’t true in general. Hence no joint distribution. 12 Areal Modelling

Markov Random Fields Also p ( y 1 , . . . , y n ) may be improper even if all the full conditionals are proper. p ( y 1 , y 2 ) ∝ exp { ( y 1 − y 2 ) 2 } But p ( Y 2 | Y 1 ) ∝ N ( Y 2 ) and p ( Y 1 | Y 2 ) ∝ N ( Y 2 , 1) . Yet the joint distribution is improper. Compatibility: Brook’s Lemma. Let y 0 = ( y 10 , . . . , y n 0 ) be any fixed point in the support of p ( · ) . p ( y 1 , . . . , y n ) = p ( y 1 | y 2 , . . . , y n ) p ( y 2 | y 10 , y 3 , . . . , y n ) p ( y 10 | y 2 , . . . , y n ) p ( y 20 | y 10 , y 3 , . . . , y n ) . . . p ( y n | y 10 , . . . , y n − 1 , 0 ) p ( y n 0 | y 10 , . . . , y n − 1 , 0 ) p ( y 10 , . . . , y n 0 ) . If LHS is proper, the fact that it integrates to 1 determines the normalizing constant! 13 Areal Modelling

Local specifications Suppose we want: p ( y i | y j , j � = i ) = p ( y i | y j ∈ ∂ i ) When does the set { p ( y i | y j ∈ ∂ i ) } uniquely determine p ( y 1 , y 2 , ...y n ) ? To answer this question, we need the following important concepts: Clique: A clique is a set of cells such that each element is a neighbor of every other element. We use notation i ∼ j if i is a neighbor of j and j is a neighbor of i . Potential: A potential of order k is a function of k arguments that is exchangeable in these arguments. The arguments of the potential would be the values taken by variables associated with the cells for a clique of size k . 14 Areal Modelling

Local specifications For clique size say 2, i ∼ j means j ∼ i ( ⇔ ( y i − y j ) 2 ) For continuous data: Q ( y i , y j ) = y i y j For binary data: Q ( y i , y j ) = I ( y i = y j ) = y i y j = (1 − y i )(1 − y j ) Cliques of size 1 ⇔ independence Cliques of size 2 ⇔ pairwise difference form     �  − 1 ( y i − y j ) 2 I ( i ∼ j ) p ( y 1 , y 2 , ...y n ) ∝ exp 2 τ 2  i,j and therefore p ( y i | y j , j � = i ) = N ( � j ∈ ∂ i y i /m i , τ 2 /m i ) , where m i is the number of neighbors of i 15 Areal Modelling

Local specifications Gibbs distribution: p ( y 1 , . . . , y n ) is a Gibbs distribution if it is a function of the y i ’s only through potentials on cliques: � � φ ( k ) ( y α 1 , . . . , y α k ) } , p ( y 1 , . . . , y n ) ∝ exp {− γ k α ∈ M k where φ ( k ) is a potential of order k , M k is the set of all cliques of size k and is indexed by α , and γ > 0 is a scale parameter. Hammersley-Clifford Theorem: If we have a Markov Random Field (i.e., { p ( y i | y j ∈ ∂ i ) } uniquely determine p ( y 1 , y 2 , ...y n ) ), then the latter is a Gibbs distribution Geman and Geman (1984) result : If we have a joint Gibbs distribution, then we have a Markov Random Field 16 Areal Modelling

CAR models Conditionally Auto-Regressive (CAR) models Gaussian (autonormal) case   � b ij y j , τ 2  p ( y i | y j , j � = i ) = N i j Using Brook’s Lemma we can obtain � � − 1 2 y ′ D − 1 ( I − B ) y p ( y 1 , y 2 , ...y n ) ∝ exp where B = { b ij } and D is diagonal with D ii = τ 2 i . Suggests a multivariate normal distribution with µ y = 0 and Σ Y = ( I − B ) − 1 D D − 1 ( I − B ) symmetric requires b ij = b ji for all i, j τ 2 τ 2 i j 17 Areal Modelling

CAR models Returning to W , let b ij = w ij /w i + and τ 2 i = τ 2 /w i + , so p ( y 1 , y 2 , ...y n ) ∝ exp {− 1 2 τ 2 y ′ ( D w − W ) y } where D w is diagonal with ( D w ) ii = w i + and thus     �  − 1 w ij ( y i − y j ) 2 p ( y 1 , y 2 , ...y n ) ∝ exp 2 τ 2  i � = j Caution: ( D w − W ) 1 = 0 . Intrinsic autoregressive (IAR) model; improper, so requires a constraint (e.g., � i y i = 0 ) Not a valid data model, but only as a random effects model! 18 Areal Modelling

CAR models With τ 2 unknown, what should be the power of τ 2 ? Answer: p ( y 1 , y 2 , ...y n ) ∝ ( 1 τ 2 ) ( n − G ) / 2 exp {− 1 2 τ 2 y ′ ( D w − W ) y } , where G is the number of “islands” in the map. In fact, n − G is the rank of D w − W . The impropriety can be remedied in an obvious way. Redefine the CAR as: p ( y 1 , y 2 , ...y n ) ∝ | D w − ρW | 1 / 2 exp {− 1 2 τ 2 y ′ ( D w − ρW ) y } , where ρ is chosen to make D w − ρW non-singular. This is � � guaranteed if ρ ∈ 1 /λ (1) , 1 , where λ (1) is the minimum eigenvalue of D − 1 / 2 WD − 1 / 2 . In practice, the bound ρ ∈ (0 , 1) is often preferred. 19 Areal Modelling

CAR models To ρ or not to ρ ? Advantages: makes distribution proper adds parametric flexibility ρ = 0 interpretable as independence Disadvantages: why should we expect y i to be a proportion of average of neighbors - sensible spatial interpretation? calibration of ρ as a correlation, e.g., ρ = 0 . 80 yields 0 . 1 ≤ I ≤ 0 . 15 , ρ = 0 . 90 yields 0 . 2 ≤ I ≤ 0 . 25 , ρ = 0 . 99 yields I ≤ 0 . 5 So, used with random effects, scope of spatial pattern may be limited 20 Areal Modelling

Spatial Autoregressive Models Sudipto Banerjee Division of - PowerPoint PPT Presentation

Spatial Autoregressive Models Sudipto Banerjee Division of Biostatistics, University of Minnesota September 22, 2009 1 Areal Modelling Areal unit data Maps of raw standard mortality ratios (SMR) of lung and esophagus cancer between 1991 and

Autoregressive Models Autoregressive Models In [1]: from mxnet import autograd, nd, gluon, init

Chapter 4: Video 1 - Supplemental slides The Autoregressive Model Autoregressive (AR) processes

Lecture 12: Autoregressive Filters Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall

Financial Econometrics Econ 40357 ARIMA Part 2: Autoregressive Models N.C. Mark University of

Adaptive Estimation of Autoregressive Models with Time-Varying Variances Ke-Li Xu and Peter C.

CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial Cloaking Framework A Spatial

Partial MLE, Marginal Effects and Asymptotics for Spatial Autoregressive Nonlinear Probit Models

Hint-Based Training for Non-Autoregressive Translation Zhuohan Li Zi Lin Fei Tian Tao Qin

Spatial Digitech Keep it s im ple Make it spatial About US Spatial Digitech is a provider of

Creating a Science of Spatial Learning Nora S. Newcombe Temple University PI, Spatial

UCSB is Spatial ! http://www.spatial.ucsb.edu Specialist Meeting on Spatial Thinking across the

STAT 209 Spatial Data I April 30, 2018 Colin Reimer Dawson 1 / 26 Spatial Data Projections

Agenda Automated Automated Modeling and Modeling and Forecasting Forecasting Vector Vector

Better Vision for Computer Vision Presenter: Nathan Wheeler CEO, Co-Founder Video Intro

Computational module of geoinformational web-system for climate change analysis: method, data

MSc in Data and Computational Science Introduction Lennon O N araigh, Nial Friel,

HIPAA/HITECH Omnibus Final Rule Secretarys Advisory Committee on Human Research Protections

MaskGAN: Better Text Generation via Filling in the ______ June 5, 2018 (

CEE02- Generating High Resolution Rainfall Data Using Statistical Techniques By: Foo Xiang Hua

Comparative Effectiveness Evaluation and Monitoring: An American Perspective Jeffrey Smith Paul

and Hidden Markov Models Dynamic Programming Biostatistics 615/815 Lecture 9: . . Summary HMM