Methods for Small Area Analyses
- f Spatial and Space-time Data
Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018
Methods for Small Area Analyses of Spatial and Space-time Data Evan - - PowerPoint PPT Presentation
Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018 Outline Introduction Challenges of spatial data Representing
Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018
– Disease mapping & BYM CAR Models
my data?
to model?
models?
I am interested in the relationship between location and my outcome. I am not interested in the effect of location, but my data has spatial nature…
high or low disease rates.
showing above/below average outcomes.
models may give you biased results/incorrect p-values
fixes the issue.
parameter here.
Geographic data, Geographic Information Systems (GIS), and spatial analysis provide public health officials with the capability to perform two unique types of analysis:
incidence
contributing to health are dispersed unevenly across communities and regions
disease (or some other health outcome) across space
these patterns:
– Composition: differences in kinds of people who live in places – Context: differences in neighborhood or area-level physical
additional information to your data analysis
problems to your analysis
– heterogeneity of observational units → heteroskedasticity – spatial autocorrelation → residual dependence
traditional assumptions of standard regression techniques are violated
– statistical inference from such a model is not valid
complexities of spatial data depend on how we define space
– Discrete geographic phenomena have spatial bounds. Locations may be within or outside a geographic feature.
– Continuous geographic phenomena have properties continuously distributed across the landscape. Locations are specific and have value.
ID Tract ChildDth Race DistPCP 1 1237 Yes White 5000 2 1237 No AA 3560 3 1238 No White 10789 4 1238 No Asian 7689 Object: Home
Spatial data: longitude, latitude (x, y) 76.9147, 107.6098 Attribute data: Survey data Spatial Relationships:
Tract PctPov PctAA Foreclose PCP 1237 .056 .241 .011 1 1238 .079 .443 .043 3 1239 .151 .078 .225 10 1240 .224 .011 .105
Attribute data: Census tract/PCSA characteristics
Object: Health Center
Event Data (Points) Lattice Data (Areas) Geostatistical Data (Grid)
These aggregations can be used to produce rates
“Spatial Statistics”
Regional Count data Spatial Econometrics Spatial Regression Analysis
|
Point Pattern Analysis Spatial Epidemiology Crime Analysis
|
Spatial Prediction
|
“Spatial Data Production”
– Is a variable in a location correlated with the values in nearby places?
– The outcome differs systematically as a function of spatial location. These are distinct concepts! * Humans are pretty bad at identifying spatial trends by eye. We tend to over interpret noise when it is on a map ☺
“Everything is related to everything else, but near things are more related than distant things.”
– Contiguity (common boundary, K-nearest neighbors)
– Distance (distance band)
– All polygons that share a common border
– Distance
k=1 k=2 k=3 1 km 1.5 km K-nearest neighbors (KNN) Euclidean distance
If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate?
High Low
If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate?
Only use information from each hospital to predict mortality. No pooling of information (no shrinkage!)
High Low
If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate?
Share (pool) information across hospitals. Prediction is ‘shrunk’ towards the mean.
High Low
2/8 = 0.25 Census Tract A 4/20 = 0.2 Census Tract B 1/10 = 0.1 Census Tract E 1/45 = 0.02 Census Tract C 3/30 = 0.1 Census Tract F 2/25 = 0.08 Census Tract D
Binary Outcome = Patient Location Patient Demographics
http://open.lib.umn.edu/mapping/chapter/6-analysis/
– no spatial trend (pure spatial noise)
– Spatial trend of varying strengths. How successful are different methods at recovering the underlying spatial trends of the binomial process??
– Gaussian kernel weighting – Allows smoothing of binary process at irregularly space locations. – Can compute mean and variance across space. – Nadaraya-Watson smoother (Nadaraya, 1964, 1989; Watson, 1964)
alternative to MCMC for fitting Bayesian models.
– Fixed effects, structured and unstructured Gaussian random effects combined linearly with likelihoods specified. – ‘focus on the continuous representation of the GRF through an (stochastic partial differential equation) SPDE’
methods of spatial inference require inversion of the covariance matrix, which is an n3 calculation!
Bakka, Haakon, Håvard Rue, Geir-Arne Fuglstad, Andrea Riebler, David Bolin, Elias Krainski, Daniel Simpson, and Finn Lindgren. “Spatial Modelling with R-INLA: A Review.” ArXiv:1802.06350 [Stat], February 18, 2018. http://arxiv.org/abs/1802.06350.
specification and fitting.
– Helper functions in R-INLA. – Expand mesh beyond boundaries of data – Experiment with density of nodes.
– Spatial effect is connected to the mesh/observations object – Other patient level effects not connected to location matrix.
well in larger datasets.
to this problem:
– R-INLA and Fixed Rank Kriging performed optimally in larger datasets (memory usage and PU time) – Methods generally provided similar estimates.
Bradley, Jonathan R., Noel Cressie, and Tao Shi. “A Comparison of Spatial Predictors When Datasets Could Be Very Large.” Statistics Surveys 10, no. 0 (2016): 100–131. https://doi.org/10.1214/16-SS115.
Focus Area Name
Spatial smoothing: Headbanging, Locally weighted averaging, and Bayesian CARs Elisabeth Dowling Root, MA, PhD
Department of Geography & Division of Epidemiology The Ohio State University
Focus Area Name
very instable and maps of rates can be misleading
– AND rates are spatially correlated
variability of mapped estimates
in maps of observable health data
– Highlight meaningful geographic patterns in the underlying risk
Focus Area Name
across small areas
– Smoothed estimates for each area “borrow strength” (precision) from data in other areas by an amount depending on the precision of the raw estimate of each area
knowledge about:
– Observed rate in that area – Average rate in surrounding areas
weighted average, weights depend on the population size in area A
Focus Area Name
– Smooths toward the mean – Area value replaced by population weighted average of surrounding areas
– Smooths toward the median – Area values replaced if large deviation from the median and population is not large
– Smooths toward the mean – Area values calculated using a CAR model with a spatial random effect term
Focus Area Name
Rate=0.3 N=8
Census Tract A
Rate=0.2 N=20 Rate=0.1 N=10 Rate=0.02 N=45 Rate=0.1 N=30 Rate=0.08 N=25 Rate=0.12 N=35 Rate=0.15 N=22 Rate=0.02 N=45 Rate=0.04 N=55 Rate=0.03 N=60
Headbanging uses the median, but this technique can also be applied to the neighborhood mean
Focus Area Name
Rate=0.3 N=8
Census Tract A
Rate=0.2 N=20 Rate=0.1 N=10 Rate=0.02 N=45 Rate=0.1 N=30 Rate=0.08 N=25 Rate=0.12 N=35 Rate=0.15 N=22 Rate=0.02 N=45 Rate=0.04 N=55 Rate=0.03 N=60
Is center value between high and low medians? -- NO Is the population much greater than neighbors? -- NO RATE = 0.09
REPLACE!!
Rate N Weighted Rate Median 0.02 45 0.027 0.02 45 0.027 0.10 10 0.030 low 50% 0.20 8 0.048 0.03 60 0.054 0.08 25 0.060 0.04 55 0.066 0.10 30 0.090 high 50% 0.15 22 0.099 0.12 35 0.125
Focus Area Name
Rate=0.3
N=200 Census Tract A
Rate=0.2 N=20 Rate=0.1 N=10 Rate=0.02 N=45 Rate=0.1 N=30 Rate=0.08 N=25 Rate=0.12 N=35 Rate=0.15 N=22 Rate=0.02 N=45 Rate=0.04 N=55 Rate=0.03 N=60 Rate N Weighted Rate Median 0.02 45 0.027 0.02 45 0.027 0.10 10 0.030 low 50% 0.20 8 0.048 0.03 60 0.054 0.08 25 0.060 0.04 55 0.066 0.10 30 0.090 high 50% 0.15 22 0.099 0.12 35 0.125
Is center value between high and low medians? -- NO Is the population much greater than neighbors? -- YES
DON’T REPLACE!!
Focus Area Name
Focus Area Name
Focus Area Name
Focus Area Name
expected rates given the age-, sex-, etc- distribution of the population in that area 𝑇𝑁𝑆𝑗 = 𝑍
𝑗
𝐹𝑗 ∗ 1000 𝑍
𝑗 is the observed number of events
𝐹𝑗 is the expected number of events 𝐹𝑗 =
𝑘
𝑞𝑘𝑜𝑗𝑘 j is the population stratum (e.g., age*sex*race) 𝑞𝑘 is the frequency of the reference population 𝑜𝑗𝑘 is the number of people in area i in stratum j
expected rates
Focus Area Name
𝑍
𝑗|𝜈𝑗
~ 𝑄𝑝𝑗𝑡𝑡𝑝𝑜(𝜈𝑗) log 𝜈𝑗 = log 𝐹𝑗 + 𝑐𝑗 𝑐𝑗|𝑐
𝑘≠𝑗 ~ 𝑂
σ𝑘≠𝑗 𝑥𝑗𝑘𝑐
𝑘
σ𝑘≠𝑗 𝑥𝑗𝑘 , 𝜏2 1 σ𝑘≠𝑗 𝑥𝑗𝑘
– 𝑐𝑗 are area-specific random effects with a correlated random effect distribution – 𝑥𝑗𝑘 are weights defining which regions j and i are neighbors – 𝜏2 is the variance controlling how similar 𝑐
𝑘 is to its neighbors
SMR (1/𝐹𝑗) in area i and the variability (heterogeneity) of the true risks across areas local or regional mean
Focus Area Name
and 𝑇𝑁𝑆𝑗) are: 𝑇𝑁𝑆𝑗 = 𝑍
𝑗
𝐹𝑗 𝑇𝑁𝑆𝑗 = Ƹ 𝜈𝑗 𝐹𝑗
𝑇𝑁𝑆𝑗 ≈ 𝑇𝑁𝑆𝑗
𝑇𝑁𝑆𝑗 ≈weighted average of SMR in the neighboring areas
Focus Area Name
Focus Area Name
Focus Area Name
– Posterior probabilities [Prob (SMRi > 1)] > 0.8 – Outside 95% credible interval
– (false detection < 10%)
Focus Area Name
Focus Area Name
Smoothing should be considered when:
1. The addition of one event or one more person at risk results in a large difference in the rate (e.g., a change
2. The number of events that form the numerator is ≤ 3 3. The number of persons at risk per region is small and the numbers change by an order of magnitude across a region (e.g., 10 people in tract A vs. 100 people in tract B)
*Smoothing reduces noise and makes trends and patterns more clear
Focus Area Name
Bayesian CAR model with maternal demographics Bayesian CAR model with demographics + tract-level SDHs
Focus Area Name
– Implemented in free WinBUGS and GeoBUGS software: www.mrc-bsu.cam.ac.uk/bugs – Also R package CARBayes
www.r-inla.org
– R package diseasemapping calls INLA specifically for disease mapping