Methods for Small Area Analyses of Spatial and Space-time Data Evan - PowerPoint PPT Presentation

Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018

Outline • Introduction • Challenges of spatial data • Representing space and defining spatial relationships • Spatial autocorrelation • Focus on analysis techniques for area data – Disease mapping & BYM CAR Models • Focus on analysis techniques for continuous data

Part 1: Foundational Concepts • Why do I care about space: is space a parameter of interest, or a nuisance parameter? • What are different ways spatial data can be represented in my data? • How do I define ‘near’ and ‘far’? • What does autocorrelation mean? • How does spatial autocorrelation differ from spatial trends? • Why is data irregularly distributed across space challenging to model? • How is this connected to small area analysis? • What does ‘shrinkage’ mean, and why does it improve models?

Why do you care about space? I am interested in the I am not interested in the relationship between effect of location, but my location and my outcome. data has spatial nature… • I want to identify areas with • Ignoring space in your high or low disease rates. models may give you biased • Potentially create maps results/incorrect p-values • Correctly modeling space showing above/below average outcomes. fixes the issue. • I want to estimate the effect • Space is a ‘ nuisance ’ of space! parameter here.

Geospatial Data & Public Health Geographic data, Geographic Information Systems (GIS), and spatial analysis provide public health officials with the capability to perform two unique types of analysis: 1. Find statistically significant areas of high or low incidence 2. Examine the spatial relationship between health outcomes and population/contextual factors

Geographic Variation in Health • People (demographics) and the risk factors contributing to health are dispersed unevenly across communities and regions • Often we are interested in identifying patterns of disease (or some other health outcome) across space • We are also interested in understanding the reasons for these patterns: – Composition: differences in kinds of people who live in places – Context: differences in neighborhood or area-level physical or social environments

But…“spatial is special” • Data that are referenced to location bring important additional information to your data analysis • But, spatially referenced data also bring special problems to your analysis – heterogeneity of observational units → heteroskedasticity – spatial autocorrelation → residual dependence • A consequence of these “special problems” is that traditional assumptions of standard regression techniques are violated – statistical inference from such a model is not valid

Spatial data is complex • The methods we chose to cope with the complexities of spatial data depend on how we define space – Discrete geographic phenomena have spatial bounds. Locations may be within or outside a geographic feature. • Areal data: census tracts, counties, states – Continuous geographic phenomena have properties continuously distributed across the landscape. Locations are specific and have value. • Point data • These definitions of space are represented by different geographic data types

What are Spatial Data • Location • Attributes Attribute data: • Spatial Relationships Survey data ID Tract ChildDth Race DistPCP Spatial data: 1 1237 Yes White 5000 Object: Home longitude, latitude (x, y) 2 1237 No AA 3560 76.9147, 107.6098 3 1238 No White 10789 4 1238 No Asian 7689 Attribute data: Census tract/PCSA characteristics Tract PctPov PctAA Foreclose PCP Object: Health Center 1237 .056 .241 .011 1 1238 .079 .443 .043 3 Spatial Relationships: 1239 .151 .078 .225 10 • Proximity to physician 1240 .224 .011 .105 0 • “Contained in” census tract

Spatial Data Types Event Data (Points) Lattice Data (Areas) Geostatistical Data (Grid)

It’s important to understand that these designations are not mutually exclusive

Points can be geolocated in some relevant areal units

These aggregations can be used to produce rates 0.18 0.16 0.11 0.02 0.05 0.09 0.00 0.14 0.7

GIS Spatial Data Spatial Analysis Analysis “Spatial Data Production” “Spatial Statistics” Event (Point) Lattice (Area) Geostatistical Data Data Data | | | Regional Count data Spatial Prediction Point Pattern Analysis Spatial Econometrics Spatial Epidemiology Spatial Regression Crime Analysis Analysis

Thinking in one dimension: Does time effect the outcome?

Thinking in one dimension: Is there a time trend?

Spatial Autocorrelation and Trends (2D) “Everything is related to everything else, but near things are more related than distant things.” • Correlation in space – Is a variable in a location correlated with the values in nearby places? • Spatial trends in the outcome – The outcome differs systematically as a function of spatial location. These are distinct concepts! * Humans are pretty bad at identifying spatial trends by eye. We tend to over interpret noise when it is on a map ☺

Defining spatial relationships • What is a neighbor? What’s next to what? • These spatial relationships can be defined in a number of ways – Contiguity (common boundary, K-nearest neighbors) • What is a “shared” boundary? • How many “neighbors” to include? – Distance (distance band) • What distance do we use?

Contiguity based neighbors • For areas: – All polygons that share a common border • For points 1 km k=2 – Distance k=1 k=3 1.5 km K-nearest neighbors (KNN) Euclidean distance

Thinking in one dimension: Does time effect the outcome?

The problem with sparse data…

General Shrinkage Idea Low High If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate?

If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate? Low High Only use information from each hospital to predict mortality. No pooling of information (no shrinkage!)

If we have observed last year’s hospital mortality rate, what is your best prediction of next year’s hospital mortality rate? Low High Share (pool) information across hospitals. Prediction is ‘shrunk’ towards the mean.

Sharing Spatial Data (Shrinkage) 1/45 4/20 = = 0.2 0.02 Census Tract C Census Tract B 2/25 = 2/8 = 0.08 0.25 Census Tract D Census Tract A 3/30 1/10 = 0.1 = 0.1 Census Tract F Census Tract E

Focus on methods for continuously indexed data Spatial models implemented with R- INLA

Motivating example: Outcomes of Veterans in Colorado Goal: Identify areas of high and low event probability. What does the ideal method need to have?

Ideal method • Identify spatial trend and make predictions at all points. • Resilient to irregularly spaced data (small area analysis!) • Exhibit shrinkage / stabilization • Incorporate other patient level traits in the model (‘adjust’) • Converge in reasonable time in medium to large datasets

Point pattern analysis versus point referenced models. Patient Patient Outcome = Binary + Location Demographics http://open.lib.umn.edu/mapping/chapter/6-analysis/

Community care utilization in Colorado (data simulation – no PHI here!)

Simulating Success of Community care Referrals in the VHA • Simulation 1: – no spatial trend (pure spatial noise) • Simulations 2-4: – Spatial trend of varying strengths. How successful are different methods at recovering the underlying spatial trends of the binomial process??

Method 1: Simple Interpolation (2D Smoother) • Use a 2D smoother: – Gaussian kernel weighting – Allows smoothing of binary process at irregularly space locations. – Can compute mean and variance across space. – Nadaraya-Watson smoother (Nadaraya, 1964, 1989; Watson, 1964) • What results do you expect to get using this method?

Results for data with no spatial trend.

Results for data with a spatial trend (simulation 2)

Results for data with a spatial trend (simulations 3 and 4)

Methods for Small Area Analyses of Spatial and Space-time Data Evan - PowerPoint PPT Presentation

Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018 Outline Introduction Challenges of spatial data Representing

Confirmatory subgroup analyses: Case Studies Frank Bretz, Gerd Rosenkranz, Emmanuel Zuber EMA

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Baseline Analyses Using Baseline Analyses Using DBP (2006) & AMP (2008) DBP (2006) & AMP

Uncertainty Analyses Using the MELCOR Uncertainty Analyses Using the MELCOR Severe Accident

Toll Schedule Analyses Board Meeting 11-17-16 Overview n Conducted Toll Schedule Analyses per

State Trading Enterprises: State Trading Enterprises: What Analyses are Required? What Analyses

Using Cost-Benefit Analyses to Promote Using Cost-Benefit Analyses to Promote y y the Early

Analyses of Variance Block 2b Types of analyses 1 way ANOVA For more than 2 levels of a

Topic 2: PK data for supporting PK-PD analyses David Tenero (GSK) , on behalf of the EFPIA team

Improved analyses and forecasts with AIRS Improved analyses and forecasts with AIRS retrievals

On the Utility of Subgroup Analyses in Confirmatory Clinical Trials EMA Expert Workshop on

Power Analyses Page Piccinini Instructor DataCamp A/B Testing in R What are power analyses? -

Early SUSY analyses with ATLAS Giacomo Polesello INFN, Sezione di Pavia Early analyses at the

Office of Small Business Small Business Updates James G. Burrows: Senior Vice President, Office

City of Boston Small Business Plan Small Business Plan Overview State of Small Business in

Estimation of behavioural Parameters of CGE Models For the 28 EU Countries Second Bwanakare

An Introduction to Title Maximum Pawel Zabczyk Likelihood and

Introduction to Bayesian Statistics Louis Raes Spring 2017 Table of contents Organisation,

Introduction to Stan and Bayesian Inference Paris Machine Learning Meetup Dataiku User Meetup

Junni Zhang Department of Business Statistics and Econometrics Guanghua School of Management,

Company Presentation People, products and services Prague 2017 Introduction of the Company

Predicting the past: A machine learning approach to detect innovative firms in times of crisis

Spatial dimension of the credit risk: Spatial filtering approach Aleksandar PETRESKI Jnkping

Sambuz

Useful Links

Newsletter

Mail Us

Methods for Small Area Analyses of Spatial and Space-time Data Evan - PowerPoint PPT Presentation

Methods for Small Area Analyses of Spatial and Space-time Data Evan Carey Robert Penfold Elisabeth Dowling Root AcademyHealth Conference, Seattle, WA June 25, 2018 Outline Introduction Challenges of spatial data Representing

Confirmatory subgroup analyses: Case Studies Frank Bretz, Gerd Rosenkranz, Emmanuel Zuber EMA

Genome Wide Haplotype analyses Genome Wide Haplotype analyses of human complex diseases with the

Multivariate Ordination Analyses: Principal Component Analysis Dilys Vela Tatiana Boza Tatiana

Baseline Analyses Using Baseline Analyses Using DBP (2006) &amp; AMP (2008) DBP (2006) &amp; AMP

Uncertainty Analyses Using the MELCOR Uncertainty Analyses Using the MELCOR Severe Accident

Toll Schedule Analyses Board Meeting 11-17-16 Overview n Conducted Toll Schedule Analyses per

State Trading Enterprises: State Trading Enterprises: What Analyses are Required? What Analyses

Using Cost-Benefit Analyses to Promote Using Cost-Benefit Analyses to Promote y y the Early

Analyses of Variance Block 2b Types of analyses 1 way ANOVA For more than 2 levels of a

Topic 2: PK data for supporting PK-PD analyses David Tenero (GSK) , on behalf of the EFPIA team

Improved analyses and forecasts with AIRS Improved analyses and forecasts with AIRS retrievals

On the Utility of Subgroup Analyses in Confirmatory Clinical Trials EMA Expert Workshop on

Power Analyses Page Piccinini Instructor DataCamp A/B Testing in R What are power analyses? -

Early SUSY analyses with ATLAS Giacomo Polesello INFN, Sezione di Pavia Early analyses at the

Office of Small Business Small Business Updates James G. Burrows: Senior Vice President, Office

City of Boston Small Business Plan Small Business Plan Overview State of Small Business in

Estimation of behavioural Parameters of CGE Models For the 28 EU Countries Second Bwanakare

An Introduction to Title Maximum Pawel Zabczyk Likelihood and

Introduction to Bayesian Statistics Louis Raes Spring 2017 Table of contents Organisation,

Introduction to Stan and Bayesian Inference Paris Machine Learning Meetup Dataiku User Meetup

Junni Zhang Department of Business Statistics and Econometrics Guanghua School of Management,

Company Presentation People, products and services Prague 2017 Introduction of the Company

Predicting the past: A machine learning approach to detect innovative firms in times of crisis

Spatial dimension of the credit risk: Spatial filtering approach Aleksandar PETRESKI Jnkping

Sambuz

Useful Links

Newsletter

Mail Us

Baseline Analyses Using Baseline Analyses Using DBP (2006) & AMP (2008) DBP (2006) & AMP