acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael - PowerPoint PPT Presentation

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig (UNIL) (UNIL) (King’s College) (UNIL) www.acregstata.weebly.com (Virtual) Swiss Stata Meeting 2020 Bern, November 2020

Introduction

Motivation I Modeling the convoluted correlation structures between units improves inference ❼ Spatial data: - Geographical positions of observations - Neighborhood structures ❼ Network data: - Social networks - Mobile data - Co-working relations Colella, Lalive, Sakalli, and Thoenig acreg

Motivation II But only a few studies offers a flexible theoretical framework (Bester et al., 2011) Commonly used practices: ❼ Spatial Data - Cluster (Cameron et al., 2011) - Conley’s Spatial Clustering (Conley, 1999a) ❼ Network Data - Cluster Colella, Lalive, Sakalli, and Thoenig acreg

Motivation III And the STATA literature on the topic is limited ❼ Robust (White, 1980) and Two-way clustering corrections (Cameron and Miller, 2015) included in most programs computing OLS and 2SLS regressions. ❼ In the Spatial literature there are some programs to account for correlation using coordinates - Conley, 1999b - Hsiang, 2010 ❼ There are no STATA packages available to account for correlation between neighbors or observations in a network Colella, Lalive, Sakalli, and Thoenig acreg

Motivation IV In a related paper (Colella et al., 2019): ❼ Building on White (1980), we develop an Arbitrary Clustering approach to deal with inference with any type of topological and temporal dependence between observational units ❼ We perform extensive Monte Carlo simulations for both spatial and network data structures comparing different methods ❼ We show that commonly used techniques reject the null hypothesis about 110% times more than they should, while with our approach gets close to the true rejection rate. Go ❼ Provide guidelines for conducting inference in complex settings Colella, Lalive, Sakalli, and Thoenig acreg

This Paper We introduce a new STATA package (and a companion paper) implementing the standard errors correction approach proposed in Colella et al. (2019): ACREG: Arbitrary Correlation Regression ❼ Computes adjusted standard errors for: - Spatial data (coordinates or contiguity matrix), - Network data (adjacency matrix), - Multi-way clustering environments (infinite list of clustering variables) ❼ Suits OLS and 2SLS settings ❼ Includes temporal correlation for panel data Colella, Lalive, Sakalli, and Thoenig acreg

Correlation with Spatial Data

Correlation in Space Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

Correlation in Space - Clustering by State Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

Correlation in Space - Conley 1999 Income in 1990 for southern U.S. counties - Messner et al. (1999) Colella, Lalive, Sakalli, and Thoenig acreg

Correlation with Network Data

Correlation in Network Colella, Lalive, Sakalli, and Thoenig acreg

Correlation in Network - One way clustering Colella, Lalive, Sakalli, and Thoenig acreg

Correlation in Network - Network Clusters Colella, Lalive, Sakalli, and Thoenig acreg

Adjacency matrix j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9 j 10 j 11 j 1 1 0 1 0 0 1 1 0 0 0 1 j 2 0 1 1 0 1 0 0 1 0 0 1 j 3 1 1 1 0 0 0 0 0 0 1 0 j 4 0 0 0 1 0 0 1 1 0 1 0 j 5 0 1 0 0 1 0 0 0 0 0 1 j 6 1 0 0 0 0 1 1 0 0 0 0 j 7 0 0 0 1 0 1 1 0 0 1 0 j 8 0 1 0 1 0 0 0 1 1 0 0 j 9 1 0 0 0 0 0 0 1 1 0 0 j 10 0 0 1 1 0 0 1 0 0 1 0 j 11 1 1 0 0 1 0 0 0 0 0 1 Colella, Lalive, Sakalli, and Thoenig acreg

Conceptual Framework

Theoretical VCV of the OLS estimator Linear Model y = X β + ǫ Standard OLS Estimator b OLS = ( X ′ X ) − 1 ( X ′ y ) With Variance VCV ( b OLS ) = ( X ′ X ) − 1 X ′ Ω X ( X ′ X ) − 1 Where: y is the Dependent Variable X is the Matrix of Regressors (exogenous and endogenous) Ω is the VCV of errors Colella, Lalive, Sakalli, and Thoenig acreg

Estimating the VCV of the OLS estimator Proposed Estimator for X ′ Ω X is: n T n T � � � � X ′ ( S × ( uu ′ )) X = x it u it u js x js s itjs i =1 t =1 j =1 s =1 Where: u ≡ y − X β OLS are the estimated residuals ❼ Each itjs -th component of s is a correlation weight [0,1] ❼ The correlation weight should reflect the dependence of the error of observation it on the error of observation js , ❼ The matrix S can be computed from the adjacency matrix Colella, Lalive, Sakalli, and Thoenig acreg

Syntax

Syntax - Baseline acreg depvar [ varlist1 ] [( varlist2 = varlist iv )] [ if ] [ in ] [ fweight pweight ] ❼ depvar is the dependent variable ❼ varlist1 is the list of exogenous variables ❼ varlist2 is the list of endogenous variables ❼ varlist iv is the list of exogenous variables used with varlist1 as instruments for varlist2 Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Time Dimension acreg depvar varlist1 ( varlist2 = varlist iv ), id ( idvar ) time ( timevar ) lag ( # ) ❼ idvar is the cross-sectional unit identifier ❼ timevar is the time unit variable ❼ lag ( # ) specifies the time lag cutoff for observations with the same idvar Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Spatial I acreg depvar varlist1 ( varlist2 = varlist iv ), spatial latitude ( latitudevar ) longitude ( longitudevar ) dist ( # ) ❼ spatial specifies the spatial environment ❼ latitudevar is the variable containing the latitude of each observation in decimal degrees: range[-180.0, 180.0] ❼ longitudevar is the variable containing the longitude of each observation in decimal degrees: range[-180.0, 180.0] ❼ dist ( # ) specifies the distance cutoff beyond which the correlation between error term of two observations is assumed to be zero, in km Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Spatial II acreg depvar varlist1 ( varlist2 = varlist iv ), spatial dist mat ( varlist distances ) dist ( # ) specifies the spatial environment ❼ spatial ❼ varlist distances is the list of N variables containing bilateral spatial distances between observations in any meaningful metric, e.g., physical or travel distance between two locations. ❼ dist ( # ) specifies the distance cutoff beyond which the correlation between error term of two observations is assumed to be zero, in the same metric as varlist distances Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Network I acreg depvar varlist1 ( varlist2 = varlist iv ), network links mat ( varlist links ) dist ( # ) ❼ network specifies that the network environment is the list of N binary variables specifying the ❼ varlist links links between observations, e.g., the adjacency matrix. The links between two units can change over time. ❼ dist ( # ) specifies the distance cutoff (geodesic paths) beyond which the correlation between error term of two observations is assumed to be zero. If it is greater than 1, acreg computes the bilateral distance between two nodes. Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Network II acreg depvar varlist1 ( varlist2 = varlist iv ), network dist mat ( varlist distances ) dist ( # ) ❼ network specifies that the network environment ❼ varlist distances is the list of N variables containing bilateral distances between observations in the network, i.e., the number of links along the shortest path between two nodes. ❼ dist ( # ) specifies the distance cutoff (geodesic paths) beyond which the correlation between error term of two observations is assumed to be zero. If it is greater than 1, acreg computes the bilateral distance between two nodes. Colella, Lalive, Sakalli, and Thoenig acreg

Syntax - Multiway Clustering acreg depvar varlist1 ( varlist2 = varlist iv ), cluster ( varlist cluster ) ❼ varlist cluster is the list of variables identifying the different clusters. Each variable identify a specific cluster dimension and its clusters. Colella, Lalive, Sakalli, and Thoenig acreg

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael - PowerPoint PPT Presentation

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig (UNIL) (UNIL) (Kings College) (UNIL) www.acregstata.weebly.com (Virtual) Swiss Stata Meeting 2020 Bern, November 2020

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Getting to Regression: The Workhorse of Quantitative Political Analysis Department of

Correlation and Regression 9-1 Overview 9-2 Correlation 9-3 Regression 9-4 Variation and

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,

Interpretation of regression coe ffi cients Correlation and Regression Is that textbook

Non-scalar operators and logarithmic correlation functions for the Potts model in arbitrary

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Video Propagation Networks V. Jampani, R. Gadde and P. V. Gehler, CVPR 2017 s Jon a

Context For Semantic Segmentation Gang Yu Collaborators Changqian Yu

Development and Evaluation of AI-based Parkinsons Disease Related Motor Symptom Detection

LEARNING REGRESSION TREES from Time-Changing Data Streams Bla Sovdat August 27, 2014 THE

Maureen McDaniel, MEd, RDMS, RDCS, RVT Sagittal Coronal Transverse Also called

Transient Test Reactors Dr. Daniel M. Wachs National Technical Lead for Transient Testing Idaho

Current Frozen Food Industry FDA Regulates Temperatures Temperature Checks: 4-hour

Thermal Energy Define the following terms and/or concepts Study Guide www.njctl.org Slide 5 /

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael - PowerPoint PPT Presentation

acreg: Arbitrary Correlation Regression Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig (UNIL) (UNIL) (Kings College) (UNIL) www.acregstata.weebly.com (Virtual) Swiss Stata Meeting 2020 Bern, November 2020

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

Getting to Regression: The Workhorse of Quantitative Political Analysis Department of

Correlation and Regression 9-1 Overview 9-2 Correlation 9-3 Regression 9-4 Variation and

Visualization of Linear Models Correlation and Regression Possums &gt; ggplot(data = possum,

Interpretation of regression coe ffi cients Correlation and Regression Is that textbook

Non-scalar operators and logarithmic correlation functions for the Potts model in arbitrary

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Introduction to Regression and Correlation James H. Steiger Department of Psychology and Human

Bivariate Correlation r &gt; 0 r &lt; 0 r = 0 r = 0 r &gt; 0 r = 0 remember: r measures

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Biostatistics Correlation and linear regression Burkhardt Seifert &amp; Alois Tschopp

Coefficient of Correlation The regression equation Y = 0 + 1 x + shows the linear

Video Propagation Networks V. Jampani, R. Gadde and P. V. Gehler, CVPR 2017 s Jon a

Context For Semantic Segmentation Gang Yu Collaborators Changqian Yu

Development and Evaluation of AI-based Parkinsons Disease Related Motor Symptom Detection

LEARNING REGRESSION TREES from Time-Changing Data Streams Bla Sovdat August 27, 2014 THE

Maureen McDaniel, MEd, RDMS, RDCS, RVT Sagittal Coronal Transverse Also called

Transient Test Reactors Dr. Daniel M. Wachs National Technical Lead for Transient Testing Idaho

Current Frozen Food Industry FDA Regulates Temperatures Temperature Checks: 4-hour

Thermal Energy Define the following terms and/or concepts Study Guide www.njctl.org Slide 5 /

Visualization of Linear Models Correlation and Regression Possums > ggplot(data = possum,

Bivariate Correlation r > 0 r < 0 r = 0 r = 0 r > 0 r = 0 remember: r measures

Biostatistics Correlation and linear regression Burkhardt Seifert & Alois Tschopp