inference with arbitrary clustering
play

Inference with Arbitrary Clustering Fabrizio Colella, Rafael Lalive, - PowerPoint PPT Presentation

Inference with Arbitrary Clustering Fabrizio Colella, Rafael Lalive, Seyhun O. Sakalli, Mathias Thoenig Swiss Stata Users Group Meeting, October 2018 University of Lausanne Introduction Motivation A tremendous surge of


  1. Inference with Arbitrary Clustering Fabrizio Colella, ∗ Rafael Lalive, ∗ Seyhun O. Sakalli, ∗ Mathias Thoenig ∗ Swiss Stata Users Group Meeting, October 2018 ∗ University of Lausanne

  2. Introduction

  3. Motivation A tremendous surge of empirical analysis with spatial data: • Growing availability of geocoded data • Integration of geographic information systems (GIS) in the toolkit of economists Network relations among individuals known and easily accessible Need for econometric methods to obtain asymptotically valid inference in settings with varying types of spatial, network, and temporal dependence between observation units Absence of Stata commands, especially in the 2SLS setting Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  4. This paper Proposes an approach to obtain asymptotically valid inference in the presence of arbitrary correlation (spatial or within a network) in both OLS and 2SLS settings Provides a package, acreg , for the statistical software Stata Performs Monte Carlo simulations (using spatial data on U.S. towns and counties) to show the properties and performance of the proposed estimator • Generate random variables and check how close we get to 5% null-rejection rate at 5% test level, following Bertrand, Duflo, and Mullainathan (2004) Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  5. Stata command: acreg What is new in acreg compared to existing packages? • Performs standard error correction in both OLS and 2SLS settings following White (1980) • Correlation weights can be given as input or computed from spatial or network relations or multi-way clustering (Cameron et al., 2011) • Spatial relations can be defined both with a distance cutoff and a contigu- ity/distance matrix (neighboring observations only) • Network relations can be defined both with a matrix of links or a distance matrix or with any arbitrary cluster structure that user defines • Allows for observation i in time t to be correlated with observation j in its cluster in time t + s • HAC standard errors and distance decays are optional • Fixes some bugs that exist in Conley (1999) and Hsiang (2010) Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  6. Arbitrary Clustering

  7. Spatial - 1 Cluster Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  8. Spatial - 2 Overlapping clusters Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  9. Network Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  10. Network - Adjacency matrix j 1 j 2 j 3 j 4 j 5 j 6 j 7 j 8 j 9 j 10 j 11 j 1 1 0 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 1 0 0 1 j 2 1 1 1 0 0 0 0 0 0 1 0 j 3 j 4 0 0 0 1 0 0 1 1 0 1 0 j 5 0 1 0 0 1 0 0 0 0 0 1 j 6 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 1 1 0 0 1 0 j 7 j 8 0 1 0 1 0 0 0 1 1 0 0 j 9 1 0 0 0 0 0 0 1 1 0 0 j 10 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 j 11 Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  11. Conceptual Framework

  12. Theoretical VCV of the 2SLS estimator Standard IV Estimator X ′ ˆ b 2 SLS = ( ˆ X ) − 1 ( ˆ X ′ y ) With Variance X ′ ˆ X ) − 1 ˆ X ′ ˆ VCV ( b 2 SLS ) = ( ˆ X ′ Ω ˆ X ( ˆ X ) − 1 Where: y is the Dependent Variable X is the Matrix of Regressors (exogenous and endogenous) Z is the Matrix of Instruments (excluded and included) X = Z ( Z ′ Z ) − 1 ( Z ′ X ) is the fitted values from the First Stage Regression ˆ Ω is the VCV of errors Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  13. Estimating the VCV of the 2SLS estimator Proposed Estimator for ˆ X ′ Ω ˆ X is: n T n T X ′ ( S . × ( uu ′ )) ˆ ˆ � � � � X = ˆ x it u it u js ˆ x js s itjs i =1 t =1 j =1 s =1 Where: u ≡ y − ˆ X ˆ β 2 SLS are the estimated residuals • Each itjs -th component of s is a correlation weight [0,1] • The correlation weight can be arbitrarily set • The correlation weight should reflect the dependence of the error of observation it on the error of observation js Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  14. Asymptotics of the proposed estimator (work in progress) Equivalence with multi-way clustering • Any bilateral links structure can be represented by a multi-way clustering structure. VCV (ˆ ˆ β 2 SLS ) in a multi-way cluster environment can be represented as sum • of one-way cluster-robust matrices (Cameron et al. 2011) VCV (ˆ ˆ • The sandwich estimator of the β 2 SLS ) in a one-way cluster environ- ment is consistent as G → ∞ (White 1984; Arellano 1987; Rogers 1993; Hansen 2007) Dimensionality with arbitrary clustering (work in progress) Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  15. Command

  16. acreg - Syntax: baseline Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  17. acreg - Syntax: Spatial 1 Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  18. acreg - Syntax: Spatial 2 Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  19. acreg - Syntax: Network 1 Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  20. acreg - Syntax: Network 2 Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  21. acreg - Syntax: Multiway clustering Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  22. acreg - Additional Options • Panel Dimension and optional HAC standard errors • Allows for sampling weights ( pweights ) • Allows for ‘if’ and ‘in’ statements • Allows for partialling out up to 2 high-order fixed effects • Produces output similar to Stata’s native commands • Allows for storing distance matrix and weights matrix • Stores main results in e() Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  23. acreg - Output: Spatial Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  24. acreg - Output: Network Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  25. Simulations

  26. Simulations In each Monte Carlo draw: 1. Generate random variables Y and X 1 , and random shocks ε Y and ε X 1 for each observation Go 2. Distribute the random shocks to ”linked observations” Go • Spatial Environment: kernel around Counties in U.S. Illustration • Network Environment: coauthors in economics (RePEc) 3. Introduce the correlation in the model by adding the common shocks to Y and X 1 Go 4. Regression of Y on X 1 and a constant. Go Test: as the number of Monte Carlo draws approaches infinity, the null hypothesis that ˆ β = 0, in a test with α = 0 . 05, will be rejected 5% of the times only if spatial correlation is accounted for. Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  27. Results

  28. Spatial setting: Null-rejection rates Data generating process: Bartlett kernel Unit: U.S. towns U.S. counties Sample size: N=101 N=1001 N=3141 (1) (2) (3) Spatial correlation Correction Endogeneity Estimator Null-rejection rate Panel A: Cross section, t = 1 OLS 5.9% 5.0% 5.0% � 2SLS 5.6% 5.1% 5.2% � OLS 37.8% 50.2% 28.2% 2SLS 33.4% 48.3% 26.5% � � � � OLS 16.8% 7.2% 5.6% 2SLS 16.7% 8.4% 5.5% � � � Panel B: Panel, t = 5 OLS 5.8% 5.1% 5.3% � 2SLS 5.3% 5.0% 4.6% OLS 39.1% 46.1% 17.9% � � � 2SLS 37.3% 44.3% 15.5% OLS 19.4% 11.2% 10.1% � � � � � 2SLS 19.0% 11.1% 9.6% Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  29. Spatial setting: Null-rejection rates by sample size, cross section, t=1 .6 .6 .45 .45 Null−rejection rate Null−rejection rate .3 .3 .15 .15 .05 .05 0 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Number of cities per state Number of cities per state Not corrected Corrected Not corrected Corrected (a) OLS (b) 2SLS Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  30. Spatial setting: Null-rejection rates by sample size, panel, t=5 .6 .6 .45 .45 Null−rejection rate Null−rejection rate .3 .3 .15 .15 .05 .05 0 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 10 12 14 16 18 20 Number of cities per state Number of cities per state Not corrected Corrected Not corrected Corrected (c) OLS (d) 2SLS Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  31. Network setting: Null-rejection rates Data generating process: First-degree friends Unit: Top of the distribution Random sample Sample size: N=1000 N=2500 N=1000 N=2500 (1) (2) (3) (4) Network correlation Correction Endogeneity Estimator Null-rejection rate OLS 5.1% 4.7% 4.7% 5.1% � 2SLS 5.3% 4.9% 5.4% 4.7% OLS 64.9% 59.0% 26.9% 36.2% � 2SLS 63.0% 58.2% 25.4% 35.4% � � � � OLS 13.2% 9.2% 7.5% 8.1% � � � 2SLS 13.4% 9.7% 7.2% 8.4% Colella, Lalive, Sakalli, and Thoenig Inference with Arbitrary Clustering

  32. Conclusions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend