PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed - PowerPoint PPT Presentation

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Effects Paulo Guimarães 2020 Portuguese Stata Conference PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Introduction Poisson regression is the standard approach to model count data alternative for multiplicative models where the dependent variable is nonnegative only assumption required for consistency is the correct specification of the conditional mean of the dependent variable Poisson regression vs Poisson pseudo maximum likelihood (PPML) regression PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Advantages of PPML dependent variable with nonnegative values no need to specify a distribution for the dependent variable natural way to deal with zero values on the dependent variable Unlike log linear OLS, it is robust to heteroskedasticidity PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Why is OLS sometimes preferred? sometimes researchers resort to log-linear regressions in contexts where PPML would be better justified one reason is ability to estimate linear regressions with multiple fixed effects Stata users are familiar with the user-written package reghdfe reghdfe (Sergio Correia) is the state-of-the-art tool for estimation of linear regression models with HDFE But PPML with HDFE can be implemented with (almost) the same ease as linear regression with HDFE PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Generalized Linear Models GLMs are a class of regression models based on the exponential family of distributions (Nelder,1972) GLMs include popular nonlinear regression models such as logit, probit, cloglog, and Poisson the exponential family is given by � y θ − b ( θ ) � f y ( y ; θ, φ ) = exp + c ( y , φ ) , a ( φ ) where a(.), b(.), and c(.), are specific functions and φ and θ are parameters PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Generalized Linear Models (cont.) for these models E ( y ) = µ = b ′ ( θ ) and V ( y ) = b ′′ ( θ ) a ( φ ) . given a set of n independent observations, each indexed by i , the expected value can be related to a set of covariates ( x i ) by means of a link function g(.). More specifically it is assumed that E ( y i ) = µ i = g − 1 ( x i β ) , and the likelihood for the GLM may be written as � y i θ i − b ( θ i ) � � n L ( θ, φ ; y 1 , y 2 , ..., y n ) = + c ( y i , φ ) i =1 exp a ( φ ) PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Estimation application of the Gauss-Newton algorithm with the expected Hessian leads to the following updating equation: � � − 1 X ′ W ( r − 1) z ( r − 1) , β ( r ) = X ′ W ( r − 1) X where X is the design matrix of explanatory variables, W ( r − 1) is a weighting matrix, z ( r − 1) is a transformation of the dependent variable, and r is an index for iteration obtained by recursive application of weighted least squares this approach is known as Iteratively Reweighted Least Squares (IRLS) PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

The Poisson regression model for Poisson regression we have E ( y i ) = µ i = exp( x i β ) and the regression weights to implement IRLS simplify to � � W ( r − 1) = diag exp( x i β ( r − 1) ) while the dependent variable for the intermediary regression becomes � � y − exp( x i β ( r − 1) ) z ( r − 1) + x i β ( r − 1) = i exp( x i β ( r − 1) ) PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Dealing with HDFE X may contain a large number of fixed effects that render the direct calculation of ( X ′ W ( r − 1) X ) impractical, if not impossible the solution is to use an alternative updating formula that estimates only the coefficients of the non-fixed effect covariates (say, δ ) we can rely on the FWL theorem to expurgate the fixed effects and use the following updating equation: � � − 1 � δ ( r ) = X ′ W ( r − 1) � � X ′ W ( r − 1) � z ( r − 1) , X where � X and � z are weighted within-transformed versions of the main covariate matrix X and working dependent variable z , respectively PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Existence of MLE MLE for Poisson regression may not exist and algorithms may be unable to converge or converge to incorrect estimates problem identified by Santos Silva and Tenreyro (2010) Correia Guimaraes and Zylkin (2018) discuss the necessary and sufficient conditions for the existence of estimates in a wide class of GLM models CGZ show that for the case of Poisson regression it is always possible to find MLE estimates if some observations are dropped from the sample these observations are called separated observations because they do not convey relevant information for the estimation process and can be safely discarded CGZ propose a method to identify separated observations that will succeed even in the presence of HDFEs PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

ppmlhdfe ppmlhdfe - Poisson pseudo-likelihood regression with multiple levels of fixed effects authored by Sergio Correia, Paulo Guimaraes and Tom Zylkin requires the installation of the latest versions of ftools and reghdfe PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

ppmlhdfe (cont) same flexibility as reghdfe allowing for multiple fixed effects and interactions allows weights, multi-way clustered standard errors, and count model specific options such as exposure and irr takes great care to verify the existence of maximum-likelihood estimates PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Accelerating HDFE-IRLS ppmlhdfe directly embeds the Mata routines of reghdfe we within-transform (or partial out ) the original untransformed variables z and X in the first IRLS iteration only and progressively update these variables the criterion for the inner loops of reghdfe becomes tighter as we approach convergence in practice, these innovations can reduce the total number of calls to reghdfe by 50% or more PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

Final Notes dedicated github website: ppmlhdfe (forthcoming) article in Stata Journal describing command usage the approach could be easily extended to any other model from the GLM family PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimarães

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed - PowerPoint PPT Presentation

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Effects Paulo Guimares 2020 Portuguese Stata Conference PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimares Introduction Poisson regression is the

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

Optimization of the Poisson Operator Optimization of the Poisson Operator in Chombo in Chombo

Poisson homology, D-modules on Poisson varieties, and complex singularities Pavel Etingof (MIT)

Punt: T , T , T are mutually independent . How about T , the number of

The Poisson Arrival Process CS 70, Summer 2019 Bonus Lecture, 8/14/19 1 / 22 Poisson

Poisson Point Processes Will Perkins April 23, 2013 The Poisson Process Say you run a website

Family Achievements?: How Wealth Trumps Education Among White and Black College Graduates

Why Threshold Models: Need to Go Beyond . . . The Above Idea Works . . . A Theoretical

WEYERHAEUSER Earnings Release 2nd Quarter 2011 07/29/2011 1 FORWARD-LOOKING STATEMENT

WELCOME TO OUR WORLD OF HOSPITALITY MILLENNIUM & COPTHORNE HOTELS PLC M Social Auckland

Weighted Linear Bandits for Non-Stationary Environments Yoan Russac 1 , Claire Vernade 2 and

Least Weighted Absolute Value Estimator with an Application to Investment Data Petra Vidnerov

HMM-based acoustic model adaptation and discriminative training Steven Wegmann ICSI 11 April

Time series Decomposing a series into meaningful components R.W. Oldford Time series data -

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed - PowerPoint PPT Presentation

PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Effects Paulo Guimares 2020 Portuguese Stata Conference PPMLHDFE: Fast Poisson Estimation with High Dimensional Fixed Paulo Guimares Introduction Poisson regression is the

Poisson Distribution: Review Poisson Over Time Let B 1 Poisson( ) be the number of bikes

Randomness in Computing L ECTURE 14 Last time Poisson distribution Poisson approximation

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

15. Poisson Processes In Lecture 4, we introduced Poisson arrivals as the limiting behavior of

Simulating events: the Poisson process Michel Bierlaire michel.bierlaire@epfl.ch Transport and

Poisson Regression Models for Count Data Outline Review Introduction to Poisson

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image

Workshop 10.6a: Poisson regression Murray Logan 12 Sep 2016 Section 1 Poisson regression

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Poisson Approximation for Two Scan Statistics with Rates of Convergence Xiao Fang (Joint work

Optimization of the Poisson Operator Optimization of the Poisson Operator in Chombo in Chombo

Poisson homology, D-modules on Poisson varieties, and complex singularities Pavel Etingof (MIT)

Punt: T , T , T are mutually independent . How about T , the number of

The Poisson Arrival Process CS 70, Summer 2019 Bonus Lecture, 8/14/19 1 / 22 Poisson

Poisson Point Processes Will Perkins April 23, 2013 The Poisson Process Say you run a website

Family Achievements?: How Wealth Trumps Education Among White and Black College Graduates

Why Threshold Models: Need to Go Beyond . . . The Above Idea Works . . . A Theoretical

WEYERHAEUSER Earnings Release 2nd Quarter 2011 07/29/2011 1 FORWARD-LOOKING STATEMENT

WELCOME TO OUR WORLD OF HOSPITALITY MILLENNIUM &amp; COPTHORNE HOTELS PLC M Social Auckland

Weighted Linear Bandits for Non-Stationary Environments Yoan Russac 1 , Claire Vernade 2 and

Least Weighted Absolute Value Estimator with an Application to Investment Data Petra Vidnerov

HMM-based acoustic model adaptation and discriminative training Steven Wegmann ICSI 11 April

Time series Decomposing a series into meaningful components R.W. Oldford Time series data -

WELCOME TO OUR WORLD OF HOSPITALITY MILLENNIUM & COPTHORNE HOTELS PLC M Social Auckland