Omitted variable bias of Lasso-based inference methods: A finite sample analysis∗
Kaspar W¨ uthrich† Ying Zhu‡ October 21, 2019
Abstract This paper shows in simulations, empirical applications, and theory that Lasso-based inference methods such as post double Lasso and debiased Lasso can exhibit substantial finite sample omitted variable biases in problems with sparse regression coefficients due to Lasso not selecting relevant control vari-
- ables. This phenomenon can be systematic and occur even when the sample
size is large and larger than the number of control variables. On the other hand, we also establish a “robustness” type of result showing that the omitted vari- able bias remains bounded with high probability even if the prediction errors
- f the Lasso are unbounded. In empirically relevant settings, our simulations
show that OLS with modern standard errors that accommodate many controls can be a viable alternative to Lasso-based inference methods. Keywords: Lasso, post double Lasso, debiased Lasso, OLS, omitted variable bias, limited variability, finite sample analysis
∗Alphabetical ordering. Both authors contributed equally to this work. We would like to thank
St´ ephane Bonhomme, Graham Elliott, Michael Jansson, Ulrich M¨ uller, Andres Santos, and Jeffrey Wooldridge for their comments. We are especially grateful to Yixiao Sun for providing extensive feed- back on an earlier draft. This paper was previously circulated as “Behavior of Lasso and Lasso-based inference under limited variability” and “Omitted variable bias of Lasso-based inference methods under limited variability: A finite sample analysis”. Ying Zhu acknowledges financial support from a start-up fund from the Department of Economics at UCSD and the Department of Statistics and the Department of Computer Science at Purdue University, West Lafayette.
†Department of Economics, University of California, San Diego. Email: kwuthrich@ucsd.edu ‡Department of Economics, University of California, San Diego. Email: yiz012@ucsd.edu.