Econometrics 1: IV, GMM and MLE James A. Duffy 1 Oxford, Michaelmas - PDF document

Econometrics 1: IV, GMM and MLE James A. Duffy 1 Oxford, Michaelmas 2016 (revised: 28/12/16) 1 I thank N. Geesing, L. Freund, K. Kuske, and E. Munro for comments. The manuscript was prepared with L YX 2.2.2.

Contents 1 Instrumental variables 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 Rank and order conditions . . . . . . . . . . . . . . . . . . . . . 1 1.2.2 A restatement of the rank condition . . . . . . . . . . . . . . . . 3 1.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 From identification to estimation . . . . . . . . . . . . . . . . . 3 1.3.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 The ‘exclusion restriction’ . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 Another way of computing the 2SLS estimator . . . . . . . . . . . . . . 10 1.6 Testing exogeneity of the endogenous regressors (Hausman test) . . . . . 11 1.7 Testing the identifying conditions . . . . . . . . . . . . . . . . . . . . . . 12 1.7.1 Tests of overidentifying restrictions (Sargan test) . . . . . . . . . 13 1.7.2 Testing the rank condition . . . . . . . . . . . . . . . . . . . . . 15 1.8 Weak instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.8.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.8.2 Dealing with weak instruments . . . . . . . . . . . . . . . . . . . 19 1.8.3 The Anderson–Rubin (AR) test . . . . . . . . . . . . . . . . . . 20 1.A Suggested (optional) further reading . . . . . . . . . . . . . . . . . . . . 21 2 Generalised method of moments 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.1 Motivating examples . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.2 A general framework . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.2 Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.3 Local identification and weak identification . . . . . . . . . . . . 31 2.3 Asymptotic efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.1 The choice of weight matrix . . . . . . . . . . . . . . . . . . . . 32 2.3.2 The implied (efficient) choice of moments . . . . . . . . . . . . . 33 2.3.3 Efficiency in the linear IV model . . . . . . . . . . . . . . . . . . 34 2.4 Tests of over-identifying restrictions . . . . . . . . . . . . . . . . . . . . 35 2.5 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.5.1 Tests of nonlinear restrictions and the delta method . . . . . . . 38 2.5.2 GMM criterion-based tests (QLR tests) . . . . . . . . . . . . . . 41 i

2.A Suggested (optional) further reading . . . . . . . . . . . . . . . . . . . . 43 3 Maximum likelihood 45 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1.1 Parametric and semiparametric estimation . . . . . . . . . . . . . 45 3.1.2 The likelihood function: the general case . . . . . . . . . . . . . 46 3.1.3 The likelihood function: with i.i.d. data . . . . . . . . . . . . . . 47 3.2 Univariate examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.1 Continuous random variables . . . . . . . . . . . . . . . . . . . . 48 3.2.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . 50 3.2.3 Mixed continuous/discrete random variables . . . . . . . . . . . . 52 3.3 Models with covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4 Consistency and identification . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4.2 Identification via Kullback–Leibler minimisation . . . . . . . . . . 60 3.5 Asymptotic distribution of the MLE . . . . . . . . . . . . . . . . . . . . 62 3.5.1 Asymptotic normality . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5.2 Efficiency properties . . . . . . . . . . . . . . . . . . . . . . . . 64 3.6 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.A Suggested (optional) further reading . . . . . . . . . . . . . . . . . . . . 67 4 References 69 A Mathematical appendix 71 A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.3 Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A.3.1 Modes of stochastic convergence . . . . . . . . . . . . . . . . . 73 A.3.2 Key results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.4 Suggested (optional) further reading . . . . . . . . . . . . . . . . . . . . 75 ii

ECONOMETRICS 1, MT 2016 20/04/17 J. A. DUFFY 1 Instrumental variables • Throughout these notes, all variables are i.i.d. unless otherwise stated. 1.1 Introduction • We would like to estimate the parameters β 0 ∈ R d x in y i = x T i β 0 + u i (1.1) (here, as throughout, a ‘ 0 ’ subscript denotes the true value of a parameter). – For various reasons – due to omitted variables, measurement error, unobservable heterogeneity and the like – the usual identifying orthogonality condition E x i u i = 0 may not be considered plausible. – Those elements of x i for which this condition fails are said to be endogenous . • Instead, identification will be achieved by means of the instruments z i , which are assumed to fulfil this same orthogonality condition, IV-ORTH E z i u i = 0 . (Note that x i and z i may share some common elements, but they cannot overlap entirely, for obvious reasons) 1.2 Identification 1.2.1 Rank and order conditions • A parameter is identified if it is uniquely determined from the joint distribution of the data, w i = ( y i , x i , z i ) . (At least in i.i.d. settings, in which the joint distribution can always be consistently estimated, this is equivalent to asking whether the parameter can be consistently estimated.) • Identification of β 0 will follow if the equation 0 = E ( y i − x T i β ) z i = E z i y i − E z i x T i β (1.2) has a unique solution at β = β 0 . The r.h.s. depends only on β and the distribution of w i , through the moments E y i z i and E z i x T i . 1

• To show that β 0 indeed solves (1.2), rewrite as 0 = (1) E ( x T i β 0 + u i − x T i β ) z i = (2) E z i x T i ( β 0 − β ) , (1.3) where = (1) follows by (1.1), and = (2) by IV-ORTH . • Is β = β 0 the only solution to (1.3)? Since E z i x T i is a d z × d x matrix, the equation [ E z i x T i ] δ = 0 admits a solution at some δ � = 0 if and only if rk E z i x T i < d x (see Appendix A.2). In this case, there will be other β ’s, distinct from β 0 , for which (1.3) holds. • A necessary and sufficient condition for identification is thus IV-RANK rk E z i x T i = d x , termed the rank condition (or somewhat more informally, the relevance condition ). – A necessary, but not sufficient condition for the rank condition is that d z ≥ d x , termed the order condition . In other words, there must be at least as many instruments as there are regressors. – The model is said to be exactly identified when d z = d x – i.e. when we have just enough instruments to identify β 0 – and overidentified when d z > d x ; in the latter case, we can test for some violations of IV-ORTH . • In the overidentified case, we have strictly more instruments than are needed to identify β 0 ; and in consequence, the number of such instruments may be reduced, down to d x , without prejudicing identification. • More formally, for some d x × d z matrix L , consider the d x new ‘instruments’ z L,i := Lz i formed by taking d x linear combinations of the original instruments. – z L,i clearly satisfy the required orthogonality condition, since by IV-ORTH 0 = L E z i u i = E z L,i u i . – Similarly, premultiplying (1.2) by L yields the identifying condition 0 = E z L,i y i − E z L,i x T ⇒ 0 = E z L,i x T i β ⇐ i ( β 0 − β ) where the equivalence follows via exactly the same reasoning as which led from (1.2) to (1.3) above. 2

Econometrics 1: IV, GMM and MLE James A. Duffy 1 Oxford, Michaelmas - PDF document

Econometrics 1: IV, GMM and MLE James A. Duffy 1 Oxford, Michaelmas 2016 (revised: 28/12/16) 1 I thank N. Geesing, L. Freund, K. Kuske, and E. Munro for comments. The manuscript was prepared with L YX 2.2.2. Contents 1 Instrumental variables

Making Life Easier Online service for people within North Lanarkshire MLE History MLE website

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Excel2013: Model Logistic MLE 1Y1X Sept 2015 V1A V1A V1A Excel2013 Model Logistic MLE 1Y1X

Logistic Regression: MLE vs. OLS1 in Excel2013 29 Aug 2016 V0B V0B V0B Schield MLE vs.

Laying a Solid Foundation for Learning: Lessons from the Kom MLE Project in Cameroon Paul

MLE vs. MAP Aarti Singh Machine Learning 10-701/15-781 Sept 15, 2010 1 MLE vs. MAP Maximum

IV and IV-GMM Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2014

IV and IV-GMM Christopher F Baum EC 823: Applied Econometrics Boston College, Spring 2013

IV and IV-GMM Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016

BS2247 Introduction to Econometrics Lecture 1: Basic Mathematical Review Dr. Kai Sun Aston

Single-Equation GMM Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu

MLE, MAP, AND NAIVE BAYES 10-601 RECITATION MARY MCGLOHON MLE The usual representation we come

2015 Schield Logistic MLE1A Excel2013 10/29/2015 V0D V0D V0D 2015 Schield Logistic MLE 1A

2015 Schield Logistic MLE1C Excel2013 8/18/2016 V0D V0D V0D 2015 Schield Logistic MLE 1C

MLE/MAP + Nave Bayes MLE / MAP Readings: Nave Bayes Readings: Matt Gormley

Clustering: K-Means, GMM, EM March 11, 2016 Boris Ivanovic* csc411ta@cs.toronto.edu *Based on

Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2)

Chapel: Status/Community Brad Chamberlain Cray Inc. CSEP 524 May 20, 2010 Outline Chapel

Natural Language Processing Info 159/259 Lecture 15: Review (Oct 11, 2018) David Bamman, UC

CS 525M Mobile and Ubiquitous Computing Seminar Ioanna Symeou Satellite-Based Internet: A

Complex Langevin Simulations and Zeroes of the Measure. I.-O. Stamatescu (Heidelberg) Results in

Singularity vs. the Hard Way Part 1 Jeff Chase Today

In Introduction to Programming with Scientific Applications Course evaluation 2018 Den

Compiling with Time and Space Constraints Jens Palsberg Purdue University Department of Computer