Ability Bias, Errors in Variables and Sibling Methods James J. - - PowerPoint PPT Presentation

ability bias errors in variables and sibling methods
SMART_READER_LITE
LIVE PREVIEW

Ability Bias, Errors in Variables and Sibling Methods James J. - - PowerPoint PPT Presentation

Ability Bias, Errors in Variables and Sibling Methods James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006 1 1 Ability Bias Consider the model: log = 0 + 1 + where =


slide-1
SLIDE 1

Ability Bias, Errors in Variables and Sibling Methods

James J. Heckman University of Chicago Econ 312 This draft, May 26, 2006

1

slide-2
SLIDE 2

1 Ability Bias

Consider the model: log = 0 + 1 + where = income, = schooling, and 0 and 1 are pa- rameters of interest. What we have omitted from the above specification is unobserved ability, which is captured in the residual term . We thus re-write the above as: log = 0 + 1 + + where is ability, ( 0) ( 0), and we believe that ( ) 6= 0. Thus, ( | ) 6= 0, so that OLS on our

  • riginal specification gives biased and inconsistent estimates.

2

slide-3
SLIDE 3

1.1 Strategies for Estimation

  • 1. Use proxies for ability: Find proxies for ability and in-

clude them as regressors. Examples may include: height, weight, etc. The problem with this approach is that prox- ies may measure ability with error and thus introduce additional bias (see Section 1.3). 3

slide-4
SLIDE 4
  • 2. Fixed Eect Method: Find a paired comparison. Exam-

ples may include a genetic twin or sibling with similar or identical ability. Consider two individuals and 0: log log 0 = (0 + 1 + ) (0 + 10 + 0) = 1( 0) + ( 0) + ( 0) Note: if = 0, then OLS performed on our fixed eect 4

slide-5
SLIDE 5

estimator is unbiased and consistent. If 6= 0, then we just get a dierent bias (see Section 1.2). Further, if is measured with error, we may exacerbate the bias in our fixed eect estimator (see Section 1.3).

1.2 OLS vs. Fixed Eect (FE)

In the OLS case with ability bias, we have: plim (

1

) = 1 + ( ) () (See derivation of Equation (2.2) for more background on the above derivation). 5

slide-6
SLIDE 6

We also impose: () = (

0)

( ) = (0 0) (0 ) = ( 0) With these assumptions, our fixed eect estimator is given by: plim

1

= 1 + ( 0 ( 0) + ( 0)) ( 0) = 1 + ( ) (0 ) () ( 0) . Note that if (0 ) = 0 and ability is positively correlated with schooling, then the fixed eect estimator is upward biased. 6

slide-7
SLIDE 7

From the preceding, we see that the fixed eect estimator has more asymptotic bias if: ( ) (0 ) () (

0) ( )

() ( ) () (0 ) (

0).

7

slide-8
SLIDE 8

1.3 Measurement Error

Say = + where is observed schooling. Our model now becomes: log = 0 + 1 + = 0 + 1 + ( + 1) and the fixed eect estimator gives: log log 0 = (0 + 1 + ) (0 + 10 + 0) = 1( ) + ( 0) + 1(0 ) Now we wish to examine which estimator (OLS or fixed eect), has more asymptotic bias given our measurement error prob-

  • lem. For the remaining arguments of this section, we assume:

( | ) = (0 | ) = ( | 0) = 0 so that the OLS estimator gives: 8

slide-9
SLIDE 9

plim

1

= 1 + ( + 1) () = 1 + ( ) 1 () () + () . The fixed eect estimator gives: plim

1

= 1 +

  • ³

( 0) + 1(0 ) ´ ( 0) = 1 + (( 0) ( 0)) 1 (0 ) ( 0) + (0 ) = 1 + ( ) ( 0) 1 () () + () (

0 )

. 9

slide-10
SLIDE 10

Under what conditions will the fixed eect bias be greater? From the above, we know that this will be true if and only if: ( ) ( 0) 1 () () + () (0 ) ( ) 1 () () + () ( 0) ( () + ()) (1 () ( )) (0 ) ( ) 1 () () + () ( 0) (

0 ).

If this inequality holds, taking dierences can actually worsen the fit over OLS alone. Intuitively, we see that we have dier- enced out the true component, , and compounded our mea- surement error problem with the fixed eect estimator. 10

slide-11
SLIDE 11

In the special case = 0, the condition is 1 () () + () (0 ) ( ) 1 () () + ()

  • 11
slide-12
SLIDE 12

2 Errors in Variables

2.1 The Model

Suppose that the equation for earnings is given by: = 11 + 22 + where ( | 1 2) = 0 0. Also define:

  • 1 = 1 + 1

and

  • 2 = 2 + 2

12

slide-13
SLIDE 13

Here,

1 and 2 are observed and measure 1 and 2 with

  • error. We also impose that

. So, our initial model can be equivalently re-written as: =

11 + 22 + ( 11 22).

Finally, by assumed independence of and , we write: = + . 13

slide-14
SLIDE 14

2.2 McCallum’s Problem

Question: Is it better for estimation of 1 to include other vari- ables measured with error? Suppose that 1 is not measured with error, in the sense that 1 = 0 while 2 is measured with

  • error. In 2.2.1 and 2.2.2 below, we consider both excluding

and including 2 and investigate the asymptotic properties

  • f both cases.

2.2.1 Excluded 2 The equation for earnings with omitted 2 is: = 11 + ( + 22) 14

slide-15
SLIDE 15

Therefore, by arguments similar to those in the appendix, we know: plim ˜ 1 = 1 + 12 11 2. (2.1) Here, 12 is the covariance between the regressors, and 11 is the variance of 1 Before moving on to a more general model for the inclusion of 2 let us first consider the classical case for including both variables. Suppose =

11

  • 22

¸ = 11 22 ¸ . We know that: plim ˆ = £ ()1 () ¤

  • (2.2)

15

slide-16
SLIDE 16

where the coecient and regressor vectors have been stacked appropriately (see Appendix for derivation). Note that rep- resents the variance-covariance matrix of the measurement er- rors, and is the variance-covariance matrix of the regressors. Straightforward computations thus give: plim ˆ

  • =

" 11 +

11

22 +

22

¸1

11

  • 22

¸# 1 2 ¸ =

  • 11

11 +

11

22 22 +

22

  • 1

2 ¸

  • 16
slide-17
SLIDE 17

2.2.2 Included 2 In McCallum’s problem we suppose that

12 = 0 Further, as

1 is not measured with error,

11 = 0 Substituting this into

equation 2.2 yields: plim ˆ = 11 12 12 22 +

22

¸1 0

  • 22

¸

  • With a little algebra, the above gives:

plim ˆ 1 = 1 + 2 μ12 11 ¶

  • 22

22 +

22 2 12

11

  • =

1 + 2 μ12 11 ¶ μ

  • 22

22 (1 2

12) + 22

¶ 17

slide-18
SLIDE 18

where 2

12 is simply the correlation coecient,

2

12

1122 Further, we know that: 0 2

12 1

so including 2 results in less asymptotic bias (inconsistency). (We get this result by comparing the above with the bias from excluding 2 in section 2.2.1, the result captured in equation (2.1)). So, we have justified the kitchen sink approach. This result generalizes to the multiple regressor case - 1 badly mea- sured variable with good ones (Econometrica, 1972). 18

slide-19
SLIDE 19

2.3 General Case

In the most general case, we have: plim ˆ = ()1 = 11 +

11

12 +

12

12 +

12

22 +

22

¸1

11

  • 12
  • 12
  • 22

¸ 1 2 ¸ . With a little algebra we find: det() = 1122+11

22+ 1122+ 11 222 12212 122 12

19

slide-20
SLIDE 20

Therefore: plim ˆ

  • =

1 det()

  • 22 +

22

(12 +

12)

(12 +

12)

11 +

11

¸ ×

11

  • 12
  • 12
  • 22

¸ 1 2 ¸ Supposing

12 = 0 we get:

det(˜ ) = det() |

12 = 0

= 1122 + 11

22 + 1122 + 11 22 2 12

20

slide-21
SLIDE 21

and thus: plim ˆ =

  • (22+

22) 11

det(˜ ) 12

22

det(˜ )

  • 1112

det(˜ )

(11+

11) 22

det(˜ )

  • 1

2 ¸ Note that if 212 0 OLS may not be downward biased for

  • 1. If 2 = 0 we get:

plim ˆ 2 = 112

11

det(˜ ) so, if 2 were a race variable and blacks get lower quality schooling, (where schooling is measured by 1 ) then 12 0 and hence ˆ 2 0 This would be a finding in support of labor market discrimination. 21

slide-22
SLIDE 22

2.4 The Kitchen Sink Revisited

McCallum’s analysis suggests that one should toss in a variable measured with error if there is no measurement error in 1 But suppose that there is measurement error in 1 Is it still better to include the additional variable measured with error as a regressor? We proceed by imposing 2 = 0. (i) Excluded X2. The equation for earnings with measure- ment error in 1 and excluded 2 is:

  • =

(

1 + 1) 1 + ( + 22)

=

  • 11 + ( + 22 + 11)

22

slide-23
SLIDE 23

Therefore: plim ˜ 1 = 1 1 μ

  • 11

11 +

11

¶ = 1 μ 11 11 +

11

¶ (2.3) = 1

  • 1

1 +

11

11

  • (ii) Included X2. From our analysis in the General Case

(Section 2.3), we know that: plim ˆ 1 = 1 μ(22 +

22) 11 2 12

det(˜ ) ¶ . (2.4) 23

slide-24
SLIDE 24

If

22 = 0 so that 2 is not measured with error:

plim ˆ 1 = 1 μ 1122 2

12

1122 2

12 + 1122

¶ (2.5) = 1 Ã 1 2

12

1 2

12 +

11

11

! . Comparing eqn (2.4) and eqn (2.5), we see that adding the variable measured without error always exacerbates the bias. 24

slide-25
SLIDE 25

For, the bias in the excluded case will be smaller if: 1

  • 1

1 +

11

11

  • 1
  • 1 2

12

1 2

12 + 11

11

  • μ

1 2

12 + 11

11 ¶

  • μ

1 +

11

11 ¶ ¡ 1 2

12

¢

  • 0 2

12

  • 11

11

  • which is always the case, provided 2

12 0 (Note that the

coecients on 1 for both the excluded and included case are less than one. So, the larger coecient is the one with less bias, as stated above.) 25

slide-26
SLIDE 26

Now suppose that

22 0 so that both variables are measured

with error. Then: plim ˆ 1 = 1 μ(22 +

22) 11 2 12

det(˜ ) ¶ = 1

  • 1 +

22

22 2

12

1 +

11

11 +

11

11

  • 22

22 +

22

22 2

12

  • Intuitively, adding measurement error in 2 can only worsen

the bias, and thus exclusion should again be preferred to in-

  • clusion. Formally, including 2 gives more bias if and only

26

slide-27
SLIDE 27

if: 1

  • 1 +

22

22 2

12

1 +

11

11 +

11

11

  • 22

22 +

22

22 2

12

  • 1
  • 1

1 +

11

11

  • μ

1 +

11

11 ¶ μ 1 +

22

22 2

12

  • μ

1 +

11

11 +

11

11

  • 22

22 +

22

22 2

12

  • 2

12

  • 11

11 0. 27

slide-28
SLIDE 28

Thus, provided 2

12 0 including 2 results in more bias

than excluding it. If 2

12 = 0 the bias from including 2 is

  • bviously seen to be:

1

  • 1 +

22

22 1 +

11

11 +

11

11

  • 22

22 +

22

22

  • = 1
  • 1 +

22

22 μ 1 +

22

22 ¶ μ 1 +

11

11 ¶

  • = 1
  • 1

1 +

11

11

  • so that including and excluding 2 yields the same result.

28

slide-29
SLIDE 29

Finally, from the General Case section, we have: plim ˆ 1 = 1 (22 +

22) 11 2 12 + 2 (12 22)

1122 2

12 + 11 22 + 1122 + 11 22

. L’Hôpital’s rule on the above shows that:

  • 11
  • lim

³ plim ˆ 1 ´ = 0 and lim

  • 22

³ plim ˆ 1 ´ = 111 + 212 11 +

11

= 111 11 +

11

+ 212 11 +

11

. 29

slide-30
SLIDE 30

Appendix

Derivation of Equation (2.2) We can write = + ( 11 22) where: = £

1

  • 2

¤ and = 1 2 ¸

  • and

1 2 are × 1.

30

slide-31
SLIDE 31

So: ˆ

  • =

³ 0´1 (0) = + ³ 0´1 ³ 0( 11 22) ´ = + á 0¢

  • !1

× μμ0

  • μ011
  • μ022
  • ¶¶
  • +

³

  • ³

0´´1 × ³

  • ³

´ ³ 01 ´ 1 ³ 02 ´ 2 ´ 31

slide-32
SLIDE 32

= ¡

1 1

¢

  • ¡

1 2

¢

  • ¡

2 1

¢

  • ¡

2 2

¢ ¸1 × μ

  • 1 1

2 1

¸ 1 +

1 2 2 2

¸ 2 ¶ = ¡

1 1

¢

  • ¡

1 2

¢

  • ¡

2 1

¢

  • ¡

2 2

¢ ¸1 × ¡

1 1

¢

  • ¡

1 2

¢

  • ¡

2 1

¢

  • ¡

2 2

¢ ¸ 1 2 ¸ 32

slide-33
SLIDE 33

= μ ()1 ¡¡

  • 1 +

1

¢ 1 ¢

  • ¡¡
  • 1 +

1

¢ 2 ¢

  • ¡¡
  • 2 +

2

¢ 1 ¢

  • ¡¡
  • 2 +

2

¢ 2 ¢ ¸¶ × 1 2 ¸ = ¡ ()1 () ¢

  • where the second-to-last step follows from the independence
  • f and This type of argument is also used to derive the

probability limit of the ’s in section 1. 33

slide-34
SLIDE 34

3 Sibling Models: Components of Vari- ance Scheme

Suppose that data on two brothers, say and is at our disposal Without loss of generality, we will consider how to estimate parameters of interest for person in what follows. We will begin by introducing a general model and then focus on the two-person case mentioned above. Consider the following triangular system: 1 = 1 2 = 121 + 2 3 = 131 + 232 + 3 34

slide-35
SLIDE 35

Here, indexes the person in the group. We assume that and 0 are uncorrelated (i.e., uncorrelated across groups). Further, we suppose:

  • =

+

  • =

+ , for = 1 2 3. We assume is uncorrelated across equations and across within the group, is i.i.d. across groups, and is i.i.d. within groups and uncorrelated with . 35

slide-36
SLIDE 36

3.1 Estimation

We specialize the above model into a two person framework and propose a similar three equation system. Let 1 = early (pre- school) test score, 2 = schooling (years), and 3 = earnings. It seems plausible to write the equation system 1 = + 1 2 = 2 + 2 3 = 232 + 3 + 3 where = ability. Regressing 3 on 2 clearly gives biased estimates of 23 as ( | 2) 6= 0 If 3 0 then OLS estimates

  • f 23 are upward biased. One estimation approach is to use

1 as a proxy for ability: 3 = 232 + 3(1 1) + 3 36

slide-37
SLIDE 37

However, this results in a similar problem – regressing 3 on 1 and 2 will give biased estimates as 1 is correlated with our

  • residual. (i.e., 1 is an imperfect proxy).

Solutions: One solution is to use 1 as an instrument for 1 Why is this a valid IV ? From our construction of the model, we know that the are uncorrelated across equations and groups. Further, test scores are correlated across siblings. That is, (1 1) 6= 0 by our group structure. Another solution is possible if there exists an additional early reading on the same person: 0 = 0 + 0 Then if 0 6= 0 0 is a valid proxy for 1 and we can perform 2SLS. 37

slide-38
SLIDE 38

3.2 Griliches and Chamberlain model

Here we have a modified triangular system as follows: 1 = 1 + 1 2 = 121 + 2 + 2 3 = 131 + 232 + 3 + 3 where 1 = years schooling, 2 = late test score (SAT), and 3 = earnings. Note that there are alternative models with

  • ther dependent variables. For example, {1 = schooling, 2 =

early earnings, and 3 = late earnings}, and {1 = schooling, 2 = consumption, and 3 = earnings}. Getting the equation system into reduced form and expressing as matrix notation, we write = + , 38

slide-39
SLIDE 39

where: =

  • 1

2 3

  • =
  • 1

2 + 121 3 + 131 + 23(2 + 121)

  • and:

=

  • 1

2 3

  • =
  • 1

2 + 121 3 + 131 + 23(2 + 121)

  • Estimation. For estimation, we impose that 23 = 0 In our

second example of section 3.2, this would be equivalent to sta- ting that there is no correlation between transient income and consumption (permanent income hypothesis). In general, with

  • ne factor, we need one more exclusion than that implied by

triangularity. 39

slide-40
SLIDE 40

(i) 1 proxies . = 1 1 1 so that 2 = 2 1 1 2 1 1 + 2. We can then estimate 2 1 consistently by using 1 as an instrument for 1 in the equation above. (ii) Get residuals from (i): = 2 2 1 1. (iii). Use the residuals as an instrument for 1 in the 3 equation. is valid since it is both uncorre- lated with and 3 and it is correlated with 1: 40

slide-41
SLIDE 41

(1 ) = μ 1 2 2 1 1 ¶ = μ 1 + 1 2 + 121 2 + 121 1 1 ¶ = μ 1 + 1 2 2 1 1 ¶ 6= 0 if 1 6= 0 and, 2 6= 0 Thus we can estimate 13. (iv). Interchange the role of 2 and 3 to estimate 12. (v). Form the residual (and recall that 13 is known and 23 = 0) = 3 131 = 3 + 3. 41

slide-42
SLIDE 42

(vi) Use 1 as a proxy for ability. Substituting this into gives: = 3 1 1 + 3 1 3 3 1 1. (vii) Now use 1 as an instrument for 1 in the above to get an estimate of 3 1 . (viii) Interchange the role of 2 and 3 to estimate 2 1 . 42

slide-43
SLIDE 43

3.3 Triangular systems more generally

Without loss of generality, suppose that 2 is excluded from the equation of our system. (We are supposing the existence

  • f an extra exclusion than that implied by triangularity). We

seek to estimate the parameters of the system in equation as well as equations before and after Equation t.

  • i. Use 1 as a proxy for ability. Solving for and

substituting into the equation: = + 43

slide-44
SLIDE 44

We get: = 1 1 1 1 + and we are considering = 2 1 The ratio 1 can then be identified using 1 as an instrument for 1

  • ii. Form the residuals:

= 1 1 = 2 1 Now we have 2 IV’s ( 2 3 1) for the 2 independent variables in the equation ( 1 3 1) so we can consistently estimate the coecients in the equation. 44

slide-45
SLIDE 45

Equations before t.

  • iii. Form:
  • = 11 · · · 11

We can use 1 · · · 1

to form 1 purged

IV ’s and

is used as a proxy for unobserved abil-

ity, . In this way, we can estimate all of the pa- rameters in equations (Note the sequential

  • rder implicit in this triangular system. We must

first estimate before this step can be made.)

  • Example. Suppose 3 and

3 = 131 + 232 + 3 + 3. 45

slide-46
SLIDE 46

Use

= + as a proxy for . Substituting this into our

3 equation yields: 3 = 131 + 232 + 3

  • +

μ 3 3

. Observe that 1 and 2, are independent of our residual, but

  • is not. We can use

as an instrument for to estimate the

parameters above. This obviously generalizes for all equations less than . 46

slide-47
SLIDE 47

Equations after t.

  • iv. Assume identification for all equations through

via an exclusion restriction in equation .

  • Example. As an example, consider the following:

4 = 141 + 242 + 343 + 4 + 4 Define:

  • 2 2 121

3 3 131 232

Solving for 1 and 2 and substituting into the equation for 4 we find: 47

slide-48
SLIDE 48

4 = 141 + 242 + 34 (

3 + 131 + 232) + 4 + 4

= (14 + 3413) 1 + (24 + 3423) 2 + 34

3 + 4 + 4

= (14 + 3413) 1 + (24 + 3423) (

2 + 121)

+34

3 + 4 + 4

=

  • 141 +

24 2 + 34 3 + 4 + 4

where:

  • 14

= 14 +

2412 + 3413

  • 24

= 24 + 2334 Using 1as a proxy for and substituting we get: 4 = 11 +

24 2 + 34 3 +

μ 4 4 1 1 ¶ 48

slide-49
SLIDE 49

where 1 =

14 + 4

1 We can then use 1

2 and 3 as

instruments to get an estimate of 34 Define: ˜ 4 = 4 343 = 141 + 242 + 4 + 4 (Excluding 3 allows us to estimate the remaining parameters). Using

3 as a proxy for yields:

4 = 141 + 242 + 4 3

  • 3 +

μ 4 4 3 3 ¶ . We can then estimate 14 and 24 by using 1 2 and

3 as

an IV. We can continue estimating. For example, consider the 5 equation: (i) Rewrite in terms of 1

2 3 and 4

49

slide-50
SLIDE 50

(ii) Use 1 to proxy . (iii) Use a cross-member IV for 1 in addition to

  • = 2 3 4 which gives our estimate of 45

(iv) Now form ˜ 5 = 5 454 (v) With 4 excluded, we can use purged IV ’s on ˜ 5 as before. 50

slide-51
SLIDE 51

3.4 Comments

  • 1. One needs to check the rank order conditions

for identification (requires imposing an exclusion restriction).

  • 2. Griliches and Chamberlain (IER, 1976) find a

small ability bias - 3 decimal point dierence in schooling coecient. 51

slide-52
SLIDE 52

4 Twin Methods

Basic Principle: Monozygotic or MZ (identical) twins are more similar than Dizygotic or DZ (fraternal) twins. The key assumption is that if environmental factors are the same for both types of twins, then we can estimate genetic components to outcomes. 52

slide-53
SLIDE 53

4.1 Univariate Twin Model

Let = observed phenotypic variable, = unobserved geno- type, and = environment. Further, suppose that we can write our model additively: = + and assume independence of and so that 2

= 2 + 2 .

Now suppose that we have data on another individual: = + Then our phenotypic covariance is: ( ) = ( ) + ( ) 53

slide-54
SLIDE 54

where we are imposing the assumption: ( ) = ( ) = 0 Defining standardized forms and some simplifying notation, let ˜

  • ˜
  • ˜
  • 2 2
  • 2
  • 2 2
  • 2
  • Thus, ˜

= ˜ + ˜ which implies ˜ = ˜ + ˜ We can also derive the identity: 2 + 2 = 2

  • 2
  • + 2
  • 2
  • = 1

where the last step follows from our assumption of indepen-

  • dence. Now we wish to consider the correlation between ob-

54

slide-55
SLIDE 55

served phenotypes of our two individuals:

  • =

( ) = (˜ + ˜ ˜ + ˜ ) = 2(˜ ˜ ) (˜ ) + 2(˜ ˜ ) (˜ ) = 2 + 2 say, with and defined as above. We assume that = 1 and that 1 That is, the genotypic variable is perfectly correlated among identical twins, but less than perfectly cor- related among fraternal twins. Replacing this result into the above produces:

  • =

2 + 2

  • =

2 + 2 55

slide-56
SLIDE 56

Therefore: = (1 )2 + ( )2 = (1 )2 + ( )(1 2) where the last equality follows from our established identity. Solving for 2, we find: 2 = ( ) ( ) (1 ) ( )

  • The only known in the right hand side of the above equality is

the expression ( ), which is simply the correlation coecient of the observed phenotypic variable. The remaining two expressions, (1 ) and ( ) can not be com- puted as they represent statistics on variables we don’t observe. 56

slide-57
SLIDE 57

One could impose = so that: 2 = 1

  • The expression is a measure of how closely the genetic

variable is correlated across our two observations. One could then guess or estimate a value for this parameter to derive corresponding estimates of 2 the ratio of how much variance in the phenotypic variable is explained by variance in the ge- netic component. Other studies have attempted to include ( ) 6= 0 but this presents an identification problem. A typical value of the estimable portion of the above, , is commonly reported in the literature to be 02. 57