Microeconometrics Module A: Non-continuous outcomes I Alexander - - PowerPoint PPT Presentation

microeconometrics
SMART_READER_LITE
LIVE PREVIEW

Microeconometrics Module A: Non-continuous outcomes I Alexander - - PowerPoint PPT Presentation

Microeconometrics Module A: Non-continuous outcomes I Alexander Ahammer Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria This version: March 22,


slide-1
SLIDE 1

Microeconometrics

Module A: Non-continuous outcomes I Alexander Ahammer

Department of Economics, Johannes Kepler University, Linz, Austria Christian Doppler Laboratory Ageing, Health, and the Labor Market, Linz, Austria

This version: March 22, 2019

Alexander Ahammer (JKU) Non-continuous outcomes I 1 / 42

slide-2
SLIDE 2

Non-continuous outcomes

Often outcome variables are limited, for example, they have only a finite and small number of possible realizations In this case, it doesn’t make sense to treat them as roughly continuous

− → we have to look for alternatives to OLS

How to model

◮ binary choices ◮ multiple choices

Maximum likelihood

Alexander Ahammer (JKU) Non-continuous outcomes I 2 / 42

slide-3
SLIDE 3

A.1

Binary choices

Alexander Ahammer (JKU) Non-continuous outcomes I 3 / 42

slide-4
SLIDE 4

Binary choices

We have a sample of individuals i = {1, 2, . . . , N} For each i we observe a binary variable

Y =

  • 1

with probability P(Y = 1) = P with probability P(Y = 0) = 1 − P (1)

X is a row vector of k potential factors that explain which outcome prevails.

For individual i we observe the vector Xi We are interested in the estimated effects of the factors X on the probability

  • f observing Y = 1,

γ = ∂P ∂X′

(2) where γ is a vector of k marginal effects Note that

E(Y ) = 1 · P + 0 · (1 − P) = P

(3)

Alexander Ahammer (JKU) Non-continuous outcomes I 4 / 42

slide-5
SLIDE 5

The linear probability model (LPM)

The LPM assumes that

P = F(X, β) = Xβ

(4) where β is a column vector of k parameters and X ∈ Rn×(k+1) includes a constant Because of linearity, and using eq (3),

Y = E(Y ) + [Y − E(Y )] = P + [Y − E(Y )] = Xβ + ε

(5) with

ε =

  • 1 − Xβ

with probability P

−Xβ

with probability 1 − P (6)

Alexander Ahammer (JKU) Non-continuous outcomes I 5 / 42

slide-6
SLIDE 6

The linear probability model (LPM)

The marginal effect of X on P is

γ = ∂P ∂X′ = β

(7) which we can estimate using OLS. How are partial effects βj for some xj interpreted? If xj is non-binary, βj is the change in the probability of success given a

  • ne-unit increase in xj.

If xj is binary, βj is the difference in the probability of success given when xj switches from zero to one.

Alexander Ahammer (JKU) Non-continuous outcomes I 6 / 42

slide-7
SLIDE 7

The linear probability model (LPM)

Alexander Ahammer (JKU) Non-continuous outcomes I 7 / 42

slide-8
SLIDE 8

The linear probability model (LPM)

The LPM has several shortcomings: Predictions are not bounded between [0, 1]

= ⇒ may yield non-sense predictions

Heteroskedasticity is present by construction

= ⇒ easy fix: robust standard errors

LPM implicitly assumes that the partial effects of X on P are constant, regardless of the initial levels of the X Errors are not normally distributed But also many advantages: Easy to compute and interpret. Estimates are easily comparable with linear estimates of continuous

  • utcomes.

Widely accepted in the applied econometrics literature.

Alexander Ahammer (JKU) Non-continuous outcomes I 8 / 42

slide-9
SLIDE 9

Nonlinear probability models

General class of nonlinear binary choice models Instead of assuming that P is linear in parameters, we can consider a class of binary response models of the form

P = P(Y = 1) = F(Xβ)

(8) where F() is a symmetric cumulative distribution taking on values strictly between 0 and 1. That is, 0 < F(z) < 1, for all z ∈ R. Let’s introduce an unobservable index function,

Y ∗ = Xβ + ε

(9) with

Y =

  • 1

if Y ∗ ≥ 0 if Y ∗ < 0 (10) (note that the choice of the threshold is irrelevant)

Alexander Ahammer (JKU) Non-continuous outcomes I 9 / 42

slide-10
SLIDE 10

Nonlinear probability models

General class of nonlinear binary choice models

Y ∗ = Xβ + ε, Y =

  • 1

if Y ∗ ≥ 0 if Y ∗ < 0 (11) How can we express the probabilities of the two outcomes in such a model?

P(Y = 1) = P(Y ∗ > 0) = P(Xβ + ε > 0) = P(ε > −Xβ) = P(ε < Xβ)

(12) where eq (12) is the cdf of ε evaluated at Xβ

Alexander Ahammer (JKU) Non-continuous outcomes I 10 / 42

slide-11
SLIDE 11

Nonlinear probability models

General class of nonlinear binary choice models Under these assumptions, the marginal effect of X on P is

γ = ∂P ∂X′ = F ′β = fβ

(13) where f is the density function of F Note that

F ′ and f are scalar functions of Xβ

Unlike the LPM, in this model β is not sufficient to estimate a marginal effect,

γ has to be evaluated at some realization of X

Alexander Ahammer (JKU) Non-continuous outcomes I 11 / 42

slide-12
SLIDE 12

Maximum likelihood

The maximum likelihood estimator (MLE)

θML = θ of a parameter θ is given

by maximizing the likelihood

  • θ = arg max

θ

L(θ)

(14) where L(θ) = f(X, θ) is the density function of θ. Typically it is more convenient to maximize the log-likelihood

l(θ) = log L(θ). Because of monotonicity of the log,

  • θ = arg max

θ

L(θ) = arg max

θ

l(θ)

(15) For an iid sample i = 1, . . . , n with probability density f(X, θ) of X,

L(θ) =

n

  • i=1

f(X, θ)

and

l(θ) =

n

  • i=1

log f(X, θ)

(16)

Alexander Ahammer (JKU) Non-continuous outcomes I 12 / 42

slide-13
SLIDE 13

Maximum likelihood

Example Let X ∼ Binomial(n, π) with realizations x. Then,

L(π) = n x

  • πx(1 − π)n−x = πx(1 − π)n−x

(17) because the multiplicative constant can be ignored. The log-likelihood is

l(π) = log L(π) = x log π + (n − x) log(1 − π)

(18) and the maximum likelihood estimator is

l′(π) = x π − n − x 1 − π = 0

(19)

= ⇒ πML = x n

(20)

Alexander Ahammer (JKU) Non-continuous outcomes I 13 / 42

slide-14
SLIDE 14

Maximum likelihood

Properties In plain English, MLE selects coefficients θ as to maximize the joint likelihood

  • f the sample data, i.e., ‘maximize the likelihood that the process described by

the model produced the data that we actually observe’ MLE does not have an analytic solution as OLS does — it is an extremum estimator, statistical software uses iterative numerical procedures to find a vector of coefficients

θ that solve the maximization problem in eq (15)

In finite samples MLE may perform poor, but as n → ∞ MLE is both consistent and efficient Please refer to the other resources for more details, especially on the derivation of test statistics

Alexander Ahammer (JKU) Non-continuous outcomes I 14 / 42

slide-15
SLIDE 15

Nonlinear probability models

Let’s return to binary choice models. We assumed P = F(Xβ) and γ = fβ The ML function is

L = P(Y1 = y1, Y2 = y2, . . . , Yn = yn) =

  • yi=0

[1 − F(Xiβ)]

  • yi=1

F(Xiβ) =

n

  • i=1

[1 − F(Xiβ)]1−yiF(Xiβ)yi

(21) Taking logs,

l =

n

  • i=1

[(1 − yi) ln(1 − F(Xiβ)) + yi ln(F(Xiβ))]

(22)

Alexander Ahammer (JKU) Non-continuous outcomes I 15 / 42

slide-16
SLIDE 16

Nonlinear probability models

The foc for maximization is

∂l ∂β′ =

n

  • i=1

yif(Xiβ) F(Xiβ) + −(1 − yi)f(Xiβ) 1 − F(Xiβ)

  • Xi = 0

(23) The solution of the system in eq (23) gives the vector of ML estimates

β

The asymptotic covariance matrix V of the β is the inverse of the Hessian

V = −H−1 = ∂2 ln L ∂β∂β′ −1

(24) which is a k × k matrix

Alexander Ahammer (JKU) Non-continuous outcomes I 16 / 42

slide-17
SLIDE 17

Nonlinear probability models

Coefficients and interpretation Recall that

γ = ∂P ∂X′ = F ′β = fβ

(25) and that f is a function of Xβ. Thus, to estimate the probability of the outcome and the marginal effects we need an estimate of β and some realization of X. Where should we evaluate the estimates of P and γ? Compute

P and γ

1

for each i and then take averages over all observations

2

for the sample mean of the observations Xi

3

for a particularly interesting value of X (e.g., median)

4

for an artificialy created individual with values of X defined by us where solutions 1 and 2 are asymptotically equivalent but may differ in small samples.

Alexander Ahammer (JKU) Non-continuous outcomes I 17 / 42

slide-18
SLIDE 18

Nonlinear probability models

Now we discuss the two most famous non-linear probability models: Probit Logit

Alexander Ahammer (JKU) Non-continuous outcomes I 18 / 42

slide-19
SLIDE 19

Probit

Probit simply assumes that F is the standard normal,

P(Y = 1) = F(Xβ) = Xβ

−∞

φ(t)dt = Φ(Xβ)

(26) Log-likelihood:

l =

n

  • i=1

[(1 − yi) ln(1 − Φ(Xiβ)) + yi ln(Φ(Xiβ))]

(27) Marginal effect:

γ = ∂P ∂X′ = φ(Xβ)β

(28)

Alexander Ahammer (JKU) Non-continuous outcomes I 19 / 42

slide-20
SLIDE 20

Logit

Logit assumes that F is logistic,

P(Y = 1) = F(Xβ) = eXβ 1 + eXβ = Λ(Xβ)

(29) Note that

F ′(Xβ) = f(Xβ) = Λ(Xβ)[1 − λ(Xβ)]

(30) Log-likelihood:

l =

n

  • i=1

[(1 − yi) ln(1 − Λ(Xiβ)) + yi ln(Λ(Xiβ))]

(31) Marginal effect:

γ = ∂P ∂X′ = [Λ(1 − Λ)]β

(32)

Alexander Ahammer (JKU) Non-continuous outcomes I 20 / 42

slide-21
SLIDE 21

In practice

Suppose we are interested in the determinants of diabetes:

P(diabetesi = 1 | xi) = G(β0 + β1agei + β2femalei + β3blacki + β4bmii)

(33)

. svy: reg diabetes age female black bmi (running regress on estimation sample) Survey: Linear regression Number of strata = 31 Number of obs = 10349 Number of PSUs = 62 Population size = 117131111 Design df = 31 F( 4, 28) = 110.06 Prob > F = 0.0000 R-squared = 0.0329 Linearized diabetes Coef.

  • Std. Err.

t P>|t| [95% Conf. Interval] age .001738 .0000991 17.54 0.000 .001536 .0019401 female .0094066 .004069 2.31 0.028 .0011078 .0177053 black .0275177 .0062667 4.39 0.000 .0147366 .0402987 bmi .0025612 .0005366 4.77 0.000 .0014668 .0036556 _cons

  • .1114554

.0139773

  • 7.97

0.000

  • .1399624
  • .0829484

Alexander Ahammer (JKU) Non-continuous outcomes I 21 / 42

slide-22
SLIDE 22

In practice

The linear probability model (LPM)

5 10 15 Density

  • .05

.05 .1 .15 .2 Fitted values

Alexander Ahammer (JKU) Non-continuous outcomes I 22 / 42

slide-23
SLIDE 23

In practice

Probit estimation

. svy: probit diabetes age female black bmi (running probit on estimation sample) Survey: Probit regression Number of strata = 31 Number of obs = 10349 Number of PSUs = 62 Population size = 117131111 Design df = 31 F( 4, 28) = 72.72 Prob > F = 0.0000 Linearized diabetes Coef.

  • Std. Err.

t P>|t| [95% Conf. Interval] age .0256275 .001845 13.89 0.000 .0218647 .0293903 female .1199636 .0610994 1.96 0.059

  • .0046494

.2445766 black .3275666 .0687353 4.77 0.000 .18738 .4677532 bmi .0298292 .0048273 6.18 0.000 .0199839 .0396745 _cons

  • 3.944569

.1702002

  • 23.18

0.000

  • 4.291695
  • 3.597444

Alexander Ahammer (JKU) Non-continuous outcomes I 23 / 42

slide-24
SLIDE 24

In practice

Interpretation of estimates Marginal effects at the mean

. margins, dydx(*) atmeans Conditional marginal effects Number of obs = 10349 Model VCE : Linearized Expression : Pr(diabetes), predict() dy/dx w.r.t. : age female black bmi at : age = 42.25401 (mean) female = .5205417 (mean) black = .0955274 (mean) bmi = 25.27614 (mean) Delta-method dy/dx

  • Std. Err.

z P>|z| [95% Conf. Interval] age .0013453 .0000659 20.43 0.000 .0012162 .0014744 female .0062975 .0033057 1.91 0.057

  • .0001816

.0127766 black .0171956 .0039631 4.34 0.000 .0094282 .0249631 bmi .0015659 .0002741 5.71 0.000 .0010286 .0021032

Alexander Ahammer (JKU) Non-continuous outcomes I 24 / 42

slide-25
SLIDE 25

In practice

Graphical illustration of marginal effects at the mean

.05 .1 .15 .2 .25 Pr(Diabetes) 20 25 30 35 40 45 50 55 60 65 70 75 age in years

Marginal effects for age

.05 .1 .15 .2 .25 Pr(Diabetes) 10 20 30 40 50 60 Body Mass Index (BMI)

Marginal effects for bmi

Alexander Ahammer (JKU) Non-continuous outcomes I 25 / 42

slide-26
SLIDE 26

In practice

Interpretation of estimates Average marginal effects

. margins, dydx(*) Average marginal effects Number of obs = 10349 Model VCE : Linearized Expression : Pr(diabetes), predict() dy/dx w.r.t. : age female black bmi Delta-method dy/dx

  • Std. Err.

z P>|z| [95% Conf. Interval] age .0017589 .0001102 15.96 0.000 .0015429 .0019749 female .0082337 .0042298 1.95 0.052

  • .0000566

.0165239 black .0224824 .0049447 4.55 0.000 .0127909 .0321739 bmi .0020473 .0003647 5.61 0.000 .0013326 .002762

Alexander Ahammer (JKU) Non-continuous outcomes I 26 / 42

slide-27
SLIDE 27

Logit

The Logit is convenient for interpretation, as it allows to present results in terms of the effects of X on the odds of the outcome Y = 1

Ω(Y = 1|X) = P 1 − P = Λ 1 − Λ = eXβ

(34) For two realizations of X, say X1 and X0, the odds ratio is

Ω(Y = 1|X1) Ω(Y = 1|X0) = e(X1−X0)β

(35) which tells us how the odds of observing Y = 1 change when X changes from X0 to X1 For a variable j, eβj gives the odds of observing Y = 1 when Xj changes by 1 unit (if eβj > 1, j increases the odds of observing Y = 1 and vice versa)

Alexander Ahammer (JKU) Non-continuous outcomes I 27 / 42

slide-28
SLIDE 28

LPM vs. Probit vs. Logit

The estimated coefficients will clearly differ, but the marginal effects should be fairly similar in general Logistic distribution has fatter tails We should expect greater differences in case of very few or very large

  • bservations with Y = 1

Be careful with rare outcomes, maybe check other estimators! Choice most often based on practical considerations Check the Ai & Norton (2003) paper −

→ be careful with interaction terms in

nonlinear binary choice models

Alexander Ahammer (JKU) Non-continuous outcomes I 28 / 42

slide-29
SLIDE 29

A.2

Multiple choices

Alexander Ahammer (JKU) Non-continuous outcomes I 29 / 42

slide-30
SLIDE 30

Framework

Denote set of decision makers by i = {1, 2, . . . , N} Finite set of mutually exclusive and exhaustive choices:

j = {0, 1, 2, . . . , H}

Assume that Uij is the utility of decision maker i if the choice is j,

Uij = Xijβj + εij

(36) which is a function of

◮ a systematic component Xijβj ◮ a random unobservable component εij

Yi is the indicator function that denotes which option has been chosen by

the decision maker:

Yi = j

if i chooses j (37) Finally, decision makers maximize utility, therefore

Yi = j

if

Uij > Uis

for all s = j in the choice set (38)

Alexander Ahammer (JKU) Non-continuous outcomes I 30 / 42

slide-31
SLIDE 31

Logit

Pij = P(Yi = j)

(39)

= P(Uij > Uis, ∀s = j) = P(Xijβj + εij > Xisβs + εis, ∀s = j) = P(εis − εij < Xijβj − Xisβs, ∀s = j)

(40) The probability that j is chosen is given by the logit distribution:

Pij = eXijβj H

s=0 eXisβs

(41) Note that

◮ 0 ≤ Pij ≤ 1 ◮ H

j=0 Pij = 1

Scales up logit model to more than 1 alternatives

Alexander Ahammer (JKU) Non-continuous outcomes I 31 / 42

slide-32
SLIDE 32

Logit

Independence from Irrelevant Alternatives Property (IIA) The logit probabilities exhibit the IIA property −

→ the odds of two

alternatives j and s do not depend on the other existing alternatives:

Pij Pis = eXijβj eXisβs

(42) depends only on i and j This is often undesirable

◮ Let j = “car” and j = “red bike”, suppose Pij/Pis = 1 ◮ Add new option t = “blue bike” ◮ If decision maker is indifferent wrt to color, we expect Pij = 0.5 and

Pis = Pit = 0.25

◮ However, the logit model still implies Pij/Pis = 1 ◮ In order to be compatible with Pis = Pit, the estimated probabilities must be

Pij = Pis = Pit = 1

3 , which is unsatisfactory!

Alexander Ahammer (JKU) Non-continuous outcomes I 32 / 42

slide-33
SLIDE 33

Logit

Identification problems Assume a (binary for simplicity) choice between Japanese (j = 0) and European (j = 1) cars, where Xij includes a vector Zij of variables that change across both individuals and choices a vector Wi of vars that change only across individuals a set of choice specific constants αj Then, the probability of the European choice would be

Pi1 = P(Yi = 1) = eα1+Zi1γ+Wiδ eα0+Zi0γ+Wiδ + eα1+Zi1γ+Wiδ = 1 1 + e(α0−α1)+(Zi0−Zi1)γ

(43)

Alexander Ahammer (JKU) Non-continuous outcomes I 33 / 42

slide-34
SLIDE 34

Logit

Identification problems

Pi1 = 1 1 + e(α0−α1)+(Zi0−Zi1)γ

(44) The vector of parameters is β′

j = {αj, γ, δ}, which differs between choices

because of the different constants per choice. The parameters γ and δ are assumed identical across choices Identification problems: The model cannot identify the effect of the decision maker’s attributed Wi The model cannot identify the choice-specific constants but only the difference between them α0 − α1 The model can identify the effects γ of the choice-specific attributes also if they are identical across choices.

Alexander Ahammer (JKU) Non-continuous outcomes I 34 / 42

slide-35
SLIDE 35

Multinomial Logit

In the multinomial logit, the utility of each choice depends only on the attributes of the decision maker:

Uij = Xiβj + εij

(45) For identification, the attributes are allowed to have different effects on the utility of the different choices The probability of a choice becomes

Pij = eXiβj H

s=0 eXiβs =

1 H

s=0 eXi(βs−βj)

(46) Important only differences between parameters can be identified. Thus, it is convenient to impose a normalization wrt a reference choice (e.g., j = 0)

Alexander Ahammer (JKU) Non-continuous outcomes I 35 / 42

slide-36
SLIDE 36

Multinomial Logit

In plain words, you have to choose a reference group. Be careful, as coefficients are always interpreted wrt to the reference group. Taking j = 0 means β0 = 0, which implies eXiβ0 = 1 and therefore

Pij = P(Yi = j) = eXiβj 1 + H

s=1 eXiβs

(47)

Pi0 = P(Yi = 0) = 1 1 + H

s=1 eXiβs

(48) Note that if H = 1, we are back at the standard binary choice logit model

Alexander Ahammer (JKU) Non-continuous outcomes I 36 / 42

slide-37
SLIDE 37

Multinomial Logit

The log-likelihood function of the multinomial logit is

l =

N

  • i=1

H

  • j=0

dij ln Pij

(49) where dij is 1 if i chooses j Interpretation: Note that

ln Pij Pi0 = Xiβj

and

ln Pij Pis = Xi(βj − βs)

(50)

◮ The coefficient βj measures the impact of Xi on the log-odds that the decision

maker chooses j instead of 0.

◮ The difference between the coefficients βj and βs measure the impact of Xi on

the log-odds that the decision maker chooses j instead of s.

Because of the IIA property, the odds concerning any couple of choices are independent from all the other choices.

Alexander Ahammer (JKU) Non-continuous outcomes I 37 / 42

slide-38
SLIDE 38

Multinomial Logit

Interpretation of marginal effects of individual attributes Xi on the probability of a choice Yi = j are even more difficult,

γj = ∂Pj γX = Pj(βj −

H

  • s=0

Psβs) = Pj(βj − ¯ β)

(51) − → The effect of Xi on Pj depends on the parameters concerning all choices, not

just the parameters concerning choice j

− → Also, X ∈ Pj (see eq 47), so you also have to pick reference values for the X As in the binary case, results can be expressed in the form of odds ratios, or exponentiated form. The odds of choice j instead of 0, given Xi, are

Ω(Yi = j; Yi = 0|X) = Pij Pi0 = eXiβj

(52) and the odds ratio between two realizations of Xi, e.g., X1 and X0 is

Ω(Yi = j; Yi = 0|X1) Ω(Yi = j; Yi = 0|X0 = e(X1−X0)βj

(53)

Alexander Ahammer (JKU) Non-continuous outcomes I 38 / 42

slide-39
SLIDE 39

Extensions

There are different extensions/variations of the multinomial logit model: Conditional logit model

◮ Utility of each choice depends on choice-specific attributes

Uij = Xijβ + εij

(54)

Nested logit model

◮ Individuals choose between larger groups (car vs. truck) before making more

refined choices (two-door vs. four-door)

Multinomial probit model

◮ Assumes multivariate normal link function instead of logistic ◮ Computationally cumbersome

and many more

Alexander Ahammer (JKU) Non-continuous outcomes I 39 / 42

slide-40
SLIDE 40

Ordered responses

Assume now that the choice set j = {0, 1, 2, . . . , h} can be ordered, so that

H ≻ H − 1 ≻ · · · ≻ 1 ≻ 0. Yi is still an indicator function that denotes the

  • ption chosen, Yi = j if i chooses j

Consider again a latent variable model

Y ∗ = Xβ + ε

(55) and let α1 < α2 < · · · < αh be a set of H unknown cut points Define

Y = 0

if Y ∗ ≤ α1

Y = 1

if α1 < Y ∗ ≤ α2

. . . Y = H

if Y ∗ > αH (56) Assumption: neither coefficients nor thresholds differ between individuals

Alexander Ahammer (JKU) Non-continuous outcomes I 40 / 42

slide-41
SLIDE 41

Ordered responses

The conditional probability for choice Yi = j can be written as

P(Yi = j|Xi) = P(εi ≤ αj − Xiβ) − P(ε ≤ αj−1 − Xiβ) = F(αj − Xiβ) − F(αj−1 − Xiβ)

(57) For the ordered probit model, we assume a standard normal distribution for

ε, which allows us to write P(y = j|X) = 1 − Φ(αj − Xβ)

(58) which can be estimated using MLE

◮ The marginal effects of X on choice Yi = j is

∂Pj ∂X = β[φ(αj−1 − Xβ) − φ(αj − Xβ)]

(59)

For the ordered logit model, replace Φ by the logit function in eq (58)

Alexander Ahammer (JKU) Non-continuous outcomes I 41 / 42

slide-42
SLIDE 42

References

Jeffrey M. Wooldridge (2010), Econometric Analysis of Cross Section and Panel Data,’ Second Edition, MIT Press. Andrea Ichino (2006), Micro-Econometrics: Limited Dependent Variables and Panel Data, Lecture Notes.

Alexander Ahammer (JKU) Non-continuous outcomes I 42 / 42