Classical Discrete Choice Theory James J. Heckman University of - - PowerPoint PPT Presentation

classical discrete choice theory
SMART_READER_LITE
LIVE PREVIEW

Classical Discrete Choice Theory James J. Heckman University of - - PowerPoint PPT Presentation

Classical Discrete Choice Theory James J. Heckman University of Chicago Econ 312, Spring 2019 Heckman Classical Discrete Choice Theory Classical regression model: y = x + 0 = E ( | x ) 0 , 2 I E N 1


slide-1
SLIDE 1

Classical Discrete Choice Theory

James J. Heckman University of Chicago Econ 312, Spring 2019

Heckman Classical Discrete Choice Theory

slide-2
SLIDE 2
  • Classical regression model:

y = xβ + ε 0 = E (ε|x) E ∼ N

  • 0, σ2I
  • Model untenable if y =

                1 1 : 1 : 1                

  • Unless we invoke the linear probability model, which is discussed below

and has some unusual properties.

Heckman Classical Discrete Choice Theory

slide-3
SLIDE 3
  • To model discrete choices, need to think of the ingredients that

give rise to choices.

  • For example, suppose we want to forecast demand for a new
  • good. We observe consumption data on old goods x1...xI.

(Each good could represent a transportation mode, for example, or an occupation choice.)

  • Assume people choose a good that yields highest utility. When

we have a new good, we need a way of putting it on a basis with the old.

  • Earliest literature on discrete choice was developed in

psychometrics where researchers were concerned with modeling choice behavior (Thurstone).

  • These are also models of counterfactual utilities.

Heckman Classical Discrete Choice Theory

slide-4
SLIDE 4

Two dominant modeling approaches

(i) Luce Model (1953) ⇐

⇒ McFadden Conditional logit model

(ii) Thurstone-Quandt Model (1929, 1930s). (Multivariate

probit/normal model)

Heckman Classical Discrete Choice Theory

slide-5
SLIDE 5

Other approaches

(i) GEV models 1 Luce-McFadden Model

  • widely used in economics
  • easy to compute
  • identifiability of parameters understood
  • very restrictive substitution possibilities among goods
  • restrictive heterogeneity
  • imposes arbitrary preference shocks

2 Quandt-Thurstone Model

  • very general substitution possibilities
  • allows for more general forms of heterogeneity
  • more difficult to compute
  • identifiability less easily established
  • does not necessarily rely on preference shocks

Heckman Classical Discrete Choice Theory

slide-6
SLIDE 6

Luce Model/McFadden Conditional Logit Model

  • References: Manski and McFadden, Chapter 5, Yellot paper
  • Notation:
  • X: universe of objects of choice
  • S: universe of attributes of persons
  • B: feasible choice set (x ∈ B ⊆ X)

Heckman Classical Discrete Choice Theory

slide-7
SLIDE 7

Luce Model/McFadden Conditional Logit Model

  • Behavior rule mapping attributes into choices: h

h (B, S) = x

  • We might assume that there is a distribution of choice rules.
  • h might be random because

(a) in observation we lose some information governing choices

(unobserved characteristics of choice and person)

(b) there can be random variation in choices due to unmeasured

psychological factors

  • Define P(x|S, B) = Pr {hǫH ∋ h (S, B) = x}
  • Probability that an individual drawn randomly from the

population with attributes S and alternative set B chooses x.

Heckman Classical Discrete Choice Theory

slide-8
SLIDE 8

Luce Axioms

  • Maintain some restrictions on P (x|S, B) and derive

implications for the functional form of P.

  • Axiom #1: “Independence of Irrelevant Alternatives”

x, y ∈ B s ∈ S P (x|s, {xy}) P (y|s, {xy}) = P (x|s, B) P (y|s, B) B = larger choice set

Heckman Classical Discrete Choice Theory

slide-9
SLIDE 9

Luce Axioms

  • Example: Suppose choice is career decision and individual is

choosing to be

  • an economist (E)
  • a fireman (F)
  • a policeman (P)

Pr (E|s, {EF}) Pr (F|s, {EF}) = Pr (E|s, {EFP}) Pr (F|s, {EFP}) ← − would think that introducing 3rd alternative might increase ratio

Heckman Classical Discrete Choice Theory

slide-10
SLIDE 10

Luce Axioms

  • Another example: Red bus-Blue bus
  • Choices:
  • take car C
  • red bus RB
  • blue bus BB
  • Axiom #2

Pr(y|s, B) > 0 ∀y ∈ B (i.e. eliminate 0 probability choices)

Heckman Classical Discrete Choice Theory

slide-11
SLIDE 11

Implications of above axioms

  • Define Pxy = P(x|s, {xy})
  • Assume Pxx = 1

2

P (y|s, B) = Pyx Pxy P (x|s, B) by IIA axiom

  • y∈B

P (y|s, B) = 1 = ⇒ P (x|s, B) = 1

  • y∈B

Pyx Pxy

Heckman Classical Discrete Choice Theory

slide-12
SLIDE 12

Implications of above axioms

  • Furthermore,

P (y|s, B) = Pyz Pzy P (z|s, B) P (x|s, B) = Pxz Pzx P (z|s, B) P (y|s, B) = Pyx Pxy P (x|s, B) Pyx Pxy = P (y|s, B) P (x|s, B) =

Pyz Pzy Pxz Pzx

Heckman Classical Discrete Choice Theory

slide-13
SLIDE 13

Implications of above axioms

  • Define

˜ v (s, x, z) = ln Pxz Pzx

  • ˜

v (s, y, z) = ln Pyz Pzy

  • =

⇒ Pyx Pxy = e ˜

v(s,y,z)

e ˜

v(s,x,z)

Heckman Classical Discrete Choice Theory

slide-14
SLIDE 14
  • Axiom #3: Separability Assumption

˜ v(s, x, z) = v(s, x) − v(s, z) ← − v(s, z) can be interpreted as a utility indicator of representative tastes

  • Then

P(x|s, B) = 1

  • y∈B

Pyx Pxy

= 1

  • y∈B

ev(s,y)−v(s,z) ev(s,x)−v(s,z)

P(x|s, B) = ev(s,x)

  • y∈B ev(s,y) ←

− Get logistic from from Luce Axioms

  • Now link model to familiar models in economics.
  • Marshak (1959) established link between Luce Model and random utility

models (Rum’s).

Heckman Classical Discrete Choice Theory

slide-15
SLIDE 15

Random Utility Models: Thurstone (1927, 1930s)

  • Assume utility from choosing alternative j is

uj = v (s, xj) + ε(s, xj)

  • v (s, xj)is a nonstochastic function and ε(s, xj) is stochastic,

reflecting idiosyncratic tastes.

Heckman Classical Discrete Choice Theory

slide-16
SLIDE 16

Random Utility Models: Thurstone (1927, 1930s)

  • Pr(j is maximal in set B) = Pr (u(s, xj) ≥ u(s, xl))

∀l = j = Pr (v(s, xj) + ε(s, xj) ≥ v(s, xl) + ε(s, xl)) ∀l = j = Pr (v(s, xj) − v(s, xl) ≥ ε(s, xl) − ε(s, xj)) ∀l = j

Heckman Classical Discrete Choice Theory

slide-17
SLIDE 17
  • Specify a cdf F(ε1, ..., εN)
  • Then

Pr(vj − vl ≥ εl − εj ∀l = j) = Pr(vj − vl + εj ≥ εl ∀l = j) = ∞

−∞

Fj( vj − v1 + εj, ..., vj − vj−1 + εj, εj, ..., vj − vJ + εj)dεj

Heckman Classical Discrete Choice Theory

slide-18
SLIDE 18
  • If ε is iid, then

F(ε1, ..., εn) =

n

  • i=1

Fi(εi)

Heckman Classical Discrete Choice Theory

slide-19
SLIDE 19
  • So Pr(vj − vl ≥ εl − εj ∀l = j)

−∞

   

n

  • i=1

i=j

Fi(vj − vi + εj)     fj(εj)dεj

Heckman Classical Discrete Choice Theory

slide-20
SLIDE 20

Binary Example (N = 2)

P(1 | s, B) = ∞

−∞

v1−v2+ε1

−∞

f1(ε1, ε2) dε1 dε2

  • If ε1, ε2 are normal then ε1 − ε2 is normal, so

Pr(v1 − v2 ≥ ε1 − ε2) is normal.

  • If ε1, ε2 are Weibull then ε1 − ε2 is logistic

ε ∼ Weibull = ⇒ Pr(ε < c) = e−e−c+α

Heckman Classical Discrete Choice Theory

slide-21
SLIDE 21
  • Also called double exponential” or “Type I extreme value”
  • ε Weibull

Pr(v1 + ε1 > v2 + ε2) = Pr(v1 − v2 > ε2 − ε1) = Ω(v1 − v2) = ev1−v2 1 + ev1−v2 = ev1 ev1 + ev2

Heckman Classical Discrete Choice Theory

slide-22
SLIDE 22
  • Result: Assuming that the errors follow a Weibull distribution

yields same logit model derived from the Luce Axioms.

  • This link was established by Marshak (1959)
  • Turns out that Weibull is sufficient but not necessary.
  • Some other distributions for ε generate a logit.
  • Yellot (1977) showed that if we require “invariance under

uniform expansions of the choice set” then only double exponential gives logit.

  • Example: Suppose choice set is {coffee, tea, mild}, then

“invariance” requires that probabilities stay the same if we double the choice set (i.e., 2 coffees, 2 teas, 2 milks).

Heckman Classical Discrete Choice Theory

slide-23
SLIDE 23

Some Important Properties of the Weibull

  • Developed 1928 (Fisher & Tippet showed it’s one of 3 possible

limiting distributions for the maximum of a sequence of random variables)

  • Closed under maximization (i.e. max of n Weibulls is a Weibull)

Pr(max

i

εi ≤ c) =

  • i

e−e−(c+α1) = e

i

e−ceαi

= e

−e−c

i

eαi

= e−e

−c+ln i eαi

Heckman Classical Discrete Choice Theory

slide-24
SLIDE 24
  • Difference between two Weibulls is a logit
  • Under Luce axioms (on R.U.M. with Weibull assumption)

Pr(j | s, B) = ev(s,xj) N

l=1 ev(s,xl)

  • Now reconsider the forecasting problem:
  • Let xj =set of characteristics associated with choice j
  • Usually, it is assumed that v(s, xj) = θ(s)′xj
  • Dependence of θon s reflects fact that individuals differ in their

evaluation of characteristics.

Heckman Classical Discrete Choice Theory

slide-25
SLIDE 25
  • Get

Pr(j | s, B) = eθ(s)′xj N

l=1 eθ(s)′xl

  • Solve by MLE

Heckman Classical Discrete Choice Theory

slide-26
SLIDE 26
  • Solve by MLE

max

θ(s) N

  • i=1
  • eθ(s)x1

N

l=1 eθ(s)xl

D1i ...

  • eθ(s)xN

N

l=1 eθ(s)xl

DNi

Heckman Classical Discrete Choice Theory

slide-27
SLIDE 27
  • If a new good has same characteristics, get probabilities by

B′ = {B, N + 1} P(N + 1 | B′, s) = eθ(s)′xN+1 N+1

l=1 eθ(s)′xl

Heckman Classical Discrete Choice Theory

slide-28
SLIDE 28

Debreu (1960) criticism of Luce Model

  • “Red Bus - Blue Bus Problem”
  • Suppose N + 1th alternative is identical to the first

Pr(choose 1 or N + 1 | s, B′) = 2eθ(s)′xN+1 N+1

l=1 eθ(s)′xl

  • =

⇒ Introduction of identical good changes probability of riding a bus.

  • not an attractive result
  • comes from need to make iid assumption on new alternative

Heckman Classical Discrete Choice Theory

slide-29
SLIDE 29

Debreu (1960) criticism of Luce Model: Some Alternative Assumptions

1 Could let vi = ln(θ(s)′xi)

Pr(j | s, B) = θ(s)′xj N+1

l=1 θ(s)′xl

If we also imposed N

l=1 θ(s)′xl = 1, we would get linear

probability model but this could violate IIA.

2 Could consider model of form

Pr(j | s, B) = eθj(s)xi N

l=1 eθl(s)xl

but here we have lost our forecasting ability (cannot predict demand for a new good).

3 Universal Logit Model

Pr(i | s, x1, ..., xN) = eϕi(x1,...,xN)β(s) N

l=1 eϕl(x1,...,xN)β(s)

Here we lose IIA and forecasting (Bernstein Polynomial

Heckman Classical Discrete Choice Theory

slide-30
SLIDE 30

Criteria for a good PCS

1 Goal: We want a probabilistic choice model that 1 has a flexible functional form 2 is computationally practical 3 allows for flexibility in representing substitution patterns

among choices

4 is consistent with a random utility model (RUM) =

⇒ has a structural interpretation

Heckman Classical Discrete Choice Theory

slide-31
SLIDE 31

How do you verify that a candidate PCS is consistent with a RUM?

1 Goal: (a) Either start with a R.U.M.

ui = v(s, xi) + ε(s, xi) and solve integral for Pr(ui > ul, ∀l = i) = Pr(i = arg max

l

  • vl + εl

)

  • r

(b) start with a candidate PCS and verify that it is consistent with

a R.U.M. (easier)

2 McFadden provides sufficient conditions 3 See discussion of Daley-Zachary-Williams theorem

Heckman Classical Discrete Choice Theory

slide-32
SLIDE 32

Link to Airum Models

Heckman Classical Discrete Choice Theory

slide-33
SLIDE 33

Daly-Zachary-Williams Theorem

  • Daly-Zachary (1976) and Williams (1977) provide a set of

conditions that makes it easy to derive a PCS from a RUM with a class of models (“generalized extreme value” (GEV) models)

  • Define G : G(Y1, . . . , YJ)
  • If G satisfies the following

1 nonnegative defined on Y1, . . . , YJ ≥ 0 2 homogeneous degree one in its arguments 3

lim

Yi→∞G(Y1, . . . , Yi, . . . , YJ) → ∞, ∀i = 1, . . . , J

  • ∂kG

∂Y1 · · · ∂Yk is nonnegative if k odd nonpositive if even (1)

Heckman Classical Discrete Choice Theory

slide-34
SLIDE 34
  • Then for a R.U.M. with ui = vi + εi and

F(ε1, . . . , εJ) = exp

  • −G
  • e−ε1, . . . , e−εJ
  • This cdf has Weibull marginals but allows for more dependence

among ε’s.

  • The PCS is given by

Pi = ∂ ln G ∂vi = eviGi (ev1, . . . , evJ) G (ev1, . . . , evJ)

  • Note: McFadden shows that under certain conditions on the

form of the indirect utility function (satisfies AIRUM form), the DZW result can be seen as a form of Roy’s identity.

Heckman Classical Discrete Choice Theory

slide-35
SLIDE 35
  • Let’s apply this result
  • Multinomial logit model (MNL)

cdf F(ε1, . . . , εJ) = e−e−ε1 · · · e−e−εJ ← − product of iid Weibulls = e− J

j=1 e−εj

  • Can verify that G (ev1, . . . , evJ) = J

j=1 evi satisfies DZW

conditions P(j) = ∂ ln G ∂vi = evj J

l=1 evl = MNL model

Heckman Classical Discrete Choice Theory

slide-36
SLIDE 36
  • Another GEV model
  • Nested logit model (addresses to a limited extent the IIA

criticism)

  • Let

G (ev1, . . . , evJ) =

M

  • m=1

am  

i∈Bm

e

vi 1−σm

 

1−σm

( տ like an elasticity

  • f substitution)

Heckman Classical Discrete Choice Theory

slide-37
SLIDE 37
  • Idea: divide goods into branches
  • First choose branch, then good within branch

car bus red blue

  • Will allow for correlation between errors (this is role of σ))
  • Bm

⊆ {1, . . . , J}

  • m=1

Bm = B is a single branch—need not have all choices on all branches

Heckman Classical Discrete Choice Theory

slide-38
SLIDE 38
  • Note: if σ = 0, get usual MNL form
  • Calculate equation for

pi = ∂ ln G ∂vi = ∂ ln

  • m

m=1 am

  • i∈Bm e

vi 1−σm

1−σm ∂vi =

  • m∋ i∈Bm am
  • e

vi 1−σm

i∈Bm e

vi 1−σm

−σm

i∈Bm e

vi 1−σm

−1

i∈Bm e

vi 1−σm

  • m

m=1 am

  • i∈Bm e

vi 1−σm

1−σm =

m

  • m=1

P(i | Bm)P(Bm)

Heckman Classical Discrete Choice Theory

slide-39
SLIDE 39
  • Where

P(i | Bm) = e

vi 1−σm

  • i∈Bm e

vi 1−σm

if i ∈ Bm, 0 otherwise P(Bm) = am

  • i∈Bm e

vi 1−σm

1−σm m

m=1 am

  • i∈Bm e

vi 1−σm

1−σm

  • Note: If P(Bm) = 1 get logit form
  • Nested logit requires that analyst make choices about nesting

structure

Heckman Classical Discrete Choice Theory

slide-40
SLIDE 40
  • How does nested logit solve red bus/blue bus problem?
  • Suppose

G = Y1 +

  • Y

1 1−σ

2

+ Y

1 1−σ

3

1−σ Yi = evi

Heckman Classical Discrete Choice Theory

slide-41
SLIDE 41

P (1 | {123}) = ∂ ln G ∂vi = ev1 ev1 +

  • e

v2 1−σ + e v3 1−σ

1−σ P (2 | {123}) = ∂ ln G ∂vi = e

v2 1−σ

  • e

v2 1−σ + e v3 1−σ

−σ ev1 +

  • e

v2 1−σ + e v3 1−σ

1−σ

Heckman Classical Discrete Choice Theory

slide-42
SLIDE 42
  • As v3 → −∞

P(1 | {123}) = ev1 ev1 + ev2 (get logistic)

  • As v1 → −∞

Heckman Classical Discrete Choice Theory

slide-43
SLIDE 43

What Role Does σ Play?

  • σ is the degree of substitutability parameter
  • Recall

F(ε1, ε2, ε3) = exp{−G(e−ε1, e−ε2, e−ε3)}

  • Here

σ = cov(ε2, ε3) √var ε2 var ε3 = correlation coefficient

  • Thus we require −1 ≤ σ ≤ 1, but turns out we also need to

require σ > 0 for DZW conditions to be satisfied. This is unfortunate because it does not allow ε’s to be negatively correlated.

  • Can show that

lim

σ→1 P (1 | {123}) =

ev1 ev1 + max(ev2, ev3) (L’Hˆ

  • pital’s Rule)

Heckman Classical Discrete Choice Theory

slide-44
SLIDE 44
  • If v2 = v3, then

P (2 | {123}) = e

v2 1−σ

  • 2e

v2 1−σ

−σ ev1 +

  • 2e

v2 1−σ

1−σ = 2−σ ev2 ev1 + (ev2) (21−σ) lim

σ→1

= 2−1 ev2 ev1 + ev2 when v1 = v2 ր

introduce 3rd identical alternative and cut the probability of choosing 2 in half

  • Solves red-bus/blue-bus problem
  • Probability cut in half with two identical alternatives

Heckman Classical Discrete Choice Theory

slide-45
SLIDE 45

car red bus blue bus

  • σ is a measure of similarity between red and blue bus.
  • When σ close to one, the conditional choice probability selects

with high probability the alternative.

Heckman Classical Discrete Choice Theory

slide-46
SLIDE 46
  • Remark: We can expand logit to accommodate multiple levels

ex. G =

Q

  • q=1

aq   

  • m∈Qq

am

  • i∈Bm

y

1 1−σm

i

1−σm   3 levels

Heckman Classical Discrete Choice Theory

slide-47
SLIDE 47
  • Example: Two Choices

1 Neighborhood (m) 2 Transportation mode (t) 3 P(m): choice of neighborhood 4 P(i | Bm): probability of choosing ith mode, given

neighborhood m

Heckman Classical Discrete Choice Theory

slide-48
SLIDE 48

1 Not all modes available in all neighborhoods

Pm,t = e

v(m,t) 1−σm

Tm

t=1 e

v(m,t) 1−σm

−σm m

j=1

Tj

t=1 e

v(m,t) 1−σm

1−σm Pt|m = e

v(m,t) 1−σm

Tm

t=1 e

v(m,t) 1−σm

Pm = Tm

t=1 e

v(m,t) 1−σm

1−σm m

j=1

Tj

t=1 e

v(m,t) 1−σm

1−σm = P(Bm)

Heckman Classical Discrete Choice Theory

slide-49
SLIDE 49
  • Standard type of utility function that people might use

v(m, t) = z′

tγ + x′ mtβ + y ′ mα

Heckman Classical Discrete Choice Theory

slide-50
SLIDE 50
  • z′

t is transportation mode characteristics, x′ mt is interactions and

y ′

m is neighborhood characteristics.

  • Then

Pt|m = e

(z′

t γ+x′ mt β) 1−σm

Tm

t=1 e (z′

t γ+x′ mt β) 1−σm

  • Pm

= ey′

Tm

t=1 e (z′

t γ+x′ mt β) 1−σm

1−σm m

j=1 ey′

Tm

t=1 e (z′

t γ+x′ mt β) 1−σj

1−σj

Heckman Classical Discrete Choice Theory

slide-51
SLIDE 51
  • Estimation (in two steps) (see Amemiya, Chapter 9)
  • Let

Im =

Tm

  • t=1

e

(z′

t γ+x′ mt β) 1−σm Heckman Classical Discrete Choice Theory

slide-52
SLIDE 52

1 Within each neighborhood, get

  • γ

1−σm and

  • β

1−σm by logit 2 Form

Im

3 Then estimate by MLE

ey′

mα+(1−σm) ln

Im

m

j=1 ey′

mα+(1−σj) ln

Ij

get α, σm

  • Assume σm = σj ∀j, m or at least need some restrictions across

multiple neighborhoods?

  • Note:

Im is an estimated regressor (“Durbin problem”)

  • Need to correct standard errors

Heckman Classical Discrete Choice Theory

slide-53
SLIDE 53

Multinomial Probit Models

1 Also known as: 1 Thurstone Model V (1929; 1930) 2 Thurstone-Quandt Model 3 Developed by Domencich-McFadden (1978) (on reading list)

ui = vi + ηi i = 1, ..., J vi = Ziβ (linear in parameters form) ui = Ziβ + ηi MNL MNP (i)β fixed (i)β random coefficient β ∼ N ¯ β, Σβ

  • (ii)ηi iid

(ii)β independent of η η ∼ (0, Ση),

  • Allow gen. forms of correlation between errors

Heckman Classical Discrete Choice Theory

slide-54
SLIDE 54

ui = Zi ¯ β + Zi

  • β − ¯

β

  • + ηi
  • (β − ¯

β) = ε and Zi

  • β − ¯

β

  • + ηi is a composite heteroskedastic

error term.

  • β random = taste heterogeneity,
  • ηi can interpret as unobserved attributes of goods
  • Main advantage of MNP over MNL is that it allows for general

error covariance structure.

  • Note: To make computation easier, users sometimes set

Σβ = 0 (fixed coefficient version)

  • allowing for β random
  • permits random taste variation
  • allows for possibility that different persons value 2

characteristics differently

Heckman Classical Discrete Choice Theory

slide-55
SLIDE 55

Problem of Identification and Normalization in the MNP Model

  • Reference: David Bunch (1979), “Estimability In the

Multinominal Probit Model” in Transportation Research

  • Domencich and McFadden
  • Let

Z ¯ β =    Z1 · ¯ β . . . ZJ · ¯ β    ˜ η =    η1 . . . ηJ    J alternatives K characteristics β random β ∼ N (β, Σβ) (2)

Heckman Classical Discrete Choice Theory

slide-56
SLIDE 56

Problem of Identification and Normalization in the MNP Model

  • Pr (alternative j selected):

= Pr (uj > ui) ∀i = j =

  • uj=−∞

uj

  • ui=−∞

uj

  • uJ=−∞

Φ (u | Vµ, Σµ) duJdulduj where Φ (u | Vµ, Σµ) is pdf (Φ is J-dimensional MVN density with mean Vµ, Σµ)

  • Note: Unlike the MVL, no closed form expression for the

integral.

  • The integrals often evaluated using simulation methods (we will

work an example).

Heckman Classical Discrete Choice Theory

slide-57
SLIDE 57

How many parameters are there?

  • ¯

β: K parameters

  • Σβ: K × K symmetric matrix K 2−K

2

+ K = K(K+1)

2

  • Ση:

J(J+1) 2

  • Note: When a person chooses j, all we know is relative utility,

not absolute utility.

  • This suggests that not all parameters in the model will be

identified.

  • Requires normalizations.

Heckman Classical Discrete Choice Theory

slide-58
SLIDE 58

Digression on Identification

  • What does it mean to say a parameter is not identified in a

model?

  • Model with one parameterization is observationally equivalent

to another model with a different parameterization

Heckman Classical Discrete Choice Theory

slide-59
SLIDE 59

Digression on Identification

  • Example: Binary Probit Model (fixed β)

Pr (D = 1 | Z) = Pr (v1 + ε1 > v2 + ε2) = Pr (xβ + ε1 > x2β + ε2) = Pr ((x1 − x2) β > ε2 − ε1) = Pr (x1 − x2) β σ > ε2 − ε1 σ

  • =

Φ ˜ xβ σ

  • ¯

x = x1 − x2

  • Φ

˜

xβ σ

  • is observationally equivalent to Φ

˜

xβ∗ σ∗

  • for β

σ = β∗ σ∗.

Heckman Classical Discrete Choice Theory

slide-60
SLIDE 60
  • β not separably identified relative to σ but ratio is identified:

Φ ˜ xβ σ

  • =

Φ ˜ xβ∗ σ∗

  • Φ−1 · Φ

˜ xβ σ

  • =

Φ−1Φ ˜ xβ∗ σ∗

β σ = β∗ σ∗

  • Set {b : b = β · δ, δ any positive scalar} is identified (say “β is

identified up to scale and sign is identified”).

Heckman Classical Discrete Choice Theory

slide-61
SLIDE 61

Identification in the MVP model

Pr (j selected | Vµ, Σµ) = Pr (ui − uj < 0 ∀i = j) Define ∆j =     1 .. −1 .. 1 .. −1 .. : : : .. .. −1 1    

(J−1)×J

(contrast matrix) ∆j ˜ u =   u′ − uj : uJ − uj  

Heckman Classical Discrete Choice Theory

slide-62
SLIDE 62

Identification in the MVP model

Pr (j selected | Vµ, Σµ) = Pr (∆j ˜ u < 0 | Vµ, Σµ) = Φ (0 | VZ, ΣZ)

  • Where

1 VZ is the mean of ∆j ˜

u = ∆j ˜ Z ¯ β

2 ΣZ is the variance of ∆j ˜

ZΣβ ˜ Z ′∆′

j + ∆jΣη∆′ j 3 VZ is (J − 1) × 1 4 ΣZ: (J − 1) × (J − 1)

  • We reduce dimensions of the integral by one.

Heckman Classical Discrete Choice Theory

slide-63
SLIDE 63
  • This says that all of the information exists in the contrasts.
  • Can’t identify all the components because we only observe the

contrasts.

  • Now define ˜

∆j as ∆j with Jth column removed and choose J as the reference alternative with corresponding ∆J.

  • Then can verify that

∆j = ˜ ∆j · ∆J

Heckman Classical Discrete Choice Theory

slide-64
SLIDE 64
  • For example, with three goods:

1 −1 −1

  • ×

1 −1 1 1

  • =

1 −1 −1 1

  • ˜

∆j, (j = 2, ∆J, (J = 3, ∆j, (j = 2, 3rd column

included) 3rd column removed) reference alt.)

Heckman Classical Discrete Choice Theory

slide-65
SLIDE 65
  • Therefore, we can write

VZ = ∆j ˜ Z ¯ β ΣZ = ∆j ˜ ZΣβ ˜ Z ′∆′

j + ˜

∆j∆JΣη∆′

J ˜

∆′

j

  • where CJ = ∆JΣη∆′

J and (J − 1) × (J − 1) has (J−1)2−(J−1) 2

+ (J + 1) parameters = J(J−1)

2

total.

  • Since original model can always be expressed in terms of a

model with (β, Σβ, CJ) , it follows that some of the parameters in the original model are not identified.

Heckman Classical Discrete Choice Theory

slide-66
SLIDE 66

How many parameters not identified?

  • Original model:

K + K (K + 1) 2 + J (J + 1) 2

  • Now:

K + K (K + 1) 2 + J (J − 1) 2 , J2 + J − (J2 − J) 2 = J not identified

  • Turns out that one additional parameter not identified.
  • Total: J + 1
  • Note: Evaluation of Φ (0 | kvZ, k2ΣZ) k > 0 gives same result

as evaluating Φ (0 | vZ, ΣZ) can eliminate one more parameter by suitable choice of k.

Heckman Classical Discrete Choice Theory

slide-67
SLIDE 67

Illustration

J = 3 Ση =   σ11 σ12 σ13 σ21 σ22 σ23 σ31 σ32 σ33   C2 = ∆2Ση∆′

2 =

1 −1 −1 1

  • · Ση

1 −1 −1 1 ′ = σ11 −2σ21 +σ22, σ21 −σ31 −σ32 +σ22 σ21 −σ31 −σ32 +σ22, σ33 −2σ31 +σ22

  • Heckman

Classical Discrete Choice Theory

slide-68
SLIDE 68

Illustration

C2 = ˜ ∆2∆3Ση∆′

3∆′ 2 =

1 −1 −1

  • ·

σ11 −2σ21 +σ33, σ21 −σ31 −σ32 +σ33 σ21 −σ31 −σ32 +σ33 σ22 −2σ32 σ33

  • ·
  • 1

−1 −1

  • Heckman

Classical Discrete Choice Theory

slide-69
SLIDE 69

Normalization Approach of Albreit, Lerman, and Manski (1978)

  • Note: Need J + 1 restrictions on VCV matrix.
  • Fix J parameters by setting last row and last column of Ση to 0
  • Fix scale by constraining diagonal elements of Ση so that trace

Σε J equals variance of a standard Weibull. (To compare

estimates with MNL and independent probit)

Heckman Classical Discrete Choice Theory

slide-70
SLIDE 70

How do we solve the forecasting problem?

  • Suppose that we have 2 goods and add a 3rd

Pr (1 chosen) = Pr

  • u1 − u2 ≥ 0
  • = Pr 1
  • Z 1 − Z 2 ¯

β ≥ ω2 − ω1

  • where

ω1 = Z 1 β − ¯ β

  • + η1,

ω2 = Z 2 β − ¯ β

  • + η2

=

  • (Z1−Z2) ¯

β

[σ11+σ22−2σ12+(Z2−Z1)Ση(Z2−Z1)′]

1/2

−∞

1 √ 2π e−t/2dt

  • Now add a 3rd good

u3 = Z 3 ¯ β + Z 3 β − ¯ β

  • + η3.

Heckman Classical Discrete Choice Theory

slide-71
SLIDE 71
  • Problem: We don’t know correlation of η3 with other errors.
  • Suppose that η3 = 0 (i.e. only preference heterogeneity). Then

Pr (1 chosen) = a

−∞

b

−∞

B.V .N. dt1dt2 when a =

  • Z 1 − Z 2 ¯

β

  • σ11 + σ22 − 2σ12 + (Z 2 − Z 1) Σβ (Z 2 − Z 1)′1/2

and b =

  • Z 1 − Z 3 ¯

β

  • σ11 + (Z 3 − Z 1) Σβ (Z 3 − Z 1)′1/2
  • We could also solve the forecasting problem if we make an

assumption like η2 = η3.

  • We solve red-bus//blue-bus problem if η2 = η1 = 0 and

z3 = z2.

Heckman Classical Discrete Choice Theory

slide-72
SLIDE 72

Pr (1 chosen) = Pr

  • u1 − u2 ≥ 0, u1 − u3 ≥ 0
  • but u1 − u2 ≥ 0 ∧ u1 − u3 ≥ 0 are the same event.
  • ∴adding a third choice does not change the choice of 1.

Heckman Classical Discrete Choice Theory

slide-73
SLIDE 73

Estimation Methods for MNP Models

  • Models tend to be difficult to estimate because of high

dimensional integrals.

  • Integrals need to be evaluated at each stage of estimating the

likelihood.

  • Simulation provides a means of estimating Pij = Pr (ichooses j)

Heckman Classical Discrete Choice Theory

slide-74
SLIDE 74

Computation and Estimation Link to Appendix

Heckman Classical Discrete Choice Theory

slide-75
SLIDE 75

Classical Models for Estimating Models with Limited Dependent Variables

References:

  • Amemiya, Ch. 10
  • Different types of sampling (previously discussed)

(a) random sampling (b) censored sampling (c) truncated sampling (d) other non-random (exogenous stratified, choice-based)

Heckman Classical Discrete Choice Theory

slide-76
SLIDE 76

Standard Tobit Model (Tobin, 1958) “Type I Tobit”

y ∗

i = xiβ + ui

  • Observe

yi = y ∗

i

if y ∗

i ≥ y0 or yi = 1 (y ∗ i ≥ y0) y ∗ i

yi = if y ∗

i < y0

  • Tobin’s example-expenditure on a durable good only observed if

good is purchased

Heckman Classical Discrete Choice Theory

slide-77
SLIDE 77

Figure 1

individuals

y

x x x x x x x x x x expenditure

Note: Censored observations might have bought the good if price had been lower.

  • Estimator. Assume y ∗

i /xi ∼ N (0, σ2 u)

y ∗

i /xi ∼ N (xiβ, σ2 u)

Heckman Classical Discrete Choice Theory

slide-78
SLIDE 78

Density of Latent Variables

g(y∗) =π0 Pr (y∗

i < y0) + π1f (y∗ i |yi ≥ y0) · Pr (y∗ i ≥ y0)

Pr (y∗

i < y0) = Pr (xiβ + ui < y0) = Pr

ui σu < y0−xiβ σu

  • = Φ

y0−xiβ σu

  • f (y∗

i |y∗ i ≥ y0) = 1 σu φ

y∗

i −xiβ

σu

  • 1 − Φ
  • y0−xiβ

σu

why?

Pr (y ∗ = y ∗

i |y0 ≤ y ∗)

= Pr (xβ + u = y ∗

i |y0 ≤ xβ + u)

Pr u σu = y ∗

i − xβ

σu | u σu ≥ y0 − xβ σu

  • Heckman

Classical Discrete Choice Theory

slide-79
SLIDE 79
  • Note that likelihood can be written as:

L = Π0Φ y0 − xiβ σu

  • Π1
  • 1 − Φ

y0 − xiβ σu

  • This part you would set

with just a simple probit

Π1

1 σu φ

  • y ∗

i −xiβ

σu

  • 1 − Φ
  • y0−xiβ

σu

  • Additional information
  • You could estimate β up to scale using only the information on

whether yi y0, but will get more efficient estimate using additional information. * if you know y0, you can estimate σu.

Heckman Classical Discrete Choice Theory

slide-80
SLIDE 80

Truncated Version of Type I Tobit

Observe yi = y ∗

i if y ∗ i > o

  • bserve nothing for censored observations

example: only observe wages for workers

  • Z

= Π1

1 σu φ

  • y∗

i −xiβ

σu

  • Φ
  • xiβ

σu

  • Pr (y ∗

i > 0)

= Pr (xβ + u > 0) = Pr u σu > −xβ σu

  • =

Pr

  • u < xβ

σu

  • Heckman

Classical Discrete Choice Theory

slide-81
SLIDE 81

Different Ways of Estimating Tobit

(a) if censored, could obtain estimates of β σu by simple probit (b) run OLS on observations for which y ∗ i is observed

E (yi|xiβ + ui ≥ 0) = xiβ + σuE ui σu | ui σu > −xβ σu

  • (y0 = 0)
  • where E (yi|xiβ + ui ≥ 0) is the conditional mean for truncated

normal r.v and σuE ui σu | ui σu > −xβ σu

→ λ xiβ σu

  • =

φ

  • −xβ

σu

  • Φ
  • πiβ

σu

  • λ
  • xiβ

σu

  • known as “Mill’s ratio” ; bias due to censoring, can be

viewed as an omitted variables problem

Heckman Classical Discrete Choice Theory

slide-82
SLIDE 82

Heckman Two-Step procedure

  • Step 1: estimate

β σu by probit

  • Step 2:

form ˆ λ

  • xi ˆ

β σ

  • regress

yi = xiβ + σ ˆ λ xiβ σ

  • + v + ε

v = σ

  • λ

xiβ σ

  • − ˆ

λ xiβ σ

  • ε

= ui − E (ui|ui > xiβ)

Heckman Classical Discrete Choice Theory

slide-83
SLIDE 83
  • Note: errors (v+e) will be heteroskedatic;
  • need to account for fact that λ is estimated (Durbin problem)
  • Two ways of doing this:

(a) Delta method (b) GMM (Newey, Economic Letters, 1984) (c) Suppose you run OLS using all the data

E (yi) = Pr (y∗

i ≤ 0) · 0 + Pr (y∗ i > 0)

  • xiβ + σuE

ui σu | ui σu > −xiβ σ

xiβ σ xiβ + σuλ xiβ σ

  • could estimate model by replacing Φ with ˆ

φ and λ with ˆ λ.

  • For both (b) and (c), errors are heteroskedatic, meaning that

you could use weights to improve efficiency.

  • Also need to adjust for estimated regressor.

(d) Estimate model by Tobit maximum likelihood directly.

Heckman Classical Discrete Choice Theory

slide-84
SLIDE 84

Variations on Standard Tobit Model

y ∗

1i

= x1iβ + u1i y ∗

2i

= x2iβ + u2i y2i = y ∗

2i

if y ∗

1i ≥ 0

= else

  • Example
  • y2i student test scores
  • y∗

1i index representing parents propensity to enroll students in

school

  • Test scores only observed for proportion enrolled

Heckman Classical Discrete Choice Theory

slide-85
SLIDE 85

L =Π1 [Pr (y ∗

1i > 0) f (y2i|y ∗ 1i > 0)] Π0 [Pr (y ∗ 1i ≤ 0)]

f (y ∗

2i|y ∗ 1i ≥ 0) =

0 f (y ∗ 1i, y ∗ 2i) dy ∗ 1i

0 f (y ∗ 1i) dy ∗ 1i

=f (y2i) ∞

0 f (y ∗ 1i|y ∗ 2i) dy ∗ 1i

0 f (y ∗ 1i) dy ∗ 1i

= 1 σ2φ y ∗

2i − x2iβ2

σ2

  • ·

0 f (y ∗ 1i|y ∗ 2i) dy ∗ 1i

Pr (y ∗

1i > 0)

y1i ∼ N

  • x1iβ1, σ2

y2i ∼ N (x2iβ2, )

Heckman Classical Discrete Choice Theory

slide-86
SLIDE 86

y ∗

1i | y ∗ 2i ∼ N

  • x1iβ1 + σ12

σ2

2

(y2i − x2iβ2) , σ2

1 − σ12

σ2

2

  • E (y ∗

1i | u2i = y ∗ 2i − x2iβ) =x1iβ1 + E (u1i | u2i = y ∗ 2i − x2iβ)

Heckman Classical Discrete Choice Theory

slide-87
SLIDE 87

Estimation by MLE

L = Π0

  • 1 − Φ

x1iβ σ1

  • Π1

1 σ2 · φ y ∗

2i − x2iβ2

σ2

  • ·

  1 − Φ   −

  • x1iβ1 + σ12

σ2

2 (y2i − x2iβ2)

  • σx

    

Heckman Classical Discrete Choice Theory

slide-88
SLIDE 88

Estimation by Two-Step Approach

  • Using data on y2i for which y1i > 0

E (y2i|y1i > 0) = x2iβ + E (u2i|xiβ + u1i > 0) = x2iβ + σ2E u2i σ2 | u1i σ1 > −x1iβ1 σ1

  • =

x2iβ + \ σ2 σ12 σ1\ σ2 E u1i σ1 | u1i σ1 > −x1iβ1 σ1

  • =

x2iβ2 + σ12 σ1 λ −xiβ σ

  • Heckman

Classical Discrete Choice Theory

slide-89
SLIDE 89

Example: Female labor supply model

max u (L, x) s.t. x = wH + v H = 1 − L where H : hours worked v : asset income w given Px = 1 L : time spent at home for child care

∂u ∂L ∂u ∂x

= w when L < 1 reservation wage = MRS |H=0= wR

Heckman Classical Discrete Choice Theory

slide-90
SLIDE 90

Example: Female labor supply model

  • We don’t observe wR directly.

Model w 0 = xβ + u

(wage person would earn if they worked)

w R = zγ + v wi = w 0

i

if w R

i < w 0 i

= else

  • Fits within previous Tobit framework if we set

y ∗

1i

= xβ − zγ + u − v = w 0 − w R y2i = wi

  • Note - Gronau does not develop a model to explain hours of

work.

Heckman Classical Discrete Choice Theory

slide-91
SLIDE 91

Incorporate choice of H

w 0 = x2iβ2 + u2i given MRS =

∂u ∂L ∂u ∂x

= γHi + z′

i α + vi

(Assume functional form for utility function that yields this)

Heckman Classical Discrete Choice Theory

slide-92
SLIDE 92

w r (Hi = 0) = z′

i α + vi

work if w 0 = x2iβ2 + u2i > ziα + vi if work, then w 0

i

= MRS = ⇒ x2iβ2 + u2i = αHi + ziα + vi = ⇒ Hi = x2iβ2−z′

i α + u2i − vi

γ = x1iβ1 + u1i where x1iβ1 = (x2iβ2 − ziα) γ−1 u1i = u2i − vi

Heckman Classical Discrete Choice Theory

slide-93
SLIDE 93

Type 3 Tobit Model

y ∗

1i = x1iβ1 + u1i ←

− hours y ∗

2i = x2iβ1 + u2i ←

− wage y1i = y ∗

1i

if y ∗

1i > 0

= if y ∗

1i ≤ 0

y2i = y ∗

2i

if y ∗

1i > 0

= if y ∗

1i ≤ 0

Heckman Classical Discrete Choice Theory

slide-94
SLIDE 94

Here Hi = H∗

i

if H∗

i > 0

= if H∗

i ≤ 0

wi = w 0

i

if H∗

i > 0

= if H∗

i ≤ 0

  • Note: Type IV Tobit simply adds

y3i = y ∗

3i

if y ∗

1i > 0

= if y ∗

1i ≤ 0

Heckman Classical Discrete Choice Theory

slide-95
SLIDE 95
  • Can estimate by

(1) maximum likelihood (2) Two-step method

E

  • w 0

i | Hi > 0

  • = γHi + ziα + E (vi | Hi > 0)

Heckman Classical Discrete Choice Theory

slide-96
SLIDE 96

Type V Tobit Model of Heckman (1978)

y ∗

1j

= γy2i + x1iβ + δ2wi + u1i y2i = γ2y ∗

1i + x2iβ2 + δ2wi + u2i

  • Analysis of an antidiscrimination law on average income of

African Americans in ith state.

  • Observe x1i, x2i, y2i and wi

wi = 1 if y ∗

1i > 0

wi = if y ∗

1i ≤ 0

  • y2i = average income of African Americans in the state
  • y ∗

1i = unobservable sentiment towards African Americans

  • wi = if law is in effect

Heckman Classical Discrete Choice Theory

slide-97
SLIDE 97
  • Adoption of Law is endogenous
  • Require restriction γδ2 + δ1 = 0 so that we can solve for y ∗

1j as

a function that does not depend on wi.

  • This class of models known as “dummy endogenous variable”

models. Coherency Problem (Suppose Not Restricted?)

Heckman Classical Discrete Choice Theory

slide-98
SLIDE 98

Relaxing Parametric Assumptions in the Selection Model

References:

  • Heckman (AER, 1990) “Varieties of Selection Bias”
  • Heckman (1980), “Addendum to Sample Selection Bias as

Specification Error”

  • Heckmand and Robb (1985, 1986)

y ∗

1

= xβ + u y ∗

2

= zγ + v y1 = y ∗

1

if y ∗

2 > 0

Heckman Classical Discrete Choice Theory

slide-99
SLIDE 99

Relaxing Parametric Assumptions in the Selection Model

E (y ∗

1 | observed) =xβ + E (u | x, zγ + u > 0)

+ [u − E (u | x, zγ + u > 0)] ∞

−∞

−zγ

−∞ uf (u, v | x, z) dvdu

−∞

−zγ

−∞ f (uv | x, z) dvdu

  • Note:

Pr (y ∗

2 > 0 | z) = Pr (zγ + u > 0 | z) = P (Z) = 1 − Fv(−zγ)

Heckman Classical Discrete Choice Theory

slide-100
SLIDE 100

⇒ Fv (−zγ) = 1 − P (Z) ⇒ −zγ = F −1

v

(1 − P (Z)) if Fv

  • Can replace −zγ in integrals in integrals by F −1

v

(1 − P (Z)) if in addition f (u, v | x, z) = f (u, v | zγ) (index sufficiency)

  • Then

E (y ∗

1 | y2 > 0) = xβ + g (P (z)) + ε where g (P (Z))

is bias or “control function.”

  • Semiparametric selection model-Approximate bias function by

Taylor series in P (zγ) , truncated power series.

Heckman Classical Discrete Choice Theory

slide-101
SLIDE 101

Airum Models

Heckman Classical Discrete Choice Theory

slide-102
SLIDE 102

Notes on McFadden Chapter/Integrating Discrete Continuous (see Heckman, 1974b, 1978, change notation)

  • Notation:
  • I: enumeration of discrete alternatives
  • x: divisible goods
  • w: attributes of discrete choices
  • r: price of x
  • qi: price of good i
  • y: income
  • y: rx + qi

u: x × ω × I → [0, 1] utility

  • Define indirect utility function

v(y − q, r, wi, i, u) = max

x

  • u (x, wi, i | rx ≤ y − qi)
  • Maximize out over continuous goods so we are left with

discrete goods

Heckman Classical Discrete Choice Theory

slide-103
SLIDE 103

Assumptions

  • We assume v has usual properties of an indirect utility function
  • Continuous, twice differentiable, homogeneous degree 0 in

(y, q − r), quasiconcave in r,

dv d(y−q) > 0)

  • Then get

x(y − q, r, wi, i; u) =

−∂v ∂r ∂v ∂y

. (Roy’s Identity)

Heckman Classical Discrete Choice Theory

slide-104
SLIDE 104

Assumptions

  • For discrete alternative, we also get something like Roy’s

Identity δj = D(j | B, s; u) =

−∂v∗ ∂qj ∂v∗ ∂y

where v ∗(y − qB, r, wR, B; u) = max

i∈B v(y − qi, r, wi, i;

u) δj = 1 0 if j ∈ B vj ≥ vk ∀k

Heckman Classical Discrete Choice Theory

slide-105
SLIDE 105
  • If IU assumptions satisfied, can write relationship between the

probability of choosing ji and the utility function as P(j | B, s) = Eu|sD(j | B, s; u)

  • We seek sufficient conditions on preferences u such that we can

integrate out over characteristics and come up with probabilities

  • McFadden Shows that v takes AIRUM form

v(y − q, r, wi, i; u) = y − q − α(r, wi, i, u) β(r) where y > q + α α, β homogeneous of degree one wrt r

Heckman Classical Discrete Choice Theory

slide-106
SLIDE 106
  • Then

v = Eu|s max

i∈B v(y − qi, r, wi, i;

u) = 1 β(r)

  • y + max

i∈B

  • Eu|s (−qi − α(r, w, i;

u))

  • and

P(j) = Eu|sD(j | B, s) = − ∂v

∂qj ∂v ∂y

  • v is a utility function yielding the PCS
  • Demand distribution can be analyzed as if it were generated by

a population with common tastes, with each representative consumer having fractional consumption rates for the discrete alternative.

Heckman Classical Discrete Choice Theory

slide-107
SLIDE 107
  • Let
  • G(qB, r, wB, B, s)

= Eu|s max

i∈B [−qi − α(r, wB, i;

u)] (∗) “Social surplus function”

  • Then

P(j | B, s) = −∂ G(qB, r, wB, B, s) ∂qj (∗∗) under SS conditions given in Mcfadden’s chapter

  • I.e., choice probabilities given by the gradient of the SS

function.

Heckman Classical Discrete Choice Theory

slide-108
SLIDE 108

Return to main text

Heckman Classical Discrete Choice Theory

slide-109
SLIDE 109

Appendix

Heckman Classical Discrete Choice Theory

slide-110
SLIDE 110

Variety of Simulation Methods

  • Simulated method of moments
  • Method of simulated scores
  • Simulated maximum likelihood

References:

  • Lerman and Manski (1981), Structural Analysis (online at

McFadden’s website)

  • McFadden (1989), Econometrica
  • Ruud (1982), Journal of Econometrics
  • Hajivassilon and McFadden (1990)
  • Hajivassilon and Ruud (Ch. 20), Handbook of Econometrics
  • Stern (1992), Econometrica
  • Stern (1997), Survey in JEL
  • Bayesian MCMC (Chib et al. on reading list)

Heckman Classical Discrete Choice Theory

slide-111
SLIDE 111

Early Simulation Method: “Crude Frequency Method”

Model uj = Zjβ + ηj with β fixed, ηj ∼ N (0, Ω) , J choices Pij = prob i chooses j Yij = 1 if i chooses j, 0 else $ =

N

  • i=1

J

  • j=1

(Pij)Yij log $ =

N

  • i=1

J

  • j=1

Yij log Pij

Heckman Classical Discrete Choice Theory

slide-112
SLIDE 112

Simulation Algorithm

(i) For given β, Ω generate Monte Carlo draws (R of them)

ur

j , j = 1...J, r = 1...R (ii) Let ˜

Pk = 1

R R

  • r=1

1(ur

k = max{ur 1, ..., ur J}) where ˜

Pk is a “frequency simulator” of Pr (k chosen; β, Ω)

(iii) Maximize N

  • i=1

log ˜ Pik over alternative values for β, Ω

Heckman Classical Discrete Choice Theory

slide-113
SLIDE 113
  • Note: Lerman and Manski found that this procedure performs

poorly and requires a large number of draws, particularly when P is close to 0 or 1. var 1 R

  • 1 (·)
  • = 1

R2

R

  • i=1

var1 (·) with var1 (·) at true values

  • McFadden (1989) provided some key insights into how to

improve the simulation method. He showed that simulation is viable even for a small number of draws provided that:

(a) an unbiased simulator is used (b) functions to be simulated appear linearly in the conditions

defining the estimator

(c) same set of random draws is used to simulate the model at

different parameter values

  • Note: Condition (b) is violated for crude frequency method

which had log ˜ Pik

Heckman Classical Discrete Choice Theory

slide-114
SLIDE 114

Simulated Method of Moments (McFadden, Econometrica, 1989)

uij = Zijβ = Zij ¯ β + Zijεi β = ¯ β + εi (earlier model with only preference heterogeneity) Pij (γ) = P (i chooses j | wi, γ) (wi are regressors)

  • Define Yij = 1 if i chooses j, Yij = 0 otherwise.

log $ = 1 N0

N

  • i=1

J

  • j=1

Yij ln Pij (γ) N0 = NJ ∂ log $ ∂γ = 1 N0

N

  • i=1

J

  • j=1

Yij

  • ∂Pij

∂γ

Pij (γ)

  • = 0

(3)

Heckman Classical Discrete Choice Theory

slide-115
SLIDE 115

Simulated Method of Moments (McFadden, Econometrica, 1989)

  • N0(ˆ

γMLE − γ0) ∼ N

  • 0, I −1

f

  • ˆ

If = 1 N0

N

  • i=1

J

  • j=1

Yij

  • ∂Pij

∂γ

Pij (γ)  

J

  • j=1

Yij

  • ∂Pij

∂γ

Pij (γ) ′  . (outer product of score vector)

Heckman Classical Discrete Choice Theory

slide-116
SLIDE 116
  • Now use the fact that J

j=1 Pij (γ) = 1

J

  • j=1

∂Pij ∂γ = 0 ⇒

J

  • j=1

∂Pij ∂γ

Pij Pij = 0

  • Rewrite 3 as

1 N0

N

  • i=1

J

  • j=1

(Yij − Pij)

∂Pij ∂γ

Pij = 0

  • Note: E (Yij) = Pij.

Heckman Classical Discrete Choice Theory

slide-117
SLIDE 117
  • Letting εij = Yij − Pij, and Zij =

∂Pij ∂γ

Pij , we have

1 N0

N

  • i=1

J

  • j=1

εijZij =

∂Pij ∂γ

Pij

  • like a moment condition using Zij as the instrument but so far

Pij still a J − 1 dimensional intergral.

Heckman Classical Discrete Choice Theory

slide-118
SLIDE 118

Simulation Algorithm

  • Model

uij = Zij ¯ β + Zij · εi J choices, K characteristics uij : 1 × 1 Zij : 1 × K ¯ β : K × 1 Zij : 1 × K εi : K × 1

  • Rewrite as

˜ ui = ˜ Zi ¯ β + ˜ ZiΓ˜ ei where ΓΓ′ = Σε (Cholesky decomposition), ˜ ei ∼ N (0, Ik) , εi = Γ˜ ei ˜ ui : J × 1 ˜ Zi : J × K ¯ β : K × 1 ˜ Zi : J × K Γ : K × X ˜ ei : K × 1

Heckman Classical Discrete Choice Theory

slide-119
SLIDE 119
  • Step (i). Generate ˜

ei for each i such that ˜ ei are iid across persons and distributed N (0, Ik) . In total, generate N (sample size) · K (vector length) · R (number of Monte Carlo draws)

  • Step (ii). Fix matrix Γ and obtain

ηij = ZijΓ˜ ei, where Zij : 1 × K; Γ : K × K; ˜ ei is K × 1 .

  • Form vector

     Zi1Γ˜ ei Zi2Γ˜ ei . . . ZiJΓ˜ ei      for each person.

Heckman Classical Discrete Choice Theory

slide-120
SLIDE 120
  • Step (iii). Fix ¯

β and generate ˜ uij = Zij ¯ β + ηij ∀i.

  • Step (iv). Find relative frequency that ith person chooses

alternative j across Monte Carlo draws ˜ Pij (γ) = 1 R

R

  • r=1

1 (˜ uij > ˜ uim; ∀m = j)

  • where ˜

Pij (γ) is the “simulator” for Pij (γ) . Stack to get ˜ Pi (γ) .

Heckman Classical Discrete Choice Theory

slide-121
SLIDE 121
  • Step (v). To get ˜

Pi (γ) for different values of γ, repeat steps (ii) through (iv) using the same r.v.’s ˜ ei generated in step (i).

  • Step (vi). Define

wij =

∂Pij(γ) ∂γ

Pij

  • and wij as corresponding stacked vector simulator for wij can be
  • btained by a numerical derivative

∂Pij (γ) ∂γ = Pij (γ + hlm) − Pij (γ − hlm) 2h where m = 1...J, lm = vector with 1 in mth place.

Heckman Classical Discrete Choice Theory

slide-122
SLIDE 122

Solve Moment Condition

  • Apply Gauss-Newton Method, iterate to convergence

γ1 = γ0 + { 1 N

  • wi (γ) {yi − ˜

Pi (γ0)}{yi − ˜ Pi (γ0)}−1 · 1 N

N

  • i=1

wi (γ0) {yi − ˜ Pi (γ0) }

Heckman Classical Discrete Choice Theory

slide-123
SLIDE 123

Solve Moment Condition

Digression on Gauss-Newton

  • Suppose problem is

S = min

β

1 N

N

  • i=1

[yi − fi (β)]2 (nonlinear least squares). fi (β) = fi

  • ˆ

β1

  • + ∂fi

∂β

  • β1
  • β − ˆ

β1

  • + ...

by Taylor expansion around initial guess ˆ β1 = fi

  • ˆ

β1

  • + ∂f

∂β

  • ˆ

β1

  • β − ˆ

β1

  • + ...

(terms ignored)

Heckman Classical Discrete Choice Theory

slide-124
SLIDE 124
  • Substitution gives

min

β

1 N

N

  • i=1

[yi − fi

  • ˆ

β1

  • − ∂f

∂β

  • ˆ

β1

  • β − ˆ

β1

  • ]2
  • Solve for ˆ

β2 to get ˆ β2 = β1 + N

  • i=1

∂fi ∂β

  • ˆ

β1

∂fi ∂β

ˆ β1

−1 ∂S ∂β

  • ˆ

β1

  • Repeat until convergence (problem if matrix is singular).

Heckman Classical Discrete Choice Theory

slide-125
SLIDE 125

Disadvantages of Simulation Methods

1 ˜

Pij is not smooth due to indicator function (causes difficulties in deriving asymptotic distribution; need to use methods developed by Pakes and Pollard (1989) for nondifferentiable functions). Smoothed SMOM methods developed by Stern, Hajivassiliou, and Ruud.

2 ˜

Pij cannot be 0 (causes problems in denominator when close to 0)

3 Simulating small Pij may require large number of draws

Heckman Classical Discrete Choice Theory

slide-126
SLIDE 126

Disadvantages of Simulation Methods

  • Refinement: “Smoothed Simulated Method of Moments”

replaces indicator with a smooth function (Stern (1992), Econometrica).

  • instead of

˜ Pij (γ) = 1 R

R

  • r=1

R (Φ (˜ uij − ˜ uim))

Heckman Classical Discrete Choice Theory

slide-127
SLIDE 127

How does simulation affect the asymptotic distribution?

  • Without simulation get

√η (ˆ γmme − γ0) ∼ N

  • 0,
  • p lim

N→∞

1 N

  • wi (yi − Pi)′ w ′

i

−1

  • with simulation, the variance is slightly hgher due to simulation

error √n (ˆ γmsm − γ0) ∼ N

  • 0,plim

N→∞ C −1{1 + 1

η}

  • Heckman

Classical Discrete Choice Theory

slide-128
SLIDE 128

How does simulation affect the asymptotic distribution?

  • where

C = − 1 N

N

  • i=1

wi (yi − Pi) (yi − Pi)′ wi′

  • as N → ∞,

√η (γmsm − γ) ∼ N (0, C −1) .

  • Note: Method does not require that number of draws go to

infinity.

Heckman Classical Discrete Choice Theory

slide-129
SLIDE 129

Choice-Based Sampling (See Heckman in New Palgrave)

  • Reference:
  • Chs. 1-2 of Manski and McFadden volume
  • Manski and Lerman (1978 Econometrica)
  • Amemiya
  • Examples:
  • 1. Suppose we gather data on transportation mode choice at the
  • train station
  • subway station
  • car checkpoints (toll booths etc.)

Heckman Classical Discrete Choice Theory

slide-130
SLIDE 130
  • We observe characteristics on populations conditioned on the

choice that they made (this type of sampling commonly arises)

  • 2. Evaluating effects of a social program; have data on

particpants and non-participants; usually participants

  • versampled relative to frequency in the population.
  • Distinguish between exogenous stratification and endogenous

stratification, the latter of which is choice-based. (But a special type of endogenous stratification)

  • Oversampling in high population areas (as is commonly done to

reduce sampling costs or to increase representation of some groups) could be exogenous stratification (depending on phenomenon being studied).

Heckman Classical Discrete Choice Theory

slide-131
SLIDE 131

Notation:

  • Let Pi = P (i | Z) in a random sample P∗

i in a choice-based

sample (CBS)

  • Under CBS, sampling is assumed to be random within i

partitions of the data P (Z | i) = P∗ (Z | i) but P (Z) = P∗ (Z)

  • Suppose that we want to recover P (i | Z) from choice-based

data

  • We observe

P∗ (i | Z) (assume Z are discrete conditioning cells) P∗ (Z) P∗ (i) Frequency weights :

Heckman Classical Discrete Choice Theory

slide-132
SLIDE 132

By Bayes’ Rule

P (A, B) = P (A, B) P (B) = P (B | A) · P (A) P (B) P∗ (i | Z) = P∗ (Z | i) · P∗ (i) P∗ (Z) P (i | Z) = P (Z | i) · P (i) P (Z)

Heckman Classical Discrete Choice Theory

slide-133
SLIDE 133

By Bayes’ Rule

P (i | Z) =

  • P∗(i|Z)·P∗(Z)

P∗(i)

  • P (i)

P (Z) P (Z) =

  • j

P (Z | j) P (i) P (Z | j) = P∗ (Z | j) P∗ (Z | j) = P∗ (j | Z) · P∗ (Z) P∗ (j) P (i | Z) =

P∗(i|Z)P∗(Z) P∗(i)

P (i)

  • j

P∗(j|Z)P∗(Z) P∗(j)

P (j) = P∗ (i | Z) P(i)

P∗(i)

  • j P∗ (j | Z) P(j)

P∗(j)

Heckman Classical Discrete Choice Theory

slide-134
SLIDE 134
  • To recover P (i | Z) from choice-based sampled data, you need

to know P (j) , P∗ (j) ∀j. P∗ (j) can be estimated from sample, but P (j) requires outside information. Need weights

P(i) P∗(i).

  • Note: Problem set asks you to consider how CBS biases the

coefficients and intercept in a logit model. (Can show that bias

  • nly in the constant)

Heckman Classical Discrete Choice Theory

slide-135
SLIDE 135

Application and Extension: Berry, Levinsohn, and Pakes (1995)

  • Develop equilibrium model and estimation techniques for

analyzing demand and supply in differentiated product markets

  • Use to study automobile industry
  • Goal is to estimate parameters of both the demand and cost

functions incorporating own and cross price elasticities and elasticities with respect to product attributes (car horse power, MPG, air conditioning, size,...) using only aggregate product level data supplemented with data on consumer characteristic distributions (income distribution from CBS)

  • Want to allow for flexible substitution patterns

Heckman Classical Discrete Choice Theory

slide-136
SLIDE 136

Key assumptions

(i) joint distribution of observed and unobserved product and

consumer characteristics

(ii) price taking for consumers, Nash eq assumptions on producers

in oligopolistic, differentiated products market.

Heckman Classical Discrete Choice Theory

slide-137
SLIDE 137

Notation

  • ζ : individual characteristics

x (observed) ξ (unobserved) p (price) : product characteristics uij = u (ζi, pj, xj, θ) : utility if person i chooses j (Cobb-Douglas assumption here) j = 0, 1, ..., J 0 = not purchasing any

Heckman Classical Discrete Choice Theory

slide-138
SLIDE 138

Notation

  • Define

Aj = {ζ : uij (ζ, pj, xj, ξj ; θ) ≥ u (ζ, pr, xr, ξr ; 0) r = 0, ..., J} ,

  • the set of ζ that induces choice of good j. This is defined over

individual characteristics which may be observed or unobserved.

Heckman Classical Discrete Choice Theory

slide-139
SLIDE 139

Market Share

sj (p, x, ξ; θ) =

  • ζǫAj

f (ζ) dζ; (s is vector of market shares)

  • Special functional form:

uij = u (ζi, pj, xj, ξj; θ) = xjβ − αpj + ξj + ǫij = δj + ǫij

Heckman Classical Discrete Choice Theory

slide-140
SLIDE 140

Market Share

  • δj = xjβ − αpj + ξj = mean utility from good j
  • ¯

ξj is mean across consumers of unobserved component of utility

  • ǫij are the only elements representing consumer characteristics
  • Special Case:

ξj = (no unobserved characteristic) ǫij iid over i, j, independent of xj

  • Then share

sj = ∞

−∞

Πq=jF (δj − δq + ǫ) f (ǫ) dǫ

  • Unidimensional integral; has closed form solution under extreme

value.

Heckman Classical Discrete Choice Theory

slide-141
SLIDE 141

Why is assumption that utility is additively separable and iid in consumer and product characteristics highly restrictive?

(a) Implies that all substitution effects depend only on the δs (since

there is a unique vector of market shares associated with each δ vector). Therefore, conditional on market shares, substitution patterns don’t depend on characteristics of the product. Example: if Mercedes and Yugo have some market share then they must have the same δs and some cross derivative with respect to any 3rd car (BMW). ∂si ∂pk =

  • Πq=kF (δk − δq + ǫ) F ′ (δk − δq + ǫ) ∂δk

∂pk f (ǫ) dǫ (same if δs same)

Heckman Classical Discrete Choice Theory

slide-142
SLIDE 142

Why is assumption that utility is additively separable and iid in consumer and product characteristics highly restrictive?

(b) Two products with same market share have same own price

derivatives (not good because you expect product markup to depend on more than market share)

(b) Also assumes that individuals value product characteristics in

same way (no preference heterogeneity)

Heckman Classical Discrete Choice Theory

slide-143
SLIDE 143

Alternative Model (Random Coefficients Versions)

uij = xj ¯ β − αpj + ξj +

  • k

σkxjkνik + ǫij βk = ¯ βk + σkνk E (νik) = 1

  • Could still assume ǫij has iid extreme value.

Heckman Classical Discrete Choice Theory

slide-144
SLIDE 144

Model Actually Used

  • Impose alternative functional form assumption because they

want to incorporate prior info on distribution of relevant consumer characteristics and on interactions between consumer and product characteristics. uij = (y − pj)α G (xi, ξj, νi) eǫ(i,j)

  • Assume G is log linear

˜ uij = log uij = α log (yi − pj) + xj ¯ β + ξj +

  • k

σkxjkνik + ǫij

Heckman Classical Discrete Choice Theory

slide-145
SLIDE 145

Model Actually Used

  • No good utility:

= α log yi + ξ0 + σ0νio + ǫio

  • Note: Prices are likely to be correlated with unobserved

product attributes, ξ, which leads to an endogeneity problem. (ξ may represent things like style, prestige, reputation, etc.) quantity demanded, qj = Msj (x, ξ, p; θ) (share)

  • ξ enters nonlinearly, so we need to use some transformation to

be able to apply instrumental variables (Principle of Replacement Function).

Heckman Classical Discrete Choice Theory

slide-146
SLIDE 146

Cost Side

  • Multiproduct firms 1, . . . , F. Each produces subset τF of J

possible products. Cost of producing good assumed to be independent of output levels and log linear in vector of cost characteristics (Wj, ωj) . ln mcj = Wjγ + ωj ⇒ Πf =

  • j∈τF

(pj − mcj) msj (p, x, ξ; θ) Nash Assumption

  • Firm chooses prices that maximize profit taking as given

attributes of its products and the prices and attributes of its competitor’s products. Pj satisfies sj (p, x, ξ; θ) +

  • rǫτF

(pr − mcr) ∂sr (p, x, ξ; θ) ∂pj = 0

Heckman Classical Discrete Choice Theory

slide-147
SLIDE 147

Cost Side

  • Define

∆jr = −∂sr

∂pj

  • if r and j produced by same firm

⇒ s (p, x, ξ; θ) − ∆ (p, x, ξ; θ) [p − mc] = 0 ⇒ p = mc + ∆ (p, x, ξ; θ)−1 s (p, x, ξ; θ) (market)

Heckman Classical Discrete Choice Theory

slide-148
SLIDE 148
  • Market term depends on parameters of demand system and on

equilibrium price vector p = mc + ∆ (p, x, ξ; θ)−1 s (p, x, ξ; θ)

  • Mark-up depends only on the parameters of the demand system

and equilibrium price vector.

  • Since p is a function of w, b (p, x, ξ; θ) also a function of ω

(unobs cost determinants)

  • Let

ln mcj = Wjγ + ωj ⇒ p exp {W γ + ω} + b (p, x, ξ; θ) ln (p − b (p, x, ξ, θ)) = W γ + ω (pricing equation)

Heckman Classical Discrete Choice Theory

slide-149
SLIDE 149

Estimation

  • Need instruments for both demand and pricing equations. i.e.

need variables correlated with (x, ω) uncorrelated with ξ and ω. Let Z = (X, W ) (p, q not included in Z)

  • Assume

E (ξj | Zj) = E (ωj | Z) = 0 E ((ξj, ωj) (ξj, ωj) | Z) = Ω (Zj) (finite almost every Zj)

  • Note that demand for any product is a function of

characteristics of all products, so don’t have any exclusion restrictions.

Heckman Classical Discrete Choice Theory

slide-150
SLIDE 150

Data

  • J vectors (xj, ωj, pj, qj)
  • n : number of households sampled
  • sn : vector of sampled market shares
  • Assume that a true θ0 population abides by models.
  • Decision Rules
  • sn converges to s0

(multinomial)

  • √n (sn − s0) = Op(1)

Heckman Classical Discrete Choice Theory

slide-151
SLIDE 151
  • Assume we could calculate

{εj (θ, s, p0) , ωj (θ, s, p0)}J

j=1

for alternative values of θ

  • They show that any choice of

(a) observed vector of market shares, s (b) distribution of consumer characteristics, P (c) parameter of model

  • implies a unique sequence of estimates for the two unobserved

characteristics ξj(θ, s, P), ωj (θ, s, P)

Heckman Classical Discrete Choice Theory

slide-152
SLIDE 152
  • Then any function of Z must be uncorrelated with the vector
  • ξ
  • θ, s0, P0
  • , ω
  • θ, s0, P0

J

j=1 when θ = θ0

  • Can use GMM
  • Note: Conditional moment restriction implies infinite number of

unconditional restrictions min 1 J

  • Hj (Z)

ξj (θ, s0, P0) wj (θ, s0, P0)

  • Heckman

Classical Discrete Choice Theory

slide-153
SLIDE 153

Compuation in Logit Case

  • Logit ǫij is Weibull.

δj = xjβ − αpj + εj uij = xjβ − αpj + εj + ǫij sj (p, x, ε) = eδj 1 + J

j=1 eδj

δj = ln sj − ln s0, j = 1, . . . , J εj = ln sj − ln s0 − xjβ + αpj

  • See paper for more details.

Heckman Classical Discrete Choice Theory

slide-154
SLIDE 154

Generalized Method of Moments (GMM)

References:

  • Hansen (1982), Econometrica
  • Hansen and Singleton (1982,1988)
  • Also known as “minimum distance estimators”
  • Suppose that we have some data {xt}

t = 1...T and we want to test hypotheses about E (xt = µ) .

  • How do we proceed? By a CLT

1 √ T

T

  • t=1

(xt − µ) ∼ N (O, V0)

V0 = E

  • (xt − µ) (xt − µ)′

if xt iid = lim

t→∞ E

  • 1

√ T v

  • t

(xt − µ) 1 √ T (xt − µ)′

  • (general case)

Heckman Classical Discrete Choice Theory

slide-155
SLIDE 155
  • We can decompose V0 = QDQ′ where

QQ′ = I, Q−1 = Q1, D = matrix of eigenvalues V0 = QD1/2D1/2Q′ Q′V0Q = D1/2D1/2 D−1/2Q′V0QD−1/2 = I

  • under rule H0,
  • 1

√ T

  • t

(xt − µ0) ′ V −1 1 √ T

  • (xt − u0)
  • =
  • 1

√ T

  • t

(xt − µ0) ′ QD−1/2D−1/2Q′

  • 1

√ T

  • t

(xt − µ0)

  • ∼ X 2 (n)

Heckman Classical Discrete Choice Theory

slide-156
SLIDE 156
  • where n is the number of moment conditions.
  • How does test statistic behave under alternative? (µ = µ0)
  • should get large

Heckman Classical Discrete Choice Theory

slide-157
SLIDE 157
  • Write as
  • 1

√ T

  • t

(xt − µ0) ′ V −1

  • 1

√ T

  • t

(xt − µ0)

  • (4)

=

  • 1

√ T

  • t

(xt − µ0) ′ V −1

  • 1

√ T

  • t

(xt − µ0)

  • + 2

√ T

  • t

(µ − µ0)′ V −1 1 √ T

  • t

(xt − µ) + ... + 1 √ T

  • t

(µ − µ0)′ V −1 1 √ T

  • t

(µ − µ0) (5)

  • last term * is 0(T) .

λ = T (µ − µ0)′ V −1 (µ − µ0) is the noncentrality parameter

Heckman Classical Discrete Choice Theory

slide-158
SLIDE 158
  • Problems:

(i) Vo is not known a priori.

  • Estimate VT −

→ V0

  • In the setting, use sample covariance matrix.
  • In general setting, approximate limit by finite T

(i) µ not known

  • Suppose we want to test µ = ϕ (β)

ϕ specified β unknown

Heckman Classical Discrete Choice Theory

slide-159
SLIDE 159
  • Can estimate by min-x2 estimation.

min

β∈B

1 √ T

  • (xt − ϕ (β))

′ V −1

T

1 √ T

  • (xt − ϕ (β))
  • ∼x2 (n − k)

k =dimension of β n =number of moments

  • Note: Searching over k dimensions, you lose one degree of

freedom (will show next).

Heckman Classical Discrete Choice Theory

slide-160
SLIDE 160
  • Find distribution theory for ˆ

β : Q: This is a M-estimator, so how do you proceed?

  • FOC

√ T ∂ϕ ∂β

ˆ βT

V −1

T

1 √ T

  • xt − ϕ
  • ˆ

βT

  • = 0
  • Taylor expand ϕ
  • ˆ

βT

  • around ϕ
  • ˆ

β0

  • ϕ
  • ˆ

βT

  • = ϕ
  • ˆ

β0

  • + ∂ϕ

∂β

  • ˆ

β∗

  • ˆ

βT − β0

  • β∗ between β0, ˆ

βT

  • get

√ T ∂ϕ ∂β

ˆ βT

V −1

T

1 √ T

  • t
  • xt − ϕ (β0) − ∂ϕ

∂β

  • ˆ

β∗

  • ˆ

βT − β0

  • = 0

Heckman Classical Discrete Choice Theory

slide-161
SLIDE 161
  • Rearrange to solve for
  • ˆ

βT − β0

  • +

T ∂ϕ ∂β

ˆ βT

V −1

T

∂ϕ ∂β

  • ˆ

β∗

ˆ βT − β0 T √ T = √ T ∂ϕ ∂β

ˆ βT

V −1

T

1 √ T

  • t

xt − ϕ

  • ˆ

βT

  • if ∂ϕ

∂β

ˆ βT

→ D0 (Convergence of random function) VT → V0 ∂ϕ ∂β

  • ˆ

β∗

→ D0

Heckman Classical Discrete Choice Theory

slide-162
SLIDE 162
  • Apply CLT to

1 √ T

  • t

xt − ϕ

  • ˆ

βT

  • ∼ N (0, V0)
  • Then

√ T

  • ˆ

βT − β0

  • ∼ N
  • 0,
  • D′

0V −1 0 D0

−1 D′

0V −1 0 D0

−1′

Heckman Classical Discrete Choice Theory

slide-163
SLIDE 163
  • Why is the limiting distribution χ2 (n − k)?
  • Write

1 √ T

  • t

xt−ϕ

  • ˆ

βT

  • =

1 √ T

  • t

(xt − µ0)+ 1 √ T

  • t
  • µ0 − ϕ
  • ˆ

βT

  • Recall, we had

ϕ

  • ˆ

βT

  • =

ϕ (β0) + ∂ϕ ∂β

  • ˆ

β∗

(6)   

  • ∂ϕ

∂β

ˆ βT

V −1

T

∂ϕ ∂β

  • ˆ

β∗

−1 · ∂ϕ ∂β

  • ˆ

βT

· V −1

T

1 √ T

  • t

(xt − µ0)    definition of √ T

  • ˆ

βT − β0

  • derived easier.

(7)

Heckman Classical Discrete Choice Theory

slide-164
SLIDE 164
  • Note that the second term (6) is a linear combination of the

first. ⇒ 1 √ T

  • t

xt − ϕ

  • ˆ

βT

  • = B 1

√ T

  • t

(xt − µ0)

  • where B = I − ∂ϕ

∂β

  • β0
  • ∂ϕ

∂β

β0 V −1 ∂ϕ ∂β

  • β0

−1

∂ϕ ∂β

β0 V −1

= B0 Note that ∂ϕ ∂β

β0

V −1

0 B0 = 0

Heckman Classical Discrete Choice Theory

slide-165
SLIDE 165
  • This tells us that certain linear combinations of B will give a

degerate distribution (along k dimensions)

  • This needs to be taken into account when testing.
  • Recall that we had V0 = QDQ′

QQ′ = I V −1 = QD−1/2D−1/2Q′ D−1/2Q′ 1 √ T

  • t

xt − ϕ

  • ˆ

βT

  • = D−1/2Q′B 1

√ T

  • t

(xt − µ0)

  • where

D−1/2Q′B =D−1/2Q′ − D−1/2Q′ · ∂ϕ ∂β

  • ˆ

βT

  • ∂ϕ

∂β

ˆ βT

QD−1/2D−1/2Q′ ∂ϕ ∂β

  • ˆ

βT

−1 ∂ϕ ∂β

β0

QD−1/2D−1/2Q′ =

  • I − A
  • A′A

−1 D−1/2Q′ (idempotent matrix MA)

Heckman Classical Discrete Choice Theory

slide-166
SLIDE 166
  • Thus

D−1/2Q′ 1 √ T

  • t

xt − ϕ

  • ˆ

βT

  • = MA · D−1/2 · Q′ ·

1 √ T

  • t

(xt − µ0)

  • This matrix MA accounts for the fact that we performed the

minimization over β

Heckman Classical Discrete Choice Theory

slide-167
SLIDE 167

How is distribution theory affected?

  • Have a quadratic form in normal r.v’s with idempotent matrix
  • ex. ˆ

ε · ˆ ε = ε′Mxε Mx = I − x (x′x)−1 x′

  • Me facts

(i) Theorem

  • Let Y ∼ N
  • θ, σ2In
  • and let P be a symmetric matrix of rank γ
  • Then Q = (Y −B)′P(Y −B)

σ2

∼ x2

r

iff p2 = p (i.e. P idempotent)

  • See Seber, p.37

(ii) if Qi ∼ X 2 ri

i = 1, 2 r1 > r2 and Q = Q1 − Q2 is independent of Q2, then Q ∼ X 2

r1−r2

  • Apply these results to

ˆ ε′ˆ ε σ2 = ε′Mxε σ2 ∼ x2 (rank Mx)

Heckman Classical Discrete Choice Theory

slide-168
SLIDE 168
  • Rank of an idempotent matrix is equal to its trace and

tr (A) =

n

  • i=1

λi λi eigenvalues (8) for idempotent matrix, eigenvalues are all 0 or 1.

  • (note rank = #non-zero eigenvalues for idempotent eigenvalues

all 0 or 1)

Heckman Classical Discrete Choice Theory

slide-169
SLIDE 169

rank

  • I − x (x′x)−1 x′

=rank (I) − rank

  • x (x′x)−1 x′

where rank (I) =n rank

  • x (x′x)−1 x′

=trace

  • x′x (x′x)−1

sincetr (AB) = tr (BA) =trace (Ik) rank

  • I − x (x′x)−1 x′

= nk

Heckman Classical Discrete Choice Theory

slide-170
SLIDE 170
  • Thus, by same reasoning limiting distribution of
  • 1

√ T

  • t

xt − ϕ

  • ˆ

βT

  • V −1

T

  • 1

√ T

  • t

xt − ϕ

  • ˆ

βT

  • ∼ x2 (n − k) = rank (A)
  • We preserve x2 but lose degrees of freedom in estimating β.
  • In case where n = k (just identified case)
  • we can estimate β but have no degrees of freedom left to

perform the test.

  • Would GMM provide a method fo estimating β if we used a

weighting matrix other than V −1

T ?

  • Why not replace V −1

0 by w0?

Heckman Classical Discrete Choice Theory

slide-171
SLIDE 171

min

β∈B

1 √ T

  • t
  • xt − ϕ
  • ˆ

βT ′ w0 1 √ T

  • t

xt − ϕ (β)

  • Could choose w0 = I (avoid need to estimate weighting matrix)
  • Result: Asymptotic covariance is altered and asymptotic

distribution of criterion is different, but

1 √ T

  • t
  • xt − ϕ
  • ˆ

βT

  • will still be normal.

Heckman Classical Discrete Choice Theory

slide-172
SLIDE 172
  • What is the advantage of focusing on minimum x2estimation?
  • Choosing w0 = V −1

0 gives smallest covariance matrix. Get most

efficient estimator for β and most powerful test of restrictions.

  • Show this: Show

(D′

0w0D0)−1

D′

0w0V −1 0 w0D0

D′

0w0D−1

−1 − (D′

0V0D0)−1 is

P.S.D

  • where (D′

0w0D0)−1

D′

0w0V −1 0 w0D0

D′

0w0D−1

−1 is the covariance matrix for √ T

  • ˆ

βT − β0

  • when general weighting

matrix is used.

  • Equivalent to showing
  • D0V −1

0 D0

  • − (D′

0w0D0)

  • D′

0w0V −1 0 w0D0

−1 (D′

0w0D0)

is P.S.D Show that it can be written as a quadratic form

  • Take any vector α

Heckman Classical Discrete Choice Theory

slide-173
SLIDE 173

α′ D′V −1 D0 − (D′

0W0D0) (D′ 0W0V0W0D0)−1 (D′ 0W0D0)

  • α

=α′D′

0V −1/2

  • I − V ′1/2

W0D0

  • D′

0W0V 1/2

· Y 1/2 W0D0 −1 D′

0W0V 1/2

  • Y −1/2

D0α =α′D′

0V −1/2

  • I − ˜

V

  • ˜

V ′ ˜ V −1 ˜ V ′

  • V −1/2D0α ≥ 0

(= 0 if W0 = V −1 )

  • Therefore W0 = V −1

is the optimal choice for the weighting matrix.

Heckman Classical Discrete Choice Theory

slide-174
SLIDE 174

Many standard estimators can be interpreted as GMM estimators

  • Some examples:

(1) OLS

yt = xtβ + ut E (utxt) = 0 ⇒ E ((yt − xtβ) xt) = 0 min

β∈B

  • 1

√ T

  • t

(yt − xtβ) xt ′ V −1

T

  • 1

√ T

  • t

(yt − xtβ) xt

  • where yt is the estimator for E (xtutu′

tx′ t) for idd

case = σ2E (xtx′

t) if homoskedastic.

Heckman Classical Discrete Choice Theory

slide-175
SLIDE 175

(2) Instrumental Variables yt = xtβ + ut E (utxt) = E (utzt) = E (xtzt) = ˆ βT = arg min

β∈B

1 √ T

  • (yt − xtβ) Zt

′ V −1

T

1 √ T

  • (yt − xtβ)

where VT = ˆ E (ztutu′

tz′ t) in idd case

Heckman Classical Discrete Choice Theory

slide-176
SLIDE 176
  • Estimator for

lim

T→∞ E

  • 1

√ T

  • t

ztut 1 √ T

  • zsus

′ (time series case)

  • Suppose

E (utu′

t | zt) = σ2I

  • then

W0 = σ2E (ztz′

t)

  • Can verify that 2SLS and GMM give same estimator

Heckman Classical Discrete Choice Theory

slide-177
SLIDE 177

ˆ β2SLS =

  • x′z (z′z)−1 z′x

−1 x′z (z′z)−1 z′y

  • Note: In first stage regress x on Z

ˆ x = z (z′z)−1 z′x ˆ y = z (z′z)−1 z′y

var

  • ˆ

β2SLS

  • =
  • E (xizi) E (ziz′

i )−1 E (zixi)

−1 E (xizi) E (ziz′

i )−1 E (ziuiu′ iz′ i )

· E (ziz′

i )−1 E (xizi)′

E (xizi) E (ziz′

i )−1 E (zixi)

−1

  • Under GMM

Heckman Classical Discrete Choice Theory

slide-178
SLIDE 178

var

  • ˆ

βGMM

  • =
  • D0V −1

0 D0

−1 (when W0 = V −1

0 )

D0 = ∂ϕ ∂β |β0= p lim

1 n

  • xiz

i = E (xiz′ i )

W0 =

  • σ2−1 E (ziz′

i )−1

Heckman Classical Discrete Choice Theory

slide-179
SLIDE 179

In the presense of heterskedastocity, weighting matrix would be different (and 2SLS and GMM not the same)

W0 = E (z′uu′z)−1 = E (z′E (uu′ | z) z)−1 = E (z′vz)−1

  • with panel data could have

E (uu′ | z) =      v1 · · · v2 · · · ... vT      = y

  • allow for correlation over time for given individual, but iid

across individuals.

Heckman Classical Discrete Choice Theory

slide-180
SLIDE 180

Nonlinear least squares

yt = ϕ (xt, β) + ut E (utϕ (xt; β)) = 0 min

β∈B

1 √ T

  • (yt − ϕ (xt; β)) ϕ (xt; β)
  • V −1

T

1 √ T

  • (yt − ϕ (xt; β)) ϕ (xt; β)
  • General Method of Moments

min

β∈B

  • t

ft (β)′ V −1

  • t

ft (β)

  • Heckman

Classical Discrete Choice Theory

slide-181
SLIDE 181

Nonlinear least squares

  • where

1 √ T

  • ft (β0)

→ N (O, V0) 1 √ T

  • ft

→ Eft

  • In general, ft is a random function.

Heckman Classical Discrete Choice Theory

slide-182
SLIDE 182
  • Suppose we want to estimate β 5 × 1 and we have 6 potential
  • instruments. Can we test validity of the instruments? What if

we have 5 instruments? If we assume E (εi | xi) = 0 instead of E (εixi) = 0 (i. e. conditional instead of unconditional) then have infinite number of moment conditions. E (εif (xi)) = E (E (εi | xi) f (xi)) = 0 any f (xi)

  • How to optimally choose which moment conditions to use is

current area of research. How might you use GMM to check if a variable is normally distributed?

Heckman Classical Discrete Choice Theory

slide-183
SLIDE 183

Return to main text

Heckman Classical Discrete Choice Theory