[PPT] - ADVANCED ECONOMETRICS I Theory (3/3) Instructor: Joaquim J. S. PowerPoint Presentation

SLIDE 1

ADVANCED ECONOMETRICS I

Theory (3/3)

Instructor: Joaquim J. S. Ramalho E.mail: jjsro@iscte-iul.pt Personal Website: http://home.iscte-iul.pt/~jjsro Office: D5.10 Course Website: https://jjsramalho.wixsite.com/advecoi Fénix: https://fenix.iscte-iul.pt/disciplinas/03089

SLIDE 2

Joaquim J.S. Ramalho

Ordered choices:

Values for the dependent variable: 𝑍 ∈ 0,1, … , 𝑁 − 1 Latent model: 𝑍

𝑗 ∗ = 𝑦𝑗 ′𝛾 + 𝑣𝑗

▪ 𝑦𝑗 cannot include an intercept

Individual behaviour observed only by intervals: 𝑍𝑗 = ൞ if 𝑍

𝑗 ∗ ≤ 𝛿0

𝑛 if 𝛿𝑛−1 < 𝑍

𝑗 ∗ ≤ 𝛿𝑛,

1 ≤ 𝑛 ≤ 𝑁 − 2 𝑁 − 1 if 𝑍

𝑗 ∗ > 𝛿𝑁−2

▪ Example:

– 𝑍

𝑗 ∗ is a latent measure of the health status

– 𝑍𝑗 is an observed health indicator: poor, satisfactory, good, excellent

Assumption: the 𝛿𝑘’s are not known

3. Discrete Choice Models

3.2. Models for Ordered Choices

2020/2021 Advanced Econometrics I 2

SLIDE 3

Joaquim J.S. Ramalho

Probabilities:

Aim:

▪ Modelling the probability of observing 𝑍

𝑗 ∗ in a given interval

Each probability is based on the same 𝐻 ∙ functions used with binary choices, being given by:

𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗 = 𝑄𝑠 𝛿𝑛−1 < 𝑍 𝑗 ∗ ≤ 𝛿𝑛|𝑦𝑗

= 𝑄𝑠 𝑍

𝑗 ∗ ≤ 𝛿𝑛|𝑦𝑗 − 𝑄𝑠 𝑍 𝑗 ∗ < 𝛿𝑛−1|𝑦𝑗

= 𝑄𝑠 𝑦𝑗

′𝛾 + 𝑣𝑗 ≤ 𝛿𝑛|𝑦𝑗 − 𝑄𝑠 𝑦𝑗 ′𝛾 + 𝑣𝑗 < 𝛿𝑛−1|𝑦𝑗

= 𝑄𝑠 𝑣𝑗 ≤ 𝛿𝑛 − 𝑦𝑗

′𝛾|𝑦𝑗 − 𝑄𝑠 𝑣𝑗 < 𝛿𝑛−1 − 𝑦𝑗 ′𝛾|𝑦𝑗

= 𝐻 𝛿𝑛 − 𝑦𝑗

′𝛾 − 𝐻 𝛿𝑛−1 − 𝑦𝑗 ′𝛾

Hence, the general case is:

𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗 = ൞

𝐻 𝛿0 − 𝑦𝑗

′𝛾

if 𝑛 = 0 𝐻 𝛿𝑛 − 𝑦𝑗

′𝛾 − 𝐻 𝛿𝑛−1 − 𝑦𝑗 ′𝛾 if 1 ≤ 𝑛 ≤ 𝑁 − 2

1 − 𝐻 𝛿𝑁−2 − 𝑦𝑗

′𝛾

if 𝑛 = 𝑁 − 1

3. Discrete Choice Models

3.2. Models for Ordered Choices

2020/2021 Advanced Econometrics I 3

SLIDE 4

Joaquim J.S. Ramalho

Estimation:

Parameters to be estimated:

▪ 𝛾 ▪ 𝛿0, … , 𝛿𝑁−2

Estimation method:

▪ Maximum likelihood

Most common models:

▪ Ordered logit ▪ Ordered probit

3. Discrete Choice Models

3.2. Models for Ordered Choices

2020/2021 Advanced Econometrics I 4

Stata

logit Y 𝑌1 … 𝑌𝑙
probit Y 𝑌1 … 𝑌𝑙

SLIDE 5

Joaquim J.S. Ramalho

Partial effects:

Each 𝑌𝑘 affects 𝑁 probabilities: ∆𝑌

𝑘 = 1 ⟹ ∆𝑄𝑠 𝑍 = 𝑛 𝑌

= ൞ −𝛾𝑘𝑕 𝛿0 − 𝑦𝑗

′𝛾

if 𝑛 = 0 𝛾𝑘 𝑕 𝛿𝑛 − 𝑦𝑗

′𝛾 − 𝑕 𝛿𝑛−1 − 𝑦𝑗 ′𝛾

if 1 ≤ 𝑛 ≤ 𝑁 − 2 𝛾𝑘𝑕 𝛿𝑁−2 − 𝑦𝑗

′𝛾

if 𝑛 = 𝑁 − 1 The sign of 𝛾𝑘 is informative about the direction of ∆𝑄𝑠 𝑍 = 0 𝑌 and ∆𝑄𝑠 𝑍 = 𝑁 − 1 𝑌 but not of the changes in the remaining probabilities

3. Discrete Choice Models

3.2. Models for Ordered Choices

2020/2021 Advanced Econometrics I 5

SLIDE 6

Joaquim J.S. Ramalho

Multinomial choices:

Values for the dependent variable: 𝑍 ∈ 0,1, … , 𝑁 − 1 Latent model:

▪ Each individual has a given utility associated with each alternative:

𝑉𝑗𝑛 = 𝑦𝑗𝑛

′ 𝛾 + 𝑣𝑗𝑛

▪ The selected alternative is the one that maximizes utility: 𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗 = 𝑄𝑠 𝑉𝑗𝑛 = 𝑛𝑏𝑦 𝑉𝑗1, … 𝑉𝑗𝑁 |𝑦𝑗

Main models:

▪ Multinomial Logit: 𝑉𝑗𝑛 ~ 𝐻𝑣𝑛𝑐𝑓𝑚 and 𝑉𝑗𝑛 independent ∀𝑛 ▪ Multinomial Probit: 𝑉𝑗𝑛 ~ 𝑂𝑝𝑠𝑛𝑏𝑚 ▪ Nested Logit ▪ Random Parameters Logit

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 6

SLIDE 7

Joaquim J.S. Ramalho

Explanatory variables:

𝑦𝑗𝑛 may include:

▪ 𝑦𝑗𝑛: variables that are different across individuals and alternatives ▪ 𝑦𝑛: variables that differ across alternatives but not individuals ▪ 𝑦𝑗: variables that differ across individuals but not alternatives

Example:

▪ 𝑍

𝑗 - selected means of transport to go to work

▪ 𝑦𝑗𝑛 - time that each individual 𝑗 takes in going to work when using transport 𝑛 ▪ 𝑦𝑛 - price of transport 𝑛 ▪ 𝑦𝑗 - age of individual 𝑗

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 7

SLIDE 8

Joaquim J.S. Ramalho

Multinomial logit:

𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗𝑛 = 𝐻𝑛 𝑦𝑗𝑛 ′ 𝛾 + 𝑦𝑗 ′𝛾𝑛 =

𝑓𝑦𝑗𝑛

′ 𝛾+𝑦𝑗 ′𝛾𝑛

σ𝑘=0

𝑁−1 𝑓𝑦𝑗𝑘

′ 𝛾+𝑦𝑗 ′𝛾𝑘

𝛾𝑛 has to be normalized, that is for one alternative (base

utcome) its value is set to zero

𝛾 cannot include a constant term Independence of Irrelevant Alternatives (IIA) – the odds ratio between two alternatives does not depend on the remaining alternatives:

𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗𝑛

𝑄𝑠 𝑍

𝑗 = 𝑚|𝑦𝑗𝑛

= 𝑓𝑦𝑗𝑛

′ 𝛾+𝑦𝑗 ′𝛾𝑛

𝑓𝑦𝑗𝑚

′ 𝛾+𝑦𝑗 ′𝛾𝑚

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 8

SLIDE 9

Joaquim J.S. Ramalho

When all explanatory variables are of the type 𝑦𝑗𝑛 and 𝑦𝑛, the choice between alternatives 𝑛 and 𝑚 is fully explained by diferences in the alternative characteristics: 𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗𝑛

𝑄𝑠 𝑍

𝑗 = 𝑚|𝑦𝑗𝑚

= 𝑓𝑦𝑗𝑛

′ 𝛾

𝑓𝑦𝑗𝑚

′ 𝛾 = 𝑓 𝑦𝑗𝑛 ′ −𝑦𝑗𝑚 ′ 𝛾

▪ Is this case, the model is often called ‘conditional logit’

When all explanatory variables are of the type 𝑦𝑗, the choice between alternatives 𝑛 e 𝑚 is fully explained by diferences between 𝛾𝑛 e 𝛾𝑚: 𝑄𝑠 𝑍

𝑗 = 𝑛|𝑦𝑗

𝑄𝑠 𝑍

𝑗 = 𝑚|𝑦𝑗

= 𝑓𝑦𝑗

′𝛾𝑛

𝑓𝑦𝑗

′𝛾𝑚 = 𝑓𝑦𝑗 ′ 𝛾𝑛−𝛾𝑚

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 9

Stata asclogit Y 𝑌1𝑛 …, case(id) alternatives(varname) casevars(𝑌𝑗 …) basealternative(name) Stata mlogit Y 𝑌1 … 𝑌𝑙, baseoutcome(0)

SLIDE 10

Joaquim J.S. Ramalho

Estimation:

▪ Maximum likelihood based on the following log-likelihood function: 𝑀𝑀 = ෍

𝑗=1 𝑂

𝑒𝑗𝑛𝑚𝑝𝑕 𝐻𝑛 𝑦𝑗𝑛

′ 𝛾 + 𝑦𝑗 ′𝛾𝑛

▪ 𝑒𝑗𝑛 = 1 if individual 𝑗 chooses alternative 𝑛

Partial effects:

▪ ∆𝑌𝑗𝑘 = 1 ⟹

– ∆𝑄𝑠 𝑍

𝑗 = 𝑛 𝑌 = 𝛾𝑘𝐻𝑛 ∙ 𝑒𝑗𝑛 − 𝐻𝑛 ∙

– 𝛾𝑘 gives the sign of the partial effect

▪ ∆𝑌𝑗 = 1 ⟹

– ∆𝑄𝑠 𝑍

𝑗 = 𝑛 𝑌 = 𝐻𝑛 ∙

𝛾𝑘 − ҧ 𝛾 , onde ҧ 𝛾 = σ𝑛=1

𝑁−1 𝛾𝑛𝐻𝑛 ∙

– 𝛾𝑘 gives the sign of the partial effect relative to the base alternative, not the sign of the overall effect

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 10

SLIDE 11

Joaquim J.S. Ramalho

Testing IIA

▪ Hausman test comparing:

– Full multinomial logit model – Multinomial logit model excluding one or more alternatives

▪ If multinomial logit is the correct model, then both models produce consistent estimators (null hypothesis) ▪ If multinomial logit is not the correct model, then the results generated by both models will be different (alternative hypothesis)

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 11

Stata mlogit Y 𝑌1 … 𝑌𝑙, baseoutcome(0) (ou asclogit…) estimates store Mod1 mlogit Y 𝑌1 … 𝑌𝑙 if Y != 3, baseoutcome(0) (ou asclogit…) estimates store Mod2 hausman Mod1 Mod2

SLIDE 12

Joaquim J.S. Ramalho

Multinomial probit:

Not affected by the IIA property Very complex, requiring the computation of 𝑁 − 1 integrals The version implemented in Stata assumes independent errors, which eliminates the only advantage of multinomial probit over multinomial logit

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 12

Stata mprobit Y 𝑌1 … 𝑌𝑙, baseoutcome(0)

SLIDE 13

Joaquim J.S. Ramalho

Nested logit:

Not affected by the IIA property, grouping the choices in several sets in such a way that:

▪ Within each group, alternatives may be correlated ▪ Between groups, alternatives are independent

Results from a sequential decision process – example for a two-level process:

▪ Level 1 – defining J groups, ▪ Level 2 – defining 𝑁

𝑘 choices in each group

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 13

Financing Own funds Bank debt Bank 1 Stock market Bank N Lisbon Frankfurt

Stata nlogit …

SLIDE 14

Joaquim J.S. Ramalho

Random parameters logit:

Latent model: 𝑉𝑗𝑛 = 𝑦𝑗𝑛

′ 𝛾𝑗 + 𝑣𝑗𝑛

Most common assumption: 𝛾𝑗 ~ 𝑂 𝛾, Σ𝛾 Not affected by the IIA property If Σ𝛾 = 0, it reduces to the Multinomial Logit model; hence, comparing the two models allows the IIA property to be tested

3. Discrete Choice Models

3.3. Models for Multinomial Choices

2020/2021 Advanced Econometrics I 14

SLIDE 15

Joaquim J.S. Ramalho

4.1. Models for Nonnegative Outcomes 4.2. Models for Fractional Responses 4.3. Models for Discrete-Continuous Responses

4. Models for Continuous Limited Dependent Variables

2020/2021 Advanced Econometrics I 15

SLIDE 16

Joaquim J.S. Ramalho

Nonnegative outcomes can be:

▪ Continuous: 𝑍 ϵ 0, +∞

– Examples: prices, wages,…

▪ Discrete (counts): 𝑍 ϵ 0,1,2,3, …

– Examples: patents applied for by a firm in a year, times someone is arrested in a year,...

Linear regression models are not the most suitable option because:

▪ May generate negative predictions for the dependent variable ▪ At least close to the lower bound of 𝑍, it does not make sense to assume constant partial effects

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 16

SLIDE 17

Joaquim J.S. Ramalho

Log-linear regression model:

ln 𝑍

𝑗 = 𝛾0 + 𝛾1𝑦1𝑗 + ⋯ + 𝛾𝑙𝑦𝑙𝑗 + 𝑣𝑗

With this transformation, the dependent variable becomes unbounded: 𝑍 ∈ ]0, +∞[⟹ ln 𝑍 ∈ ] − ∞, +∞[ Assumption: 𝐹 𝑣𝑗|𝑦 = 0 However, two new problems arise:

▪ The log-linear model is not defined for 𝑍 = 0; adding a small constant value to 𝑍 or dropping zeros are not in general good solutions ▪ Prediction is more interesting in the original scale, ෡ 𝑍

𝑗, and not in the

logarithmic scale, ෣ ln 𝑍

𝑗 ; the log-linear model gives the latter directly

but retransforming it to the original scale requires additional assumptions and calculations and/or the application of relatively complex methods (see the next slide to understand the problem)

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 17

SLIDE 18

Joaquim J.S. Ramalho

Assumed model: ln 𝑍

𝑗 = 𝛾0 + 𝛾1𝑦1𝑗 + ⋯ + 𝛾𝑙𝑦𝑙𝑗 + 𝑣𝑗

▪ Consistent estimation requires 𝐹 𝑣𝑗|𝑦 = 0 ▪ Under 𝐹 𝑣𝑗|𝑦 = 0: 𝐹 ln 𝑍

𝑗 |𝑦 = 𝛾0 + 𝛾1𝑦1𝑗 + ⋯ + 𝛾𝑙𝑦𝑙𝑗

෣ ln 𝑍

𝑗 = መ

𝛾0 + መ 𝛾1𝑦1𝑗 + ⋯ + መ 𝛾𝑙𝑦𝑙𝑗

Prediction of 𝑍

𝑗:

▪ If ln 𝑍

𝑗 = 𝛾0 + 𝛾1𝑦1𝑗 + ⋯ + 𝛾𝑙𝑦𝑙𝑗 + 𝑣𝑗, then:

𝑍

𝑗 = 𝑓𝛾0+𝛾1𝑦1𝑗+⋯+𝛾𝑙𝑦𝑙𝑗+𝑣𝑗

and 𝐹 𝑍

𝑗|𝑦 = 𝑓𝛾0+𝛾1𝑦1𝑗+⋯+𝛾𝑙𝑦𝑙𝑗𝐹 𝑓𝑣𝑗|𝑦

▪ Consistent prediction of 𝑍

𝑗 would require assuming 𝐹 𝑓𝑣𝑗|𝑦 = 1;

however, the assumption made, 𝐹 𝑣𝑗|𝑦 = 0, implies that, in general, 𝐹 𝑓𝑣𝑗|𝑦 ≠ 1 ▪ Alternatively, we need to get a consistent estimate of 𝐹 𝑓𝑣𝑗|𝑦 , which requires additional assumptions

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 18

SLIDE 19

Joaquim J.S. Ramalho

Exponential regression model:

𝑍 = exp 𝑦′𝛾 + 𝑣 𝐹 𝑍|𝑌 = exp 𝑦′𝛾 Assumption: 𝐹 𝑓𝑣|𝑦 = 1 Advantages:

▪ ෡ 𝑍

𝑗 is always nonnegative

▪ Predictions are obtained directly in the original scale, without requiring any retransformations

Partial effects: ∆𝑌

𝑘 = 1 ⟹ ∆𝐹 𝑍 𝑌 = 𝛾𝑘exp 𝑦′𝛾

▪ The sign of the effect is given by the sign of 𝛾𝑘 ▪ 𝛾𝑘 can be interpreted as a semi-elasticity (see the next slide for a proof)

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 19

SLIDE 20

Joaquim J.S. Ramalho

∆𝑌

𝑘 = 1 ⟹ ∆𝐹 𝑍 𝑌 = 𝛾𝑘exp 𝑦′𝛾

⟹ ∆𝐹 𝑍 𝑌 = 𝛾𝑘𝐹 𝑍 𝑌 ⟹ ∆𝐹 𝑍 𝑌 𝐹 𝑍|𝑌 = 𝛾𝑘 ⟹ 100 ∆𝐹 𝑍 𝑌 𝐹 𝑍|𝑌 = 100𝛾𝑘 ⟹ %∆𝐹 𝑍 𝑌 = 100𝛾𝑘%

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 20

SLIDE 21

Joaquim J.S. Ramalho

Assumptions and estimation methods according to the type of nonnegative outcome:

▪ Continuous response:

– Assumption: only 𝐹 𝑍|𝑌 ; estimation: QML

▪ Count data - two alternatives:

– Assumption: only 𝐹 𝑍|𝑌 ; estimation: QML – Assumption: 𝐹 𝑍|𝑌 and 𝑄𝑠 𝑍 = 𝑘|𝑌 ; estimation: ML

Three main distribution functions are used as basis for QML and/or ML estimation:

▪ Poisson ▪ Negative Binomial 1 ▪ Negative Binomial 2

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 21

SLIDE 22

Joaquim J.S. Ramalho

Poisson regression model:

𝑍

𝑗 ~ 𝑄𝑝𝑗𝑡𝑡𝑝𝑜 𝜇𝑗 ⟹ 𝑄𝑠 𝑍 𝑗 = 𝑧|𝑦𝑗 = 𝑓−𝜇𝑗𝜇𝑗 𝑧

𝑧! where 𝜇𝑗 = 𝐹 𝑍|𝑌 = exp 𝑦′𝛾 Estimation methods: ML (only count data) or QML, since the Poisson distribution belongs to the linear exponential family By definition, 𝐹 𝑍|𝑌 = 𝑊𝑏𝑠 𝑍|𝑌 (equidispersion), which may be a strong assumption is some empirical applications

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 22

Stata ML: poisson Y 𝑌1 … 𝑌𝑙 QML: poisson Y 𝑌1 … 𝑌𝑙, robust

SLIDE 23

Joaquim J.S. Ramalho

Negative binomial regression models:

Two variants, both allowing for overdispersion (𝜀 > 0):

▪ NEGBIN1: 𝑊𝑏𝑠 𝑍|𝑌 = 1 + 𝜀 𝐹 𝑍|𝑌 - ML estimation ▪ NEGBIN2: 𝑊𝑏𝑠 𝑍|𝑌 = 1 + 𝜀𝐹 𝑍|𝑌 𝐹 𝑍|𝑌 - it belongs to the linear exponential family, enabling estimation by both ML (only count data) and QML

Overdispersion test:

𝐼0: 𝜀 = 0 (Poisson model) 𝐼1: 𝜀 ≠ 0 (Negative Binomial 1 or 2 model)

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 23

Stata NEGBIN1: nbreg Y 𝑌1 … 𝑌𝑙, dispersion(constant) NEGBIN2 (ML): nbreg Y 𝑌1 … 𝑌𝑙, dispersion(mean) NEGBIN2 (QML): nbreg Y 𝑌1 … 𝑌𝑙, dispersion(mean) robust

SLIDE 24

Joaquim J.S. Ramalho

Base panel data model:

Continuous / count data: 𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢, 𝛽𝑗 = exp 𝛿𝑗 + 𝑦𝑗𝑢 ′ 𝛾 = 𝛽𝑗exp 𝑦𝑗𝑢 ′ 𝛾

Count data: 𝑄𝑠 𝑍

𝑗𝑢 = 𝑧|𝑦𝑗𝑢, 𝛽𝑗 = 𝑓−𝜇𝑗𝑢𝜇𝑗𝑢 𝑧

𝑧! 𝜇𝑗 = 𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢, 𝛽𝑗 = 𝛽𝑗exp 𝑦𝑗𝑢 ′ 𝛾

Pooled estimator:

Based on the cross-sectional assumption 𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢 =

exp 𝑦𝑗𝑢

′ 𝛾

Produces consistent estimators only if 𝐹 𝛽𝑗 𝑦𝑗𝑢 = 1

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 24

Stata poisson Y 𝑌1 … 𝑌𝑙, vce(cluster clustvar)

SLIDE 25

Joaquim J.S. Ramalho

Random Effects Poisson Estimator:

Assumptions:

▪ 𝑍

𝑗𝑢 ~ 𝑄𝑝𝑗𝑡𝑡𝑝𝑜 𝜇𝑗𝑢

▪ 𝜇𝑗 = 𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢, 𝛽𝑗 = 𝛽𝑗exp 𝑦𝑗𝑢 ′ 𝛾

▪ log 𝛽𝑗 = 𝛿𝑗 ~ 𝐻𝑏𝑛𝑛𝑏 1, 𝜃

Resulting model:

▪ NEGBIN2-type model ▪ Estimation method: ML ▪ 𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢 = exp 𝑦𝑗𝑢 ′ 𝛾 , which implies that the Pooled estimator is

consistent under random effects of this type

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 25

Stata xtpoisson Y 𝑌1 … 𝑌𝑙, re vce(robust)

SLIDE 26

Joaquim J.S. Ramalho

Fixed Effects Estimators:

Fixed effects Poisson estimator (three equivalent versions):

▪ Pooled estimator with individual effects ▪ Estimator conditional on σ𝑢=1

𝑈

𝑍

𝑗𝑢, with σ𝑢=1 𝑈

𝑍

𝑗𝑢 ≠ 0

▪ Quasi mean-differenced GMM estimator (Hausman, Hall and Griliches, 1984)

Quasi-differences GMM estimator:

▪ Chamberlain (1992) ▪ Wooldridge (1997)

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 26

SLIDE 27

Joaquim J.S. Ramalho

Fixed effects Poisson estimator:

May be derived using the three equivalent versions Pooled estimator with individual effects:

▪ Adds individual dummies, associated to the 𝛿𝑗

′s

▪ As in linear models, 𝛾 is consistently estimated even in short panels (no incidental parameters problem)

The quasi mean-differenced GMM estimator is based on the following moment condition: 𝐹 ቤ 𝑍

𝑗𝑢 − 𝜇𝑗𝑢

ҧ 𝜇𝑗 ത 𝑍

𝑗 𝑦𝑗𝑢

= 0, where 𝜇𝑗𝑢 = 𝑓𝑦𝑞 𝑦𝑗𝑢

′ 𝛾

Requires strictly exogenous explanatory variables

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 27

Stata xtpoisson Y 𝑌1 … 𝑌𝑙, fe vce(robust)

SLIDE 28

Joaquim J.S. Ramalho

Quasi-differences GMM estimator :

Chamberlain (1992): 𝐹 ቤ 𝜇𝑗𝑢 𝜇𝑗,𝑢−1 𝑍

𝑗𝑢 − 𝑍 𝑗,𝑢−1 𝑦𝑗𝑢

= 0 Wooldridge (1997):

𝐹 ቤ 𝑍

𝑗𝑢

𝜇𝑗𝑢 − 𝑍

𝑗,𝑢−1

𝜇𝑗,𝑢−1 𝑦𝑗𝑢 = 0

In both cases the explanatory variables do not need to be strictly exogenous, so these estimators are particularly useful in dynamic models

4. Models for Continuous Limited Dependent Variables

4.1. Models for Nonnegative Outcomes

2020/2021 Advanced Econometrics I 28

SLIDE 29

Joaquim J.S. Ramalho

Fractional outcomes:

𝑍 ϵ 0,1

Base specification:

𝐹 𝑍 𝑌 = 𝐻 𝑦′𝛾 where the 𝐻 ∙ function must respect the restriction 0 ≤ 𝐻 ∙ ≤ 1

Main models:

Fractional regression model: assumes only 𝐹 𝑍|𝑌 Beta regression model: assumes also 𝑄𝑠 𝑍|𝑌 Transformation regression models (assume only 𝐹 𝑍|𝑌 ):

▪ Linear transformation ▪ Exponential transformation

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 29

SLIDE 30

Joaquim J.S. Ramalho

Fractional regression models:

Very similar to binary regression models

▪ Main models: Logit, Probit, Cloglog ▪ Partial effects calculated using the same expressions ▪ Estimation also based on the Bernoulli function, but only by QML

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 30

Stata glm Y 𝑌1 … 𝑌𝑙, family(binomial) link(logit) robust glm Y 𝑌1 … 𝑌𝑙, family(binomial) link(probit) robust glm Y 𝑌1 … 𝑌𝑙, family(binomial) link(cloglog) robust

SLIDE 31

Joaquim J.S. Ramalho

Beta regression model:

Assumes also 𝐹 𝑍 𝑌 = 𝐻 𝑦′𝛾 , using the same functions for 𝐻 ∙ Additional assumption: 𝑍

𝑗 ~ 𝐶𝑓𝑢𝑏, with mean given by 𝐻 𝑦′𝛾

and precision parameter 𝜚 Estimation only by ML: more efficient, less robust Only available when 𝑍 ϵ 0,1

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 31

SLIDE 32

Joaquim J.S. Ramalho

Linear transformation:

𝑍

𝑗 = 𝐻 𝑦𝑗 ′𝛾 + 𝑣𝑗

𝐼 𝑍

𝑗 = 𝑦𝑗 ′𝛾 + 𝑣𝑗

Alternative specifications:

▪ Logit: 𝐼 𝑍

𝑗 = ln 𝑍𝑗 1−𝑍𝑗

▪ Probit: 𝐼 𝑍

𝑗 = Φ−1 𝑍 𝑗

▪ Cloglog: 𝐼 𝑍

𝑗 = ln −ln 1 − 𝑍 𝑗

Advantages:

▪ Estimation: OLS ▪ Easy to deal with panel data and endogenous variables

Limitations:

▪ 𝐼 𝑍

𝑗 is not defined for 𝑍 𝑗 = 0 and 𝑍 𝑗 = 1

▪ Prediction in the original scale requires additional assumptions and calculations and/or the application of relatively complex methods

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 32

Example for logit: 𝑍

𝑗 =

𝑓𝑦𝑗

′𝛾+𝑣𝑗

1 + 𝑓𝑦𝑗

′𝛾+𝑣𝑗

𝑍

𝑗 + 𝑍 𝑗𝑓𝑦𝑗

′𝛾+𝑣𝑗 = 𝑓𝑦𝑗 ′𝛾+𝑣𝑗

𝑍

𝑗 = 𝑓𝑦𝑗

′𝛾+𝑣𝑗 − 𝑍

𝑗𝑓𝑦𝑗

′𝛾+𝑣𝑗

𝑍

𝑗 = 1 − 𝑍 𝑗 𝑓𝑦𝑗

′𝛾+𝑣𝑗

𝑍

𝑗

1 − 𝑍

𝑗

= 𝑓𝑦𝑗

′𝛾+𝑣𝑗

ln 𝑍

𝑗

1 − 𝑍

𝑗

= 𝑦𝑗

′𝛾 + 𝑣𝑗

SLIDE 33

Joaquim J.S. Ramalho

Exponential transformation:

𝑍

𝑗 = 𝐻 𝑦𝑗 ′𝛾 + 𝑣𝑗 = 𝐻1 exp 𝑦𝑗 ′𝛾 + 𝑣𝑗

𝐼1 𝑍

𝑗 = exp 𝑦𝑗 ′𝛾 + 𝑣𝑗

Alternative specifications:

▪ Logit: 𝐼1 𝑍

𝑗 = 𝑍𝑗 1−𝑍𝑗

▪ Cloglog: 𝐼1 𝑍

𝑗 = −ln 1 − 𝑍 𝑗

Advantages:

▪ Estimation: same methods as those used for nonnegative responses ▪ Easy to deal with panel data and endogenous variables

Limitations:

▪ Not aplicable to the probit model ▪ 𝐼 𝑍

𝑗 is not defined for 𝑍 𝑗 = 1 (but it is for 𝑍 𝑗 = 0)

▪ Prediction in the original scale requires additional assumptions and calculations and/or the application of relatively complex methods

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 33

SLIDE 34

Joaquim J.S. Ramalho

Multivariate fractional outcomes:

𝑍

𝑗𝑛 ϵ 0,1 , 𝑛 = 0, … , 𝑁 − 1

σ𝑛=0

𝑁−1 𝑍 𝑗𝑛 = 1

Base specification:

𝐹 𝑍

𝑗𝑛 𝑌𝑗 = 𝐻𝑛 𝑦′𝛾

The 𝐻𝑛 ∙ function must respect the restrictions 0 ≤ 𝐻𝑛 ∙ ≤ 1 and σ𝑛=0

𝑁−1 𝐻𝑛 = 1

Main models:

Multivariate fractional regression model Dirichlet regression model

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 34

SLIDE 35

Joaquim J.S. Ramalho

Multivariate fractional regression model:

Very similar to multinomial choice models

▪ Main models: Logit Multinomial, Nested Logit, Random Parameters Logit, … ▪ Partial effects calculated using the same expressions

QML estimation based on the multivariate Bernoulli function

Dirichlet regression model:

Assumes the same specifications for 𝐻𝑛 ∙ Additional assumption: 𝑍

𝑗 ~ 𝐸𝑗𝑠𝑗𝑑ℎ𝑚𝑓𝑢, with means given by

𝐻𝑛 𝑦′𝛾 and precision parameter 𝜚 Estimation only by ML: more efficient, less robust Only available when 𝑍

𝑗𝑛 ϵ 0,1

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 35

SLIDE 36

Joaquim J.S. Ramalho

Panel data - base specification:

𝐹 𝑍

𝑗𝑢 𝑦𝑗𝑢, 𝛽𝑗 = 𝐻 𝛽𝑗 + 𝑦𝑗𝑢 ′ 𝛾

Estimators:

Pooled estimator (requires 𝛽𝑗 = 𝛽 for consistency) Pooled with individual effects (requires 𝑈 ⟶ ∞ for consistency) Random effects (assumes 𝛽𝑗~𝑂 0, 𝜏𝛽

2 )

Fixed effects (based on linear or exponential transformations)

4. Models for Continuous Limited Dependent Variables

4.2. Models for Fractional Responses

2020/2021 Advanced Econometrics I 36

SLIDE 37

Joaquim J.S. Ramalho

Tobit Model Two-Part Model Sample Selection Model

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 37

SLIDE 38

Joaquim J.S. Ramalho

Motivation:

Sometimes, the dependent variable has both discrete and continuous values; typically:

▪ Discrete value: for many individuals, 𝑍

𝑗 = 0

▪ Continuous component: for the remaining individuals, 𝑍

𝑗 may take on

some positive value, which may be bounded (fractional outcome) or not (nonnegative outcome)

Examples:

▪ Expenditures on durable goods, alcohol,,... ▪ Work hours

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 38

SLIDE 39

Joaquim J.S. Ramalho

Alternative models:

Tobit model: a single model explains all values Two-part model: uses two independent models for explaining separately the zeros and the positive values Sample selection model: uses two different, but interdependent, models for explaining the zeros and the positive values

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 39

SLIDE 40

Joaquim J.S. Ramalho

Tobit model - specification:

Latent model: 𝑍

𝑗 ∗ = 𝑦𝑗 ′𝛾 + 𝑣𝑗, −∞ < 𝑍 𝑗 ∗ < +∞

Instead of 𝑍

𝑗 ∗, it is observed:

𝑍

𝑗 = ൝0 if 𝑍 𝑗 ∗ ≤ 0

𝑍

𝑗 ∗ if 𝑍 𝑗 ∗ > 0

Assumption: 𝑣𝑗 ~ 𝑂 0, 𝜏2

▪ 𝑄𝑠 𝑍

𝑗 = 0|𝑦𝑗 = 𝑄𝑠 𝑍 𝑗 ∗ ≤ 0|𝑦𝑗 = 𝑄𝑠 𝑦𝑗 ′𝛾 + 𝑣𝑗 ≤ 0|𝑦𝑗 = 𝑄𝑠(

) 𝑣𝑗 ≤ −𝑦𝑗

′𝛾|𝑦𝑗 = 𝑄𝑠

ฬ

𝑣𝑗 𝜏2 ≤ − 𝑦𝑗

′𝛾

𝜏2 𝑦𝑗

= Φ −

𝑦𝑗

′𝛾

𝜏2

= 1 − Φ

𝑦𝑗

′𝛾

𝜏2

▪ Hence: 𝑔 𝑧𝑗|𝑦𝑗 = 1 − Φ

𝑦𝑗

′𝛾

𝜏2

if Y = 0

1 2𝜌𝜏2 𝑓−

𝑧𝑗−𝑦𝑗 ′𝛾 2 2𝜏2

if 𝑍 > 0

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 40

SLIDE 41

Joaquim J.S. Ramalho

Estimation:

Method: ML Parameters to be estimated: 𝛾 and 𝜏 Log-likelihood function:

𝑀𝑀 = ෍ 1 − 𝑒𝑗 𝑚𝑝𝑕 1 − Φ 𝑦𝑗

′𝛾

𝜏2 + 𝑒𝑗𝑚𝑝𝑕 1 2𝜌𝜏2 𝑓− 𝑧𝑗−𝑦𝑗

′𝛾 2

2𝜏2

where 𝑒𝑗 = ቊ0 if 𝑍

𝑗 = 0

1 if 𝑍

𝑗 > 0

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 41

Stata tobit Y 𝑌1 … 𝑌𝑙, ll(0)

SLIDE 42

Joaquim J.S. Ramalho

Quantities of interest:

Conditional mean given that 𝑍

𝑗 is positive:

𝐹 𝑍

𝑗|𝑦𝑗, 𝑍 𝑗 > 0 = 𝑦𝑗 ′𝛾 + 𝜏𝜇 𝑦𝑗 ′𝛾

𝜏

where 𝜇

𝑦𝑗

′𝛾

𝜏

=

𝜚

𝑦𝑗 ′𝛾 𝜏

Φ

𝑦𝑗 ′𝛾 𝜏

is the Mills ratio

Probability of observing positive values for 𝑍

𝑗:

Pr 𝑍

𝑗 > 0|𝑦𝑗 = Φ 𝑦𝑗 ′𝛾

𝜏 Overall conditional mean:

𝐹 𝑍

= Φ 𝑦𝑗

′𝛾

𝜏 𝑦𝑗

′𝛾 + 𝜏𝜚 𝑦𝑗 ′𝛾

𝜏

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 42

SLIDE 43

Joaquim J.S. Ramalho

Partial effects:

∆𝑌

𝑘 = 1 ⟹

▪ ∆𝐹 𝑍

𝑗|𝑦𝑗, 𝑍 𝑗 > 0 = 𝛾𝑘 1 − 𝜇 𝑦𝑗

′𝛾

𝜏 𝑦′𝑗𝛾 𝜏

+ 𝜇

𝑦𝑗

′𝛾

𝜏

▪ ∆𝑄𝑠 𝑍

𝑗 > 0|𝑦𝑗 = 𝛾𝑘 𝜏 𝜚 𝑦𝑗

′𝛾

𝜏

▪ ∆𝐹 𝑍

𝑗|𝑦𝑗

= 𝛾𝑘Φ

𝑦𝑗

′𝛾

𝜏

The three effects have the same sign

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 43

SLIDE 44

Joaquim J.S. Ramalho

Two-part model specification:

First part – binary regression model: 𝑄𝑠 𝑒𝑗 = 1|𝑦𝑗 = 𝐻1 𝑦𝑗

′𝛾

▪ 𝑒𝑗 = ቊ0 se 𝑍

𝑗 = 0

1 se 𝑍

𝑗 > 0

Second part – exponential or fractional regression model 𝐹 𝑍

𝑗|𝑦𝑗, 𝑒𝑗 = 1 = 𝐻2 𝑦𝑗 ′𝜄

Overall conditional mean: 𝐹 𝑍

𝑗|𝑦𝑗

= 𝑄𝑠 𝑍

𝑗 = 0|𝑦𝑗 𝐹 𝑍 𝑗|𝑦𝑗, 𝑍 𝑗 = 0 + Pr 𝑍 𝑗 > 0|𝑦𝑗 𝐹 𝑍 𝑗|𝑦𝑗, 𝑍 𝑗 > 0

= 𝐻1 𝑦𝑗

′𝛾 𝐻2 𝑦𝑗 ′𝜄

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 44

SLIDE 45

Joaquim J.S. Ramalho

Estimation:

Each part of the model is estimated separately:

▪ In each part, use the standard methods for the type of data being analyzed ▪ In the first part of the model, use the full sample ▪ In the second part of the model, use the subsample for which 𝑍

𝑗 > 0

▪ One may use different explanatory variables in each part of the model

Partial effects:

∆𝑄𝑠 𝑒𝑗 = 1|𝑦𝑗 ∆𝐹 𝑍

𝑗|𝑦𝑗, 𝑒𝑗 = 1

∆𝐹 𝑍

𝑗|𝑦𝑗

= ∆𝑄𝑠 𝑒𝑗 = 1|𝑦𝑗 𝐹 𝑍

𝑗|𝑦𝑗, 𝑒𝑗 = 1 + 𝑄𝑠(

) 𝑒𝑗 = 1|𝑦𝑗 ∆𝐹 𝑍

𝑗|𝑦𝑗, 𝑒𝑗 = 1

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 45

SLIDE 46

Joaquim J.S. Ramalho

Sample selection model - latent variable:

𝑍

2𝑗 ∗ : main variable

𝑍

1𝑗 ∗ : variable that determines whether 𝑍 2𝑗 ∗ is observed or not

Two equations:

Participation equation (e.g. to work or not): 𝑍

1𝑗 = ൝0 if 𝑍 1𝑗 ∗ ≤ 0

1 if 𝑍

1𝑗 ∗ > 0

Outcome equation (e.g. how much to work): 𝑍

2𝑗 = ൝−

if 𝑍

1𝑗 ∗ ≤ 0

𝑍

2𝑗 ∗ if 𝑍 1𝑗 ∗ > 0

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 46

SLIDE 47

Joaquim J.S. Ramalho

Latent linear models:

൝𝑍

1𝑗 ∗ = 𝑦1𝑗 ′ 𝛾1 + 𝑣1𝑗

𝑍

2𝑗 ∗ = 𝑦2𝑗 ′ 𝛾2 + 𝑣2𝑗

Assumptions:

The error terms of the two equations are assumed to be correlated, having a bivariate normal distribution: 𝑣1𝑗 𝑣2𝑗 ~𝑂 0 , 1 𝜏12 𝜏12 𝜏2

2

Only when 𝜏12 = 0 the two equations will be independent (the selection mechanism is exogenous or ignorable):

▪ In this case, the second equation may be estimated by OLS using only the observed data

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 47

SLIDE 48

Joaquim J.S. Ramalho

Quantities of interest:

Conditional mean of the main latent variable: 𝐹 𝑍

2𝑗 ∗ |𝑦𝑗 = 𝑦2𝑗 ′ 𝛾2

Conditional mean of the main observed dependent variable: 𝐹 𝑍

2𝑗|𝑦𝑗, 𝑍 1𝑗 = 1 = 𝑦2𝑗 ′ 𝛾2 + 𝜏12𝜇 𝑦1𝑗 ′ 𝛾1

Probability of observing positive values: 𝑄𝑠 𝑍

2𝑗 > 0|𝑦𝑗 = 𝑄𝑠 𝑍 1𝑗 = 1|𝑦𝑗 = Φ 𝑦1𝑗 ′ 𝛾1

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 48

SLIDE 49

Joaquim J.S. Ramalho

Parameters to be estimated: 𝛾, 𝜏12, 𝜏2 Estimation methods:

ML Heckman’s two-step method

ML:

Based on the following log-likelihood function:

𝑀𝑀 = ෍ 1 − 𝑒𝑗 Pr 𝑍

1𝑗 = 0|𝑦1𝑗 + 𝑒𝑗 𝑔 𝑍 1𝑗 = 1|𝑍 2𝑗 + 𝑔 𝑍 2𝑗

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 49

Stata heckman 𝑍

2 𝑌1 … 𝑌𝑙, select(𝑍 1 𝑌1 … 𝑌𝑙)

SLIDE 50

Joaquim J.S. Ramalho

Heckman’s two-step method:

Based on 𝐹 𝑍

2𝑗|𝑦𝑗, 𝑍 1𝑗 = 1 = 𝑦2𝑗 ′ 𝛾2 + 𝜏12𝜇 𝑦1𝑗 ′ 𝛾1

First step: estimate the probit model 𝑄𝑠 𝑍

1𝑗 = 1|𝑦𝑗 =

Φ 𝑦1𝑗

′ 𝛾1 and get 𝜇 𝑦1𝑗 ′ መ

𝛾1 =

𝜚 𝑦1𝑗

′ ෡

𝛾1 Φ 𝑦1𝑗

′ ෡

𝛾1

Second step: regress 𝑍

2𝑗 on 𝑦2𝑗 and 𝜇 𝑦1𝑗 ′ መ

𝛾1 using only individuals fully observed and OLS, and correct the variances t test for H0: 𝜏12 = 0 (exogenous selection mechanism) If the same regressors are used in both steps, multicolinearity may arise; to avoid it, it is usual to exclude from 𝑦2𝑗 some of the variables included in 𝑦1𝑗

4. Models for Continuous Limited Dependent Variables

4.3. Models for Discrete-Continuous Responses

2020/2021 Advanced Econometrics I 50

Stata heckman 𝑍

2 𝑌1 … 𝑌𝑙, twostep select(𝑍 1 𝑌1 … 𝑌𝑙)