[PPT] - Applied Statistics Lecturer: Serena Arima Introduction Binary PowerPoint Presentation

SLIDE 1

Introduction Binary model Example Fit Test

Applied Statistics

Lecturer: Serena Arima

SLIDE 2

Introduction Binary model Example Fit Test

Introduction

Until now:

1 Linear regression model; 2 Analysis of Variance model (ANOVA); 3 Analysis of Covariance model (ANCOVA).

In practical applications, one often has to cope with phenomena that are discrete or mixed discrete-continuous nature.

SLIDE 3

Introduction Binary model Example Fit Test

Introduction

Until now:

1 Linear regression model; 2 Analysis of Variance model (ANOVA); 3 Analysis of Covariance model (ANCOVA).

In practical applications, one often has to cope with phenomena that are discrete or mixed discrete-continuous nature.

SLIDE 4

Introduction Binary model Example Fit Test

Introduction

Suppose we want to explain whether a family possesses a car or

not. Let the sole explanatory variable to be the family income.

We have n families and the response variable is defined as yi = 1 if family i owns a car yi = 0 if family i does not own a car xi1 is the income of the family i.

SLIDE 5

Introduction Binary model Example Fit Test

Introduction

We estimate the relationship between y and x2 using the linear model yi = β0 + β1xi1 + ǫi = x′

i β + ǫi

It seems reasonable to make the standard assumption that E[ǫi|xi] = 0 E[yi|xi] = x′

i β

This implies that:

E[yi|xi] = 1 · Pr(yi = 1||xi) + 0 · Pr(yi = 0|xi) = Pr(yi = 1|xi) = x′

i β

SLIDE 6

Introduction Binary model Example Fit Test

Introduction

We estimate the relationship between y and x2 using the linear model yi = β0 + β1xi1 + ǫi = x′

i β + ǫi

It seems reasonable to make the standard assumption that E[ǫi|xi] = 0 E[yi|xi] = x′

i β

This implies that:

E[yi|xi] = 1 · Pr(yi = 1||xi) + 0 · Pr(yi = 0|xi) = Pr(yi = 1|xi) = x′

i β

SLIDE 7

Introduction Binary model Example Fit Test

Introduction

We can use the OLS method in order to estimate the model and we get:

yi =

β0 + β1xi1

5 10 15 20 25

2
1

1 2

Regression model

Family Car

SLIDE 8

Introduction Binary model Example Fit Test

Introduction

Thus, the linear model implies that x′

i β is a probability and should

therefore lie between 0 and 1. This is only possible if the xi values are bounded and if certain restrictions on β are satisfied. Usually this is hard to achieve in practice. In addition, because yi has only two possible outcomes (0 and 1), the error term has two possible outcomes as well.

SLIDE 9

Introduction Binary model Example Fit Test

Introduction

Thus, the linear model implies that x′

i β is a probability and should

therefore lie between 0 and 1. This is only possible if the xi values are bounded and if certain restrictions on β are satisfied. Usually this is hard to achieve in practice. In addition, because yi has only two possible outcomes (0 and 1), the error term has two possible outcomes as well.

SLIDE 10

Introduction Binary model Example Fit Test

Introduction

In particular, the distribution of the error term ǫi is P(ǫi = −x′

i β) = P(yi = 0|xi) = 1 − x′ i β

P(ǫi = 1 − x′

i β) = P(yi = 1|xi) = x′ i β

Hence, the variance of the error term is V (ǫi|xi) = x′

i β(1 − x′ i β)

Hence, the error term is not Normal and it is also heteroskedastic! Moreover its variance depend upon the model parameters β.

SLIDE 11

Introduction Binary model Example Fit Test

Binary choice model

To overcome the problems, there exists a class of binary choice model designed to model the choice between two discrete

alternatives. In general, we have

P(yi = 1|xi) = G(xi, β) for some function G(.) that takes values in [0, 1]. Usually, one restricts attention to functions of the form G(xi, beta) = F(x′

i β)

where F is some distribution function.

SLIDE 12

Introduction Binary model Example Fit Test

Binary choice model

To overcome the problems, there exists a class of binary choice model designed to model the choice between two discrete

alternatives. In general, we have

P(yi = 1|xi) = G(xi, β) for some function G(.) that takes values in [0, 1]. Usually, one restricts attention to functions of the form G(xi, beta) = F(x′

i β)

where F is some distribution function.

SLIDE 13

Introduction Binary model Example Fit Test

Binary choice model

A common choice is the standard Normal distribution function F(w) = Φ(w) = w

−∞

1 √ 2π exp

−1

2t2

dt

leading the so-called probit model in which P(yi=1|xi = Φ(x′

i β) = Φ(β0 + β1xi1)

SLIDE 14

Introduction Binary model Example Fit Test

Binary choice model

Another choice is the standard logistic distribution function F(w) = L(w) = ew 1 + ew leading the so-called logit model in which P(yi = 1|xi) = exp(x′

i β)

1 + exp(x′

i β) =

exp(β0 + β1xi1) 1 + exp(β0 + β1xi1)

SLIDE 15

Introduction Binary model Example Fit Test

Binary choice model

This model can also be written as log P(yi = 1|xi) 1 − P(yi = 1|xi) = x′

i β

The left hand side is referred to log odds ratio. An odds ratio of 3 means the the odds of yi = 1 are 3 times those

f yi = 0. Using this equality, the β coefficients can be interpreted

as describing the effect upon the odds ratio. For example, if βk = 0.1, a unit increase of xik increases the odds ratio by about 10%.

SLIDE 16

Introduction Binary model Example Fit Test

Binary choice model

Another common choice is the uniform distribution over the interval [0, 1] with distribution function F(w) = 0 w < 0 F(w) = w 0 ≤ w ≤ 0 F(w) = 1 w > 1. This results in the so-called linear probability model defined as Pr(yi = 1|xi) = 0 if x′

i β < 0;

Pr(yi = 1|xi) = x′

i β if 0 ≤ x′ i β ≤ 1;

Pr(yi = 1|xi) = 1 if x′

i β > 1.

SLIDE 17

Introduction Binary model Example Fit Test

Binary choice model: interpretation

A main difficulty with these models, it’s the parameters’ interpretation: apart for their signs, the coefficients in these binary choice models may be interpret according to marginal effect of changes in the explanatory variables. For a continuous explanatory variable xik, the marginal effect is defined as the partial derivative of the probability that yi equals one.

SLIDE 18

Introduction Binary model Example Fit Test

Binary choice model: interpretation

A main difficulty with these models, it’s the parameters’ interpretation: apart for their signs, the coefficients in these binary choice models may be interpret according to marginal effect of changes in the explanatory variables. For a continuous explanatory variable xik, the marginal effect is defined as the partial derivative of the probability that yi equals one.

SLIDE 19

Introduction Binary model Example Fit Test

Binary choice model: interpretation

For the probit model the marginal effect is dΦ(x′

i β)

dxik = φ(x′

i β)β

where φ denotes the standard normal density function, that is φ(w) = 1 √ 2π exp

−1

2w2

SLIDE 20

Introduction Binary model Example Fit Test

Binary choice model: interpretation

For the logit model the marginal effect is dL(x′

i β)

dxik = ex′

i β

(1 + ex′

i β)

βk For the linear probability model the marginal effect is dx′

i β

dxik = βk (or 0).

SLIDE 21

Introduction Binary model Example Fit Test

Example 1: probit model

Suppose we have n = 2380 individuals and the following variables have been recorded (in 1920-1940): Loan: binary variable 1 if the bank loan is rejected, 0 if it is allowed; Income: monthly income for each individual; Race: race of each individual (0=white, 1=black) (R); LoanPayment: ratio income and loan payment (LP), income/payment

SLIDE 22

Introduction Binary model Example Fit Test

Example 1: probit model

We would like to study whether the rejection of a loan is related with other variables, such as the income, the race and the income/payment ratio. The response variable is a binary variable and the explanatory variables are both continuous and discrete. Let’s try to interpret different models!

SLIDE 23

Introduction Binary model Example Fit Test

Example 1: probit model

We would like to study whether the rejection of a loan is related with other variables, such as the income, the race and the income/payment ratio. The response variable is a binary variable and the explanatory variables are both continuous and discrete. Let’s try to interpret different models!

SLIDE 24

Introduction Binary model Example Fit Test

Example 0: linear model

We start with a simple linear model. The estimated model is: P(loanRejection = 1|LP) = −0.07991 + 0.60353LPi Increasing the income/loan ratio of 0.1, the probability that the loan is rejected increases of 0.06; What is the probability that the loan is rejected when the income/loan ratio is 0.5? The predicted probability is −0.07991 + 0.60353 · 0.5 = 0.22 What is the probability that the loan is rejected when the income/loan ratio is 0.01? The predicted probability is −0.07991 + 0.60353 · 0.01 = −0.073 (!!!)

SLIDE 25

Introduction Binary model Example Fit Test

Example 0: linear model

We start with a simple linear model. The estimated model is: P(loanRejection = 1|LP) = −0.07991 + 0.60353LPi Increasing the income/loan ratio of 0.1, the probability that the loan is rejected increases of 0.06; What is the probability that the loan is rejected when the income/loan ratio is 0.5? The predicted probability is −0.07991 + 0.60353 · 0.5 = 0.22 What is the probability that the loan is rejected when the income/loan ratio is 0.01? The predicted probability is −0.07991 + 0.60353 · 0.01 = −0.073 (!!!)

SLIDE 26

Introduction Binary model Example Fit Test

Example 0: linear model

We start with a simple linear model. The estimated model is: P(loanRejection = 1|LP) = −0.07991 + 0.60353LPi Increasing the income/loan ratio of 0.1, the probability that the loan is rejected increases of 0.06; What is the probability that the loan is rejected when the income/loan ratio is 0.5? The predicted probability is −0.07991 + 0.60353 · 0.5 = 0.22 What is the probability that the loan is rejected when the income/loan ratio is 0.01? The predicted probability is −0.07991 + 0.60353 · 0.01 = −0.073 (!!!)

SLIDE 27

Introduction Binary model Example Fit Test

Example 0: linear model

We start with a simple linear model. The estimated model is: P(loanRejection = 1|LP) = −0.07991 + 0.60353LPi Increasing the income/loan ratio of 0.1, the probability that the loan is rejected increases of 0.06; What is the probability that the loan is rejected when the income/loan ratio is 0.5? The predicted probability is −0.07991 + 0.60353 · 0.5 = 0.22 What is the probability that the loan is rejected when the income/loan ratio is 0.01? The predicted probability is −0.07991 + 0.60353 · 0.01 = −0.073 (!!!)

SLIDE 28

Introduction Binary model Example Fit Test

Example 0: linear model

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.5

0.0 0.5 1.0 1.5 Income/Loan ratio Loan

SLIDE 29

Introduction Binary model Example Fit Test

Example 1: probit model

Model 1: P(loanRejectioni = 1|LP = Φ(β + β1LPi) The estimated model is P(loanRejectioni = 1|LP) = Φ(−2.1941 + 2.9679LPi) How to interpret the model?

SLIDE 30

Introduction Binary model Example Fit Test

Example 1: probit model

P(loanRejectioni = 1|LP) = Φ(−2.1941 + 2.9679LPi) Step 0 Interpret the sign: increasing the income and loan-payment rate, the probability that the bank will reject a loan increases ( β1 = 2.9679). Step 1 What is the probability that the loan is rejected when the loan-payment rate is 0.3? P(loanRejectioni = 1|LP = 0.3) = Φ(−2.1941+2.9679·0.3) = 0.170

SLIDE 31

Introduction Binary model Example Fit Test

Example 1: probit model

P(loanRejectioni = 1|LP) = Φ(−2.1941 + 2.9679LPi) Step 0 Interpret the sign: increasing the income and loan-payment rate, the probability that the bank will reject a loan increases ( β1 = 2.9679). Step 1 What is the probability that the loan is rejected when the loan-payment rate is 0.3? P(loanRejectioni = 1|LP = 0.3) = Φ(−2.1941+2.9679·0.3) = 0.170

SLIDE 32

Introduction Binary model Example Fit Test

Example 1: probit model

Step 2 What is the probability that the loan is rejected when the loan-payment rate is 0.5? P(loanRejectioni = 1|LP = 0.5) = Φ(2.1941−2.9679·0.5) = 0.2388 Step 3 What is the probability that the loan is not allowed when the loan-payment rate is 0.8? That is, all income is used to pay the loan ratio P(loanRejectioni = 1|LP = 0.8) = Φ(2.1941−2.9679·0.8) = 0.571

SLIDE 33

Introduction Binary model Example Fit Test

Example 1: probit model

Model 2: Let’s insert the effect of the race. The estimated model is P(loanRejectioni = 1|LP, R) = Φ(−2.25879+2.74178LPi+0.70816Ri) How to interpret the model?

SLIDE 34

Introduction Binary model Example Fit Test

Example 1: probit model

P(loanRejectioni = 1|LP, R) = Φ(−2.25879+2.74178LPi+0.70816Ri) Step 0 Interpret the sign: increasing the income and loan-payment ratio, the probability that the bank will not allow a loan increases ( β1 = 2.7417) and it also decreases if the individual is black. Step 1 For a black man with loan-payment ratio equal to 0.3 the probability that the bank will reject a loan is P(loanRejectioni = 1|LP = 0.3R = 1) = Φ(2.26+2.74·0.3+0.71) = 0 and for a white man with the same ratio is P(loani = 1|LP = 0.3R = 1) = Φ(2.26 + 2.74 · 0.3) = 0.075

SLIDE 35

Introduction Binary model Example Fit Test

Example 1: logit model

Model 3: logit(P(loanRejectioni = 1|LP)) = β0 + β1LPi The estimated model is P(loanRejectioni = 1|LP) = exp(−4.0284 + 5.8845LPi) 1 + exp(−4.0284 + 5.8845LPi) Step 0 Interpret the sign: increasing the income and loan-payment rate, the probability that the bank will not allow a loan increases( β1 = 5.8845).

SLIDE 36

Introduction Binary model Example Fit Test

Example 1: logit model

Step 1 What is the probability that the loan is not allowed when the loan-payment rate is 0.3? P(loanRejectioni = 1|LP = 0.3) = exp(−4.0284 + 5.8845 · 0.3) 1 + exp(−4.0284 + 5.8845 · 0.3) = Step 2 What is the probability that the loan is not allowed when the loan-payment rate is 0.8? P(loanRejectioni = 1|LP = 0.8) = exp(−4.0284 + 5.8845 · 0.8) 1 + exp(−4.0284 + 5.8845 · 0.8) =

SLIDE 37

Introduction Binary model Example Fit Test

Probit and Logit model

0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.4 0.8

Logit and Probit

Loan payment - income Loan Probit model Logit model

SLIDE 38

Introduction Binary model Example Fit Test

Model estimation

The likelihood function is for the binary response model is defined as L(β) =

n

i=1

P(yi = 1|xi, β)yiP(yi = 0|xi, β)1−yi =

n

i=1

F(x′

i β)yi(1 − F(x′ i β))1−yi

SLIDE 39

Introduction Binary model Example Fit Test

Model estimation

Hence the loglikelihood function is l(β) =

m

i=1

yilogF(x′

i β) + n

i=1

(1 − yi)(1 − F(x′

i β))

and its first derivative is dl(β) dβ =

n

i=1
yi − F(x′

i β)

F(x′

i β)(1 − F(x′ i β))f (x′ i β)

xi = 0

SLIDE 40

Introduction Binary model Example Fit Test

Model estimation

The likelihood cannot be maximized analytically. We need numeric

r iterative methods:

Newton - Raphson method; Fisher scoring method.

SLIDE 41

Introduction Binary model Example Fit Test

Goodness of fit

When the response variable is binary, the accuracy of the model can be judged either in terms of the fit between the calculated probabilities and the observed response frequencies

r in terms of the model’s ability to forecast observed

responses. Contrary to the linear regression model, there is no single measure for the goodness of fit in binary choice models and a variety of measures exists.

SLIDE 42

Introduction Binary model Example Fit Test

Goodness of fit

A first goodness of fit measure is defined as

pseudo − R2 = 1 − 1 1 + 2(logL1 − logL0)/n

where logL1 denote the maximum loglikelihood value of the model of interest; logL0 denote the maximum loglikelihood value of the model with only intercept. pseudo − R2 ∈ [0, 1].

SLIDE 43

Introduction Binary model Example Fit Test

Goodness of fit

A first goodness of fit measure is defined as

pseudo − R2 = 1 − 1 1 + 2(logL1 − logL0)/n

where logL1 denote the maximum loglikelihood value of the model of interest; logL0 denote the maximum loglikelihood value of the model with only intercept. pseudo − R2 ∈ [0, 1].

SLIDE 44

Introduction Binary model Example Fit Test

Goodness of fit

An alternative measures is suggested by McFadded (1974)

McFaddenR2 = 1 − logL1 logL0

sometimes referred to as the likelihood ratio index. Because the log likelihood is the sum of log probabilities, it follows that logL0 < logL1 < 0, from which it is straightforward to show that also McFaddenR2 ∈ [0, 1].

SLIDE 45

Introduction Binary model Example Fit Test

Goodness of fit

An alternative measures is suggested by McFadded (1974)

McFaddenR2 = 1 − logL1 logL0

sometimes referred to as the likelihood ratio index. Because the log likelihood is the sum of log probabilities, it follows that logL0 < logL1 < 0, from which it is straightforward to show that also McFaddenR2 ∈ [0, 1].

SLIDE 46

Introduction Binary model Example Fit Test

Goodness of fit

Note that to compute logL0 it is not necessary to estimate a probit

r logit model with intercept term only. Indeed, the ML estimate is

ˆ p = n1 n where n1 = n

i=1 yi and

logL0 = n1log(n1/n) + n0log(n0/n) On the other hand, the value of logL1 should be given by a computer package.

SLIDE 47

Introduction Binary model Example Fit Test

Goodness of fit

An alternative way to evaluate the goodness of fit is comparing correct and incorrect predictions.

The predicted values are ˆ yi = 1 if x′

i ˆ

β > 0 ˆ yi = 0 if x′

i ˆ

β ≤ 0

SLIDE 48

Introduction Binary model Example Fit Test

Goodness of fit

We can built the following table of predicted and observed values: yi

yi

1 n00 n01 N0 1 n10 n11 N1

The proportion of correct predictions is given by HM = n00 N0 + n11 N1 Values of HM>1 define a good model.

SLIDE 49

Introduction Binary model Example Fit Test

Specification tests

Although the MLEs have the property of being consistent, there is

ne important condition for this to hold: the likelihood function has

to be correctly specified. Consider the generic model P(yi = 1|xi) = F(x′

i β).

Suppose we want to test H0 : βk = 0 H1 : βk = 0 The test statistic is defined as z =

βk

SE( βk) → N(0, 1) (asymptotic approximation)

SLIDE 50

Introduction Binary model Example Fit Test

Specification tests

On the other hand, suppose we would like to test H0 : β1 = β2 = β3 = 0 We compare the maximized loglikelihood of the full model l1( β) and the maximized loglikelihood of the reduced model (with β1 = β2 = β3 = 0) l0( β). We use the following likelihood ratio test T = −2(l1 − l0) ∼ χp−k where p is the number of parameters involved in the full model and k the number of parameters involved in the reduced model.

SLIDE 51

Introduction Binary model Example Fit Test

Binary choice model: an underlying latent model

It is possible to derive a binary choice model from underlying behavioural assumption. This leads to a latent variable representation of he model. Let us look at the decision of a married female to have a paid job

r not. The utility difference between having a paid job or not

depends upon the wage but also on other personal characteristics like the age, the education, whether there are young children in the family, etc.

SLIDE 52

Introduction Binary model Example Fit Test

Binary choice model: an underlying latent model

Thus, for each person i we can write the utility difference between having a job an not as function of observed characteristics, x and unobserved characteristics ǫ. The utility difference y∗

i can be

defined as y∗

i = x′ i β + ǫi

Because y∗

i is unobserved, it is referred to as a latent variable.

SLIDE 53

Introduction Binary model Example Fit Test

Binary choice model: an underlying latent model

Thus, for each person i we can write the utility difference between having a job an not as function of observed characteristics, x and unobserved characteristics ǫ. The utility difference y∗

i can be

defined as y∗

i = x′ i β + ǫi

Because y∗

i is unobserved, it is referred to as a latent variable.

SLIDE 54

Introduction Binary model Example Fit Test

Binary choice model: an underlying latent model

We assume that a woman chooses to work if the utility difference y∗

i exceeds a certain threshold level, that is

yi = 1 if y∗

i > γ

In the binary choice model typically γ = 0. Hence, P(yi = 1) = P(y∗

i > 0) = P(x′ i β+ǫi > 0) = P(−ǫi ≤ x′ i β) = F(x′ i β)

SLIDE 55

Introduction Binary model Example Fit Test

Illustration of binary choice models

As an illustration we consider a sample of 4877 blue-collar workers who lost their jobs in the US between 1982 and 1991, taken from a study by McCall (1995). Not all unemployed workers eligible for unemployment insurance (UI) benefits apply for it, probably owing to the associated pecuniary and psychological costs. It is therefore interesting to investigate what makes people to decide not to apply.