[PPT] - Alecos Papadopoulos PhD Candidate (supervisor: Prof. Pl. Sakellaris) PowerPoint Presentation

SLIDE 1

1 of 51

Economics Research Workshop

March 2017

Making sense of “nonsense probabilities”: binary choice models when the underlying error term has bounded support.

Alecos Papadopoulos PhD Candidate (supervisor: Prof. Pl. Sakellaris) (NΟΤΕ: this piece of research is not PhD-related) Athens University of Economics and Business School of Economic Sciences / Dpt of Economics papadopalex@aueb.gr https://alecospapadopoulos.wordpress.com/

SLIDE 2

2 of 51

Intro & Motivation

Binary Choice, Linear Probability Model (LPM):

Usually treated as

a linear approximation to the true data generating mechanism
a device to showcase the problems with OLS estimation in models

with Limited Dependent variables

Prominent among these problems : we may obtain “nonsense

probabilities”

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

 

Pr , 0,1

i i i i i i

y E y y     x x x β

   

ˆ ˆ Pr 1, Pr

i i i i

y y   x x

SLIDE 3

3 of 51

Intro & Motivation

Provide a proper statistical foundation for the LPM
It becomes the true/correct specification
“Nonsense probabilities” become a useful indication of the

structure of the underlying population

For the LPM this foundation is : the error term in the underlying

latent regression follows a Uniform distribution (Amemiya

1981, p. 1489).

The Uniform distribution has inherently a bounded support.
The bounded support alone is the reason for the appearance
f “nonsense probabilities”.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 4

4 of 51

Intro & Motivation

The General Binary Choice model with bounded error

Latent regression

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

 

 

1 2 1 2

, , , , , 0,

i i i i i i i i i

y g u u F u u E u           x x x

         

 

1 1 1 2 2 1 2

, 1

i i i i i

u F u F F u u F F u                        

SLIDE 5

5 of 51

Intro & Motivation

Define Then

Since the support is bounded, we have

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

 

   

 

2 1

Pr 1 Pr 1 Pr Pr 1 1

i i i i i i i i i i i i

g u g y g u g y                      x x x x x x x x

 

1 2

,

i

u    

 

i i

y I y    

 

Pr 1 Pr Pr Pr

i i i i i i i i i i

y y g u u g          x x x x x x

 

Pr 1 1 Pr

i i i i i

y u g       x x x

SLIDE 6

6 of 51

Intro & Motivation

Complete model at the binary-variable level when the support of the underlying error term is bounded:

Is this a useful model for Econometrics?

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

   

 

   

1 1 2 2

1 Pr 1 1

i i i i i i

g y F g g g                          x x x x x

SLIDE 7

7 of 51

Intro & Motivation

It reflects a population partitioned in three subgroups:

In the familiar “middle” one, the underlying disturbance affects and

co-determines the binary choice as usual. No matter how strongly the systematic part points towards the one direction, the realization of the disturbance may even “overturn” the binary choice towards the other direction.

In the two “extreme” subgroups the underlying disturbance does not

affect the binary choice – only the systematic factors do.

This does not imply individuals with different preferences
But it does imply that for some individuals, their “core relevant”

characteristics are so strongly realized, that nothing else matters for the binary choice.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 8

8 of 51

Intro & Motivation

Is it plausible and useful?

It is a plausible real-world situation, a recognition that

non-negligible extremes may exist.

It is useful for businesses which always look to segment

the market to either tailor their products or increase efficiency in their marketing campaigns.

It is useful for policy, which for socio-political reasons

attempts in many cases to take into account small but visible subsets of the population.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 9

9 of 51

Intro & Motivation

NONSENSE PROBABILITIES and what do they tell us Assume that the complete model holds. Initially we can only estimate the incomplete specification Suppose that for some observation we obtain Then, however imperfectly, the estimator indicates that But this is the condition to have

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

Pr 1 1

i i i

y F g     x x

 

ˆ Pr 1 1

i i

y   x

 

     

1 2 1

1 1 1 1

i i

F g F F g F F               x x

   

 

1 1 i i

F F g g           x x

 

Pr 1 1

i i

y   x

SLIDE 10

10 of 51

Intro & Motivation

The result depends solely on the very existence of a bounded support for the error term, and NOT on

whether the specification is linear
whether the distribution of the error term is symmetric around

zero

whether it is symmetrically bounded/truncated around zero.

So not something specific to the Uniform distributional assumption. could e.g. be a Truncated Normal, or a Truncated Logistic distribution, or some non-symmetric distribution

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

F 

SLIDE 11

11 of 51

Intro & Motivation

As regards estimated probabilities, the statistically justified action is to set the conditional probability equal to 1 or 0, whenever we obtain “nonsense probabilities”. (has been proposed as an “ad hoc” correction in the LPM) The whole matter would end here… but by misspecifying the model the estimator becomes inconsistent. To see this we re-cast the model as a mixture.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 12

12 of 51

Intro & Motivation

First, Scope restriction: Assume for the remainder

is affine
(symmetric around 0)
(symmetrically truncated around zero)

MIXTURE MODEL FORMULATION Then, true model can be written

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

i

g x

 

1

i i

F g F g    x x

1 2

    

 

 

 

 

 

, 1, ,

, ,

r i i i i r i r

I I g I I g E I p i         x x

   

 

  1,

Pr 1 1

i i i i r i r i

y E y p F g p I      x x x

SLIDE 13

13 of 51

Intro & Motivation

MIXTURE MODEL FORMULATION Specifying

r worse

(untruncated) leads to inconsistency.

This is due to the existence of the observations from the two extreme

subgroups.

These obs are “irrelevant” as regards the estimation of the unknown

coefficients.

We need to somehow separate and discard them, and use only the

“relevant” ones.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

 

  1,

Pr 1 1

i i i i r i r i

y E y p F g p I      x x x

 

Pr 1

i i i

y F g   x x

 

Pr 1

i i i

y F g   x x

SLIDE 14

14 of 51

An MM algorithm to clean the sample

A natural idea is to use what the estimator signals to us:

Start by (mis)specifying a model using a bounded/truncated

distribution

Discard those observations for which we obtain “nonsense

probabilities”.

Re-estimate
Discard any additional observations labeled “irrelevant” given

the new estimates

etc. up until no new observations are discarded.

…but we are using a misspecified model, and an inconsistent estimator… Can we trust the algorithm ? We can. If we use least-squares estimation.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 15

15 of 51

An MM algorithm to clean the sample

Why can we trust the algorithm ? Because this is a Majorization/Minimization (MM) iteration algorithm Basic Idea and initial formulation (Ortega & Rheinboldt 1970):

If the true objective function is inconvenient or infeasible, find a “surrogate” function that “majorizes” the former, and minimize it instead.

The Expectation Maximization (EM) algorithm (Dempster et al. 1977) , is a particular instance of an MM algorithm (of the minorization/maximization variety). See Hunter & Lange (2004) for an exposition on MM and examples.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 16

16 of 51

An MM algorithm to clean the sample

IN OUR CASE

Write to reveal the unknown parameters. Let be the estimates from the j-th iteration Define the selection/counting function The first argument selects the “relevant” observations, those for which we have The second argument determines the sample from which these observations are selected.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

[ ]

;

j r

n β b

 

i i

F g F  x β

[ ] j

b

 

1

i

F   β

SLIDE 17

17 of 51

An MM algorithm to clean the sample

Then the “desired” objective function is to be minimized over the third argument. But is unknown. So the feasible objective function is …since we neither know which observations currently in the sample are relevant and which aren’t.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

[ ]

; 2 [ ] 1

; ;

j r

n j i i i

y F



  



β b

β b β β

 

[ ] [ ]

ˆ ; 2 [ ] [ ] 1

; ;

j j r

n j j i i i

y F



  



b b

b b β β

β

SLIDE 18

18 of 51

An MM algorithm to clean the sample

We can show that the surrogate majorizes the true And that the crucial “descent property” holds In words, when the surrogate function is minimized, the value of the true objective function gets lower: the algorithm moves “towards the right direction”, even though the model is misspecified and the estimator is inconsistent.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

[ ] [ ]

; ;

j j

 b b β

 

[ ]

; ;

j

 β b β

     

[ ] [ 1] [ ] [ ] [ 1] [ ]

; ; ; ; ; ;

j j j j j j  

     β b b β b b β b b

SLIDE 19

19 of 51

An MM algorithm to clean the sample

So filtering the sample using a least-squares estimator does discard irrelevant observations. This reduces contamination and improves the quality of the estimates.

Asymptotically the algorithm fully cleans the sample from

irrelevant observations, under a mild sufficient condition (that the algorithm fails to detect irrelevant observations only a finite number of times). This re-instates consistency of the estimator. NOTE: we can show that the algorithm can never discard the whole sample, so it will stop after a finite number of iterations. This is due to the binary nature of the dependent variable and the algebraic properties of least-squares.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 20

20 of 51

An MM algorithm to clean the sample

SAMPLE CONSIDERATIONS

From an i.i.d. sample we move to a sample

The marginal and joint distribution of the sample changes
Correlation among regressors (inside each observation) is

induced, even if it didn’t exist.

The sample is no longer representative of the population

none of the above affect LS or our research target

It is still identically distributed
It is only asymptotically independent

which is enough for the asymptotic properties of LS to hold

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

, 1,...,

i i

y i n  x

 

 

ˆ , 1, 1,...,

i i i r

y F i n    x b

SLIDE 21

21 of 51

The Unit model

Latent regression Binary Choice model – Mixture formulation

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

     

, γ, , ,

i i i i i i i

y g u g u U            x x x x

 

   

1 γ β, , β , ,β 2 2 2

i i i

F g g                x x x β

   

1

i i

g g      x x

 

1,

1 β

i r r i i r r i

y p p I p p e        x

SLIDE 22

22 of 51

The Unit model

Unit Binary Choice model – Mixture formulation

Looks like an “error components” model with only a

varying intercept, which are used with Panel Data (see e.g. Swamy(1971), or Hsiao(2003) ch. 3).

Here, we are looking at a cross-sectional sample, and also,

the varying intercept is a function of the regressors.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

1,

1 β

i r r i i r r i

y p p I p p e        x

 

 

1,i i

I I g    x

SLIDE 23

23 of 51

The Unit model

Unit Binary Choice model – Mixture formulation If we specify we have the correspondence

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

1,

, δ β, 1

r r i r i r i

p p v p e p I        δ

i i i

y v      x

   

 

1 r

p



    b X X X X β v

   

1 1,

plim 1

r r i i i i

p p E E I



    b β x x x

 

1,

1 β

i r r i i r r i

y p p I p p e        x

SLIDE 24

24 of 51

The Unit model- Attenuation Bias

Case A) The case where in reality the “upper” extreme group does not exist. Here Case B) Here we obtain attenuation bias for the slope coefficients.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

1 1,

plim 1

r r i i i i

p p E E I



    b β x x x

1, i

I 

 

 

1,i i

I I g    x

plim plim

r

p    b β b β

 

1,

1

i i i

E I E   x x

SLIDE 25

25 of 51

The Unit model- Attenuation Bias

We anticipate attenuation bias to hold more generally, in the contaminated sample. But this is good news: when the length of the coefficient vector is smaller, it will tend to provide conservative fitted values, and so it reduces the possibility of falsely labeling relevant observations as “irrelevant”.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 26

26 of 51

The Unit model- Estimation

Decontaminate using the MM algorithm
Account for the inherent heteroskedasticity.
either with a two-stage GLS estimator , which is

consistent (McGillivray 1970) , and has asymptotically the same distribution as the maximum likelihood one (Amemiya 1977) -but this may change the estimates and produce another cycle of discarding observations.

or using OLS with a robust variance-covariance matrix

estimator.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 27

27 of 51

The Unit model- A real data set

We use part of a data set from Mroz (1987), to examine the decision of USA women for labor force participation. We have 753 observations related to 1975. The specification we implement has been used in Wooldridge (2002), p. 455-456, to showcase the Linear Probability Model and the appearance of “nonsense probabilities”. The first estimation replicated Wooldridge’s results and returned 33 estimated probabilities outside the [0,1] range.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

SLIDE 28

28 of 51

The Unit model- A Real data set

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

Variable Iter 1 Iter 2 Iter 3 Iter 4 final const 0.5855 0.5753 0.5505 0.5505 nwifeinc

0.0034
0.0040
0.0046
0.0046

educ 0.0380 0.0417 0.0442 0.0443 exper 0.0395 0.0407 0.0413 0.0413 expersq

0.0006
0.0006
0.0006
0.0006

age

0.0161
0.0168
0.0168
0.0168

kidslt6

0.2618
0.2799
0.2815
0.2815

kidsge6 0.0130 0.0135 0.0144 0.0144 n 753 720 712 708 (Σ) ghat >1 17 3 2 22 ghat <0 16 5 2 23 Total dropped per Iter 33 8 4 45

We observe attenuation bias for the slope coefficients

SLIDE 29

29 of 51

The Unit model- A Real data set

Sub-sample averages

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

Binary Choice 1 1 Variable Coefficient DoI Dtrmns Disturb Disturb Dtrmns Sample nwifeinc

0.0046

neg 28.9 21.3 19.0 15.3 20.1 educ 0.0443 pos 9.6 12.0 12.5 15.9 12.3 exper 0.0413 pos 1.4 7.9 12.8 18.5 10.6 expersq

0.0006

neg 5.1 110.5 227.6 376.7 178.0 age

0.0168

neg 46.2 43.1 42.1 39.5 42.5 kidslt6

0.2815

neg 1.0 0.3 0.1 0.0 0.2 kidsge6 0.0144 pos 1.2 1.4 1.4 1.4 1.4 ghat (except constant)

0.68
0.11

0.12 0.51 0.01 constant 0.55 0.55 0.55 0.55 0.55 ghat

0.13

0.44 0.67 1.06 0.56 n 23 303 405 22 753

rel. freq.

3.1% 40.2% 53.8% 2.9% 100.0%

SLIDE 30

30 of 51

The Unit model- An intriguing result

We generated 10,000 i.i.d. observations from the following model which leads to the true binary-variable model We discarded all the RELEVANT observations, and we were left with 5,191 irrelevant observations. We then applied the MM algorithm on a fully irrelevant sample. Let’s see what happened.

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

2 (3)

0.5 1.5 0.6 Bern 0.7 , χ , 1, 1

i i i

y B X u B X u U      

 

Pr 1 0.25 0.75 0.3

i i

y B X     x

SLIDE 31

31 of 51

The Unit model- An intriguing result

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

Pr 1 0.25 0.75 0.3

i i

y B X     x

True model

Looks like a magic trick… but it’s not.

SLIDE 32

32 of 51

The Unit model- An intriguing result

In a fully irrelevant sample from It holds that for all observations kept. Then the true binary-variable model for this sample can be written while we specify as usual

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

 

, ,

i i i i

y u u U       x γ

i

   x γ

 

i i

y I     x γ

i i i

y e    x δ

SLIDE 33

33 of 51

The Unit model- An intriguing result

True model If the sample is large enough, we expect that there will be some irrelevant observations for which The MM algorithm, being conservative (attenuation bias), starts by discarding observations for which So we are gradually left with observations for which

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

i

   x γ

 

i i

y I     x γ

   

, 0 , 1

i i

 

   

      x γ x β

i

  x γ

   

, 0 , 1

i i

 

   

      x γ x β

SLIDE 34

34 of 51

The Unit model- An intriguing result

But In matrix notation Moreover, as the number of observations gets reduced, the regressor matrix tends to become square and so invertible. When we are close to the exact identification limit, for the OLS estimator we get

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

   

0, 1

i i i i

I y          x β x β x γ

c c

  X β I y

   

1 1 1

ˆ

OLS c c c c c c   

      δ X X X y X X X y

1 c c 

  X X β β

SLIDE 35

35 of 51

The Unit model- An intriguing result

This is not just a fun result. We can write the OLS estimator

f a now mixed sample as

The result just proven was

is the “passive” contribution of the

decontamination algorithm (discarding irrelevant obs)

is an additional “active” contribution (forcing the

remaining irrelevant obs to provide a coefficient vector closer to the true one), and we tend to get

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

         

1 1 1

ˆ X X X X X

c c r r r   

         b X X δ X X β X X e          

1 1 1

ˆ X X X , X X X X

c c c c c c r r

I

  

         δ y X X X X

ˆ  δ β

X X

r r

   X X

ˆ  δ β

 

1

X X X

r r r 

    b β e

SLIDE 36

36 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

     

 

 

, , , , 0,

i i i i i i i i i i i

y g u g u F u u E u            x x x β x x

       

 

1 , 2 1 1

i i

u F u F F u u F u                      

SLIDE 37

37 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

           

1 1 2 1

i i i i i i

g g y F F e g F                           x x x β x

It appears problematic to use maximum likelihood with the above model (still fooling around the prospect though).

We opt for non-linear least squares (NLS), for which we have again a valid MM decontamination algorithm.

SLIDE 38

38 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

We set and estimate alpha (if truncation does not really exist, theta is infinity, and so not a proper object

f estimation).

NLS minimizes Note that by construction But we do NOT impose the constraint , in order not to lose asymptotic normality in the case where in reality

 

F   

 

2 2 , , 1 1

1 min min 2 1

n n i i i i i i

F C y F g y

 

 

             

 

β β

x β x

1 2 1    1  

1  

SLIDE 39

39 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

The additional orthogonality condition here is which is validated by the true population relations SAMPLE FILTERING We discard observations for which

r

     

2 1 1

1 2 2 , 2 2 1 2 1

n n i i i i i i i

f F C C e e   

 

           

 

x β x x β β

 

1 1

ˆ ˆ ˆ ˆ 2 1 2 1

n n i i i i i i

C e F e F 

 

        

 

 

0,

i i i

E Fe E e  

 

ˆ ˆ

i NLS

F    x β

 

ˆ ˆ 1

i NLS

F     x β

SLIDE 40

40 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

NLS ASYMPTOTICS Let be the matrix with typical row a column vector

Z

,..., ,..., 2 1 2 1

i i i i Ki K

F F f f x                        

q

   

2

1 2 1 2 2 1 2 1

i i

F F                           

 

: Z q  J

 

1

ˆ ˆ

NLS asympt NLS

 



              β β J J J e

(see for example Judge et al. 1985, pp.199-200)

SLIDE 41

41 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

NLS ASYMPTOTICS

 

 



1 1 1 1 1

ˆ 0, , ˆ

d NLS NLS e

n N E n E n E n  

    

                        β β Σ Σ J J J V J J J

 

 

diag 1

e i i

E F F     V ee X

SLIDE 42

42 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

NLS ASYMPTOTICS for There is a convenient way to calculate the asymptotic variance without using the n x n matrix

 

ˆNLS n   

 

1 * * 2 1

ˆ 0, ,

Z e Z d NLS Z

E n q M M q n N v v E n q M q

 

 

          V

 

1

,

Z Z Z

M I P P Z Z Z Z



    

  



2

ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ

a z e z z

v n q M V M q q M q



   

Z

M

SLIDE 43

43 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

Note that is the residual series from regressing on using OLS.

Estimate the Truncated-error binary choice model
Transform the regressors by multiplying them by the

derivative and divide them by to obtain (note that it does not contain a constant term).

Create also the series
Regress by OLS (without adding any constant term in the

specification) and obtain the residual series, say . ˆ ˆ

z

M q

ˆ q

ˆ Z

ˆ

i

f

 

ˆ 2 1  

ˆ Z

  

 



ˆ ˆ ˆ 1 2 2 1

i i

q F    

ˆ ε

 

2 1 * 2 2 1

ˆ ˆ ˆ 1 ˆ ˆ

n i i i i n i i

F F v n



 

 

  



SLIDE 44

44 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

TESTING FOR THE EXISTENCE OF TRUNCATION NOTE: The test must be performed using the whole initial sample, NOT the decontaminated one.

 

2 2 H : 1

ˆ 1 χ (1)

NLS d

n v

 





  

SLIDE 45

45 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

TESTING FOR THE EXISTENCE OF TRUNCATION We have two options for the variance estimator

The one provided by NLS (multiplied by n)
The other from the exact expression for the asymptotic

variance Their difference is that the latter uses the true functional form for the heteroskedastic variance

 

2 2 H : 1

ˆ 1 χ (1)

NLS d

n v

 





  

SLIDE 46

46 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

TESTING FOR THE EXISTENCE OF TRUNCATION Initial Monte Carlo simulations for finite samples

The NLS robust estimator for the variance (using the HC2
r HC3 variant), is a tad conservative, tends to under-

reject the true null

The

estimator tends to over-reject, by relatively more. In any case, a caveat is in order…

 

2 2 H : 1

ˆ 1 χ (1)

NLS d

n v

 





  

*

ˆ v

SLIDE 47

47 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

TESTING FOR THE EXISTENCE OF TRUNCATION

We don’t expect “small” values for alpha.
Say, for

10% of probability mass is allocated to the centre. This is not small.

But for such values, the test will have low power.
So, perhaps failure to reject the null of no truncation

should not, on its own alone, lead to the abandonment of the truncated-error specification.

 

ˆ ˆ 0.95 F    

SLIDE 48

48 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

ESTIMATING THE Truncated-error MODEL

After decontaminating the model using NLS

Should we switch to maximum likelihood (to improve

n efficiency)?
We may experience numerical optimization issues
The last column is closely approximated by an affine

function of all the rest (Taylor)

Not perfect multicollinearity, but possible ill-conditioning
especially if truncation is severe.

 

1 2 ,..., , 2 1

i i i K

F F F                   J

SLIDE 49

49 of 51

Non-linear models

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

ESTIMATING THE Truncated-error MODEL NLS or MLE on the decontaminated sample?

NLS algorithm is usually the Levenberg-Marquardt (akin to

what a “ridge” estimator does to deal with ill-conditioning)

MLE is usually implemented using methods like Newton-

Raphson, that require inversion of the possibly ill- conditioned matrix as is. We can always implement both and see what happens.

SLIDE 50

50 of 51

References

Binary choice and nonsense probabilities (March 2017) Alecos Papadopoulos

Amemiya, T. (1977). "Some theorems in the linear probability model". International Economic Review, 645-650. Amemiya, T. (1981). "Qualitative response models: A survey." Journal of Economic Literature, 19(4), 1483-1536. Davidson, R., & MacKinnon, J. G. (2004). Econometric theory and methods. Oxford University Press. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). "Maximum likelihood from incomplete data via the EM algorithm". Journal of the royal statistical society. Series B (methodological), 1-38. Heckman, J. J. (1976). "The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models". In Annals of Economic and Social Measurement, 5(4), 475-492. NBER. Hill, R. C., & Adkins, L. C. (2001). "Collinearity". Ch. 12 in A companion to theoretical econometrics (ed. B.B. Baltagi), 256-78. Blackwell. Hsiao, C. (2003). Analysis of panel data. Cambridge university press. Hunter, D. R., & Lange, K. (2004). A tutorial on MM algorithms. The American Statistician, 58(1), 30-37. Judge, G. G., Griffiths, W. E., Hill, R. C., Lütkepohl, H., & Lee, T. C. (1985). The Theory and Practice of Econometrics, 2nd ed,. Wiley. McGillivray, R. G. (1970). "Estimating the linear probability function". Econometrica, 38(5), 775. Mroz, T. A. (1987), "The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions", Econometrica, 55(4), 765–799. Ortega, J. M., & Rheinboldt, W. C. (1970). Iterative solution of nonlinear equations in several variables. Republication 2000, Society for Industrial and Applied Mathematics. Swamy, P. A. V. B. (1971). Statistical inference in random coefficient regression models. Springer. Wooldridge, J. (2002). Econometric analysis of cross section and panel data. MIT Press.

SLIDE 51

51 of 51

Economics Research Workshop

March 2017

Making sense of “nonsense probabilities”: binary choice models when the underlying error term has bounded support.

Alecos Papadopoulos PhD Candidate

THANK YOU !