Binary Choice Matthieu de Lapparent matthieu.delapparent@epfl.ch - - PowerPoint PPT Presentation

binary choice
SMART_READER_LITE
LIVE PREVIEW

Binary Choice Matthieu de Lapparent matthieu.delapparent@epfl.ch - - PowerPoint PPT Presentation

Binary Choice Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility Laboratory, School of Architecture, Civil and Environmental Engineering, Ecole Polytechnique F ed erale de Lausanne Transport and Mobility Laboratory


slide-1
SLIDE 1

Binary Choice

Matthieu de Lapparent

matthieu.delapparent@epfl.ch

Transport and Mobility Laboratory, School of Architecture, Civil and Environmental Engineering, Ecole Polytechnique F´ ed´ erale de Lausanne

Transport and Mobility Laboratory Binary Choice 1 / 53

slide-2
SLIDE 2

Outline

1

Model specification

2

Applying the model

3

Maximum likelihood estimation

4

Estimation output

5

Back to the scale

Transport and Mobility Laboratory Binary Choice 2 / 53

slide-3
SLIDE 3

Model specification

Example

Data Unit of analysis: travelers (simulated observations) Choice set: choice of car (C) or transit (T) Independent variable: travel time

Ben-Akiva & Lerman (1985) Discrete Choice Analysis: Theory and Applications to Travel Demand, MIT Press (p.88)

Transport and Mobility Laboratory Binary Choice 3 / 53

slide-4
SLIDE 4

Model specification

Example

Data from 21 decision makers Time Time Time Time # auto transit Choice # auto transit Choice 1 52.9 4.4 T 11 99.1 8.4 T 2 4.1 28.5 T 12 18.5 84.0 C 3 4.1 86.9 C 13 82.0 38.0 C 4 56.2 31.6 T 14 8.6 1.6 T 5 51.8 20.2 T 15 22.5 74.1 C 6 0.2 91.2 C 16 51.4 83.8 C 7 27.6 79.7 C 17 81.0 19.2 T 8 89.9 2.2 T 18 51.0 85.0 C 9 41.5 24.5 T 19 62.2 90.1 C 10 95.0 43.5 T 20 95.1 22.2 T 21 41.6 91.5 C

Transport and Mobility Laboratory Binary Choice 4 / 53

slide-5
SLIDE 5

Model specification

Binary choice model

Specification of utility functions UC = β1TC + εC UT = β1TT + εT where TC is the travel time by car (min) and TT the travel time by transit (min). Choice model P(C|{C, T}) = P(UC ≥ UT) = P(β1TC + εC ≥ β1TT + εT) = P(β1TC − β1TT ≥ εT − εC) = P (ε ≤ β1(TC − TT)) where ε = εT − εC.

Transport and Mobility Laboratory Binary Choice 5 / 53

slide-6
SLIDE 6

Model specification

Error term

Three questions about the random variables εT and εC

1 What’s their distribution? 2 What’s their moments: 1

Mean?

2

Variance?

Note For binary choice it is sufficient to make assumptions about ε = εT − εC

Transport and Mobility Laboratory Binary Choice 6 / 53

slide-7
SLIDE 7

Model specification

First-order moment: mean

Note Adding the same constant µ to all utility functions does not affect the choice model Pr(UC ≥ UT) = Pr(UC + µ ≥ UT + µ) ∀µ ∈ R. Why? An utility function is defined up to a monotone increasing transformation.

Transport and Mobility Laboratory Binary Choice 7 / 53

slide-8
SLIDE 8

Model specification

First-order moment: mean, cont.

Change of variables Assume that E[εC] = βC and E[εT] = βT. Define ε′

C = εC − βC and ε′ T = εT − βT,

so that E[ε′

C] = E[ε′ T] = 0.

Choice model P(C|{C, T}) = Pr(β1(TC − TT) ≥ εT − εC) = Pr(β1(TC − TT) ≥ ε′

T + βT − ε′ C − βC) =

Pr(β1(TC − TT) + (βC − βT) ≥ ε′

T − ε′ C) =

Pr(β1(TC − TT) + β0 ≥ ε′) where β0 = βC − βT and ε′ = ε′

T − ε′ C.

Transport and Mobility Laboratory Binary Choice 8 / 53

slide-9
SLIDE 9

Model specification

First-order moment: mean, cont.

Mean The mean of ε can be included as a parameter of the deterministic part of utility Only the mean of the difference of the error terms is meaningful Alternative Specific Constant (ASC) UC = β1TC +εC UT = β1TT + β0 +εT or UC = β1TC − β0 +εC UT = β1TT +εT In practice, one needs to associate an ASC with all alternatives but one: exclusion constraint to define a one-to-one mapping between vector of parameters and choice probabilities

Transport and Mobility Laboratory Binary Choice 9 / 53

slide-10
SLIDE 10

Model specification

Second-order moment: the variance

Utility is ordinal Utilities can be scaled up or down without changing the choice probability Pr(UC ≥ UT) = Pr(αUC ≥ αUT) ∀α > 0 Repeat once more! A utility function is defined up to a monotone increasing transformation. Link with the variance Var(αUC) = α2 Var(UC) Var(αUT) = α2 Var(UT) Variance is not identified As any α can be selected arbitrarily, any variance can be assumed. No way to identify the variance of the error terms from data. The scale has to be arbitrarily defined: normalization constraint to

Transport and Mobility Laboratory Binary Choice 10 / 53

slide-11
SLIDE 11

Model specification

Practical summary

Only difference in levels of utility matters It is not possible to estimate all ASC but only their differences. Choose arbitrarily one of the ASCs as reference and fix it to 0: estimated differences of ASCs are wrt to this reference Scale is arbitrary It means for a linear utility function that the values of the parameters are not sensible.

Transport and Mobility Laboratory Binary Choice 11 / 53

slide-12
SLIDE 12

Model specification

The normal distribution

Assumption 1 εT and εC are the sum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Central-limit theorem The sum of many i.i.d. random variables approximately follows a normal distribution: N(µ, σ2). Assumed distribution εC ∼ N(0, 1), εT ∼ N(0, 1), εC ⊥ εT

Transport and Mobility Laboratory Binary Choice 12 / 53

slide-13
SLIDE 13

Model specification

The normal distribution, cont.

Probability density function (pdf): f (t) = 1 σ √ 2π e− (t−µ)2

2σ2

Cumulative distribution function (CDF) P(c ≥ ε) = F(c) = c

−∞

f (t)dt No closed form

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

  • 3
  • 2
  • 1

1 2 3 Utility Time c * exp(-x*x/2.0) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 3
  • 2
  • 1

1 2 3 Utility Time norm(x)

Transport and Mobility Laboratory Binary Choice 13 / 53

slide-14
SLIDE 14

Model specification

The normal distribution, cont.

ε = εT − εC From the properties of the normal distribution, we have εC ∼ N(0, 1) εT ∼ N(0, 1) ε = εT − εC ∼ N(0, 2) As the variance is arbitrary, we may also assume εC ∼ N(0, 0.5) εT ∼ N(0, 0.5) ε = εT − εC ∼ N(0, 1)

Transport and Mobility Laboratory Binary Choice 14 / 53

slide-15
SLIDE 15

Model specification

The binary probit model

Choice model P(C|{C, T}) = Pr(β1(TC − TT) + β0 ≥ ε) = Fε(β1(TC − TT) + β0) The binary probit model P(C|{C, T}) = 1 √ 2π β1(TC −TT )−β0

−∞

e− 1

2 t2dt

Not a closed form expression

Transport and Mobility Laboratory Binary Choice 15 / 53

slide-16
SLIDE 16

Model specification

The binary probit model

The distribution If the error terms are assumed to follow a normal distribution, the corresponding model is called Probability Unit Model or Probit Model.

Transport and Mobility Laboratory Binary Choice 16 / 53

slide-17
SLIDE 17

Model specification

The Gumbel distribution

Assumption 2 εT and εC are the maximum of many r.v. capturing unobservable attributes (e.g. mood, experience), measurement and specification errors. Gumbel theorem The maximum of many i.i.d. random variables approximately follows an Extreme Value distribution: EV(η, µ). Assumed distribution εC ∼ EV(0, 1), εT ∼ EV(0, 1), εC ⊥ εT

Transport and Mobility Laboratory Binary Choice 17 / 53

slide-18
SLIDE 18

Model specification

The type 1 Extreme Value distribution EV1(η, µ)

Probability density function (pdf) f (t) = µe−µ(t−η)e−e−µ(t−η) Cumulative distribution function (CDF) P(c ≥ ε) = F(c) = c

−∞

f (t)dt = e−e−µ(c−η)

Transport and Mobility Laboratory Binary Choice 18 / 53

slide-19
SLIDE 19

Model specification

The type 1 Extreme Value distribution

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

  • 3
  • 2
  • 1

1 2 3 Gumbel PDF mu=0 sigma=1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

  • 3
  • 2
  • 1

1 2 3 Gumbel CDF mu=0 sigma=1

Transport and Mobility Laboratory Binary Choice 19 / 53

slide-20
SLIDE 20

Model specification

The type 1 Extreme Value distribution

Properties If ε ∼ EV(η, µ) then E[ε] = η + γ µ and Var[ε] = π2 6µ2 where γ is Euler’s constant. Euler’s constant γ = lim

k→∞ k

  • i=1

1 i − ln k = − ∞ e−x ln xdx ≈ 0.5772

Transport and Mobility Laboratory Binary Choice 20 / 53

slide-21
SLIDE 21

Model specification

Difference of independent type 1 Extreme Value distributions

ε = εT − εC From the properties of the extreme value distribution, we have εC ∼ EV(0, 1) εT ∼ EV(0, 1) ε ∼ Logistic(0, 1)

Transport and Mobility Laboratory Binary Choice 21 / 53

slide-22
SLIDE 22

Model specification

The Logistic distribution: Logistic(η,µ)

Probability density function (pdf) f (t) = µe−µ(t−η) (1 + e−µ(t−η))2 Cumulative distribution function (CDF) P(c ≥ ε) = F(c) = c

−∞

f (t)dt = 1 1 + e−µ(c−η) with µ > 0.

Transport and Mobility Laboratory Binary Choice 22 / 53

slide-23
SLIDE 23

Model specification

The binary logit model

Choice model P(C|{C, T}) = Pr(β1(TC − TT) + β0 ≥ ε) = Fε(β1(TC − TT) + β0) The binary logit model P(C|{C, T}) = 1 1 + e−(β1(TC −TT )+β0) = eβ1TC +β0 eβ1TC +β0 + eβ1TT The binary logit model P(C|{C, T}) = eVC eVC + eVT

Transport and Mobility Laboratory Binary Choice 23 / 53

slide-24
SLIDE 24

Model specification

Logit curve

Vin-Vjn Pn(i) 1

logit curve for non- limiting cases

Transport and Mobility Laboratory Binary Choice 24 / 53

slide-25
SLIDE 25

Model specification

Logit curve: limiting cases

Scale ⟶ 0 P=0.5

Vin-Vjn Pn(i) 1

Scale ⟶ ∞ deterministic logit curve for non- limiting cases

Transport and Mobility Laboratory Binary Choice 25 / 53

slide-26
SLIDE 26

Applying the model

Back to the example

Remember the data from our 21 decision makers? Time Time Time Time # auto transit Choice # auto transit Choice 1 52.9 4.4 T 11 99.1 8.4 T 2 4.1 28.5 T 12 18.5 84.0 C 3 4.1 86.9 C 13 82.0 38.0 C 4 56.2 31.6 T 14 8.6 1.6 T 5 51.8 20.2 T 15 22.5 74.1 C 6 0.2 91.2 C 16 51.4 83.8 C 7 27.6 79.7 C 17 81.0 19.2 T 8 89.9 2.2 T 18 51.0 85.0 C 9 41.5 24.5 T 19 62.2 90.1 C 10 95.0 43.5 T 20 95.1 22.2 T 21 41.6 91.5 C

Transport and Mobility Laboratory Binary Choice 26 / 53

slide-27
SLIDE 27

Applying the model

First individual

Parameters Let’s assume that β0 = 0.5 and β1 = −0.1 Variables Let’s consider the first observation: TC1 = 52.9 TT1 = 4.4 Choice = transit: yauto,1 = 0, ytransit,1 = 1 Choice What’s the probability given by the model that this individual indeed chooses transit?

Transport and Mobility Laboratory Binary Choice 27 / 53

slide-28
SLIDE 28

Applying the model

First individual

Utility functions VC1 = β1TC1 = −5.29 VT1 = β1TT1 + β0 = 0.06 Choice model P1(transit) = eVT1 eVT1 + eVC1 = e0.06 e0.06 + e−5.29 ∼ = 1 Comments The model fits the observation very well. Consistent with the assumption that travel time is the only explanatory variable.

Transport and Mobility Laboratory Binary Choice 28 / 53

slide-29
SLIDE 29

Applying the model

Second individual

Parameters Let’s assume that β0 = 0.5 and β1 = −0.1 Variables TC2 = 4.1 TT2 = 28.5 Choice = transit: yauto,2 = 0, ytransit,2 = 1 Choice What’s the probability given by the model that this individual indeed chooses transit?

Transport and Mobility Laboratory Binary Choice 29 / 53

slide-30
SLIDE 30

Applying the model

Second individual

Utility functions VC2 = β1TC2 = −0.41 VT2 = β1TT2 + β0 = −2.35 Choice model P2(transit) = eVT2 eVT2 + eVC2 = e−2.35 e−2.35 + e−0.41 ∼ = 0.13 Comment The model fits the observation poorly. But the assumption is that travel time is the only explanatory variable. Still, the probability is not small.

Transport and Mobility Laboratory Binary Choice 30 / 53

slide-31
SLIDE 31

Applying the model

Back to the example

Two observations The probability that the model reproduces both observations is P1(transit)P2(transit) = 0.13 All observations The probability that the model reproduces all observations is P1(transit)P2(transit) . . . P21(auto) = 4.62 10−4 In general L∗ =

  • n

(Pn(auto)yauto,nPn(transit)ytransit,n) where yj,n is 1 if individual n has chosen alternative j, 0 otherwise

Transport and Mobility Laboratory Binary Choice 31 / 53

slide-32
SLIDE 32

Maximum likelihood estimation

Back to the example

L∗ is called the likelihood of the sample for a given model. Probability that the model fits all observations It is a function of the parameters Examples for some values of β0 and β1 β0 β1 L∗ 4.57 10−07

  • 1

1.97 10−30

  • 0.1

4.1 10−04 0.5

  • 0.1

4.62 10−04

Transport and Mobility Laboratory Binary Choice 32 / 53

slide-33
SLIDE 33

Maximum likelihood estimation

Likelihood function

  • 0.3 -0.25 -0.2 -0.15 -0.1 -0.05

0 -2

  • 1.5
  • 1
  • 0.500.511.522.5

0.001 0.002 0.003

  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0-2

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5 0.0002 0.0004 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 0.002 time by car-time by train cost by car-cost by train

β1 β0 0.0005 0.001 0.0015 0.002 0.0025

Transport and Mobility Laboratory Binary Choice 33 / 53

slide-34
SLIDE 34

Maximum likelihood estimation

Likelihood function (zoom)

  • 0.08 -0.07 -0.06 -0.05 -0.04 -0.030

0.1 0.2 0.3 0.4 0.5 0.001 0.002 0.003

  • 0.08
  • 0.07
  • 0.06
  • 0.05
  • 0.04
  • 0.03

0.1 0.2 0.3 0.4 0.5 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 0.002 0.0022 time by car-time by train cost by car-cost by train

β1 β0 0.0006 0.0008 0.001 0.0012 0.0014 0.0016 0.0018 0.002 0.0022

Transport and Mobility Laboratory Binary Choice 34 / 53

slide-35
SLIDE 35

Maximum likelihood estimation

Maximum likelihood estimation

Estimators for the parameters Parameters that achieve the maximum likelihood max

β

  • n

(Pn(auto; β)yauto,nPn(transit; β)ytransit,n) Log likelihood Alternatively, we prefer to maximize the log likelihood max

β

ln

  • n

(Pn(auto)yauto,nPn(transit)ytransit,n) = max

β

  • n

ln (yauto,nPn(auto) + ytransit,nPn(transit))

Transport and Mobility Laboratory Binary Choice 35 / 53

slide-36
SLIDE 36

Maximum likelihood estimation

Maximum likelihood estimation

  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0-2

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5

  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

time by car-time by train cost by car-cost by train

Transport and Mobility Laboratory Binary Choice 36 / 53

slide-37
SLIDE 37

Maximum likelihood estimation

Solving the optimization problem

Unconstrained nonlinear optimization Iterative methods Designed to identify a local maximum When the function is concave, a local maximum is also a global maximum For binary logit, the log-likelihood is concave Use the derivatives of the objective function Example: package CFSQP used in BIOGEME

Transport and Mobility Laboratory Binary Choice 37 / 53

slide-38
SLIDE 38

Maximum likelihood estimation

Example of algorithm

Tests with CFSQP package within BIOGEME

Prec. β∗ β∗

1

L∗(β∗) ∇L∗(β∗) 1.0 +0.0000e+00 +1.4901e−08

  • 14.56

456.05 1.0e-01 +2.5810e−01

  • 5.5361e−02
  • 6.172

4.9646 1.0e-02 +2.4274e−01

  • 5.2330e−02
  • 6.167

1.9711 1.0e-03 +2.3732e−01

  • 5.3146e−02
  • 6.166

0.089982 1.0e-04 +2.3758e−01

  • 5.3110e−02
  • 6.166

0.0015384 1.0e-05 +2.3757e−01

  • 5.3110e−02
  • 6.166

0.0015384

Transport and Mobility Laboratory Binary Choice 38 / 53

slide-39
SLIDE 39

Maximum likelihood estimation

Example of algorithm: CFSQP

Transport and Mobility Laboratory Binary Choice 39 / 53

slide-40
SLIDE 40

Maximum likelihood estimation

Nonlinear optimization

Things to be aware of... Iterative methods terminate when a given stopping criterion is verified, based on the fact that, if β∗ is the optimum, ∇ ln L(β∗) = 0 Stopping criteria vary across optimization packages (based on required precision) → slightly different solutions Most methods are sensitive to the conditioning of the problem A well-conditioned problem → all parameters have almost the same magnitude

Transport and Mobility Laboratory Binary Choice 40 / 53

slide-41
SLIDE 41

Maximum likelihood estimation

Nonlinear optimization

  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0-2

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 2 2.5

  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

Time Utility

  • 0.3
  • 0.25
  • 0.2
  • 0.15
  • 0.1
  • 0.05

0-4

  • 3
  • 2
  • 1

1 2 3 4

  • 400
  • 350
  • 300
  • 250
  • 200
  • 150
  • 100
  • 50

Time Utility

Time in min. Time in sec.

Transport and Mobility Laboratory Binary Choice 41 / 53

slide-42
SLIDE 42

Maximum likelihood estimation

Nonlinear optimization

Things to be aware of... Convergence may be very slow or even fail if likelihood function is flat It happens when the model is not identifiable Structural flaw in the model (e.g. full set of alternative specific constants) Lack of variability in the data (all prices are the same across the sample)

Transport and Mobility Laboratory Binary Choice 42 / 53

slide-43
SLIDE 43

Maximum likelihood estimation

Nonlinear programming

  • 0.08
  • 0.07
  • 0.06
  • 0.05
  • 0.04
  • 0.03

0.1 0.2 0.3 0.4 0.5

  • 15
  • 14.95
  • 14.9
  • 14.85
  • 14.8
  • 14.75
  • 14.7
  • 14.65
  • 14.6
  • 14.55
  • 14.5

Time Utility

Transport and Mobility Laboratory Binary Choice 43 / 53

slide-44
SLIDE 44

Estimation output

Output of the estimation

Solution of maxβ∈RK L(β) β∗ ln L(β∗) Case study β∗

0 = 0.2376

β∗

1 = −0.0531

ln L(β∗

0, β∗ 1) = −6.166

Transport and Mobility Laboratory Binary Choice 44 / 53

slide-45
SLIDE 45

Estimation output

Second derivatives

Information about the quality of the estimators. Let ∇2 ln L(β∗) =          

∂2 ln L ∂β2

1

∂2 ln L ∂β1∂β2

· · ·

∂2 ln L ∂β1∂βK ∂2 ln L ∂β2∂β1 ∂2 ln L ∂β2

2

· · ·

∂2 ln L ∂β2∂βK

. . . ... . . . . . . ... . . .

∂2 ln L ∂βK ∂β1 ∂2 ln L ∂βK ∂β2

· · ·

∂2 ln L ∂β2

K

          −∇2 ln L(β∗)−1 is a consistent estimator of the variance-covariance matrix

  • f the estimates... if the assumed distribution is “the true one”!

Transport and Mobility Laboratory Binary Choice 45 / 53

slide-46
SLIDE 46

Estimation output

Statistics

Statistics on the parameters Parameter Value Std Err. t-test β0 0.2376 0.7505 0.32 β1

  • 0.0531

0.0206

  • 2.57

Summary statistics ln L(β∗) = -6.166 ln L(0) = -14.556 −2(ln L(0) − ln L(β∗)) = 16.780 ρ2 = 0.576, ¯ ρ2 = 0.439

Transport and Mobility Laboratory Binary Choice 46 / 53

slide-47
SLIDE 47

Estimation output

Null log likelihood

ln L(0) sample log likelihood with a trivial model where all parameters are zero, that is a model always predicting P(1|{1, 2}) = P(2|{1, 2}) = 1 2 Purely a function of sample size ln L(0) = log( 1 2N ) = −N log(2)

Transport and Mobility Laboratory Binary Choice 47 / 53

slide-48
SLIDE 48

Estimation output

Likelihood ratio

−2(ln L(0) − ln L(β∗)) log ln L(0) ln L(β∗)

  • = log(ln L(0)) − log(ln L(β∗)) = ln L(0) − ln L(β∗)

Likelihood ratio test H0: the two models are equivalent Under H0, −2(ln L(0) − ln L(β∗)) is asymptotically distributed as χ2 with K degrees of freedom (K is the difference between the number

  • f parameters in the full model and the number of parameters in the

restricted model. The 2 models needs to be nested). Similar to the F test in regression models

Transport and Mobility Laboratory Binary Choice 48 / 53

slide-49
SLIDE 49

Estimation output

Rho (bar) squared

ρ2 ρ2 = 1 − ln L(β∗) ln L(0) Similar to the R2 in regression models ¯ ρ2 ¯ ρ2 = 1 − ln L(β∗) − K ln L(0)

Transport and Mobility Laboratory Binary Choice 49 / 53

slide-50
SLIDE 50

Back to the scale

Comparing models

Arbitrary scale may be problematic when comparing models Binary probit: σ2 = Var(εi − εj) = 1 Binary logit: Var(εi − εj) = π2/(3µ) = π2/3 Var(αU) = α2 Var (U). Scaled logit coeff. are π/ √ 3 larger than scaled probit coeff.

Transport and Mobility Laboratory Binary Choice 50 / 53

slide-51
SLIDE 51

Back to the scale

Comparing models

Estimation results Probit Logit Probit * π/ √ 3 L

  • 6.165
  • 6.166

β0 0.064 0.238 0.117 β1

  • 0.030
  • 0.053
  • 0.054

Note: π/ √ 3 ≈ 1.814

Transport and Mobility Laboratory Binary Choice 51 / 53

slide-52
SLIDE 52

Back to the scale

Appendix

Transport and Mobility Laboratory Binary Choice 52 / 53

slide-53
SLIDE 53

Back to the scale

Maximum likelihood for binary logit

Let Cn = {i, j} Let yin = 1 if i is chosen by n, 0 otherwise Let yjn = 1 if j is chosen by n, 0 otherwise Obviously, yin = 1 − yjn Log-likelihood of the sample

N

  • n=1
  • yin ln

eVin eVin + eVjn + yjn ln eVjn eVin + eVjn

  • Transport and Mobility Laboratory

Binary Choice 53 / 53