mixture models simulation based estimation
play

Mixture Models Simulation-based Estimation Michel Bierlaire - PowerPoint PPT Presentation

Mixture Models Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixture Models Simulation-based Estimation p. 1/72 Outline Mixtures Capturing correlation


  1. Mixture Models — Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixture Models — Simulation-based Estimation – p. 1/72

  2. Outline • Mixtures • Capturing correlation • Alternative specific variance • Taste heterogeneity • Latent classes • Simulation-based estimation Mixture Models — Simulation-based Estimation – p. 2/72

  3. Mixtures In statistics, a mixture probability distribution function is a convex combination of other probability distribution functions. If f ( ε, θ ) is a distribution function, and if w ( θ ) is a non negative function such that � w ( θ ) dθ = 1 θ then � g ( ε ) = w ( θ ) f ( ε, θ ) dθ θ is also a distribution function. We say that g is a w -mixture of f . If f is a logit model, g is a continuous w -mixture of logit If f is a MEV model, g is a continuous w -mixture of MEV Mixture Models — Simulation-based Estimation – p. 3/72

  4. Mixtures Discrete mixtures are also possible. If w i , i = 1 , . . . , n are non negative weights such that n � w i = 1 i =1 then n � g ( ε ) = w i f ( ε, θ i ) i =1 is also a distribution function where θ i , i = 1 , . . . , n are parameters. We say that g is a discrete w -mixture of f . Mixture Models — Simulation-based Estimation – p. 4/72

  5. Example: discrete mixture of normal distributions 2.5 N(5,0.16) N(8,1) 0.6 N(5,0.16) + 0.4 N(8,1) 2 1.5 1 0.5 0 4 5 6 7 8 9 10 11 Mixture Models — Simulation-based Estimation – p. 5/72

  6. Example: discrete mixture of binary logit models 1 P(1|s=1,x) P(1|s=2,x) 0. 4 P(1|s=1,x) + 0.6 P(1|s=2,x) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -2 0 2 4 Mixture Models — Simulation-based Estimation – p. 6/72

  7. Mixtures • General motivation: generate flexible distributional forms • For discrete choice: • correlation across alternatives • alternative specific variances • taste heterogeneity • . . . Mixture Models — Simulation-based Estimation – p. 7/72

  8. Back to the telephone example Budget measured: U BM = α BM + βX BM + ε BM Standard measured: U SM = α SM + βX SM + ε SM Local flat: U LF = α LF + βX LF + ε LF Extended area flat: U EF = α EF + βX EF + ε EF Metro area flat: U MF = βX MF + ε MF Distributions for ε : logit, probit, nested logit Mixture Models — Simulation-based Estimation – p. 8/72

  9. Back to the telephone example Covariance of U Logit Probit     σ 2 σ 2 0 0 0 0 σ BM,SM σ BM,LF σ BM,EF σ BM,MF BM σ 2 σ 2     0 0 0 0 σ BM,SM σ SM,LF σ SM,EF σ SM,MF     SM     σ 2 σ 2  0 0 0 0   σ BM,LF σ SM,LF σ LF σ LF  ,EF ,MF LF         σ 2 σ 2 σ BM,EF σ SM,EF σ LF σ EF 0 0 0 0     ,EF ,MF EF     σ 2 σ 2 σ BM,MF σ SM,MF σ LF σ EF 0 0 0 0 ,MF ,MF MF Nested logit   1 ρ M 0 0 0   ρ M 1 0 0 0     , ρ i = 1 − µ 2 π 2 ρ F ρ F  0 0 1  6 µ 2 µ 2   i   ρ F ρ F 0 0 1     0 0 ρ F ρ F 1 Mixture Models — Simulation-based Estimation – p. 9/72

  10. Continuous Mixtures of logit • Combining probit and logit • Error decomposed into two parts U in = V in + ξ + ν i.i.d EV (logit): tractability Normal distribution (probit): flexibility Mixture Models — Simulation-based Estimation – p. 10/72

  11. Logit • Utility: U auto = βX auto + ν auto U bus = βX bus + ν bus U subway = βX subway + ν subway • ν i.i.d. extreme value • Probability: e βX auto Pr( auto | X ) = e βX auto + e βX bus + e βX subway Mixture Models — Simulation-based Estimation – p. 11/72

  12. Normal mixture of logit • Utility: U auto = βX auto + ξ auto + ν auto U bus = βX bus + ξ bus + ν bus U subway = βX subway + ξ subway + ν subway • ν i.i.d. extreme value, ξ ∼ N (0 , Σ) • Probability: e βX auto + ξ auto Pr( auto | X, ξ ) = e βX auto + ξ auto + e βX bus + ξ bus + e βX subway + ξ subway � P ( auto | X ) = Pr( auto | X, ξ ) f ( ξ ) dξ ξ Mixture Models — Simulation-based Estimation – p. 12/72

  13. Simulation � P ( auto | X ) = Pr( auto | X, ξ ) f ( ξ ) dξ ξ • Integral has no closed form. • Monte Carlo simulation must be used. Mixture Models — Simulation-based Estimation – p. 13/72

  14. Simulation • In order to approximate � P ( i | X ) = Pr( i | X, ξ ) f ( ξ ) dξ ξ • Draw from f ( ξ ) to obtain r 1 , . . . , r R • Compute R = 1 P ( i | X ) ≈ ˜ � P ( i | X ) P ( i | X, r k ) R k =1 R e V 1 n + r k = 1 � e V 1 n + r k + e V 2 n + r k + e V 3 n R k =1 Mixture Models — Simulation-based Estimation – p. 14/72

  15. Capturing correlations: nesting • Utility: U auto = βX auto + ν auto U bus = βX bus + σ transit η transit + ν bus U subway = βX subway + σ transit η transit + ν subway • ν i.i.d. extreme value, η transit ∼ N (0 , 1) , σ 2 transit = cov(bus,subway) • Probability: e βX auto Pr( auto | X, η transit ) = e βX auto + e βX bus + σ transit η transit + e βX subway + σ transit η transit � P ( auto | X ) = Pr( auto | X, η ) f ( η ) dη η Mixture Models — Simulation-based Estimation – p. 15/72

  16. Nesting structure Example: residential telephone ASC_BM ASC_SM ASC_LF ASC_EF BETA_C σ M σ F BM 1 0 0 0 ln (cost(BM)) 0 η M SM 0 1 0 0 ln (cost(SM)) η M 0 LF 0 0 1 0 ln (cost(LF)) 0 η F EF 0 0 0 1 ln (cost(EF)) 0 η F MF 0 0 0 0 ln (cost(MF)) 0 η F Mixture Models — Simulation-based Estimation – p. 16/72

  17. Nesting structure Identification issues: • If there are two nests, only one σ is identified • If there are more than two nests, all σ ’s are identified Walker (2001) Results with 5000 draws.. Mixture Models — Simulation-based Estimation – p. 17/72

  18. NL NML NML NML NML σ F = 0 σ M = 0 σ F = σ M L -473.219 -472.768 -473.146 -472.779 -472.846 Value Scaled Value Scaled Value Scaled Value Scaled Value Scaled ASC BM -1.784 1.000 -3.81247 1.000 -3.79131 1.000 -3.80999 1.000 -3.81327 1.000 ASC EF -0.558 0.313 -1.19899 0.314 -1.18549 0.313 -1.19711 0.314 -1.19672 0.314 ASC LF -0.512 0.287 -1.09535 0.287 -1.08704 0.287 -1.0942 0.287 -1.0948 0.287 ASC SM -1.405 0.788 -3.01659 0.791 -2.9963 0.790 -3.01426 0.791 -3.0171 0.791 B LOGCOST -1.490 0.835 -3.25782 0.855 -3.24268 0.855 -3.2558 0.855 -3.25805 0.854 FLAT 2.292 MEAS 2.063 σ F 3.02027 0 3.06144 2.17138 σ M 0.52875 3.024833 0 2.17138 σ 2 F + σ 2 9.402 9.150 9.372 9.430 M

  19. Comments • The scale of the parameters is different between NL and the mixture model • Normalization can be performed in several ways • σ F = 0 • σ M = 0 • σ F = σ M • Final log likelihood should be the same • But... estimation relies on simulation • Only an approximation of the log likelihood is available • Final log likelihood with 50000 draws: Unnormalized: -472.872 σ M = σ F : -472.875 σ F = 0 : -472.884 σ M = 0 : -472.901 Mixture Models — Simulation-based Estimation – p. 18/72

  20. Cross nesting ⑤ � ❅ � ❅ � ❅ � ❅ � ❅ P ⑤ ⑤ Nest 1 P Nest 2 P � ❅ � ❅ � ❅ P � ❅ P P � ❅ P � ❅ ⑤ ⑤ ⑤ ⑤ ⑤ Bus Train Car Ped. Bike U bus = V bus + ξ 1 + ε bus U train = V train + ξ 1 + ε train U car = V car + ξ 1 + ξ 2 + ε car U ped = V ped + ξ 2 + ε ped U bike = V bike + ξ 2 + ε bike � � P ( car ) = P ( car | ξ 1 , ξ 2 ) f ( ξ 1 ) f ( ξ 2 ) dξ 2 dξ 1 ξ 1 ξ 2 Mixture Models — Simulation-based Estimation – p. 19/72

  21. Identification issue • Not all parameters can be identified • For logit, one ASC has to be constrained to zero • Identification of NML is important and tricky • See Walker, Ben-Akiva & Bolduc (2007) for a detailed analysis Mixture Models — Simulation-based Estimation – p. 20/72

  22. Alternative specific variance • Error terms in logit are i.i.d. and, in particular, have the same variance U in = β T x in + ASC i + ε in • ε in i.i.d. extreme value ⇒ Var ( ε in ) = π 2 / 6 µ 2 • In order allow for different variances, we use mixtures U in = β T x in + ASC i + σ i ξ i + ε in where ξ i ∼ N (0 , 1) • Variance: i + π 2 Var ( σ i ξ i + ε in ) = σ 2 6 µ 2 Mixture Models — Simulation-based Estimation – p. 21/72

  23. Alternative specific variance Identification issue: • Not all σ s are identified • One of them must be constrained to zero • Not necessarily the one associated with the ASC constrained to zero • In theory, the smallest σ must be constrained to zero • In practice, we don’t know a priori which one it is • Solution: 1. Estimate a model with a full set of σ s 2. Identify the smallest one and constrain it to zero. Mixture Models — Simulation-based Estimation – p. 22/72

  24. Alternative specific variance Example with Swissmetro ASC_CAR ASC_SBB ASC_SM B_COST B_FR B_TIME Car 1 0 0 cost 0 time Train 0 0 0 cost freq. time Swissmetro 0 0 1 cost freq. time + alternative specific variance Mixture Models — Simulation-based Estimation – p. 23/72

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend