mixtures of models
play

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch - PowerPoint PPT Presentation

Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixtures of models p. 1/70 Mixtures In statistics, a mixture density is a pdf which is a convex linear combination of other pdfs. If f ( ,


  1. Mixtures of models Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixtures of models – p. 1/70

  2. Mixtures In statistics, a mixture density is a pdf which is a convex linear combination of other pdf’s. If f ( ε, θ ) is a pdf, and if w ( θ ) is a nonnegative function such that � w ( θ ) dθ = 1 θ then � g ( ε ) = w ( θ ) f ( ε, θ ) dθ θ is also a pdf. We say that g is a mixture of f . If f is the pdf of a logit model, it is a mixture of logit If f is the pdf of a MEV model, it is a mixture of MEV Mixtures of models – p. 2/70

  3. Mixtures Discrete mixtures are also possible. If f ( ε, θ ) is a pdf, and if w i , i = 1 , . . . , n are nonnegative weights such that n � w i = 1 i =1 associated with parameter values θ i , i = 1 , . . . , n then n � g ( ε ) = w i f ( ε, θ i ) i =1 is also a pdf. We say that g is a discrete mixture of f . Mixtures of models – p. 3/70

  4. Mixtures Two important motivations: • Define more complex error terms • heteroscedasticity • correlation across alternatives • Capture taste heterogeneity Mixtures of models – p. 4/70

  5. Capturing correlations Logit U in = V in + ε in where ε in iid EV Idea for the derivation of the nested logit model: U in = V in + ε m + ε in where ε m is the error term specific to nest m . Assumptions for the nested logit model: • ε m are independent across m • ε m + ε ′ m ∼ EV (0 , µ ) , and 1 i ∈ C m e µ m V i • ε ′ µ m ln � m = max i ∈ C m ( V i + ε im ) − Mixtures of models – p. 5/70

  6. Capturing correlations • Assumptions are convenient for the derivation of the model • They are not natural or intuitive Consider a trinomial model, where alternatives 1 and 2 are correlated U 1 n = V 1 n + ε m + ε 1 n U 2 n = V 2 n + ε m + ε 2 n U 3 n = V 3 n + ε 3 n If ε in are iid EV and ε m is given, we have e V 1 n + ε m P n (1 | ε m , C n ) = e V 1 n + ε m + e V 2 n + ε m + e V 3 n Mixtures of models – p. 6/70

  7. Capturing correlations But... ε m is not given! If we know its density function, we have � P n (1 |C n ) = P n (1 | ε m , C n ) f ( ε m ) dε m ε m This is a mixture of logit models In general, it is hopeless to obtain an analytical form for P n (1 |C n ) Simulation must be used. Mixtures of models – p. 7/70

  8. Simulation: reminders Pseudo-random numbers generators Although deterministically generated, numbers exhibit the properties of random draws • Uniform distribution • Standard normal distribution • Transformation of standard normal • Inverse CDF • Multivariate normal Mixtures of models – p. 8/70

  9. Simulation: uniform distribution • Almost all programming languages provide generators for a uniform U (0 , 1) • If r is a draw from a U (0 , 1) , then s = ( b − a ) r + a is a draw from a U ( a, b ) Mixtures of models – p. 9/70

  10. Simulation: standard normal • If r 1 and r 2 are independent draws from U (0 , 1) , then s 1 = √− 2 ln r 1 sin(2 πr 2 ) s 2 = √− 2 ln r 1 cos(2 πr 2 ) are independent draws from N (0 , 1) Mixtures of models – p. 10/70

  11. Simulation: transformations of standard normal • If r is a draw from N (0 , 1) , then s = br + a is a draw from N ( a, b 2 ) • If r is a draw from N ( a, b 2 ) , then e r is a draw from a lognormal LN ( a, b 2 ) with mean e a +( b 2 / 2) and variance e 2 a + b 2 ( e b 2 − 1) Mixtures of models – p. 11/70

  12. Simulation: inverse CDF • Consider a univariate r.v. with CDF F ( ε ) • If F is invertible and if r is a draw from U (0 , 1) , then s = F − 1 ( r ) is a draw from the given r.v. • Example: EV with F ( ε ) = e − e − ε F − 1 ( r ) = − ln( − ln r ) Mixtures of models – p. 12/70

  13. Simulation: multivariate normal • If r 1 ,. . . , r n are independent draws from N (0 , 1) , and   r 1 . . r =   .   r n • then s = a + Lr is a vector of draws from the n -variate normal N ( a, LL T ) , where • L is lower triangular, and • LL T is the Cholesky factorization of the variance-covariance matrix Mixtures of models – p. 13/70

  14. Simulation: multivariate normal Example:   ℓ 11 0 0 L = ℓ 21 ℓ 22 0     ℓ 31 ℓ 32 ℓ 33 s 1 = ℓ 11 r 1 s 2 = ℓ 21 r 1 + ℓ 22 r 2 s 3 = ℓ 31 r 1 + ℓ 32 r 2 + ℓ 33 r 3 Mixtures of models – p. 14/70

  15. Simulation for mixtures of logit • In order to approximate � P n (1 |C n ) = P n (1 | ε m , C n ) f ( ε m ) dε m ε m • Draw from f ( ε m ) to obtain r 1 , . . . , r R • Compute R = 1 � P n (1 |C n ) ≈ ˜ P n (1 |C n ) P n (1 | r k , C n ) R k =1 R e V 1 n + r k = 1 � e V 1 n + r k + e V 2 n + r k + e V 3 n R k =1 Mixtures of models – p. 15/70

  16. Maximum simulated likelihood � J N � � � y jn ln ˜ max L ( θ ) = P n ( j | θ, C n ) θ n =1 j =1 where y jn = 1 if ind. n has chosen alt. j , 0 otherwise. Vector of parameters θ contains: • usual (fixed) parameters of the choice model • parameters of the density of the random parameters • For instance, if β j ∼ N ( µ j , σ 2 j ) , µ j and σ j are parameters to be estimated Mixtures of models – p. 16/70

  17. Maximum simulated likelihood Warning: ˜ P n ( j | θ, C n ) is an unbiased estimator of P n ( j | θ, C n ) • E [ ˜ P n ( j | θ, C n )] = P n ( j | θ, C n ) • ln ˜ P n ( j | θ, C n ) is not an unbiased estimator of ln P n ( j | θ, C n ) ln E [ ˜ P n ( j | θ, C n )] � = E [ln ˜ P n ( j | θ, C n )] Mixtures of models – p. 17/70

  18. Maximum simulated likelihood Properties of MSL: • If R is fixed, MSL is inconsistent • If R rises at any rate with N , MSL is consistent √ • If R rises faster than N , MSL is asymptotically equivalent to ML. Mixtures of models – p. 18/70

  19. Modeling � P n (1 |C n ) = P n (1 | ε m , C n ) f ( ε m ) dε m ε m Mixtures of logit can be used to model, depending on the role of ε m in the kernel model. • Heteroscedasticity • Nesting structures • Taste variations • and many more... Mixtures of models – p. 19/70

  20. Heteroscedasticity • Error terms in logit are i.i.d. and, in particular, homoscedastic U in = β T x in + ASC i + ε in • In order to introduce heteroscedasticity in the model, we use random ASCs ASC i ∼ N ( ASC i , σ 2 i ) so that U in = β T x in + ASC i + σ i ξ i + ε in where ξ i ∼ N (0 , 1) Mixtures of models – p. 20/70

  21. Heteroscedasticity Identification issue: • Not all σ s are identified • One of them must be constrained to zero • Not necessarily the one associated with the ASC constrained to zero • In theory, the smallest σ must be constrained to zero • In practice, we don’t know a priori which one it is • Solution: 1. Estimate a model with a full set of σ s 2. Identify the smallest one and constrain it to zero. Mixtures of models – p. 21/70

  22. Heteroscedastic model Example with Swissmetro ASC_CAR ASC_SBB ASC_SM B_COST B_FR B_TIME Car 1 0 0 cost 0 time Train 0 0 0 cost freq. time Swissmetro 0 0 1 cost freq. time Heteroscedastic model: ASCs random Mixtures of models – p. 22/70

  23. Logit Hetero Hetero norm. L -5315.39 -5241.01 -5242.10 Value Scaled Value Scaled Value Scaled ASC CAR SP 0.189 1.000 0.248 1.000 0.241 1.000 ASC SM SP 0.451 2.384 0.903 3.637 0.882 3.657 B COST -0.011 -0.057 -0.018 -0.072 -0.018 -0.073 B FR -0.005 -0.028 -0.008 -0.031 -0.008 -0.032 B TIME -0.013 -0.067 -0.017 -0.069 -0.017 -0.071 SIGMA CAR SP 0.020 SIGMA SBB SP -0.039 -0.061 SIGMA SM SP -3.224 -3.180

  24. Nesting structure • Structure of nested logit can be mimicked with error components • For each nest m , define a random term σ m ξ m where σ m ∈ R and ξ m ∼ N (0 , 1) . • σ m represents the standard error of the r.v. ξ m ∼ N (0 , 1) • If alternative i belongs to nest m , its utility writes U in = V in + σ m ξ m + ε in where ε in is, as usual, i.i.d EV. Mixtures of models – p. 23/70

  25. Nesting structure Example: residential telephone ASC_BM ASC_SM ASC_LF ASC_EF BETA_C σ M σ F BM 1 0 0 0 ln (cost(BM)) 0 ξ M SM 0 1 0 0 ln (cost(SM)) ξ M 0 LF 0 0 1 0 ln (cost(LF)) 0 ξ F EF 0 0 0 1 ln (cost(EF)) 0 ξ F MF 0 0 0 0 ln (cost(MF)) 0 ξ F Mixtures of models – p. 24/70

  26. Nesting structure Identification issues: • If there are two nests, only one σ is identified • If there are more than two nests, all σ ’s are identified Walker (2001) Results with 5000 draws... Mixtures of models – p. 25/70

  27. NL MLogit MLogit MLogit MLogit σ F = 0 σ M = 0 σ F = σ M L -473.219 -472.768 -473.146 -472.779 -472.846 Value Scaled Value Scaled Value Scaled Value Scaled Value Scaled ASC BM -1.784 1.000 -3.81247 1.000 -3.79131 1.000 -3.80999 1.000 -3.81327 1.000 ASC EF -0.558 0.313 -1.19899 0.314 -1.18549 0.313 -1.19711 0.314 -1.19672 0.314 ASC LF -0.512 0.287 -1.09535 0.287 -1.08704 0.287 -1.0942 0.287 -1.0948 0.287 ASC SM -1.405 0.788 -3.01659 0.791 -2.9963 0.790 -3.01426 0.791 -3.0171 0.791 B LOGCOST -1.490 0.835 -3.25782 0.855 -3.24268 0.855 -3.2558 0.855 -3.25805 0.854 FLAT 2.292 MEAS 2.063 σ F -3.02027 0 -3.06144 -2.17138 σ M -0.52875 3.024833 0 -2.17138 σ 2 F + σ 2 9.402 9.150 9.372 9.430 M

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend