Mixture Models Simulation-based Estimation Michel Bierlaire - PowerPoint PPT Presentation

Mixture Models — Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixture Models — Simulation-based Estimation – p. 1/72

Outline • Mixtures • Capturing correlation • Alternative specific variance • Taste heterogeneity • Latent classes • Simulation-based estimation Mixture Models — Simulation-based Estimation – p. 2/72

Mixtures In statistics, a mixture probability distribution function is a convex combination of other probability distribution functions. If f ( ε, θ ) is a distribution function, and if w ( θ ) is a non negative function such that � w ( θ ) dθ = 1 θ then � g ( ε ) = w ( θ ) f ( ε, θ ) dθ θ is also a distribution function. We say that g is a w -mixture of f . If f is a logit model, g is a continuous w -mixture of logit If f is a MEV model, g is a continuous w -mixture of MEV Mixture Models — Simulation-based Estimation – p. 3/72

Mixtures Discrete mixtures are also possible. If w i , i = 1 , . . . , n are non negative weights such that n � w i = 1 i =1 then n � g ( ε ) = w i f ( ε, θ i ) i =1 is also a distribution function where θ i , i = 1 , . . . , n are parameters. We say that g is a discrete w -mixture of f . Mixture Models — Simulation-based Estimation – p. 4/72

Example: discrete mixture of normal distributions 2.5 N(5,0.16) N(8,1) 0.6 N(5,0.16) + 0.4 N(8,1) 2 1.5 1 0.5 0 4 5 6 7 8 9 10 11 Mixture Models — Simulation-based Estimation – p. 5/72

Example: discrete mixture of binary logit models 1 P(1|s=1,x) P(1|s=2,x) 0. 4 P(1|s=1,x) + 0.6 P(1|s=2,x) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -4 -2 0 2 4 Mixture Models — Simulation-based Estimation – p. 6/72

Mixtures • General motivation: generate flexible distributional forms • For discrete choice: • correlation across alternatives • alternative specific variances • taste heterogeneity • . . . Mixture Models — Simulation-based Estimation – p. 7/72

Back to the telephone example Budget measured: U BM = α BM + βX BM + ε BM Standard measured: U SM = α SM + βX SM + ε SM Local flat: U LF = α LF + βX LF + ε LF Extended area flat: U EF = α EF + βX EF + ε EF Metro area flat: U MF = βX MF + ε MF Distributions for ε : logit, probit, nested logit Mixture Models — Simulation-based Estimation – p. 8/72

Back to the telephone example Covariance of U Logit Probit     σ 2 σ 2 0 0 0 0 σ BM,SM σ BM,LF σ BM,EF σ BM,MF BM σ 2 σ 2     0 0 0 0 σ BM,SM σ SM,LF σ SM,EF σ SM,MF     SM     σ 2 σ 2  0 0 0 0   σ BM,LF σ SM,LF σ LF σ LF  ,EF ,MF LF         σ 2 σ 2 σ BM,EF σ SM,EF σ LF σ EF 0 0 0 0     ,EF ,MF EF     σ 2 σ 2 σ BM,MF σ SM,MF σ LF σ EF 0 0 0 0 ,MF ,MF MF Nested logit   1 ρ M 0 0 0   ρ M 1 0 0 0     , ρ i = 1 − µ 2 π 2 ρ F ρ F  0 0 1  6 µ 2 µ 2   i   ρ F ρ F 0 0 1     0 0 ρ F ρ F 1 Mixture Models — Simulation-based Estimation – p. 9/72

Continuous Mixtures of logit • Combining probit and logit • Error decomposed into two parts U in = V in + ξ + ν i.i.d EV (logit): tractability Normal distribution (probit): flexibility Mixture Models — Simulation-based Estimation – p. 10/72

Logit • Utility: U auto = βX auto + ν auto U bus = βX bus + ν bus U subway = βX subway + ν subway • ν i.i.d. extreme value • Probability: e βX auto Pr( auto | X ) = e βX auto + e βX bus + e βX subway Mixture Models — Simulation-based Estimation – p. 11/72

Normal mixture of logit • Utility: U auto = βX auto + ξ auto + ν auto U bus = βX bus + ξ bus + ν bus U subway = βX subway + ξ subway + ν subway • ν i.i.d. extreme value, ξ ∼ N (0 , Σ) • Probability: e βX auto + ξ auto Pr( auto | X, ξ ) = e βX auto + ξ auto + e βX bus + ξ bus + e βX subway + ξ subway � P ( auto | X ) = Pr( auto | X, ξ ) f ( ξ ) dξ ξ Mixture Models — Simulation-based Estimation – p. 12/72

Simulation � P ( auto | X ) = Pr( auto | X, ξ ) f ( ξ ) dξ ξ • Integral has no closed form. • Monte Carlo simulation must be used. Mixture Models — Simulation-based Estimation – p. 13/72

Simulation • In order to approximate � P ( i | X ) = Pr( i | X, ξ ) f ( ξ ) dξ ξ • Draw from f ( ξ ) to obtain r 1 , . . . , r R • Compute R = 1 P ( i | X ) ≈ ˜ � P ( i | X ) P ( i | X, r k ) R k =1 R e V 1 n + r k = 1 � e V 1 n + r k + e V 2 n + r k + e V 3 n R k =1 Mixture Models — Simulation-based Estimation – p. 14/72

Capturing correlations: nesting • Utility: U auto = βX auto + ν auto U bus = βX bus + σ transit η transit + ν bus U subway = βX subway + σ transit η transit + ν subway • ν i.i.d. extreme value, η transit ∼ N (0 , 1) , σ 2 transit = cov(bus,subway) • Probability: e βX auto Pr( auto | X, η transit ) = e βX auto + e βX bus + σ transit η transit + e βX subway + σ transit η transit � P ( auto | X ) = Pr( auto | X, η ) f ( η ) dη η Mixture Models — Simulation-based Estimation – p. 15/72

Nesting structure Example: residential telephone ASC_BM ASC_SM ASC_LF ASC_EF BETA_C σ M σ F BM 1 0 0 0 ln (cost(BM)) 0 η M SM 0 1 0 0 ln (cost(SM)) η M 0 LF 0 0 1 0 ln (cost(LF)) 0 η F EF 0 0 0 1 ln (cost(EF)) 0 η F MF 0 0 0 0 ln (cost(MF)) 0 η F Mixture Models — Simulation-based Estimation – p. 16/72

Nesting structure Identification issues: • If there are two nests, only one σ is identified • If there are more than two nests, all σ ’s are identified Walker (2001) Results with 5000 draws.. Mixture Models — Simulation-based Estimation – p. 17/72

NL NML NML NML NML σ F = 0 σ M = 0 σ F = σ M L -473.219 -472.768 -473.146 -472.779 -472.846 Value Scaled Value Scaled Value Scaled Value Scaled Value Scaled ASC BM -1.784 1.000 -3.81247 1.000 -3.79131 1.000 -3.80999 1.000 -3.81327 1.000 ASC EF -0.558 0.313 -1.19899 0.314 -1.18549 0.313 -1.19711 0.314 -1.19672 0.314 ASC LF -0.512 0.287 -1.09535 0.287 -1.08704 0.287 -1.0942 0.287 -1.0948 0.287 ASC SM -1.405 0.788 -3.01659 0.791 -2.9963 0.790 -3.01426 0.791 -3.0171 0.791 B LOGCOST -1.490 0.835 -3.25782 0.855 -3.24268 0.855 -3.2558 0.855 -3.25805 0.854 FLAT 2.292 MEAS 2.063 σ F 3.02027 0 3.06144 2.17138 σ M 0.52875 3.024833 0 2.17138 σ 2 F + σ 2 9.402 9.150 9.372 9.430 M

Comments • The scale of the parameters is different between NL and the mixture model • Normalization can be performed in several ways • σ F = 0 • σ M = 0 • σ F = σ M • Final log likelihood should be the same • But... estimation relies on simulation • Only an approximation of the log likelihood is available • Final log likelihood with 50000 draws: Unnormalized: -472.872 σ M = σ F : -472.875 σ F = 0 : -472.884 σ M = 0 : -472.901 Mixture Models — Simulation-based Estimation – p. 18/72

Cross nesting ⑤ � ❅ � ❅ � ❅ � ❅ � ❅ P ⑤ ⑤ Nest 1 P Nest 2 P � ❅ � ❅ � ❅ P � ❅ P P � ❅ P � ❅ ⑤ ⑤ ⑤ ⑤ ⑤ Bus Train Car Ped. Bike U bus = V bus + ξ 1 + ε bus U train = V train + ξ 1 + ε train U car = V car + ξ 1 + ξ 2 + ε car U ped = V ped + ξ 2 + ε ped U bike = V bike + ξ 2 + ε bike � � P ( car ) = P ( car | ξ 1 , ξ 2 ) f ( ξ 1 ) f ( ξ 2 ) dξ 2 dξ 1 ξ 1 ξ 2 Mixture Models — Simulation-based Estimation – p. 19/72

Identification issue • Not all parameters can be identified • For logit, one ASC has to be constrained to zero • Identification of NML is important and tricky • See Walker, Ben-Akiva & Bolduc (2007) for a detailed analysis Mixture Models — Simulation-based Estimation – p. 20/72

Alternative specific variance • Error terms in logit are i.i.d. and, in particular, have the same variance U in = β T x in + ASC i + ε in • ε in i.i.d. extreme value ⇒ Var ( ε in ) = π 2 / 6 µ 2 • In order allow for different variances, we use mixtures U in = β T x in + ASC i + σ i ξ i + ε in where ξ i ∼ N (0 , 1) • Variance: i + π 2 Var ( σ i ξ i + ε in ) = σ 2 6 µ 2 Mixture Models — Simulation-based Estimation – p. 21/72

Alternative specific variance Identification issue: • Not all σ s are identified • One of them must be constrained to zero • Not necessarily the one associated with the ASC constrained to zero • In theory, the smallest σ must be constrained to zero • In practice, we don’t know a priori which one it is • Solution: 1. Estimate a model with a full set of σ s 2. Identify the smallest one and constrain it to zero. Mixture Models — Simulation-based Estimation – p. 22/72

Alternative specific variance Example with Swissmetro ASC_CAR ASC_SBB ASC_SM B_COST B_FR B_TIME Car 1 0 0 cost 0 time Train 0 0 0 cost freq. time Swissmetro 0 0 1 cost freq. time + alternative specific variance Mixture Models — Simulation-based Estimation – p. 23/72

Mixture Models Simulation-based Estimation Michel Bierlaire - PowerPoint PPT Presentation

Mixture Models Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixture Models Simulation-based Estimation p. 1/72 Outline Mixtures Capturing correlation

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Mixture Models Simulation-based Estimation Michel Bierlaire Transport and Mobility Laboratory

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Mixture models of truncated data for estimating the number of species. Li-Thiao-T e S

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Phase Fluctuations and Sign Problems Michael Wagman MIT Lattice 2018 East Lansing, Michigan

Welfare, Inequality & Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Computational Bayesian data analysis Bruno Nicenboim / Shravan Vasishth 2020-03-11 1 Bayesian

Lognormals and friends Lognormals Empirical Confusability Principles of Complex Systems Random

New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo,

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Logging with SF4L and Logback J.Serrat 102759 Software Design November 3, 2015 Index Why

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday

Mixture Models Simulation-based Estimation Michel Bierlaire - PowerPoint PPT Presentation

Mixture Models Simulation-based Estimation Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Mixture Models Simulation-based Estimation p. 1/72 Outline Mixtures Capturing correlation

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Mixture Models Simulation-based Estimation Michel Bierlaire Transport and Mobility Laboratory

Sub-quadratic Markov tree mixture models for probability density estimation Sourour Ammar 1 , Ph.

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

From Conceptual Models From Conceptual Models to Simulation Models to Simulation Models Model

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Mixture models of truncated data for estimating the number of species. Li-Thiao-T e S

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Phase Fluctuations and Sign Problems Michael Wagman MIT Lattice 2018 East Lansing, Michigan

Welfare, Inequality &amp; Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

Computational Bayesian data analysis Bruno Nicenboim / Shravan Vasishth 2020-03-11 1 Bayesian

Lognormals and friends Lognormals Empirical Confusability Principles of Complex Systems Random

New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo,

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Logging with SF4L and Logback J.Serrat 102759 Software Design November 3, 2015 Index Why

Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Welfare, Inequality & Poverty, # 2 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty