Decision-aid methodologies in transportation Lecture 3: Logistic - PowerPoint PPT Presentation

CIVIL-557 Decision-aid methodologies in transportation Lecture 3: Logistic regression and probabilistic metrics Tim Hillel Transport and Mobility Laboratory TRANSP-OR École Polytechnique Fédérale de Lausanne EPFL

Case study

Mode choice

Last week Data science process Dataset Deterministic methods – K-Nearest Neighbours (KNN) – Decision Tree (DT) Discrete metrics

Today Theory of probabilistic classification Probabilistic metrics Probabilistic classifiers – Logistic regression

But first… …a bit more feature processing

Feature processing Last week - scaling – Crucial when algorithms consider distance – Standard scaling – zero-mean unit-variance What about missing values? And categorical data?

Missing values

Missing values Possible solutions? Remove rows (instances) Remove columns (features) Assume default value – Zero – Mean – Random value – Other?

Missing values Removing rows – can introduce sampling bias ! Removing columns – reduces available features Default value – needs to make sense

Categorical variables

Categorical data All data must be numerical – possible solutions? Numerical encoding Binary encoding Remove feature?

Numerical encoding Silver 2 Blue 1 Red 3 Implies order: Blue < Silver < Red and distance: Red – Silver = Silver - Blue

Binary encoding Silver Blue Red Car Colour:Blue Colour:Silver Colour:Red 1 0 1 0 2 1 0 0 3 0 0 1 4 1 0 0 … … … … A lot more features!

Validation schemes Train-test split: Train Dataset Test Test set must be unseen data. How to sample test set? How to test model hyperparameters?

Sampling test set Previously used random sampling Assumes data is independent Not always a good assumption – Sampling bias/recording errors – Hierarchical data Instead, use external validation Validate the model on data sampled separately from the training data (e.g. separate year)

Testing model hyperparameters Can only be performed on the training data Possible solution – split again: Train-validate-test Train Train Test Validate

Decision trees Trip < 1km Yes No Walk Owns car Yes No Drive Bus

DT Metrics GINI Impurity: 𝐾 𝐾 𝑞 𝑗 2 𝐻 𝑞 = ෍ 𝑞 𝑗 1 − 𝑞 𝑗 = 1 − ෍ 𝑗=1 𝑗=1 Entropy: 𝐾 𝐼 𝑞 = − ෍ 𝑞 𝑗 log 2 𝑞 𝑗 𝑗=1 where 𝑞 𝑗 is the proportion of class 𝑗

Decision trees Remember: Decision trees only see ranking of feature values…. …therefore scaling does not need to be applied!

Lab Notebook 1: Categorical features

Probabilistic classification Previously considered deterministic classifiers – Classifier predicts discrete class ො 𝑧 Now consider probabilistic classification – Classifier predicts continuous probability for each class

Notation Dataset 𝐸 contains 𝑂 elements 𝐾 classes Each element 𝑜 associated with: – Feature vector 𝑦 𝑜 of 𝑙 real features ◆ 𝑦 𝑜 ∈ ℝ 𝑙 – Set of class indicators 𝑧 𝑜 ∈ 0,1 𝐾 : 𝐾 ◆ σ 𝑘=1 𝑧 𝑗𝑜 = 1 Selected class (ground-truth) 𝐾 – 𝑗 𝑜 = σ 𝑗=1 𝑗𝑧 𝑗𝑜

Notation continued Classifier 𝑄 maps feature vector 𝑦 𝑜 to probability distribution across 𝐾 classes – 𝑄: ℝ 𝑙 → 0,1 𝐾 Probability for each class 𝑗 : – 𝑄 𝑗 𝑦 𝑜 > 0 𝐾 – σ 𝑗=1 𝑄 𝑗 𝑦 𝑜 = 1 Therefore, probability for selected (ground-truth) class: – 𝑄(𝑗 𝑜 |𝑦 𝑜 )

Example Four modes ( walk, cycle, pt, drive ) – 𝐾 = 4 For pt trip: – 𝑧 𝑜 = 0, 0, 1, 0 – 𝑗 𝑜 = 3 Predicted probabilities for arbitrary classifier 𝑄 – 𝑄 𝑦 𝑜 = 0.1, 0.2, 0.4, 0.3 – 𝑄 𝑗 𝑜 𝑦 𝑜 = 0.4

Likelihood 𝒐 𝒐 𝒐 𝒐 𝒋 𝒐 𝒋 𝒐 𝒋 𝒐 𝑸(𝟐|𝒚 𝒐 ) 𝑸(𝟐|𝒚 𝒐 ) 𝑸(𝟑|𝒚 𝒐 ) 𝑸(𝟑|𝒚 𝒐 ) 𝑸(𝟒|𝒚 𝒐 ) 𝑸(𝟒|𝒚 𝒐 ) 𝑸(𝟓|𝒚 𝒐 ) 𝑸(𝟓|𝒚 𝒐 ) 𝑸(𝒋 𝒐 |𝒚 𝒐 ) 1 1 1 1 2 2 2 0.11 0.11 0.84 0.84 0.03 0.03 0.02 0.02 0.84 2 2 2 2 1 1 1 0.82 0.82 0.04 0.04 0.06 0.06 0.08 0.08 0.82 3 3 3 3 4 4 4 0.11 0.11 0.18 0.18 0.05 0.05 0.66 0.66 0.66 4 4 4 4 1 1 1 0.57 0.57 0.22 0.22 0.12 0.12 0.09 0.09 0.57 5 5 5 5 3 3 3 0.10 0.10 0.03 0.03 0.75 0.75 0.12 0.12 0.75 Likelihood: 𝑂 – = ς 𝑜=1 𝑄(𝑗 𝑜 |𝑦 𝑜 ) – = 0.84 × 0.82 × 0.66 × 0.57 × 0.75 – = 0.2036

Likelihood 𝒐 𝒋 𝒐 𝑸(𝟐|𝒚 𝒐 ) 𝑸(𝟑|𝒚 𝒐 ) 𝑸(𝟒|𝒚 𝒐 ) 𝑸(𝟓|𝒚 𝒐 ) 𝑸(𝒋 𝒐 |𝒚 𝒐 ) 1 4 0.11 0.84 0.03 0.02 0.02 2 1 0.82 0.04 0.06 0.08 0.82 3 4 0.11 0.18 0.05 0.66 0.66 4 1 0.57 0.22 0.12 0.09 0.57 5 3 0.10 0.03 0.75 0.12 0.75 Likelihood: 𝑂 – = ς 𝑜=1 𝑄(𝑗 𝑜 |𝑦 𝑜 ) – = 0.02 × 0.82 × 0.66 × 0.57 × 0.75 – = 0.0046

Log-likelihood Likelihood bound between 0 and 1 Tends to 0 as 𝑂 increases – Computational issues with small numbers Use log-likelihood: 𝑂 – = ln(ς 𝑜=1 𝑄(𝑗 𝑜 |𝑦 𝑜 )) 𝑂 – = σ 𝑜=1 ln 𝑄 (𝑗 𝑜 |𝑦 𝑜 )

Log-likelihood 𝒐 𝒋 𝒐 𝑸(𝟐|𝒚 𝒐 ) 𝑸(𝟑|𝒚 𝒐 ) 𝑸(𝟒|𝒚 𝒐 ) 𝑸(𝟓|𝒚 𝒐 ) 𝑸(𝒋 𝒐 |𝒚 𝒐 ) 1 2 0.11 0.84 0.03 0.02 0.84 2 1 0.82 0.04 0.06 0.08 0.82 3 4 0.11 0.18 0.05 0.66 0.66 4 1 0.57 0.22 0.12 0.09 0.57 5 3 0.10 0.03 0.75 0.12 0.75 Log-likelihood: 𝑂 – = σ 𝑜=1 ln 𝑄(𝑗 𝑜 |𝑦 𝑜 ) – = ln(0.84 × 0.82 × 0.66 × 0.57 × 0.75) – = −1.638

Log-likelihood 𝒐 𝒋 𝒐 𝑸(𝟐|𝒚 𝒐 ) 𝑸(𝟑|𝒚 𝒐 ) 𝑸(𝟒|𝒚 𝒐 ) 𝑸(𝟓|𝒚 𝒐 ) 𝑸(𝒋 𝒐 |𝒚 𝒐 ) 1 4 0.11 0.84 0.03 0.02 0.02 2 1 0.82 0.04 0.06 0.08 0.82 3 4 0.11 0.18 0.05 0.66 0.66 4 1 0.57 0.22 0.12 0.09 0.57 5 3 0.10 0.03 0.75 0.12 0.75 Log-likelihood: 𝑂 – = σ 𝑜=1 ln 𝑄(𝑗 𝑜 |𝑦 𝑜 ) – = ln(0.02 × 0.82 × 0.66 × 0.57 × 0.75) – = −5.376

Cross-entropy loss Log-likelihood bound between −inf and 0 – Tends to −inf as 𝑂 increases Can’t compare between different dataset sizes – Normalize (divide by 𝑂 ) to get cross-entropy loss (CEL) 𝐼 – Typically take negative 𝑂 𝑀 = −1 𝑜 ෍ ln 𝑄 (𝑗 𝑜 |𝑦 𝑜 ) 𝑜=1

Cross-entropy loss Can also derive from Shannon’s cross entropy 𝐼(𝑞, 𝑟) (hence name) Minimising cross-entropy loss equivalent to maximising likelihood of data under the model – Maximum likelihood estimation (MLE) Other metrics exist e.g. Brier score (MSE), but CEL is golden standard

Probabilistic classifiers Need a model which generates multiclass probability distribution from feature vector 𝑦 𝑜 Lot’s of possibilities! Today, logistic regression

Linear regression Logistic regression is an extension of linear regression – Often called linear models 𝐿 𝑔(𝑦) = ෍ 𝛾 𝑙 𝑦 𝑙 + 𝛾 𝑝 𝑙=1 Find parameters ( 𝛾 ) using gradient descent – Typically minimise MSE

Logistic regression Pass linear regression through function 𝜏 𝐿 𝑔 𝑦 = 𝜏(෍ 𝛾 𝑙 𝑦 𝑙 + 𝛾 𝑝 ) 𝑙=1 𝜏 needs to take real value and return value in [0,1] – 𝜏 𝑨 ∈ 0,1 ∀ 𝑦 ∈ ℝ

Binary case – sigmoid function 1 𝜏 𝑨 = 1 + 𝑓 −𝑨

Multinomial case – softmax function 𝑓 𝑨 𝑗 𝜏 𝑨 𝑗 = 𝐾 𝑓 𝑨 𝑗 σ 𝑘=1

Logistic regression Separate set of parameters ( 𝛾 ) for each class 𝑗 – Normalised to zero for one class Separate value 𝑨 for each class – Utilities in Discrete Choice Models Minimise cross-entropy loss (MLE) Solve using gradient descent for parameters – 𝛾 ∗ = arg min 𝛾 𝑀 𝛾

Issues Overconfidence – Linearly separable data – parameters tend to infinity (unrealistic probability estimates) Outliers – Outliers can have disproportionate effect on parameters (high variance) Multi-colinearity – If features are highly correlated, can cause numerical instability

Solution: Regularisation Penalise model for large parameter values – Reduces variance (therefore overfitting ) Regularisation – 𝛾 ∗ = arg min 𝛾 {𝑀 𝛾 + 𝐷 × 𝑔 𝛾 } Two candidates for 𝑔 – L1 normalisation (LASSO) ◆ σ 𝑗 |𝛾 𝑗 | (similar to Manhattan distance) – L2 normalisation (Ridge) ◆ σ 𝑗 𝛾 2 (similar to Euclidean distance)

Regularisation L1 regularisation shrinks less important parameters to zero – Feature selection? L2 regularisation penalises larger parameters more

Regularisation 𝐷 and choice of regularisation (L1/L2) are hyperparameters Larger 𝐷 , more regularisation (higher bias, lower variance)

Logistic regression 𝒚 𝒜 Softmax 3 classes 5 features

Homework Notebook 2: Logistic regression

Decision-aid methodologies in transportation Lecture 3: Logistic - PowerPoint PPT Presentation

CIVIL-557 Decision-aid methodologies in transportation Lecture 3: Logistic regression and probabilistic metrics Tim Hillel Transport and Mobility Laboratory TRANSP-OR cole Polytechnique Fdrale de Lausanne EPFL Case study Mode choice

Decision Aid Methodologies In Transportation Lecture 5: Maritime transportation problem Chen

Decision Aid Methodologies In Transportation Lecture 4: Air transportation problem Chen Jiang

Decision-Aid Methodologies in Transportation Introduction to transportation demand analysis

Decision-aid methodologies in transportation Michel Bierlaire michel.bierlaire@epfl.ch Transport

Decision Aid Methodologies In Transportation Lecture 2: Duality and Column generation Chen Jiang

Defining Financial Aid Financial Aid Basics Presented by: Colleen Wise Director of Financial

Defining Financial Aid Financial Aid Basics Presented by: Colleen Wise Director of Financial

Decision Aid Methodologies In Transportation Lecture 1: Introduction Operations Research and its

Software Development Methodologies Lecturer: Raman Ramsin Lecture 8 Agile Methodologies: DSDM

Financial Aid Data & Data Sharing Restrictions Restrictions on use of financial aid data

Your Guide to Financial Aid Types of Financial Aid. Financial Aid Timeline MAC College Money

Financial Aid Information Night January 31.2017 What is Financial Aid? Financial Aid is a way

Financial Aid Hey, Im Will ill Moo Moody dy Financial Aid Types of UTSA State Federal

Presented by: Sharon E. Platt Director of Financial Aid Difference between Merit-Based Aid

The Vehicle Routing Problem Decision-aid Methodologies in Transportation: Computer Lab 11 Iliya

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Friday 1

IRNAS Solutions Luka Mustafa, Institute IRNAS, November 2018 IRNAS.EU CC BY-SA 4.0 KORUZA

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Equity Factors G. Simon Universit Paris Dauphine 2019-2020 Equity Factors G. Simon

Improving IBD Patient Knowledge, Medication Adherence and Satisfaction and Patient-Provider

A Tale of 8T Transportable Tablespaces Vs Mysqldump Kristofer Grahn Verisure Innovation whoami?

Decision-aid methodologies in transportation Lecture 3: Logistic - PowerPoint PPT Presentation

CIVIL-557 Decision-aid methodologies in transportation Lecture 3: Logistic regression and probabilistic metrics Tim Hillel Transport and Mobility Laboratory TRANSP-OR cole Polytechnique Fdrale de Lausanne EPFL Case study Mode choice

Decision Aid Methodologies In Transportation Lecture 5: Maritime transportation problem Chen

Decision Aid Methodologies In Transportation Lecture 4: Air transportation problem Chen Jiang

Decision-Aid Methodologies in Transportation Introduction to transportation demand analysis

Decision-aid methodologies in transportation Michel Bierlaire michel.bierlaire@epfl.ch Transport

Decision Aid Methodologies In Transportation Lecture 2: Duality and Column generation Chen Jiang

Defining Financial Aid Financial Aid Basics Presented by: Colleen Wise Director of Financial

Defining Financial Aid Financial Aid Basics Presented by: Colleen Wise Director of Financial

Decision Aid Methodologies In Transportation Lecture 1: Introduction Operations Research and its

Software Development Methodologies Lecturer: Raman Ramsin Lecture 8 Agile Methodologies: DSDM

Financial Aid Data &amp; Data Sharing Restrictions Restrictions on use of financial aid data

Your Guide to Financial Aid Types of Financial Aid. Financial Aid Timeline MAC College Money

Financial Aid Information Night January 31.2017 What is Financial Aid? Financial Aid is a way

Financial Aid Hey, Im Will ill Moo Moody dy Financial Aid Types of UTSA State Federal

Presented by: Sharon E. Platt Director of Financial Aid Difference between Merit-Based Aid

The Vehicle Routing Problem Decision-aid Methodologies in Transportation: Computer Lab 11 Iliya

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday

Statistics and learning Regression Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Friday 1

IRNAS Solutions Luka Mustafa, Institute IRNAS, November 2018 IRNAS.EU CC BY-SA 4.0 KORUZA

Convex Hull Algorithms 2D Basic facts Algorithms: Nave, Gift wrapping, Graham

Equity Factors G. Simon Universit Paris Dauphine 2019-2020 Equity Factors G. Simon

Improving IBD Patient Knowledge, Medication Adherence and Satisfaction and Patient-Provider

A Tale of 8T Transportable Tablespaces Vs Mysqldump Kristofer Grahn Verisure Innovation whoami?

Financial Aid Data & Data Sharing Restrictions Restrictions on use of financial aid data