Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) - PowerPoint PPT Presentation

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D.

Agenda • Background • Data-Mining • (Un-) Conditional Classifiers • Implementation • Data • Performance Measures • Experimental Results • Conclusions

Background • Mode choice modeling is an integral part of the 4-step travel demand forecasting procedure • Process: – Estimating the distribution of mode choices given a set of trip attributes • Input: – Set of attributes related to the trip, person, and household • Output: – Probability distribution across set of mode choices

Background • Discrete choice models (e.g. multinomial logit) have historically dominated this area of research – Major problem with discrete choice models is their predictive capability • Increasing attention is being paid to data-mining techniques borrowed from the artificial intelligence and machine learning communities – Historically, they have shown competitive performance

Background • However, most data-mining approaches have treated trips within a tour as independent – With the exception of Miller et al. (2005) who build an agent-based mode-choice model that explicitly treats the dependence between trips • Our approach follows in the vein of Miller, but avoids developing an explicit framework

Data-Mining • Process of extracting hidden patterns from data • Example uses: – Marketing, fraud detection and scientific discovery • Classifiers: map attributes to labels (mode) – Decision Trees, Naïve Bayes, Simple Logistic, Support Vector Machines • Ensemble Method

Decision Trees • Repeated attribute partitioning – To maximize class homogeneity – Heuristic function i.e. information gain • Partitions form Outlook = Rain /\ Windy = False If-Then rules => Play • High degree of Outlook = Sunny /\ Humidity > 70 interpretability => Don’t Play

Naïve Bayes • Purely probabilistic approach • Estimate class posterior probabilities – For an example d (a vector of attributes) – Compute Pr(C = c j | d = <A 1 = a 1 , A 2 = a 2 , … A n = a n >), for all classes c j – Using Bayes’ rule: Pr(C = c j ) Pr(A i = a i | C = c j ) • Pr(C = c j ) and Pr(A i = a i | C = c j ) can be estimated from data by occurrence counts • Select class with highest probability

Simple Logistic • Based on linear regression method • Supported by LogitBoost algorithm – Fits a succession of logistic models – Each successive model learns from previous classification mistakes – Model parameters are fine-tuned to find the best (least error) fit – Best attributes are automatically selected using cross-validation

Support Vector Machines • Linear learning • Binary classifier • Finds the maximum margin hyperplane that separates two classes • Soft margins for non- linearly separable data

Support Vector Machines (cont.) : X → F • Kernel functions can φ be used to allow for x  ( x ) φ non-linear boundaries • Transformation into higher dimensional space • Idea: non-linear data will become linearly separable

Ensemble Method • Build multiple classifiers and use their outputs as a form of voting for final class selection • AdaBoost – Trains a sequence of classifiers – Each one is dependent on the previous classifier – Dataset is re-weighted in order to focus on previous classifier’s errors • Final classification is performed by passing each instance through the set of classifiers and combining their weighted output

(Un-) Conditional Classifiers • Notion of “anchor mode” is used in this study – The mode selected when departing from an anchor point (e.g. home) Work Home Store

(Un-) Conditional Classifiers • Un-conditional classifier: for first trip on tour – Calculates P(mode = anchor mode | attributes) • Conditional classifier: for each subsequent trip – Calculates P(mode = i | attributes, anchor mode = j) • Classifier outputs are combined probabilistically – P(mode = i) = Σ j P(mode = i | attributes, anchor mode = j) * P(anchor mode = j)

Implementation • Data-Mining classifiers – Developed Java application to perform (un-) conditional classification – Leveraged Weka Data Mining Toolkit API for implementations of all data mining algorithms • Discrete Choice Model – Biogeme modeling software used to develop (un-) conditional multinomial logit (MNL) models – Developed experimental framework in Java to evaluate MNL models in identical manner

Data • Models were developed using the Chicago Travel Tracker Survey (2007-2008) data • Consists of 1- and 2-day activity diaries from 32,118 people among 14,315 households in the 11 counties neighboring Chicago • Data used for experimentation contained 19,118 tours decomposed into 116,666 trip links

Performance Measures • Three metrics from the information-retrieval literature are leveraged: – Mean Precision – Mean Recall – Accuracy • Precision/recall used when interest centers around classification on particular classes • Accuracy complements precision/recall with aggregate performance across classes

Performance Measures • Precision • Recall • Accuracy

Performance Measures • For purposes of evaluating mode choice prediction, recall is most important metric – Mode choice is not so much a classification task, but a problem of distribution estimation – Recall captures the sum of the deviation for each mode, from the real distribution

Experimental Results • To test usefulness of anchor mode attribute, classifiers were built with and without knowing the anchor mode • While anchor mode will never be known with 100% certainty, these tests provided an upper bound for any expected performance gain • Classifiers tested were: C4.5 decision trees, Naïve Bayes, Simple Logistic and SVM

Experimental Results

Experimental Results • Anchor mode improves the classification performance • A second stage of testing was performed using (un-) conditional models • Best performance achieved using different algorithms for conditional and un-conditional models

Experimental Results • The AdaBoost-NaiveBayes un-conditional / AdaBoost-C4.5 conditional model (AB-NB/AB- C4.5) is considered “best” performing – Marginally lower recall than best, much higher precision and better accuracy – Combination of high accuracy and recall simultaneously make it the best overall classifier

Experimental Results • Conditional and un-conditional MNL models were built and evaluated • Attribute selection based on t-test significance • Adjusted rho-squared ( ρ 2 ) values were 0.684 and 0.691 for the un-conditional and conditional models respectively

Conclusions • The AB-NB/AB-C4.5 combination of classifiers achieved a high level of accuracy, precision and recall, outperforming the MNL models – Importantly, recall performance is higher by a large margin • Performance over MNL is higher than may have been previously thought • It may be advantageous to consider using both techniques as complementary tools

Contributions • Showing superiority of data-mining models • Use of anchor mode with un-conditional classifiers • Arguing for mean recall as the best metric to use • Showing that the AB-NB/AB-C4.5 combination has the best overall performance

Thank You! Questions?

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) - PowerPoint PPT Presentation

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background Data-Mining (Un-) Conditional

Outline Overview VR Tour VR Tour Entities Luiz Velho Tour Script IMPA Tour

THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR Under the framework of

A G E N D A Tour Policy Oakhill Tour Presentation Travel & Sports Tour

DAY TOURS 2016 TOUR OPTIONS SCHEDULED TOUR We require a minimum of two people to conduct a

Why choice modeling? Elea McDonnell Feit Instructor DataCamp Marketing Analytics in R: Choice

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

2019 KR19 TOUR PRESENTATION Kalahari Sunset KR19 EURO TOUR PRESENTATION Kalahari Khoi-San

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

Assembling choice data Elea McDonnell Feit Instructor DataCamp Marketing Analytics in R: Choice

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Phuket Football Tour 26 November 4 December 2017 Phuket Football Tour Biennial football tour

2016 ENERGY STAR Change the World Tour Webinar Agenda Tour overview ENERGY STAR Day

Summer, 2018 Overview NINE Roundtable Conversations How Should Arlington Grow? Hosted by

Smart Housing Mix Ordinance Study City Planning Commission Meeting February 21, 2017 City of

Generating Skilled Youth Self-Employment June 2015 Christopher Blattman Nathan Fiala Sebastian

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner

DEMAND MODELS FOR TRANSPORTATION MODES A FOCUS ON THE MEASUREMENT OF LATENT CONSTRUCTS AFFECTING

Vision for State Transportation CHRIS S SCHMIDT DT, C CALTRANS NS D DIVISION C ON CHIEF, T

Analysis of travel mode choice behaviors in Yokohama City, Japan Group B Nagoya University

Mode Choices of Millennials How Different? How Enduring? A study by Robert Case and Seth

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) - PowerPoint PPT Presentation

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background Data-Mining (Un-) Conditional

Outline Overview VR Tour VR Tour Entities Luiz Velho Tour Script IMPA Tour

THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR THE FUTURE TOUR Under the framework of

A G E N D A Tour Policy Oakhill Tour Presentation Travel &amp; Sports Tour

DAY TOURS 2016 TOUR OPTIONS SCHEDULED TOUR We require a minimum of two people to conduct a

Why choice modeling? Elea McDonnell Feit Instructor DataCamp Marketing Analytics in R: Choice

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Control of switch-mode converters Current Programmed Mode control CPM Mor M. Peretz, Switch-Mode

2019 KR19 TOUR PRESENTATION Kalahari Sunset KR19 EURO TOUR PRESENTATION Kalahari Khoi-San

MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY MOBILITY CHOICE STUDY Planning for Mobility in

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

NATIVE MODE PROGRAMMING Fiona Reid Overview What is native mode? What codes are suitable

Assembling choice data Elea McDonnell Feit Instructor DataCamp Marketing Analytics in R: Choice

Voting in Maines Ranked Choice Election A non-partisan guide to ranked choice elections

Homecare Choice Program Presented by Jenny Cokeley Homecare Choice Program Manager Homecare

Phuket Football Tour 26 November 4 December 2017 Phuket Football Tour Biennial football tour

2016 ENERGY STAR Change the World Tour Webinar Agenda Tour overview ENERGY STAR Day

Summer, 2018 Overview NINE Roundtable Conversations How Should Arlington Grow? Hosted by

Smart Housing Mix Ordinance Study City Planning Commission Meeting February 21, 2017 City of

Generating Skilled Youth Self-Employment June 2015 Christopher Blattman Nathan Fiala Sebastian

Market Simulation Winter 2017 Release Trang Deluca Sr. Change &amp; Release Planner

DEMAND MODELS FOR TRANSPORTATION MODES A FOCUS ON THE MEASUREMENT OF LATENT CONSTRUCTS AFFECTING

Vision for State Transportation CHRIS S SCHMIDT DT, C CALTRANS NS D DIVISION C ON CHIEF, T

Analysis of travel mode choice behaviors in Yokohama City, Japan Group B Nagoya University

Mode Choices of Millennials How Different? How Enduring? A study by Robert Case and Seth

A G E N D A Tour Policy Oakhill Tour Presentation Travel & Sports Tour

Market Simulation Winter 2017 Release Trang Deluca Sr. Change & Release Planner