Taxi Travel Time Prediction Assignment 1 - Outcome Lecture - PowerPoint PPT Presentation

Taxi Travel Time Prediction Assignment 1 - Outcome Lecture Sebastian Caldas and Nicholay Topin

This lecture has 3 objectives: Understand how Provide the Socialize the the assignment appropriate context students’ solutions relates to the for the next to the assignment course’s goals assignment 2

Ifigeneia Apostolopoulou 4

Ian Char 5

Global summary

“Familiarize yourself with the data; identify potential difficulties and required cleaning/pre-processing” ● > 67 million samples ● 7 features Vendor ID {1,2} ○ ○ Tpep_pickup_datetime (date-time format) Tpep_dropoff_datetime (date-time format) ○ ○ Passenger count [1,9] PULocationID [1,265] ○ ○ DOLocationID[1,265] Payment_type [1,5] ○ 7

“Familiarize yourself with the data; identify potential difficulties and required cleaning/pre-processing” ● Locations: ○ Most trips start in Manhattan (61m), Queens (4m), Unknown (1m), and Brooklyn (1m) ○ Most trips end in Manhattan (59m), Queens (3m), Brooklyn (3m), and Unknown (1m) ○ 20 most common locations are in Manhattan (all except LaGuardia and JFK Airport) 8

“Familiarize yourself with the data; identify potential difficulties and required cleaning/pre-processing” ● Data is given only for certain months (Jan-July) of 2017 ○ Any data from another year or month should be removed Outliers: ● ○ Trips with time less than 0 (some students suggested trips under X minutes were outliers) Trips with time more than 60 or 120 or 720 minutes (no trip across NYC is more than 6h) ○ ○ Trips before 2017 / Trips outside of expected month range Trips with 0 passengers (maybe trips with >7 passengers) ○ 9

“Familiarize yourself with the data; identify potential difficulties and required cleaning/pre-processing” ● Students found that trip time correlated with features such as day of the week, pick up hour and (to a lesser extent) passenger count. 10 Figure by Kin Gutierrez

“Familiarize yourself with the data; identify potential difficulties and required cleaning/pre-processing” ● 99.95% of trips are between a pair of zones which has at least 5 occurrences. Figure by Jonathon Byrd 11

Any other interesting findings?

“Formulate a machine learning problem that will help the domain expert achieve their goal” ● Regression problem ○ MSE ○ Root Mean Squared Log Error ■ Avoids large travel times having too large an impact ■ Penalizes underestimates more than overestimates ○ MSE weighted with an underestimate loss ○ MAE, MAPE ○ Huber loss ○ Discretized accuracy (e.g., % within some ‘d’ of actual time) ● To avoid over-penalizing some samples, we can cap the loss. 13

“Formulate a machine learning problem that will help the domain expert achieve their goal” ● Dealing with the dataset’s size: ○ Subsample plus ensembling ○ Divide into distinct tasks (e.g., split 6am-10am predictions into own task, task per pick up location) ○ Use methods with low overhead (data + method fit in memory) ○ Use online methods (e.g., gradient descent) External factors: ● ○ Add external information about weather and holidays ● Train/Val/Test splits: ○ Strangely, people suggested random splits ○ Some suggested withholding last part only (correct!) 14

Any comments?

“Propose a detailed analytical pipeline to solve the machine learning problem” 1. Preprocessing ○ Remove outliers ○ Extract travel time from “datetime” columns 2. Feature engineering Distance between locations ○ ○ Split “datetime” columns into day of week and hour of day. Treat “vendor ID” and “payment type” columns as categorical ○ ○ Treat “passenger count” as continuous Remove “payment type” and “vendor ID” ○ 16

“Propose a detailed analytical pipeline to solve the machine learning problem” 3. Split into train/val/test sets ○ Normalize within each set 4. Potential methods: ○ Linear regression / polynomial regression LASSO ○ ○ Random forests Gradient boosting ○ ○ Nearest neighbor matching Shallow feed-forward neural network ○ ○ ARIMA Bayesian regression (assume log-normal distribution) ○ 17

“Propose a detailed analytical pipeline to solve the machine learning problem” 5. Evaluate 6. Diagnose For which locations does your pipeline work well? ○ ○ Use different stratifications 7. Iterate! 18

Any comments?

“Design an experiment to evaluate the effectiveness of your approach” ● Baselines: ○ Most previous methods need finer location information ○ The baselines should be run on the same data ○ A common suggested approach was to use the average trip duration for each pair of pick up and drop off destinations ■ Use a global average for pairs with too little data Ultimately, a practitioner will have a real business need that needs to be ● addressed and should evaluate how the overall solution addresses these needs 20

Any comments?

Typical Steps of Applied Data Analysis Steps Overview of research Some research questions the data might answer Description of data Data checks / transfer Return to questions and translating them Present to collaborators ----------- Simple methods to give preliminary answers Present to collaborators ----------- Do better / Iterate Present to collaborators

Assignment 2 will focus on the implementation of a preliminary pipeline ● We will provide you with a preprocessed version of the data ● We will not impose any restrictions on which pipeline you decide to implement but you can only use the given data ○ Any engineered features must come from this data ○ You should not use any external data (e.g., from other years) 27

Assignment 2 will focus on the implementation of a preliminary pipeline ● We will provide you with a preprocessed version of the data ● We will not impose any restrictions on which pipeline you decide to implement but you can only use the given data ○ Any engineered features must come from this data ○ You should not use any external data (e.g., from other years) We will provide a baseline which you should beat ● 28

Assignment 2 will have two deadlines ● By the first deadline, you should have a Kaggle submission that beats our proposed baseline Failing to do so will impact your grade ○ ● By the second deadline, you should improve your model and write your report This second deadline is the one previously specified in the course’s calendar ○ ● The first deadline will be one week before the second 30

Assignment 2 will have two deadlines ● By the first deadline, you should have a Kaggle submission that beats our proposed baseline Failing to do so will impact your grade ○ ● By the second deadline, you should improve your model and write your report This second deadline is the one previously specified in the course’s calendar ○ ● The first deadline will be one week before the second ● The Kaggle competition is meant to incentivize you Your grade will not be negatively affected based on your ranking ○ ○ The only exception is failing to beat the given baseline 31

We want you guys to do great on Assignment 2! ● We will provide you with sample submissions from last semester ○ Different problem ○ Different assignment ○ Still, they give a rough idea of what we are expecting 32

We want you guys to do great on Assignment 2! ● We will provide you with sample submissions from last semester ○ Different problem ○ Different assignment ○ Still, they give a rough idea of what we are expecting For the students that didn’t do so well on Assignment 1: ● ○ Look at the sample submissions and come to office hours 33

Taxi Travel Time Prediction Assignment 1 - Outcome Lecture - PowerPoint PPT Presentation

Taxi Travel Time Prediction Assignment 1 - Outcome Lecture Sebastian Caldas and Nicholay Topin This lecture has 3 objectives: Understand how Provide the Socialize the the assignment appropriate context students solutions relates to the

DC TAXI APP Quick Start Guide Driver & Passenger Apps WATCH THE DC TAXI APP TRAINING VIDEO

VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to WHAT

Taxi Travel Time Prediction Assignment 3 - Outcome Lecture Sebastian Caldas and Nicholay Topin

Taxi Travel Time Prediction Assignment 2 - Outcome Lecture Sebastian Caldas and Nicholay Topin

Taxis and Accessible Services Division: Status of Taxi Industry San Francisco Municipal

726-6666 CO-OP TAXI 753-5100 St. John's Taxi Co-operative Society Ltd Box 11, Suite 130,

Ecoss Travel Refresher Overview 1. Ecoss Process 2. Tips for Pre-Travel 3. During Travel 4. Return

1 Travel Authorization and Expense Process 1. Introduction 2. Travel Authorization 3. Travel

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Real-Time Trip Information Service for a Large Taxi Fleet Based on a paper by Rajesh Krishna

Dubuque Smarter Travel TRB Tools of The Trade 07/2016 Smart Travel City of Dubuque Transit

Travel Insurance Niall Palmer Saga Insurance Overview About Saga Types of Travel

2 nd semester Topic 49: Travel solutions What you think Travel Solutions mean ? Why we

Web Dynamics Part 5 Searching the Past 5.1 Time-travel problems 5.2 Efficient Time-Travel

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Webinar 1 | COVID 19 Response Re-set | How to market travel when no one can travel Our webinar

MAT 137 LEC 0601 Instructor: Alessandro Malus TA: Julia Kim September 11th, 2020 Another

- Travel Made Easy - SAFE HARBOR: Statements in this presentation may constitute forward looking

Regional Transportation Regional Transportation Demand Management (TDM) Demand Management (TDM)

Report from trip to ATF R. Toms CLIC seminar - 2008 Rogelio Tom as Garc a Report

The Latest Wedding, Honeymoon, and Destination Wedding Research (May 2013 edition) No Worries!

HALO TRIP 2009 East St. Louis The Forgotten City Meet Halo Who are we? King Hall

Orientation for New Evaluators: An Overview ATS Commission on Accrediting (revised January

Policy Advisory Committee Session 5 1. Welcome + Logistics Agenda 2. Introductions 3.

Taxi Travel Time Prediction Assignment 1 - Outcome Lecture - PowerPoint PPT Presentation

Taxi Travel Time Prediction Assignment 1 - Outcome Lecture Sebastian Caldas and Nicholay Topin This lecture has 3 objectives: Understand how Provide the Socialize the the assignment appropriate context students solutions relates to the

DC TAXI APP Quick Start Guide Driver &amp; Passenger Apps WATCH THE DC TAXI APP TRAINING VIDEO

VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to VIP Jet Travel to WHAT

Taxi Travel Time Prediction Assignment 3 - Outcome Lecture Sebastian Caldas and Nicholay Topin

Taxi Travel Time Prediction Assignment 2 - Outcome Lecture Sebastian Caldas and Nicholay Topin

Taxis and Accessible Services Division: Status of Taxi Industry San Francisco Municipal

726-6666 CO-OP TAXI 753-5100 St. John's Taxi Co-operative Society Ltd Box 11, Suite 130,

Ecoss Travel Refresher Overview 1. Ecoss Process 2. Tips for Pre-Travel 3. During Travel 4. Return

1 Travel Authorization and Expense Process 1. Introduction 2. Travel Authorization 3. Travel

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Real-Time Trip Information Service for a Large Taxi Fleet Based on a paper by Rajesh Krishna

Dubuque Smarter Travel TRB Tools of The Trade 07/2016 Smart Travel City of Dubuque Transit

Travel Insurance Niall Palmer Saga Insurance Overview About Saga Types of Travel

2 nd semester Topic 49: Travel solutions What you think Travel Solutions mean ? Why we

Web Dynamics Part 5 Searching the Past 5.1 Time-travel problems 5.2 Efficient Time-Travel

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Webinar 1 | COVID 19 Response Re-set | How to market travel when no one can travel Our webinar

MAT 137 LEC 0601 Instructor: Alessandro Malus TA: Julia Kim September 11th, 2020 Another

- Travel Made Easy - SAFE HARBOR: Statements in this presentation may constitute forward looking

Regional Transportation Regional Transportation Demand Management (TDM) Demand Management (TDM)

Report from trip to ATF R. Toms CLIC seminar - 2008 Rogelio Tom as Garc a Report

The Latest Wedding, Honeymoon, and Destination Wedding Research (May 2013 edition) No Worries!

HALO TRIP 2009 East St. Louis The Forgotten City Meet Halo Who are we? King Hall

Orientation for New Evaluators: An Overview ATS Commission on Accrediting (revised January

Policy Advisory Committee Session 5 1. Welcome + Logistics Agenda 2. Introductions 3.

DC TAXI APP Quick Start Guide Driver & Passenger Apps WATCH THE DC TAXI APP TRAINING VIDEO