PREDICTI
TING CARRI ARRIER LOAD AD
CANCE
CELLA LLATI TIONS
- Dr. Christopher Mejia
Ali Al-Habib Nicolas Favier AUTHORS ADVISOR
MIT Center for Transportation & Logistics
Research Fest
May 22, 2018
1 4 INTRODUCTION RESULTS Trucking industry background & load - - PowerPoint PPT Presentation
P REDICTI TING C ARRI ARRIER L OAD AD C ANCE CELLA LLATI TIONS A UTHORS MIT Center for Transportation & Logistics Research Fest May 22, 2018 Ali Al-Habib Nicolas Favier Dr. Christopher Mejia A DVISOR A GENDA 1 4 INTRODUCTION RESULTS
Ali Al-Habib Nicolas Favier AUTHORS ADVISOR
Research Fest
May 22, 2018
INTRODUCTION
Trucking industry background & load cancellation impacts
DATA ANALYSIS
Descriptive analytics of load cancellation over three-year dataset
MODELING
Predictive models applied on the dataset to identify main cancellation drivers
RESULTS
Models results presented in confusion matrices and results analysis
CONCLUSION
Recommended actions and future research challenges
2
4
400 Million Truckloads 185 Million FTL Truckloads 32 Million Cancellations
~$145
/cancellation
Source: Freight Facts and Figures, by U.S. Department of Transportation Bureau of Transportation Statistics 2015; CSCMP’s Annual State of Logistics Report, by AT Kearney; & Data Analysis from the sponsor company
5
3-YEAR Dataset
Main Drivers
for Truckload Cancellations
Predictive Model
to Predict Cancellation Probability
3.6M Records of Full Truckload during 2015, 2016, 2017 Descriptive analytics to identify the main cancellation drivers Evaluating different models to predict future loads cancellations
6
Cancellations
Load Impact Shipper Impact Carrier Impact Other Impacts
Carrier Size Carrier Type Loads/Year Bounce/Carrier Carrier ID Carrier Length of Relationship Safety Rating Number of Claims/Incidence Shipper ID Facility Industry Shipper Length of Relationship Shipper Size Facility Dwell Time
Facility Impact Carrier History Impact Carrier Characteristics Impact Shipper Characteristics Impact
Shipments/Year
Shipment History Impact Carrier Issues Impact
Carrier Rep Weather Natural Disaster Geography Rep Tenure
Internal Factors Impact External Factors Impact
Day of the Week Book Time Load Time Load ID Origin Destination Number of Stops Load Cost Load Rate Spot Price Appointment Type Lead Time Empty Time High Risk High Value Book Lead Time Service Level On-Time Delivery On-Time PickUp Equipment Type Dead Head Leangth of Haul Duration Weight Loading Time Unloading Time Contract Type Load Changes Carrier Conference
Price Impact Load Characteristics Impact Trip Characteristics Impact Contract Characteristics Impact
8
Cancellation Ratios over time
0% 5% 10% 15% 20% 25%
2015-1 2015-2 2015-3 2015-4 2015-5 2015-6 2015-7 2015-8 2015-9 2015-10 2015-11 2015-12 2016-1 2016-2 2016-3 2016-4 2016-5 2016-6 2016-7 2016-8 2016-9 2016-10 2016-11 2016-12 2017-1 2017-2 2017-3 2017-4 2017-5 2017-6 2017-7 2017-8 2017-9 2017-10
Contract Cancellation Ratio Spot Cancellation Ratio Total Cancellation Ratio
9
Loads & Cancellation Ratios by city
10
Cancellation Ratios by shipper industry Cancellation Ratios by carrier length
11
Cancellation Ratios by duration between booking & load pickup Cancellation Ratios by pickup time Cancellation Ratios by day of the week
Correlation
Remove correlated attributes using Correlation & Multi-Collinearity Analysis
Outliers Processing
Remove outlier records to avoid undesired impact
13
Load-Level Data
Convert data from stop to load level data
Predictor Screening
Identify the most significant predictors in the data
Build the Model
Build multiple models to predict cancellations & assess results
14
LOGISTIC REGRESSION MACHINE LEARNING NEURAL NETWORKS RANDOM FOREST K-NEAREST NEIGHBOR Categorical Output Self-Explanatory Used as Main Model
Multiple Algorithms Harder to Explain Used to Validate Logistic Regression Results
Predictions No Yes Actual No 652,501 2,956 655,457 Yes 129,727 1,971 131,698 782,228 4,927 787,155 Error 16.86% Missed Bounces 98.50%
MODEL RESULTS PREDICTOR SCREENING
16 Error % Missed Bounces Neural Networks 16.73% 99.95% Random Forest 16.61% 99.48% K-Neares Neighbor 19.90% 84.44%
AVAILABLE DATASET
17
Carrier (80887) & City (Rochelle) Bounce Ratio=1/12=0.08333 Average of the CarrierCity Bounce Ratio for Each Stop Aggregated carrierCityBounce Ratio
Repeated loads are counted only once for the ratio calculation
SEVERE WEATHER DATA* CANCELLATION RATIOS ENRICHED DATASET
*Source: National Centers for Environmental Information
Predictions No Yes Actual No 638,652 16,880 655,532 Yes 52,155 79,468 131,623 690,807 96,348 787,155 Error 8.77% Missed Bounces 39.62%
MODEL RESULTS PREDICTOR SCREENING
18 Error % Missed Bounces Neural Networks 8.67% 39.04% Random Forest 8.70% 42.13% K-Neares Neighbor 9.33% 44.32%
ENRICHED DATASET
Predictions No Yes Actual No 59,883 3,735 63,618 Yes 8,903 1,722 10,625 68,786 5,457 74,243 Error 17.02% Missed Bounces 83.79% Predictions No Yes Actual No 638,652 16,880 655,532 Yes 52,155 79,468 131,623 690,807 96,348 787,155 Error 8.77% Missed Bounces 39.62%
ENRICHED DATASET NEW DATASET Dataset (~3-year data) Training (80%) Testing (20%) Additional 3- month data Cancellation Ratios Calculation (100%) Ratios
19 Error % Missed Bounces Neural Networks 16.78% 84.70% Random Forest 16.19% 87.98% K-Neares Neighbor 16.41% 86.66%
Predictions No Yes Actual No 2,147 31 2,178 Yes 176 44 220 2,323 75 2,398 Error 8.63% Missed Bounces 80.00% Predictions No Yes Actual No 21,449 368 21,817 Yes 2,222 542 2,764 23,671 910 24,581 Error 10.54% Missed Bounces 80.39%
PREDICTION TIME HORIZON AVAILABLE HISTORICAL DATA <= 10 Historical Records (67%)
> 10 Historical Records (33%)
Additional 3-month data
7-day Horizon (3%)
Additional 3-month data <= 10 Historical Records (67%)
> 10 Historical Records (33%)
20
21
Test Error Missed Bounces
Logistic Regression (Threshold=0.5) - Base Scenario 17.02% 83.79% Cost Clustering Low Cost (<= $500) 18.20% 99.06% Mid Cost 16.67% 98.46% High Cost (>= $6000) 8.49% 100.00% Miles Clustering Same day delivery (<= 250 mi) 16.07% 99.18% Next Day delivery 18.08% 98.18% Long Haul (>= 550 mi) 18.08% 98.18% Book To pickup Hours Clustering Less than 24h 8.53% 100.00% Between 24h and 48h 16.91% 100.00% Between 48h and 72h 20.58% 99.99% More than 72h 22.33% 99.58%
22
23
0% 10% 20% 30% 40% 50% 60% 70% 80%
10,000 15,000 20,000 25,000 30,000 35,000 40,000
0.20 0.30 0.40 0.50 0.60
% of Bounces predicted correctly of total Bounces Loads Threshold
Loads
FN (Missed Bounces) FP (Missed Not Bounces) Bounces Predicted Correctly (%)
25
THRESHOLD CHANGE FURTHER RESEARCH
(predicted cancellation : actual cancellation)
26
LOAD SEQUENCE SCENARIO OVERBOOKING SCENARIO
COMPANY A COMPANY B COMPANY C SELECTED ROUTE
Ali Al-Habib Nicolas Favier