Applying Cost Curves to Marine Cargo Container Inspection DIMACS / - - PowerPoint PPT Presentation

applying cost curves to marine cargo container inspection
SMART_READER_LITE
LIVE PREVIEW

Applying Cost Curves to Marine Cargo Container Inspection DIMACS / - - PowerPoint PPT Presentation

Applying Cost Curves to Marine Cargo Container Inspection DIMACS / DyDAn / LPS Workshop on Port Security, Safety, Inspection, Risk Analysis and Modeling Rutgers University November 18 th , 2008 Richard Hoshino Laboratory and Scientific


slide-1
SLIDE 1

Applying Cost Curves to Marine Cargo Container Inspection

DIMACS / DyDAn / LPS Workshop on Port Security, Safety, Inspection, Risk Analysis and Modeling

Rutgers University November 18th, 2008

Richard Hoshino Laboratory and Scientific Services Directorate Canada Border Services Agency

slide-2
SLIDE 2

1

  • Each year in Canada, approximately 20,000 marine

containers are referred for a full examination.

  • Some of these containers have been fumigated with

chemical compounds to kill invasive alien species.

  • If these marine containers are not ventilated properly,

fumigants may pose a risk to the health and safety of border service officers.

Context

slide-3
SLIDE 3

2

Flowchart of Current Process

Test

Positive

Ventilate Test Exam Exam

Negative Positive Negative

Start

slide-4
SLIDE 4

3

  • We can create a mathematical model that predicts

whether a container has been fumigated.

  • For containers predicted to have been fumigated, we

ventilate prior to testing.

  • Deploying a reliable binary classifier would reduce

the overall costs of inspection, creating a more efficient and effective port.

A Simple Yet Powerful Insight

slide-5
SLIDE 5

4

Flowchart of Proposed Process

Test

Positive

Ventilate Test Exam Exam

Negative Positive Negative

Model

Predict Fumigated Predict Non-Fumigated

Start

Predict Non-Fumigated: Test first Predict Fumigated: Ventilate first

slide-6
SLIDE 6

5

  • The misclassification cost of the Status Quo is

M1 = #P × $C−

  • The misclassification cost of the Binary Classifier is

M2 = #FN × $C− + #FP × $C+ = (FNR × #P) × $C− + (FPR × #N) × $C+ = (1 − TPR) × #P × $C− + FPR × #N × $C+

  • Given a predictive model, its optimal binary classifier

is the classifier that minimizes the misclassification cost M2.

Misclassification Cost

slide-7
SLIDE 7

6

  • We introduce the improvement curve, inspired by

the theory of cost curves (Drummond & Holte, 2000).

  • Improvement curves measure a model’s performance

– Over all possible class distributions (#P vs. #N) – Over all possible misclassification costs ($C− and $C+)

  • Define the improvement to be I = (M1 − M2) ÷ M1.

The Improvement Curve

Same as Status Quo FPR = 0, FNR = 1 This Model’s Best Classifier

0% 100% IMPROVEMENT %

Perfect Model FPR = 0, FNR = 0

slide-8
SLIDE 8

7

  • The x-axis of the improvement curve is the following

expression, denoted as probability times cost:

x = PC(+) = (#P × $C−) ÷ (#P × $C− + #N × $C+).

  • The y-axis is the improvement, the percentage

reduction in misclassification cost by replacing the status quo with the model’s optimal classifier:

y = I(x) = (M1 − M2) ÷ M1 = TPR − [ FPR × (#N × $C+) ÷ (#P × $C−) ]. It is straightforward to show that 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1.

Definition of x-axis and y-axis

slide-9
SLIDE 9

8

  • We built a simple predictive model based on a 4,200

container data set. The model’s four features are:

– Origin Country – Canadian Port of Arrival – HS section (e.g. Section 5 = Mineral Products) – HS chapter (e.g. Chapter 26 = Ores, Slag, Ash)

  • The model consists of 24=16 disjoint classes.
  • The data was split 70/30 for Training/Testing.

Illustrating the Theory

slide-10
SLIDE 10

9

ROC Curve of Model

ROC AUC

Training = 0.75 Testing = 0.74

slide-11
SLIDE 11

10

Improvement Curve of Model

Training = Red Testing = Blue

slide-12
SLIDE 12

11

  • Suppose #N ÷ #P = 6 and $C− ÷ $C+ = 4. Then,

x = PC(+) = (#P × $C−) ÷ (#P × $C− + #N × $C+) = 0.4.

  • From the improvement curve, we have y = 28%

(Reading from the Testing Set).

  • This simple 4-feature model would have reduced our

misclassification cost by 28%.

Improvement Curve Interpretation

Same as Status Quo FPR = 0, FNR = 1 This Model’s Best Classifier FPR=20.3%, TPR=62.5%

0% 100% 28%

Perfect Model FPR = 0, FNR = 0

slide-13
SLIDE 13

12

  • The Improvement Curve does the following:

– Addresses the limitations of ROC curves and the ROC AUC. – Measures performance over all possible values of PC(+). – Determines a simple condition for when the status quo should be retained. – Compares the optimal classifiers of two predictive models.

  • The Improvement Curve is an evaluation metric that

– Is extremely accessible to a non-specialist. – Has numerous applications to operations research beyond marine container inspection.

Conclusion