E XPLAINING D ATASETS THROUGH H IGH -A CCURACY R EGIONS 1 Work - - PowerPoint PPT Presentation

e xplaining d atasets through
SMART_READER_LITE
LIVE PREVIEW

E XPLAINING D ATASETS THROUGH H IGH -A CCURACY R EGIONS 1 Work - - PowerPoint PPT Presentation

Women in Machine Learning Workshop 12 th of December 2011 Ina Fiterau, Carnegie Mellon University Artur Dubrawski, Carnegie Mellon University E XPLAINING D ATASETS THROUGH H IGH -A CCURACY R EGIONS 1 Work under review at the SIAM Data Mining


slide-1
SLIDE 1

EXPLAINING DATASETS THROUGH HIGH-ACCURACY REGIONS

Ina Fiterau, Carnegie Mellon University Artur Dubrawski, Carnegie Mellon University

Women in Machine Learning Workshop 12th of December 2011

1

Work under review at the SIAM Data Mining Conference

slide-2
SLIDE 2

OUTLINE

Motivation of need for interpretability Explanation-Oriented Partitioning (EOP) Evaluation of EOP

2

slide-3
SLIDE 3

EXAMPLE APPLICATION: NUCLEAR THREAT DETECTION

 Border control: vehicles are scanned  Human in the loop interpreting results

3

vehicle scan prediction feedback

slide-4
SLIDE 4

BOOSTED DECISION STUMPS

 Accurate, but hard to interpret

4

How is the prediction derived from the input? Image obtained with the Adaboost applet.

slide-5
SLIDE 5

DECISION TREE – MORE INTERPRETABLE

5

Radiation > x% Payload type = ceramics Uranium level > max. admissible for ceramics Consider balance

  • f Th232, Ra226

and Co60 Clear yes no yes no Threat yes no

slide-6
SLIDE 6

MOTIVATION

Many users are willing to trade accuracy to better understand the system-yielded results

Need: simple, interpretable model Need: explanatory prediction process

6

slide-7
SLIDE 7

EXPLANATION-ORIENTED PARTITIONING (EOP)

7

slide-8
SLIDE 8

EXPLANATION-ORIENTED PARTITIONING (EOP) EXECUTION EXAMPLE – 3D DATA

2 Gaussians Uniform cube

  • 4
  • 3
  • 2
  • 1
1 2 3 4 5
  • 3
  • 2
  • 1
1 2 3 4 5

(X,Y) plot

  • 4
  • 3
  • 2
  • 1

1 2 3 4 5

  • 3
  • 2
  • 1

1 2 3 4 5

  • 3
  • 2
  • 1

1 2 3 4 5

8

slide-9
SLIDE 9

EOP EXECUTION EXAMPLE – 3D DATA

Step 1: Select a projection - (X1,X2)

9

slide-10
SLIDE 10

EOP EXECUTION EXAMPLE – 3D DATA

Step 1: Select a projection - (X1,X2)

10

slide-11
SLIDE 11

EOP EXECUTION EXAMPLE – 3D DATA

Step 2: Choose a good classifier - call it h1 h1

11

slide-12
SLIDE 12

EOP EXECUTION EXAMPLE – 3D DATA

Step 2: Choose a good classifier - call it h1

12

slide-13
SLIDE 13

EOP EXECUTION EXAMPLE – 3D DATA

Step 3: Estimate accuracy of h1 at each point

OK NOT OK

13

slide-14
SLIDE 14

EOP EXECUTION EXAMPLE – 3D DATA

Step 3: Estimate accuracy of h1 for each point

14

slide-15
SLIDE 15

EOP EXECUTION EXAMPLE – 3D DATA

Step 4: Identify high accuracy regions

15

slide-16
SLIDE 16

EOP EXECUTION EXAMPLE – 3D DATA

Step 4: Identify high accuracy regions

16

slide-17
SLIDE 17

EOP EXECUTION EXAMPLE – 3D DATA

Step 5:Training points - removed from consideration

17

slide-18
SLIDE 18

EOP EXECUTION EXAMPLE – 3D DATA

18

Step 5:Training points - removed from consideration

slide-19
SLIDE 19

EOP EXECUTION EXAMPLE – 3D DATA

Finished first iteration

19

slide-20
SLIDE 20

EOP EXECUTION EXAMPLE – 3D DATA

Iterate until all data is accounted for

  • r error cannot be decreased

20

slide-21
SLIDE 21

LEARNED MODEL – PROCESSING QUERY [X1X2X3]

[x1x2] in R1 ? [x2x3] in R2 ? [x1x3] in R3 ?

h1(x1x2) h2(x2x3) h3(x1x3) Default Value

yes yes yes no no no

21

slide-22
SLIDE 22

PARAMETRIC REGIONS OF HIGH CONFIDENCE (BOUNDING POLYHEDRA)

 Enclose points in simple convex shapes (multiple per iteration)

Grow contour while train error is ≤ ε

22

Incorrectly classified Correctly classified decision

slide-23
SLIDE 23

PARAMETRIC REGIONS OF HIGH CONFIDENCE (BOUNDING POLYHEDRA)

 Enclose points in simple convex shapes (multiple per iteration)

Grow contour while train error is ≤ ε

 Calibration on hold out set - remove shapes that:  do not contain calibration points  over which the classifier is not accurate

23

Incorrectly classified Correctly classified decision

slide-24
SLIDE 24

PARAMETRIC REGIONS OF HIGH CONFIDENCE (BOUNDING POLYHEDRA)

 Enclose points in simple convex shapes (multiple per iteration)

Grow contour while train error is ≤ ε

 Calibration on hold out set - remove shapes that:  do not contain calibration points  over which the classifier is not accurate  Intuitive, visually appealing - hyper-rectangles/spheres

24

Incorrectly classified Correctly classified decision

slide-25
SLIDE 25

OUTLINE

Motivation of need for interpretability Explanation-Oriented Partitioning (EOP) Evaluation of EOP  Summary

25

slide-26
SLIDE 26

BENEFITS OF EOP

  • AVOIDING NEEDLESS COMPLEXITY -

Typical XOR dataset

26

slide-27
SLIDE 27

BENEFITS OF EOP

  • AVOIDING NEEDLESS COMPLEXITY -

Typical XOR dataset

CART

  • is accurate
  • takes many iterations
  • does not uncover or

leverage structure of data

27

slide-28
SLIDE 28

BENEFITS OF EOP

  • AVOIDING NEEDLESS COMPLEXITY -

Typical XOR dataset

EOP

  • equally accurate
  • uncovers structure

Iteration 1 Iteration 2

28

CART

  • is accurate
  • takes many iterations
  • does not uncover or

leverage structure of data

+ o

  • +
slide-29
SLIDE 29

COMPARISON TO BOOSTING

 What is the price of understandability?  Why boosting?

 It is an [arguably] good black-box classifier  Learns an ensemble using any type of classifier  Iteratively targets data misclassified earlier

 Criterion: Complexity of the resulting model

= number of vector operations to make a prediction

29

slide-30
SLIDE 30

COMPARISON TO BOOSTING - SETUP

 Problem: Binary classification  10D Gaussians/uniform cubes for each class  Statistical significance: repeat experiment with several

datasets and compute paired t-test p-values

 Results obtained through 5-fold cross validation

30

slide-31
SLIDE 31

EOP VS ADABOOST - SVM BASE CLASSIFIERS

 EOP is often less accurate, but not significantly  the reduction of complexity is statistically significant

Accuracy p-value: 0.832 Complexity p-value: 0.003

0.85 0.9 0.95 1

1 2 3 4 5 6 7 8 9 10

Boosting EOP (nonparametric) 100 200 300

1 2 3 4 5 6 7 8 9 10

Accuracy Complexity

31

slide-32
SLIDE 32

EOP (STUMPS AS BASE CLASSIFIERS) VS CART DATA FROM THE UCI REPOSITORY

0.2 0.4 0.6 0.8 1 1.2 BCW MB V BT

Accuracy

10 20 30 40

Complexity

CART EOP N. EOP P.

32

Dataset # of Features # of Points Breast Tissue 10 1006 Vowel 9 990 MiniBOONE 10 5000 Breast Cancer 10 596

 Parametric

EOP yields the simplest models

 CART is

the most accurate

slide-33
SLIDE 33

EXPLAINING REAL DATA - SPAMBASE

1st Iteration  classier labels everything as spam  high confidence regions do enclose mostly

spam and

 Incidence of the word ‘your’ is low  Length of text in capital letters is high

33

slide-34
SLIDE 34

EXPLAINING REAL DATA - SPAMBASE

2nd Iteration  the threshold for the incidence of `your' is

lowered

 the required incidence of capitals is increased  the square region on the left also encloses

examples that will be marked as `not spam'

34

slide-35
SLIDE 35

EXPLAINING REAL DATA - SPAMBASE

 3rd Iteration  Classifier marks everything as spam  Frequency of ‘your’ and ‘hi’ determine the regions

35

slide-36
SLIDE 36

SUMMARY

 EOP maintains classification accuracy but uses

less complex models when compared to Boosting

 EOP with decision stumps finds less complex

models than CART at the price of a small decrease in accuracy

 EOP gives interpretable high accuracy regions  We are currently testing EOP in a range of

practical application scenarios

36

slide-37
SLIDE 37

THANK YOU

37

slide-38
SLIDE 38

EXTRA RESULTS

38

slide-39
SLIDE 39

EXPLAINING REAL DATA - FUEL

39