[PPT] - Model Learning Data Analysis Project Madalina Fiterau DAP PowerPoint Presentation

SLIDE 1

Trade-offs in Explanatory Model Learning

Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 21st of February 2012

1

SLIDE 2

Outline

Motivation: need for interpretable models
Overview of data analysis tools
Model evaluation – accuracy vs complexity
Model evaluation – understandability
Example applications
Summary

2

SLIDE 3

Example Application: Nuclear Threat Detection

Border control: vehicles are scanned
Human in the loop interpreting results

vehicle scan prediction feedback 3

SLIDE 4

Boosted Decision Stumps

Accurate, but hard to interpret

How is the prediction derived from the input? 4

SLIDE 5

Decision Tree – More Interpretable

Radiation > x% Payload type = ceramics Uranium level > max. admissible for ceramics Consider balance of Th232, Ra226 and Co60 Clear yes no yes no Threat yes no 5

SLIDE 6

Motivation

6

Many users are willing to trade accuracy to better understand the system-yielded results

Need: simple, interpretable model Need: explanatory prediction process

SLIDE 7

Analysis Tools – Black-box

Very accurate tree ensemble
L. Breiman,‘Random Forests’, 2001

Random Forests

Guarantee: decreases training error
R. Schapire, ‘The boosting

approach to machine learning’

Boosting

Bagged boosting
G. Webb, ‘MultiBoosting: A

Technique for Combining Boosting and Weighted Bagging’

Multi-boosting

7

SLIDE 8

Analysis Tools – White-box

Decision tree based on the Gini

Impurity criterion

CART

Dec. tree with leaf classifiers
K. Ting, G. Webb, ‘FaSS: Ensembles

for Stable Learners’

Feating

Ensemble: each discriminator trained
n a random subset of features
R. Bryll, ‘Attribute bagging ’

Subspacing

Builds a decision list that selects the

classifier to deal with a query point

EOP

8

SLIDE 9

Explanation-Oriented Partitioning

4
3
2
1

1 2 3 4 5

3
2
1

1 2 3 4 5

3
2
1

1 2 3 4 5

2 Gaussians Uniform cube

4
3
2
1

1 2 3 4 5

3
2
1

1 2 3 4 5

(X,Y) plot 9

SLIDE 10

EOP Execution Example – 3D data

Step 1: Select a projection - (X1,X2)

10

SLIDE 11

Step 1: Select a projection - (X1,X2)

11

EOP Execution Example – 3D data

SLIDE 12

Step 2: Choose a good classifier - call it h1 h1

12

EOP Execution Example – 3D data

SLIDE 13

Step 2: Choose a good classifier - call it h1

13

EOP Execution Example – 3D data

SLIDE 14

Step 3: Estimate accuracy of h1 at each point

OK NOT OK 14

EOP Execution Example – 3D data

SLIDE 15

Step 3: Estimate accuracy of h1 for each point

15

EOP Execution Example – 3D data

SLIDE 16

Step 4: Identify high accuracy regions

16

EOP Execution Example – 3D data

SLIDE 17

Step 4: Identify high accuracy regions

17

EOP Execution Example – 3D data

SLIDE 18

Step 5:Training points - removed from consideration

18

EOP Execution Example – 3D data

SLIDE 19

19

Step 5:Training points - removed from consideration

EOP Execution Example – 3D data

SLIDE 20

Finished first iteration

20

EOP Execution Example – 3D data

SLIDE 21

21

EOP Execution Example – 3D data

Finished second iteration

SLIDE 22

Iterate until all data is accounted for

r error cannot be decreased

22

EOP Execution Example – 3D data

SLIDE 23

Learned Model – Processing query [x1x2x3]

[x1x2] in R1 ? [x2x3] in R2 ? [x1x3] in R3 ?

h1(x1x2) h2(x2x3) h3(x1x3) Default Value

yes yes yes no no no 23

SLIDE 24

Parametric / Nonparametric Regions

Bounding Polyhedra Nearest-neighbor Score Enclose points in convex shapes (hyper-rectangles /spheres). Consider the k-nearest neighbors Region: { X | Score(X) > t} t – learned threshold Easy to test inclusion Easy to test inclusion Visually appealing Can look insular Inflexible Deals with irregularities 24 decision

p n1 n2 n3 n4 n5

Incorrectly classified Correctly classified Query point decision

SLIDE 25

Feating and EOP

25

Decision Structures to pick right classification model Flexible Regions Tiles in feature space Decision Tree Decision List Models trained

n all features

Models trained

n subspaces

EOP Feating

SLIDE 26

Outline

Motivation: need for interpretable models
Overview of data analysis tools
Model evaluation – accuracy vs complexity
Model evaluation – understandability
Example applications
Summary

26

SLIDE 27

Overview of datasets

Real valued features, binary output
Artificial data – 10 features

▫ Low-d Gaussians/uniform cubes

UCI repository
Application-related datasets
Results by k-fold cross validation

▫ Complexity = expected number of vector

perations performed for a classification task

27

SLIDE 28

EOP vs AdaBoost - SVM base classifiers

EOP is often less accurate, but not significantly
the reduction of complexity is statistically significant

p-value of 2-sided test: 0.832 p-value of 2-sided test: 0.003

0.85 0.9 0.95 1 1 2 3 4 5 6 7 8 9 10 Boosting EOP (nonparametric) 100 200 300 1 2 3 4 5 6 7 8 9 10

Accuracy Complexity

28

mean diff in accuracy: 0.5% mean diff in complexity: 85

SLIDE 29

EOP (stumps as base classifiers) vs CART on data from the UCI repository

0.5 1 BCW MB V BT

Accuracy

20

Complexity

CART EOP N. EOP P. 29

Dataset # of Features # of Points Breast Tissue 10 1006 Vowel 9 990 MiniBOONE 10 5000 Breast Cancer 10 596

 CART is

the most accurate

 Parametric

EOP yields the simplest models

SLIDE 30

Typical XOR dataset 30

Why are EOP models less complex?

SLIDE 31

Typical XOR dataset

CART

is accurate
takes many iterations
does not uncover or

leverage structure of data

31

Why are EOP models less complex?

SLIDE 32

Typical XOR dataset

EOP

equally accurate
uncovers structure

Iteration 1 Iteration 2

32

CART

is accurate
takes many iterations
does not uncover or

leverage structure of data

+ o

+

Why are EOP models less complex?

SLIDE 33

1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 Depth of decision tree/list

Error

Error variation with model complexity

Breast Cancer Wis CART Breast Cancer Wis EOP MiniBOONE CART MiniBOONE EOP Breast Tissue CART Breast Tissue EOP Vowel CART Vowel EOP

At low complexities, EOP is typically more accurate

Error Variation With Model Complexity for EOP and CART Depth of decision tree/list Error

33

SLIDE 34

UCI data – Accuracy

0.2 0.4 0.6 0.8 1 1.2 BCW MB BT Vow R-EOP N-EOP CART Feating Sub-spacing Multiboosting Random Forests

34

SLIDE 35

UCI data – Model complexity

20 40 60 80 BCW MB BT Vow R-EOP N-EOP CART Feating Sub-spacing Multiboosting

35 Complexity of Random Forests is huge

thousands of nodes -

SLIDE 36

Robustness

Accuracy-targeting EOP

▫ identifies which portions of the data can be confidently classified with a given rate.

36

Accuracy of EOP when regions do not include noisy data Max allowed error Accuracy

SLIDE 37

Outline

Motivation: need for interpretable models
Overview of data analysis tools
Model evaluation – accuracy vs complexity
Model evaluation – understandability
Example applications
Summary

37

SLIDE 38

Metrics of Explainability

38

Lift Bayes Factor J-Score Normalized Mutual Information

SLIDE 39

Evaluation with usefulness metrics

For 3 out of 4 metrics, EOP beats CART

CART EOP BF L J NMI BF L J NMI MB 1.982 0.004 0.389 0.040 1.889 0.007 0.201 0.502 BCW 1.057 0.007 0.004 0.011 2.204 0.069 0.150 0.635 BT 0.000 0.009 0.210 0.000 Inf 0.021 0.088 0.643 V Inf 0.020 0.210 0.010 2.166 0.040 0.177 0.383 Mean 1.520 0.010 0.203 0.015 2.047 0.034 0.154 0.541

BF =Bayes Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info

39

Higher values are better

SLIDE 40

Outline

Motivation: need for interpretable models
Overview of data analysis tools
Model evaluation – accuracy vs complexity
Model evaluation – understandability
Example application
Summary

40

SLIDE 41

Spam Detection (UCI ‘SPAMBASE’)

10 features: frequencies of misc. words in e-mails
Output: spam or not

10 20 30 40 50 60 70 80 90 100 Splits 0.65 0.7 0.75 0.8 0.85 0.9 Accuracy 41 Complexity

SLIDE 42

Spam Detection – Iteration 1

▫ classifier labels everything as spam ▫ high confidence regions do enclose mostly spam and:

 Incidence of the word ‘your’ is low  Length of text in capital letters is high

42

SLIDE 43

Spam Detection – Iteration 2

▫ the required incidence of capitals is increased ▫ the square region on the left also encloses examples that will be marked as `not spam'

43

SLIDE 44

Spam Detection – Iteration 3

44

word_frequency_hi

▫ Classifier marks everything as spam ▫ Frequency of ‘your’ and ‘hi’ determine the regions

SLIDE 45

Effects of Cell Treatment

Monitored population of cells
7 features: cycle time, area, perimeter ...
Task: determine which cells were treated

0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 Accuracy 5 10 15 20 25 Splits 45 Complexity

SLIDE 46

46

SLIDE 47

Mimic Medication Data

Information about administered medication
Features: dosage for each drug
Task: predict patient return to ICU

0.9915 0.992 0.9925 0.993 0.9935 0.994 0.9945 Accuracy 5 10 15 20 25 Splits 47 Complexity

SLIDE 48

48

SLIDE 49

Predicting Fuel Consumption

10 features: vehicle and driving style characteristics
Output: fuel consumption level (high/low)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Accuracy 5 10 15 20 25 Splits 49 Complexity

SLIDE 50

50

SLIDE 51

Nuclear threat detection data

Random Forests accuracy: 0.94
Rectangular EOP accuracy: 0.881

… but Regions found in 1st iteration for Fold 0:

▫ incident.riidFeatures.SNR [2.90,9.2] ▫ Incident.riidFeatures.gammaDose [0,1.86]*10-8

Regions found in 2st iteration for Fold 1:

▫ incident.rpmFeatures.gamma.sigma [2.5, 17.381] ▫ incident.rpmFeatures.gammaStatistics.skewdose [1.31,…]

51 No match

SLIDE 52

Summary

White box models (CART, Feating, Sub-spacing)

▫ ~ as accurate as typical black-box models - B, MB

In most cases EOP:

▫ maintains accuracy ▫ reduces complexity ▫ identifies useful aspects of the data

EOP wins in terms of expressiveness
Trade-offs

▫ Accuracy vs Complexity ▫ Accuracy vs Coverage

Open questions:

▫ What if no good low-dimensional projections found? ▫ What to do with inconsistent models in different folds of cv?

52