Trade-offs in Explanatory Model Learning
Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 21st of February 2012
1
Model Learning Data Analysis Project Madalina Fiterau DAP - - PowerPoint PPT Presentation
1 21 st of February 2012 Trade-offs in Explanatory Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable models Overview of data
1
2
vehicle scan prediction feedback 3
How is the prediction derived from the input? 4
Radiation > x% Payload type = ceramics Uranium level > max. admissible for ceramics Consider balance of Th232, Ra226 and Co60 Clear yes no yes no Threat yes no 5
6
approach to machine learning’
Technique for Combining Boosting and Weighted Bagging’
7
Impurity criterion
for Stable Learners’
classifier to deal with a query point
8
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
2 Gaussians Uniform cube
(X,Y) plot 9
10
11
12
13
OK NOT OK 14
15
16
17
18
19
20
21
22
yes yes yes no no no 23
Bounding Polyhedra Nearest-neighbor Score Enclose points in convex shapes (hyper-rectangles /spheres). Consider the k-nearest neighbors Region: { X | Score(X) > t} t – learned threshold Easy to test inclusion Easy to test inclusion Visually appealing Can look insular Inflexible Deals with irregularities 24 decision
p n1 n2 n3 n4 n5
Incorrectly classified Correctly classified Query point decision
25
26
27
p-value of 2-sided test: 0.832 p-value of 2-sided test: 0.003
0.85 0.9 0.95 1 1 2 3 4 5 6 7 8 9 10 Boosting EOP (nonparametric) 100 200 300 1 2 3 4 5 6 7 8 9 10
28
mean diff in accuracy: 0.5% mean diff in complexity: 85
0.5 1 BCW MB V BT
20
CART EOP N. EOP P. 29
Dataset # of Features # of Points Breast Tissue 10 1006 Vowel 9 990 MiniBOONE 10 5000 Breast Cancer 10 596
CART is
Parametric
Typical XOR dataset 30
Typical XOR dataset
CART
leverage structure of data
31
Typical XOR dataset
EOP
Iteration 1 Iteration 2
32
CART
leverage structure of data
+ o
1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 Depth of decision tree/list
Breast Cancer Wis CART Breast Cancer Wis EOP MiniBOONE CART MiniBOONE EOP Breast Tissue CART Breast Tissue EOP Vowel CART Vowel EOP
33
0.2 0.4 0.6 0.8 1 1.2 BCW MB BT Vow R-EOP N-EOP CART Feating Sub-spacing Multiboosting Random Forests
34
20 40 60 80 BCW MB BT Vow R-EOP N-EOP CART Feating Sub-spacing Multiboosting
35 Complexity of Random Forests is huge
36
37
38
BF =Bayes Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info
39
40
10 20 30 40 50 60 70 80 90 100 Splits 0.65 0.7 0.75 0.8 0.85 0.9 Accuracy 41 Complexity
Incidence of the word ‘your’ is low Length of text in capital letters is high
42
43
44
word_frequency_hi
0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 Accuracy 5 10 15 20 25 Splits 45 Complexity
46
0.9915 0.992 0.9925 0.993 0.9935 0.994 0.9945 Accuracy 5 10 15 20 25 Splits 47 Complexity
48
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Accuracy 5 10 15 20 25 Splits 49 Complexity
50
51 No match
▫ What if no good low-dimensional projections found? ▫ What to do with inconsistent models in different folds of cv?
52