[PPT] - Trading off coverage for accuracy in forecasts: Applications to PowerPoint Presentation

SLIDE 1

Trading off coverage for accuracy in forecasts: Applications to clinical data analysis

Michael J Pazzani, Patrick Murphy, Kamal Ali, and David Schulenburg Department of Information and Computer Science University of California Irvine, CA 92717 {pazzani, pmurphy, ali, schulenb}@ics.uci.edu Research supported by Air Force Office of Scientific Research Grant, F49620-92-J-0430

AIM-94 Thursday, June 30, 1994 1

SLIDE 2

Inductive Learning of Classification Procedures

Given:

A set of training examples

a. Attribute-value pairs: { (age, 24) (gender, female) ... }
b. A class label: pregnant
Create

A classification procedure to infer the class label of an example represented as a set of Attribute-value pairs

Decision Tree
Weights of neural network
Conditional probability of a class given an attribute
Rules
Rule with “confidence factors”

Typical evaluation of a learning algorithm:

Divide available data into a training and test set
Infer procedure from data in training set.
Estimate accuracy of procedure on data in the test set.

AIM-94 Thursday, June 30, 1994 2

SLIDE 3

Trading off coverage for accuracy Learners usually infer the classification of all test examples

Give learner ability to say “I don’t know” on some examples
Goal: Learner is more accurate when it makes a classification.

Possible applications:

Human computer interfaces: Learning “Macros”
Learning rules to translate from Japanese to English
Analysis of medical databases
Let learner automatically handle the typical cases
Refer hard cases to a human specialist

Evaluation: T- Total number of test examples P- Number of examples for which the learner makes a prediction C- Number of examples whose class is inferred correctly

AIM-94 Thursday, June 30, 1994 3

Coverage = P T Accuracy = C P

SLIDE 4

Trading off coverage for accuracy

AIM-94 Thursday, June 30, 1994 4

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Coverage

Lymphography Backprop

Activation Accuracy Coverage

SLIDE 5

Goals of this research Modify learning algorithms to trade off coverage for accuracy

Learners typically have an internal measure of hypothesis quality
Use hypothesis quality measure to determine whether to classify

Experimental evaluate trading off coverage for accuracy on databases from UCI Archive of Machine Learning Databases Train on 2/3rds Test on remaining 1/3. Averages over 20 trials.

Breast Cancer (699 examples; benign from malignant tumors)
Lymphography (148 examples; identify malignant tumors)
DNA Promoter (106 examples; Leave-one-out testing)

Describe how a sparse clinical database (diabetes data sets) can be analyzed by classification learners.

AIM-94 Thursday, June 30, 1994 5

SLIDE 6

Neural Networks

One output unit per class.
An output units activation is between 0 and 1.
Assign an example to the class with the highest activation.

AIM-94 Thursday, June 30, 1994 6

Fever Gender Age Bloodshot eyes Headache Nausea Swollen Glands Pregnant Cancer

SLIDE 7

Trading off coverage for accuracy in Neural Networks

1. Assign an example to the class with the highest activation

provided that that activation is above a threshold.

2. Assign an example to the class with the highest activation

provided that that the difference between that activation and the next highest is above a threshold. (Didn’t make a significant difference in

ur experiments)

AIM-94 Thursday, June 30, 1994 7

SLIDE 8

AIM-94 Thursday, June 30, 1994 8

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.64 0.66 0.68 0.70 0.72 0.74 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Coverage

Breast Cancer Backprop

Activation Accuracy Coverage

SLIDE 9

AIM-94 Thursday, June 30, 1994 9

1.0 0.9 0.8 0.7 0.6 0.5 0.94 0.95 0.96 0.97 0.98 0.99 1.00 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Promoter Backprop

Activation Accuracy Coverage

SLIDE 10

Bayesian Classifier

An example is assigned to the class that maximizes the probability
f that class, given the example.
If we assume features are independent

Estimate from training data:

Trading off coverage for accuracy: (Like backprop)

Only make prediction if is above some threshold.

AIM-94 Thursday, June 30, 1994 10

P(Ci|A1=V1j & ...An=Vnj) P Ci|Ak=Vkj P(Ci) P(Ci|A1=V1j & ...An=Vnj) = P(Ci) P Ci|Ak=Vkj P(Ci)

∏

k

SLIDE 11

AIM-94 Thursday, June 30, 1994 11

6
8
10
12
14
16
18
20

0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer Bayesian Classifier

ln(Probability) Accuracy Coverage

SLIDE 12

A Decision Tree (for determining suitability of contact lenses)

Leaf nodes assign classes (n =no, h =hard, s = soft)
Different leaves can be more reliable.

AIM-94 Thursday, June 30, 1994 12

Tear s No Age Pr escr i pti

n

A sti gm ati c Pr escr i pti

n

No Har d Sof t Har d Sof t No A sti gm ati c Sof t 15n 0h 0s 1h 3s 1n 1h 3s 0n 1h 0s 5n 1h 1n 3h 2s 0n 0h 3s 2n 1h 1s Reduced Nor m al <15 15- 55 >55 Hyper Yes No Hyper M yope Yes No M yope

SLIDE 13

Trading off coverage for accuracy in decision trees

Estimate the probability that an example belongs to some class

given that it classified by a particular leaf Two possibilities:

Divide training data

learning set probability estimation set *unbiased estimate of probability, but not most accurate tree

Estimate probability from training data

* Use Laplace estimate of probability of class given leaf 3 soft, 1 hard, 0 none P(soft) = 4/7

AIM-94 Thursday, June 30, 1994 13

p(class = i) = Ni+1 k + Nj

∑

j k

SLIDE 14

AIM-94 Thursday, June 30, 1994 14

1.0 0.8 0.6 0.4 0.2 0.0 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer ID3

Maximum Probability Accuracy Coverage

SLIDE 15

First Order Combined Learner

Learns a set of first order Horn Clauses (Like Quinlan’s FOIL)

no_payment_due(P) :- enlisted(P, Org) & armed_forces(Org). no_payment_due(P) :- longest_absence_from_school(P,A) & 6 > A & enrolled(P,S,U) & U > 5. no_payment_due(P) :- unemployed(P).

Negation as failure
Selects literal that maximizes information gain
Averaging Multiple Models

Learn several different rules sets (stocastically select literals) Assign example to the class predicted by the majority of rule sets

Trading off coverage for accuracy

Only make prediction if at least k of the rules sets agree

AIM-94 Thursday, June 30, 1994 15

p1 log2 p1 p1+n1 -log2 p0 p0+n0

SLIDE 16

AIM-94 Thursday, June 30, 1994 16

12 11 10 9 8 7 6 5 0.62 0.64 0.66 0.68 0.70 0.72 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer FOCL

Number of Voters Accuracy Coverage

SLIDE 17

AIM-94 Thursday, June 30, 1994 17

12 11 10 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Promotor FOCL

Number of Voters Accuracy Coverage

SLIDE 18

HYDRA

Learns a contrasting set of rules

no_payment_due(P) :- enlisted(P, Org) & armed_forces(Org). [LS = 4.0] no_payment_due(P) :- longest_absence_from_school(P,A) & 6 > A & enrolled(P,S,U) & U > 5. [LS = 3.2] no_payment_due(P) :- unemployed(P). [LS = 2.1] payment_due(P) :- longest_absence_from_school(P,A) & A > 36 [LS = 2.7] payment_due(P) :- not (enrolled(P,_,_)) & not (unemployed(P)) [LS = 4.1]

Attaches a measure of reliability to clauses (logical sufficiency)
Assigns example to the class of satisfied clause with the highest

logical sufficiency

Trading off coverage for accuracy: Only make prediction if at ratio
f logical sufficiency is greater than a threshold

AIM-94 Thursday, June 30, 1994 18

lsij = p(clauseij(t) = true|t ∈ classi) p(clauseij(t) = true|t ∉ classi)

SLIDE 19

AIM-94 Thursday, June 30, 1994 19

5 4 3 2 1 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer HYDRA

log(LS Ratio) Accuracy Coverage

SLIDE 20

Analysis of the diabetes data sets with classification learners 02-01-1989 8:00 58 154 Pre-breakfast blood glucose 02-01-1989 8:00 33 006 Regular insulin dose 02-01-1989 8:00 34 016 NPH insulin dose 02-01-1989 11:30 60 083 Pre-lunch blood glucose 02-01-1989 11:30 33 004 Regular insulin dose 02-01-1989 16:30 62 102 Pre-supper blood glucose 02-01-1989 16:30 33 004 Regular insulin dose 02-01-1989 23:00 48 076 Unspecified blood glucose Problems with applying machine learning classifiers:

1. There is not a fixed, small number of classes
2. The data isn’t divided into a fixed number of attributes
3. We know very little about medicine, diabetes, blood glucose

If you have hammer, everything looks like a nail:

1. Predict whether a blood glucose is above mean for the patient
2. Create attributes and values from coded data
3. Come to AIM-94 and be willing to learn

AIM-94 Thursday, June 30, 1994 20

SLIDE 21

Converting the diabetes data set into attribute value format

Current glucose measurement: (above or below).
CGT: Current glucose time: (in hours) numeric.
CGM: Current glucose meal: (unspecified, breakfast, lunch, super, or snack).
CGP: Current glucose period: (unspecified, pre, or post).
LGV: Last glucose measurement: numeric.
ELGV: Elapsed time since last glucose measurement: (in hours) numeric.
LGM: Last glucose meal: (unspecified, breakfast, lunch, super, or snack).
LGP: Last glucose period: (unspecified, pre, or post).
ENPH: Elapsed time since last NPH insulin: (in hours) numeric.
NPH: Last NPH dose: numeric.
EREG: Elapsed time since last regular insulin: (in hours) numeric.
LREG: Last regular dose: numeric.

Ran experiments with patients 20 and 27. Trained on 450, tested on 150 155 PRE BREAKFAST 7.17 PRE LUNCH 16 7.17 6 7.17 15.2 Below 80 PRE LUNCH 2.83 PRE SUPPER 16 10.0 4 2.83 18.0 Below 101 UNSPEC UNSPEC 59.0 PRE BREAKFAST 16 72.0 6 62.0 8.0 Above

AIM-94 Thursday, June 30, 1994 21

SLIDE 22

Backpropagation results

AIM-94 Thursday, June 30, 1994 22

0.9 0.8 0.7 0.6 0.5 0.55 0.57 0.59 0.61 0.63 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Activation Accuracy (Patient 27) Coverage (Patient 27)

SLIDE 23

FOCL results

AIM-94 Thursday, June 30, 1994 23

12 11 10 9 8 7 6 5 0.57 0.58 0.59 0.60 0.61 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Votes FOCL Accuracy (Patient 27) Coverage (Patient 27)

SLIDE 24

Decision tree results

AIM-94 Thursday, June 30, 1994 24

1.0 0.9 0.8 0.7 0.6 0.5 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Maximum Probability Accuracy (Patient 27) Coverage (Patient 27)

SLIDE 25

Example of Rule learned by FOCL

ABOVE if ENPH ≥ 12.0 & LGV ≥ 131 & ENPH < 24.0 & CGM ≥ SUPPER ABOVE if LGV ≥ 132 & LREG < 6.5 & CGT ≥ 23.0 ABOVE if ELGV ≥ 6.5 & LGV < 130 & LGV ≥ 121 ABOVE if ELGV < 56.0 & LGV < 83 & ENPH ≥ 24.0 ABOVE if LGV ≥ 163 & CGP = UNSPECIFIED & LGV < 181 ABOVE if LGV ≥ 131 & LGV < 147 & CGM = LUNCH ABOVE if ENPH ≥ 12.0 & LGV ≥ 131 & CGT ≥ 8.0 & LGV < 142 ABOVE if ELGV ≥ 6.5 & LGV ≥ 191 & ELGV < 10.5 ABOVE if ELGV ≥ 6.5 & CGT ≥ 8.0 & LGV < 90 ABOVE if LGV ≥ 96 & LGV < 118 & ELGV ≥ 4.5 & ENPH ≥ 14.5 ABOVE if LGV ≥ 128 & ENPH ≥ 5.0 & ELGV < 5.5 & LGV < 147 ABOVE if ENPH ≥ 5.0 & LGV ≥ 189 & ELGV < 4.0 ABOVE if LREG ≥ 7.5 & ENPH < 11.5 & LGV < 147 ABOVE if LGV ≥ 128 & ELGV ≥ 33.5 & CGT < 7.5 Above if at lease 5 hours have elapsed since last NPH insulin Last glucose maesurement was above 189 It’s been less than 4 hours since last measurement

AIM-94 Thursday, June 30, 1994 25

SLIDE 26

Conclusions

Experimentally evaluated trading off coverage for accuracy in

machine learning classifiers

Rather than forcing problems to be classification problems, and

important issue is to identify new classes of learning problems:

Different goals
Different example representations
We also do research in:
Reducing cost of misclassification errors
Knowledge-guided induction

AIM-94 Thursday, June 30, 1994 26

Trading off coverage for accuracy in forecasts: Applications to clinical data analysis

Michael J Pazzani, Patrick Murphy, Kamal Ali, and David Schulenburg Department of Information and Computer Science University of California Irvine, CA 92717 {pazzani, pmurphy, ali, schulenb}@ics.uci.edu Research supported by Air Force Office of Scientific Research Grant, F49620-92-J-0430

Inductive Learning of Classification Procedures

A set of training examples

A classification procedure to infer the class label of an example represented as a set of Attribute-value pairs

Typical evaluation of a learning algorithm:

Trading off coverage for accuracy Learners usually infer the classification of all test examples

Possible applications:

Evaluation: T- Total number of test examples P- Number of examples for which the learner makes a prediction C- Number of examples whose class is inferred correctly

Coverage = P T Accuracy = C P

Trading off coverage for accuracy

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Coverage

Lymphography Backprop

Activation Accuracy Coverage

Goals of this research Modify learning algorithms to trade off coverage for accuracy

Experimental evaluate trading off coverage for accuracy on databases from UCI Archive of Machine Learning Databases Train on 2/3rds Test on remaining 1/3. Averages over 20 trials.

Describe how a sparse clinical database (diabetes data sets) can be analyzed by classification learners.

Neural Networks

Fever Gender Age Bloodshot eyes Headache Nausea Swollen Glands Pregnant Cancer

Trading off coverage for accuracy in Neural Networks

provided that that activation is above a threshold.

provided that that the difference between that activation and the next highest is above a threshold. (Didn’t make a significant difference in

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.64 0.66 0.68 0.70 0.72 0.74 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Coverage

Breast Cancer Backprop

Activation Accuracy Coverage

1.0 0.9 0.8 0.7 0.6 0.5 0.94 0.95 0.96 0.97 0.98 0.99 1.00 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Promoter Backprop

Activation Accuracy Coverage

Bayesian Classifier

Estimate from training data:

Only make prediction if is above some threshold.

P(Ci|A1=V1j & ...An=Vnj) P Ci|Ak=Vkj P(Ci) P(Ci|A1=V1j & ...An=Vnj) = P(Ci) P Ci|Ak=Vkj P(Ci)

∏

k

0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer Bayesian Classifier

ln(Probability) Accuracy Coverage

A Decision Tree (for determining suitability of contact lenses)

Trading off coverage for accuracy in decision trees

given that it classified by a particular leaf Two possibilities:

*learning set *probability estimation set *unbiased estimate of probability, but not most accurate tree

* Use Laplace estimate of probability of class given leaf 3 soft, 1 hard, 0 none P(soft) = 4/7

p(class = i) = Ni+1 k + Nj

∑

j k

1.0 0.8 0.6 0.4 0.2 0.0 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer ID3

Maximum Probability Accuracy Coverage

First Order Combined Learner

Learn several different rules sets (stocastically select literals) Assign example to the class predicted by the majority of rule sets

Only make prediction if at least k of the rules sets agree

p1 log2 p1 p1+n1 -log2 p0 p0+n0

12 11 10 9 8 7 6 5 0.62 0.64 0.66 0.68 0.70 0.72 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer FOCL

Number of Voters Accuracy Coverage

12 11 10 9 8 7 6 5 0.92 0.94 0.96 0.98 1.00 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Promotor FOCL

Number of Voters Accuracy Coverage

HYDRA

logical sufficiency

lsij = p(clauseij(t) = true|t ∈ classi) p(clauseij(t) = true|t ∉ classi)

5 4 3 2 1 0.5 0.6 0.7 0.8 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Breast Cancer HYDRA

log(LS Ratio) Accuracy Coverage

If you have hammer, everything looks like a nail:

Converting the diabetes data set into attribute value format

Ran experiments with patients 20 and 27. Trained on 450, tested on 150 155 PRE BREAKFAST 7.17 PRE LUNCH 16 7.17 6 7.17 15.2 Below 80 PRE LUNCH 2.83 PRE SUPPER 16 10.0 4 2.83 18.0 Below 101 UNSPEC UNSPEC 59.0 PRE BREAKFAST 16 72.0 6 62.0 8.0 Above

Backpropagation results

0.9 0.8 0.7 0.6 0.5 0.55 0.57 0.59 0.61 0.63 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Activation Accuracy (Patient 27) Coverage (Patient 27)

FOCL results

12 11 10 9 8 7 6 5 0.57 0.58 0.59 0.60 0.61 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Votes FOCL Accuracy (Patient 27) Coverage (Patient 27)

Decision tree results

1.0 0.9 0.8 0.7 0.6 0.5 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.0 0.2 0.4 0.6 0.8 1.0 Accuracy Coverage

Maximum Probability Accuracy (Patient 27) Coverage (Patient 27)

Example of Rule learned by FOCL

Conclusions

machine learning classifiers

important issue is to identify new classes of learning problems:

learning set probability estimation set *unbiased estimate of probability, but not most accurate tree