the evaluation issues
play

The Evaluation Issues The accuracy of a classifier can be evaluated - PowerPoint PPT Presentation

The Evaluation Issues The accuracy of a classifier can be evaluated using a test data set The test set is a part of the available labeled data set But how can we evaluate the accuracy of a classification method? A classification


  1. The Evaluation Issues • The accuracy of a classifier can be evaluated using a test data set – The test set is a part of the available labeled data set • But how can we evaluate the accuracy of a classification method? – A classification method can generate many classifiers • What if the available labeled data set is too small? Jian Pei: CMPT 741/459 Classification (2) 1

  2. Holdout Method • Partition the available labeled data set into two disjoint subsets: the training set and the test set – 50-50 – 2/3 for training and 1/3 for testing • Build a classifier using the training set • Evaluate the accuracy using the test set Jian Pei: CMPT 741/459 Classification (2) 2

  3. Limitations of Holdout Method • Fewer labeled examples for training • The classifier highly depends on the composition of the training and test sets – The smaller the training set, the larger the variance • If the test set is too small, the evaluation is not reliable • The training and test sets are not independent Jian Pei: CMPT 741/459 Classification (2) 3

  4. Cross-Validation • Each record is used the same number of times for training and exactly once for testing • K-fold cross-validation – Partition the data into k equal-sized subsets – In each round, use one subset as the test set, and use the rest subsets together as the training set – Repeat k times – The total error is the sum of the errors in k rounds • Leave-one-out: k = n – Utilize as much data as possible for training – Computationally expensive Jian Pei: CMPT 741/459 Classification (2) 4

  5. Confidence Interval for Accuracy • Suppose a classifier C is tested on a test set of n cases, and the accuracy is acc • How much confidence can we have on acc? • We need to estimate the confidence interval of a given model accuracy – Within which one is sufficiently sure that the true population value lies or, equivalently, by placing a bound on the probable error of the estimate • A confidence interval procedure uses the data to determine an interval with the property that – viewed before the sample is selected – the interval has a given high probability of containing the true population value Jian Pei: CMPT 741/459 Classification (2) 5

  6. Binomial Experiments • When a coin is flipped, it has a probability p to have the head turned up • If the coin is flipped N times, what is the probability that we see the head X times? – Expectation (mean): Np – Variance: Np(1 - p) N ⎛ ⎞ v N v P ( X v ) p ( 1 p ) − ⎜ ⎟ = = − ⎜ ⎟ v ⎝ ⎠ Jian Pei: CMPT 741/459 Classification (2) 6

  7. Confidence Level and Approximation Area = 1 - α acc p − P ( Z Z ) < < p ( 1 p ) / N / 2 1 / 2 α − α − 1 = − α Approximating using normal distribution Z α : the bound at confidence level (1- α ) Z α /2 Z 1- α /2 2 2 2 2 N acc Z Z Z 4 N acc 4 N acc ⋅ + ± + ⋅ − ⋅ / 2 / 2 / 2 α α α 2 2 ( N Z ) + / 2 α Jian Pei: CMPT 741/459 Classification (2) 7

  8. Accuracy Can Be Misleading … • Consider a data set of 99% of the negative class and 1% of the positive class • A classifier predicts everything negative has an accuracy of 99%, though it does not work for the positive class at all! • Imbalance class distribution is popular in many applications – Medical applications, fraud detection, … Jian Pei: CMPT 741/459 Classification (2) 8

  9. Performance Evaluation Matrix Confusion matrix (contingency table, error matrix): used for imbalance class distribution PREDICTED CLASS Class=Yes Class=No ACTUAL Class=Yes a (TP) b (FN) CLASS Class=No c (FP) d (TN) a d TP TN + + Accuracy = = a b c d TP TN FP FN + + + + + + Jian Pei: CMPT 741/459 Classification (2) 9

  10. Performance Evaluation Matrix PREDICTED CLASS Class=Yes Class=No ACTUAL Class=Yes a (TP) b (FN) CLASS Class=No c (FP) d (TN) True positive rate (TPR, sensitivity) = TP / (TP + FN) True negative rate (TNR, specificity) = TN / (TN + FP) False positive rate (FNR) = FP / (TN + FP) False negative rate (FNR) = FN / (TP + FN) Jian Pei: CMPT 741/459 Classification (2) 10

  11. Recall and Precision • Target class is more important than the other classes PREDICTED CLASS Class=Yes Class=No ACTUAL Class=Yes a (TP) b (FN) CLASS Class=No c (FP) d (TN) Precision p = TP / (TP + FP) Recall r = TP / (TP + FN) Jian Pei: CMPT 741/459 Classification (2) 11

  12. Fallout • Type I errors – false positive: a negative object is classified as positive – Fallout: the type I error rate, FP / (TP + FP) • Type II errors – false negative: a positive object is classified as negative – Captured by recall Jian Pei: CMPT 741/459 Classification (2) 12

  13. F β Measure • How can we summarize precision and recall into one metric? – Using the harmonic mean between the two 2 rp 2 TP F - measure (F) = = r p 2 TP FP FN + + + • F β measure β = ( β 2 + 1) rp ( β 2 + 1) TP F r + β 2 p = ( β 2 + 1) TP + β 2 FN + FP – β = 0, F β is the precision – β = ∞ , F β is the recall – 0 < β < ∞ , F β is a tradeoff between the precision and the recall Jian Pei: CMPT 741/459 Classification (2) 13

  14. Weighted Accuracy • A more general metric w a w d + Weighted Accuracy = 1 4 w a w b w c w d + + + 1 2 3 4 Measure w1 w2 w3 w4 Recall 1 1 0 0 Precision 1 0 1 0 F β β 2 + 1 β 2 1 0 Accuracy 1 1 1 1 Jian Pei: CMPT 741/459 Classification (2) 14

  15. ROC Curve • Receiver Operating Characteristic (ROC) 1-dimensional data set containing 2 classes. Any points located at x > t is classified as positive Jian Pei: CMPT 741/459 Classification (2) 15

  16. ROC Curve (TP,FP): • (0,0): declare everything to be negative class • (1,1): declare everything to be positive class • (1,0): ideal • Diagonal line: – Random guessing – Below diagonal line: prediction is opposite of the true class Figure from [Tan, Steinbach, Kumar] Jian Pei: CMPT 741/459 Classification (2) 16

  17. Comparing Two Classifiers Figure from [Tan, Steinbach, Kumar] Jian Pei: CMPT 741/459 Classification (2) 17

  18. Cost-Sensitive Learning • In some applications, misclassifying some classes may be disastrous – Tumor detection, fraud detection • Using a cost matrix PREDICTED CLASS Class=Yes Class=No ACTUAL Class=Yes -1 100 CLASS Class=No 1 0 Jian Pei: CMPT 741/459 Classification (2) 18

  19. Sampling for Imbalance Classes • Consider a data set containing 100 positive examples and 1,000 negative examples • Undersampling: use a random sample of 100 negative examples and all positive examples – Some useful negative examples may be lost – Run undersampling multiple times, use the ensemble of multiple base classifiers – Focused undersampling: remove negative samples that are not useful for classification, e.g., those far away from the decision boundary Jian Pei: CMPT 741/459 Classification (2) 19

  20. Oversampling • Replicate the positive examples until the training set has an equal number of positive and negative examples • For noisy data, may cause overfitting Jian Pei: CMPT 741/459 Classification (2) 20

  21. Significance Tests • Are two algorithms different in effectiveness? – The null hypothesis: there is NO difference – The alternative hypothesis: there is a difference – B is better than A (the baseline method) • Matched pair experiments: the rankings that are compared are based on the same set of queries for both algorithms • Possible errors of significant tests – Type I: the null hypothesis is rejected when it is true – Type II: the null hypothesis is accepted when it is false • The power of a hypothesis test: the probability that the test will reject the null hypothesis correctly – Reducing the type II errors Jian Pei: CMPT 741/459 Classification (2) 21

  22. Procedure of Comparison • Using a set of data sets • Procedure – Compute the effectiveness measure for every data set – Compute a test statistic based on a comparison of the effectiveness measures for each data set • E.g., the t-test, the Wilcoxon signed-rank test, and the sign test – Compute a P-value: the probability that a test statistic value at least that extreme could be observed if the null hypothesis were true – The null hypothesis is rejected if the P-value ≤ α , where α is the significance level which is used to minimize the type I errors • One-sided (one-tailed) tests: whether B is better than A (the baseline method) – Two-sided tests: whether A and B are different – the P-value is doubled Jian Pei: CMPT 741/459 Classification (2) 22

  23. Distribution of Test Statistics Jian Pei: CMPT 741/459 Classification (2) 23

  24. T-test • Assuming data values are sampled from normal distributions – In a matched pair experiment, assuming the difference between the effectiveness values is a sample from a normal distribution • The null hypothesis: the mean of the distribution of difference is 0 B A − t N = σ B − A – B – A is the mean of the differences, σ B – A is the standard deviation of the differences 1 N 2 2 ( x x ) ∑ σ = − i N i 1 = Jian Pei: CMPT 741/459 Classification (2) 24

  25. Example B A 21 . 4 − = 29 . 1 σ = B A − t 2 . 33 = P-value = 0.02 significant at a level of σ = 0.05 – the null hypothesis can be rejected Jian Pei: CMPT 741/459 Classification (2) 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend