roc analysis for evaluation of machine learning algorithms
play

ROC Analysis for Evaluation of Machine Learning Algorithms Larry - PowerPoint PPT Presentation

ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical Engineering and Computer Science Washington State University References Provost et al., The Case Against Accuracy Estimation for Comparing


  1. ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical Engineering and Computer Science Washington State University

  2. References � Provost et al., “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” International Conference on Machine Learning , 1998. � Rob Holte’s talk on ROC analysis at www.cs.ualberta.ca/~ holte/Learning/ROCtalk/

  3. Motivation � Most comparisons of machine learning algorithms use classification accuracy � Problems with this approach � May be different costs associated with false positive and false negative errors � Training data may not reflect true class distribution

  4. Motivation � Perhaps maximizing accuracy is still okay � Alter class distribution to reduce FP/FN costs � Problems � Only works on 2-class case � Assigning true costs is difficult � Unsure of true class distribution � So, must show classifier L1 better than L2 under more general conditions

  5. ROC Analysis � Receiver Operating Characteristic (ROC) � Originated from signal detection theory � Common in medical diagnosis � Becoming common in ML evaluations � ROC curves assess predictive behavior independent of error costs or class distributions

  6. Confusion Matrix Classified As True Class Positive Negative Positive # TP # FN Negative # FP # TN � True Positive rate TP = # TP/# P � False Positive rate FP = # FP/# N � Rates independent of class distribution

  7. ROC Curves � ROC space � False positive (FP) rate on X axis � True positive (TP) rate on Y axis � Each classifier represented by a point in ROC space corresponding to its (FP,TP) pair � For continuous-output models, classifiers defined based on varying thresholds on output

  8. Example ROC Curve 1.0 True positive rate 0.75 0.5 Learner L1 Learner L2 0.25 Learner L3 Random 0 0 0.25 0.5 0.75 1.0 False positive rate

  9. Domination in ROC Space � Learner L1 dominates L2 if L2’s ROC curve is beneath L1’s curve � If L1 dominates L2, then L1 better than L2 for all possible costs and class distributions � If neither dominates (L2 and L3), then there are times when L2 maximizes accuracy, but does not minimize cost

  10. Expected ROC Curve � Perform k-fold cross-validation on each learner � ROC curve from each fold i treated as a function R i such that TP = R i (FP) ^ � R(FP) = mean (R i (FP)) ^ � Generate ROC curve by evenly sampling R along FP axis � Compute confidence intervals according to binomial distribution over resulting TP values

  11. Accuracy vs. ROC Curves � Hypothesis � Standard learning algorithms produce dominating ROC models � Answer: No � Results on 10 datasets from UCI repository show only one instance of a dominating model � Thus, learners maximizing accuracy typically do not dominate in ROC space � Thus, worse than others for some costs and class distributions � Non-dominating ROC curves can still provide regions of superiority for different learners

  12. Summary � Results comparing accuracy of learning algorithms are questionable � Especially in scenarios with non-uniform costs and class distributions � ROC curves provide a better look at where different learners minimize cost � Recommends proper ROC analysis for comparison of learning algorithms

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend