Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s - PowerPoint PPT Presentation

IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't ʌ ·dek p ɪ e·'tr ʌ · ʃ ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005

“To classify, or not to classify: that is the question.” 2 August 9, 2005 ICML2005

Motivation ! Abstaining classifiers are classifiers that in certain cases can refrain from classification and are similar to human experts who can say “I don’t know”. ! In many domains such experts are preferred to the ones that always make a decision and are sometimes wrong (think “doctor”). ! Machine learning has frequently used abstaining classifiers ([FH04], [GL00], [PMAS94], [Tort00]) also implicitly (e.g., active learning, delegating classifiers, triskels (ICML05)). ! Q1: How do we optimally select abstaining classifiers? ! Q2: How do we compare normal and abstaining classifiers? 3 August 9, 2005 ICML2005

Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 4 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Notation Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Binary classifier C is a function : i α {+,-} , where i ∈ I is an instance ! Ranker R (a.k.a scoring classifier) is a function attaching rank to an instance i α R , can be converted to a binary classifier C τ using ∀ i : C τ (i) = + ⇔ R (i) ≥ τ ! Abstaining binary classifier A is a classifier that in certain case can refrain from classification. We denote it as attaching a third class “ ? ”. 5 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Evaluate model performance under all class and cost distributions – 2D plot (X – false positive rate, Y – true positive rate) – Classifier C corresponds to a single point on the ROC curve (fp, tp) . ! Classifier C τ (or a machine learning method L τ ) has a parameter τ , varying which produces multiple points. ! Therefore we consider a ROC curve a function f : τ α (fp τ , tp τ ) . ! Can find an inverse function f -1 : (fp τ , tp τ ) α τ 6 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! ROC Convex Hull – A piecewise-linear convex down curve f R , having the following properties: f R (0) = 0, f R (1) = 1 • Slope of f R is monotonically non-increasing . • – Assume that for any value m , there [PF98] exists f R (x) = m . • Vertices have ``slopes’’ assuming values between the slopes of adjacent edges Assume sentinel edges: 0 th edge with a slope ∞ and (n+1) th edge with a • slope 0. – We will use ROCCH instead of ROC. 7 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Some Definitions Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Confusion Matrix TP FP = = A/C + - tp fp + + TP FN FP TN + TP FN P FN = fn - FP TN N + TP FN ! Cost Matrix A/C + - + 0 c 12 c CR = 21 - c 21 0 c 12 A = Actual, C = Classified as 8 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Cost Minimizing Criteria for One Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Known iso-performance lines [PF98] ( ) N ′ = f ROC fp CR P 9 August 9, 2005 ICML2005

Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 10 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! IDEA: Construct the classifier as follows: + = + C α C β C ( x ) Result α ) ( ) ( = = − ∧ = + A ( x ) ? C ( x ) C ( x ) + + + α β α β , − = − - + ? C ( x ) β + - Impossible where C α , C β is such that: - - - ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Can we optimally select C α , C β ? 11 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Requirements on the ROC Curve Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary Requirement: for a ROC curve and any two classifiers C α and C β corresponding to points (fp α , tp α ) and (fp β , tp β ) such that fp α ≤ fp β ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Conditions are the same used by [FlachWu03] and are met in particular if classifiers C α and C β are constructed from a single ranker R . 12 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based “Optimal” Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How do we compare binary classifiers and abstaining classifiers? How to select an optimal classifier? ! No clear answer – Use cost based model (Cost-Based Model) – Use boundary conditions: • Maximum number of instances classified as “?” (Bounded- Abstention Model) • Maximum misclassification cost (Bounded-Improvement Model) 13 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Cost-Based Model Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Cost Matrix C α A/C + - A/C + - ? + TP α FN α + 0 c 12 c 13 - FP α TN α - c 21 0 c 23 C β A/C + - ! Important properties + TP β FN β ( )( ) ⇒ ≥ fp fp fp fp - FP β TN β α β β α ( )( ) ⇒ ≥ fn fn fn fn β α β α A = Actual, C = Classified as 14 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Selecting the Optimal Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Similar criteria – minimize the cost    ( ) ( )  1 = + + − + − rc FN c FP c FP FP c FN FN c   1 2 3 1 2 β 3 α 1 4 β 4 2 4 α 4 3 1 4 α 4 2 4 β 4 3 + 12 21 23 13   N P   α β β α − β − α fp , fp fn , fn disagree misclass . disagree misclass . ∂ ∂ rc rc = ∧ = ⇒ 0 0 ∂ ∂ FP FP β α c N ′ = 23 f ( fp ) β − ROC c c P 12 13 − c c N ′ = 21 23 f ( fp ) α ROC c P 13 15 August 9, 2005 ICML2005

Cost-Based Model – a Simulated Example Misclassification cost for different Misclassification cost for different ROC curve with two optimal classifiers combinations of A and B combinations of A and B 1.0 Classifier B 0.5 0.5 0.8 0.4 0.4 Cost Cost 0.6 0.3 0.3 TP Classifier A 0.4 c N ′ = 23 f ( fp ) 0.2 0.2 β − ROC c c P 0.0 0.0 12 13 1.0 1.0 0.2 0.2 0.2 − 0.8 0.8 c c N 0.4 0.4 ′ = 21 23 0.6 0.6 f ( fp ) FP(b) FP(b) α 0.6 0.6 ROC F F 0.4 0.4 c P P P ( ( a a ) ) 0.8 0.8 0.0 13 0.2 0.2 1.0 1.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FP 16 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Understanding Cost Matrices Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! 2x2 cost matrix is well known. 2x3 cost matrices has some interesting properties: e.g., under which conditions the optimal classifier is an abstaining classifier. ! Our derivation is valid for ( ) ( ) ( ) ≥ ∧ > ∧ ≥ + c c c c c c c c c c 21 23 12 13 21 12 21 13 23 12 we can prove that if this condition is not met the classifier is a trivial binary classifier 17 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Cost Matrices – Interesting Cases Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How to set c 13 , c 23 so that the classifier is a non- trivial abstaining classifier? ! Two interesting cases – Symmetric case ( c 13 =c 23 ) c c = ≤ 12 21 c c + 13 23 c c 21 12 – Proportional case (c 13 / c 23 = c 12 / c 21 ) c c ≤ ⇔ ≤ 12 21 c c 13 23 2 2 18 August 9, 2005 ICML2005

1. ROC Background 2. Abstaining Classifier Cost-Based Bounded Models Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Problem: 2x3 cost matrix is not always given and would have to be estimated. However, classifier is very sensitive to c 13 , c 23 . ! Finding other optimization criteria for an abstaining classifier using a standard cost matrix. – Calculate misclassification costs per classified instance ! Follow the same reasoning to find the optimal classifier 19 August 9, 2005 ICML2005

Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s - PowerPoint PPT Presentation

IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't dek p e'tr ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Evaluation of Classifiers Evaluation of Classifiers ROC Curves ROC Curves Reject Curves Reject

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

On Robust Trimming of Bayesian Network Classifiers YooJung Choi and Guy Van den Broeck UCLA

Visualization for Explainable Classifiers Yao MING THE HONG KONG UNIVERSITY OF SCIENCE AND

Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11,

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Advanced SAT-Techniques for Bounded Model Checking of Blackbox Designs Marc Herbstritt (joint

N 3 PDF Machine Learning PDFs QCD Introduction NISQ era We are in a Noisy

Malicious Overjoining in Multicast Problem and proposed solution draft-jholland-cb-assisted-cc

Math 211 Math 211 Lecture #31 Higher Order Equations Harmonic Motion November 7, 2003 2

Actual Class Adapted from Fawcett (2003} P N Estimated Class T rue F alse P P ositive P

www.cornwall-insight.com Tim Dixon Alex Wynn HELPING YOU MAKE SENSE OF THE HELPING YOU MAKE

61A Extra Lecture 13 Announcements Prediction Regression Given a set of (x, y) pairs, find a

Phonons II - Thermal Properties (Kittel Ch. 5) Heat Capacity C Approaches classical limit 3 N k