optimizing abstaining classifiers u s i n g r o c a n a l
play

Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s - PowerPoint PPT Presentation

IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't dek p e'tr ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:


  1. IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't ʌ ·dek p ɪ e·'tr ʌ · ʃ ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005

  2. “To classify, or not to classify: that is the question.” 2 August 9, 2005 ICML2005

  3. Motivation ! Abstaining classifiers are classifiers that in certain cases can refrain from classification and are similar to human experts who can say “I don’t know”. ! In many domains such experts are preferred to the ones that always make a decision and are sometimes wrong (think “doctor”). ! Machine learning has frequently used abstaining classifiers ([FH04], [GL00], [PMAS94], [Tort00]) also implicitly (e.g., active learning, delegating classifiers, triskels (ICML05)). ! Q1: How do we optimally select abstaining classifiers? ! Q2: How do we compare normal and abstaining classifiers? 3 August 9, 2005 ICML2005

  4. Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 4 August 9, 2005 ICML2005

  5. 1. ROC Background 2. Abstaining Classifier Cost-Based Notation Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Binary classifier C is a function : i α {+,-} , where i ∈ I is an instance ! Ranker R (a.k.a scoring classifier) is a function attaching rank to an instance i α R , can be converted to a binary classifier C τ using ∀ i : C τ (i) = + ⇔ R (i) ≥ τ ! Abstaining binary classifier A is a classifier that in certain case can refrain from classification. We denote it as attaching a third class “ ? ”. 5 August 9, 2005 ICML2005

  6. 1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Evaluate model performance under all class and cost distributions – 2D plot (X – false positive rate, Y – true positive rate) – Classifier C corresponds to a single point on the ROC curve (fp, tp) . ! Classifier C τ (or a machine learning method L τ ) has a parameter τ , varying which produces multiple points. ! Therefore we consider a ROC curve a function f : τ α (fp τ , tp τ ) . ! Can find an inverse function f -1 : (fp τ , tp τ ) α τ 6 August 9, 2005 ICML2005

  7. 1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! ROC Convex Hull – A piecewise-linear convex down curve f R , having the following properties: f R (0) = 0, f R (1) = 1 • Slope of f R is monotonically non-increasing . • – Assume that for any value m , there [PF98] exists f R (x) = m . • Vertices have ``slopes’’ assuming values between the slopes of adjacent edges Assume sentinel edges: 0 th edge with a slope ∞ and (n+1) th edge with a • slope 0. – We will use ROCCH instead of ROC. 7 August 9, 2005 ICML2005

  8. 1. ROC Background 2. Abstaining Classifier Cost-Based Some Definitions Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Confusion Matrix TP FP = = A/C + - tp fp + + TP FN FP TN + TP FN P FN = fn - FP TN N + TP FN ! Cost Matrix A/C + - + 0 c 12 c CR = 21 - c 21 0 c 12 A = Actual, C = Classified as 8 August 9, 2005 ICML2005

  9. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost Minimizing Criteria for One Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Known iso-performance lines [PF98] ( ) N ′ = f ROC fp CR P 9 August 9, 2005 ICML2005

  10. Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 10 August 9, 2005 ICML2005

  11. 1. ROC Background 2. Abstaining Classifier Cost-Based Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! IDEA: Construct the classifier as follows: + = + C α C β C ( x ) Result α ) ( ) ( = = − ∧ = + A ( x ) ? C ( x ) C ( x ) + + + α β α β , − = − - + ? C ( x ) β + - Impossible where C α , C β is such that: - - - ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Can we optimally select C α , C β ? 11 August 9, 2005 ICML2005

  12. 1. ROC Background 2. Abstaining Classifier Cost-Based Requirements on the ROC Curve Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary Requirement: for a ROC curve and any two classifiers C α and C β corresponding to points (fp α , tp α ) and (fp β , tp β ) such that fp α ≤ fp β ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Conditions are the same used by [FlachWu03] and are met in particular if classifiers C α and C β are constructed from a single ranker R . 12 August 9, 2005 ICML2005

  13. 1. ROC Background 2. Abstaining Classifier Cost-Based “Optimal” Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How do we compare binary classifiers and abstaining classifiers? How to select an optimal classifier? ! No clear answer – Use cost based model (Cost-Based Model) – Use boundary conditions: • Maximum number of instances classified as “?” (Bounded- Abstention Model) • Maximum misclassification cost (Bounded-Improvement Model) 13 August 9, 2005 ICML2005

  14. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost-Based Model Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Cost Matrix C α A/C + - A/C + - ? + TP α FN α + 0 c 12 c 13 - FP α TN α - c 21 0 c 23 C β A/C + - ! Important properties + TP β FN β ( )( ) ⇒ ≥ fp fp fp fp - FP β TN β α β β α ( )( ) ⇒ ≥ fn fn fn fn β α β α A = Actual, C = Classified as 14 August 9, 2005 ICML2005

  15. 1. ROC Background 2. Abstaining Classifier Cost-Based Selecting the Optimal Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Similar criteria – minimize the cost    ( ) ( )  1 = + + − + − rc FN c FP c FP FP c FN FN c   1 2 3 1 2 β 3 α 1 4 β 4 2 4 α 4 3 1 4 α 4 2 4 β 4 3 + 12 21 23 13   N P   α β β α − β − α fp , fp fn , fn disagree misclass . disagree misclass . ∂ ∂ rc rc = ∧ = ⇒ 0 0 ∂ ∂ FP FP β α c N ′ = 23 f ( fp ) β − ROC c c P 12 13 − c c N ′ = 21 23 f ( fp ) α ROC c P 13 15 August 9, 2005 ICML2005

  16. Cost-Based Model – a Simulated Example Misclassification cost for different Misclassification cost for different ROC curve with two optimal classifiers combinations of A and B combinations of A and B 1.0 Classifier B 0.5 0.5 0.8 0.4 0.4 Cost Cost 0.6 0.3 0.3 TP Classifier A 0.4 c N ′ = 23 f ( fp ) 0.2 0.2 β − ROC c c P 0.0 0.0 12 13 1.0 1.0 0.2 0.2 0.2 − 0.8 0.8 c c N 0.4 0.4 ′ = 21 23 0.6 0.6 f ( fp ) FP(b) FP(b) α 0.6 0.6 ROC F F 0.4 0.4 c P P P ( ( a a ) ) 0.8 0.8 0.0 13 0.2 0.2 1.0 1.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FP 16 August 9, 2005 ICML2005

  17. 1. ROC Background 2. Abstaining Classifier Cost-Based Understanding Cost Matrices Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! 2x2 cost matrix is well known. 2x3 cost matrices has some interesting properties: e.g., under which conditions the optimal classifier is an abstaining classifier. ! Our derivation is valid for ( ) ( ) ( ) ≥ ∧ > ∧ ≥ + c c c c c c c c c c 21 23 12 13 21 12 21 13 23 12 we can prove that if this condition is not met the classifier is a trivial binary classifier 17 August 9, 2005 ICML2005

  18. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost Matrices – Interesting Cases Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How to set c 13 , c 23 so that the classifier is a non- trivial abstaining classifier? ! Two interesting cases – Symmetric case ( c 13 =c 23 ) c c = ≤ 12 21 c c + 13 23 c c 21 12 – Proportional case (c 13 / c 23 = c 12 / c 21 ) c c ≤ ⇔ ≤ 12 21 c c 13 23 2 2 18 August 9, 2005 ICML2005

  19. 1. ROC Background 2. Abstaining Classifier Cost-Based Bounded Models Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Problem: 2x3 cost matrix is not always given and would have to be estimated. However, classifier is very sensitive to c 13 , c 23 . ! Finding other optimization criteria for an abstaining classifier using a standard cost matrix. – Calculate misclassification costs per classified instance ! Follow the same reasoning to find the optimal classifier 19 August 9, 2005 ICML2005

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend