Optimizing the Partial AUC Harikrishna Narasimhan and Shivani - - PowerPoint PPT Presentation

optimizing the partial auc
SMART_READER_LITE
LIVE PREVIEW

Optimizing the Partial AUC Harikrishna Narasimhan and Shivani - - PowerPoint PPT Presentation

Support Vector Algorithms for Optimizing the Partial AUC Harikrishna Narasimhan and Shivani Agarwal Department of Computer Science and Automation Indian Institute of Science, Bangalore Based on work in ICML 2013 and KDD 2013 Receiver Operating


slide-1
SLIDE 1

Support Vector Algorithms for Optimizing the Partial AUC

Harikrishna Narasimhan and Shivani Agarwal

Department of Computer Science and Automation Indian Institute of Science, Bangalore

Based on work in ICML 2013 and KDD 2013

slide-2
SLIDE 2

Binary Classification

Vs.

Non-Spam Spam

Bipartite Ranking

Ranking of documents Area Under the ROC Curve (AUC)

Receiver Operating Characteristic Curve

slide-3
SLIDE 3

Partial AUC?

Vs Full AUC Partial AUC

slide-4
SLIDE 4

Ranking

http://www.google.com/

slide-5
SLIDE 5

Medical Diagnosis

KDD Cup 2008

http://en.wikipedia.org/

slide-6
SLIDE 6

Bioinformatics

― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……

http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp

slide-7
SLIDE 7

Partial Area Under the ROC Curve is critical to many applications

slide-8
SLIDE 8

Partial AUC Optimization

  • Asymmetric SVM:

– Wu, S.-H., Lin, K.-P., Chen, C.-M., and Chen, M.-S. Asymmetric support vector machines: low false-positive learning under the user tolerance. In KDD, 2008.

  • Boosting style algorithm:

– Komori, O. and Eguchi, S. A boosting method for maximizing the partial area under the ROC curve. BMC Bioinformatics, 11:314, 2010. – Takenouchi, T., Komori, O., and Eguchi, S. An extension of the receiver

  • perating characteristic curve and AUC-optimal classification. Neural

Computation, 24, (10):2789–2824, 2012.

  • Several heuristic approaches:

– Pepe, M. S. and Thompson, M. L. Combining diagnostic test results to increase accuracy. Biostatistics, 1(2):123–140, 2000. – Ricamato, M. T. and Tortorella, F. Partial AUC maximization in a linear combination of dichotomizers. Pattern Recognition, 44(10-11):2669– 2677, 2011.

slide-9
SLIDE 9

Partial AUC Optimization

  • Many of the existing approaches are either heuristic or

solve special cases of the problem.

  • Our contribution: New support vector methods for
  • ptimizing the general partial AUC measure.
  • Based
  • n

Joachims’ Structural SVM approach for

  • ptimizing full AUC, but leads to a trickier inner

combinatorial optimization problem.

– Joachims, T. A Support Vector Method for Multivariate Performance Measures. ICML, 2005. – Joachims, T. Training linear SVMs in linear time. KDD, 2006.

  • Improvements over baselines on several real-world

applications

slide-10
SLIDE 10

Outline

  • Problem Setup
  • First cut: Structural SVM Approach for

Optimizing Partial AUC

  • Better Formulation: Tighter Upper Bound on

the Partial AUC Loss

  • Experiments
slide-11
SLIDE 11

Receiver Operating Characteristic Curve

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

Rank objects

….

x3

+

x5

+

x6

+

x1

  • xn
  • Build a classifier

….

x3

+

x5

+

x6

+

x1

  • xn
  • r

Threshold Quality of scoring function?

Threshold Assignment

slide-12
SLIDE 12

ROC Curve

Receiver Operating Characteristic Curve

20 15 14 13 11 9 8 6 5 3 2

Area Under the Curve (AUC)

Scores assigned by f

Partial AUC

Pair-wise accuracy

slide-13
SLIDE 13

Partial AUC Optimization

Minimize:

  • Extends Joachims’ approach for full AUC optimization,

but leads to a trickier combinatorial optimization step.

  • Efficient solver with the same/lesser time complexity

compared to that for full AUC.

Discrete and Non-differentiable Convex Upper Bound on “ ” + Regularizer Structural SVM Based Approach

slide-14
SLIDE 14

Outline

  • Problem Setup
  • First cut: Structural SVM Approach for

Optimizing Partial AUC

  • Better Formulation: Tighter Upper Bound on

the Partial AUC Loss

  • Experiments
slide-15
SLIDE 15

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} 1 1 1 1 1 1 1 1 m n compared with IDEAL pAUC Loss Regularizer Upper Bound on (1 – pAUC) Exponential Number of Output Matrices!!

slide-16
SLIDE 16

Break down!

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Full AUC Partial AUC ?

slide-17
SLIDE 17

Trickier Optimization Problem

All Pairs

Subset of negative instances in the FPR range [α, β] – changes with

  • rdering

Full AUC Partial AUC

1 1 1 1 1 1 1 1 1 1

Σ

AUC

slide-18
SLIDE 18

Trickier Optimization Problem

Full AUC All Pairs

Subset of negative instances in the FPR range [α, β] – changes with

  • rdering

Partial AUC

Partial AUC

slide-19
SLIDE 19

Trickier Optimization Problem

Full AUC All Pairs

Subset of negative instances in the FPR range [α, β] – changes with

  • rdering

Partial AUC

1 1 1 1 1 1 1 1 1 1

Partial AUC Optimize rows independently

  • H. Narasimhan and S. Agarwal. A Structural SVM Based Approach for Optimizing Partial AUC.

ICML, 2013. 1 1 1 1 1 1 1 1 1 1 Can be implemented in O((m+n) log (m+n)) time complexity

slide-20
SLIDE 20

Outline

  • Problem Setup
  • First cut: Structural SVM Approach for

Optimizing Partial AUC

  • Better Formulation: Tighter Upper Bound on

the Partial AUC Loss

  • Experiments
slide-21
SLIDE 21

Better Formulation

  • Characterize the upper bound on the pAUC loss:
  • Rewrite pAUC loss:

?

Max over subsets of negative instances Truncated form of earlier

  • bjective
  • Tighter upper bound on partial AUC loss
  • Lesser time for finding most-violated constraint!
  • Better guarantee on number of cutting-plane

iterations!

  • H. Narasimhan and S. Agarwal. SVM_pAUC^tight: A New Support Vector Method for

Optimizing Partial AUC Based on a Tight Convex Upper Bound. KDD, 2013. To appear.

slide-22
SLIDE 22

Outline

  • Problem Setup
  • First cut: Structural SVM Approach for

Optimizing Partial AUC

  • Better Formulation: Tighter Upper Bound on

the Partial AUC Loss

  • Experiments
slide-23
SLIDE 23

SVMpAUCstruct vs. Baseline Methods

Interval [0, β]

Drug Discovery

50 active compounds / 2092 inactive compounds

Protein-Protein Interaction Prediction

~3x103 interacting pairs / ~2x105 non-interacting pairs

slide-24
SLIDE 24

Interval [α, β]

KDD Cup 2008 Breast Cancer Detection ~600 malignant ROIs / ~105 benign ROIs

SVMpAUCstruct vs. Baseline Methods

slide-25
SLIDE 25

Partial AUC in [0, β] Partial AUC in [α, β]

SVMpAUCtight vs. SVMpAUCstruct

slide-26
SLIDE 26

Run-time Analysis

Interval [0, β]

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

slide-27
SLIDE 27

Conclusions

  • A new structural SVM based approach for
  • ptimizing partial AUC
  • Efficient algorithm for solving the inner

combinatorial optimization step

  • Improved algorithm that optimizes a tighter

upper bound on the partial AUC loss

  • Experimental results confirm the effectiveness
  • f our methods
slide-28
SLIDE 28

Questions?