for Optimizing the Partial AUC Harikrishna Narasimhan (Joint work - - PowerPoint PPT Presentation

for optimizing the partial auc
SMART_READER_LITE
LIVE PREVIEW

for Optimizing the Partial AUC Harikrishna Narasimhan (Joint work - - PowerPoint PPT Presentation

A Structural SVM Based Approach for Optimizing the Partial AUC Harikrishna Narasimhan (Joint work with Shivani Agarwal) A paper on this work has been accepted in ICML 2013 Learning with Binary Supervision Learning with Binary Supervision


slide-1
SLIDE 1

A Structural SVM Based Approach for Optimizing the Partial AUC

Harikrishna Narasimhan

(Joint work with Shivani Agarwal) A paper on this work has been accepted in ICML 2013

slide-2
SLIDE 2

Learning with Binary Supervision

slide-3
SLIDE 3

Learning with Binary Supervision

slide-4
SLIDE 4

Learning with Binary Supervision

http://www.google.com/imghp

slide-5
SLIDE 5

Learning with Binary Supervision

http://www.google.com/imghp

slide-6
SLIDE 6

Learning with Binary Supervision

http://www.google.com/imghp

slide-7
SLIDE 7

Learning with Binary Supervision

Good evaluation metric?

http://www.google.com/imghp

slide-8
SLIDE 8

Learning with Binary Supervision

Good evaluation metric?

http://www.google.com/imghp

slide-9
SLIDE 9

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

slide-10
SLIDE 10

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

slide-11
SLIDE 11

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

Rank objects

….

x3

+

x5

+

x6

+

x1

  • xn
slide-12
SLIDE 12

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

Rank objects

….

x3

+

x5

+

x6

+

x1

  • xn
  • Build a classifier

….

x3

+

x5

+

x6

+

x1

  • xn
  • r

Threshold

slide-13
SLIDE 13

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

Rank objects

….

x3

+

x5

+

x6

+

x1

  • xn
  • Build a classifier

….

x3

+

x5

+

x6

+

x1

  • xn
  • r

Threshold Quality of score function?

slide-14
SLIDE 14

Learning with Binary Supervision

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

Learn a scoring function GOAL?

Rank objects

….

x3

+

x5

+

x6

+

x1

  • xn
  • Build a classifier

….

x3

+

x5

+

x6

+

x1

  • xn
  • r

Threshold Quality of score function?

Threshold Assignment

slide-15
SLIDE 15

Receiver Operating Characteristic Curve

Captures how well a prediction model discriminates between positive and negative examples

slide-16
SLIDE 16

Receiver Operating Characteristic Curve

Captures how well a prediction model discriminates between positive and negative examples

Full AUC

slide-17
SLIDE 17

Receiver Operating Characteristic Curve

Captures how well a prediction model discriminates between positive and negative examples

Vs Full AUC Partial AUC

slide-18
SLIDE 18

Ranking

http://www.google.com/

slide-19
SLIDE 19

Ranking

http://www.google.com/

slide-20
SLIDE 20

Medical Diagnosis

http://www.google.com/imghp

slide-21
SLIDE 21

Medical Diagnosis

http://www.google.com/imghp

slide-22
SLIDE 22

Bioinformatics

― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……

http://www.google.com/imghp

slide-23
SLIDE 23

Bioinformatics

― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……

http://www.google.com/imghp

slide-24
SLIDE 24

Partial Area Under the ROC Curve is critical to many applications

slide-25
SLIDE 25

Partial AUC Optimization

  • Many existing approaches are either heuristic or

solve special cases of the problem.

Partial Area Under the ROC Curve is critical to many applications

slide-26
SLIDE 26

Partial AUC Optimization

  • Many existing approaches are either heuristic or

solve special cases of the problem.

  • Our contribution: A new support vector method for
  • ptimizing the general partial AUC measure.

Partial Area Under the ROC Curve is critical to many applications

slide-27
SLIDE 27

Partial AUC Optimization

  • Many existing approaches are either heuristic or

solve special cases of the problem.

  • Our contribution: A new support vector method for
  • ptimizing the general partial AUC measure.
  • Based on Joachims’ Structural SVM approach for
  • ptimizing full AUC, but leads to a trickier inner

combinatorial optimization problem.

Partial Area Under the ROC Curve is critical to many applications

slide-28
SLIDE 28

Partial AUC Optimization

  • Many existing approaches are either heuristic or

solve special cases of the problem.

  • Our contribution: A new support vector method for
  • ptimizing the general partial AUC measure.
  • Based on Joachims’ Structural SVM approach for
  • ptimizing full AUC, but leads to a trickier inner

combinatorial optimization problem.

  • Improvements over baselines on several real-world

applications

Partial Area Under the ROC Curve is critical to many applications

slide-29
SLIDE 29

ROC Curve

Receiver Operating Characteristic Curve

20 15 14 13 11 9 8 6 5 3 2

Scores assigned by f

slide-30
SLIDE 30

ROC Curve

Receiver Operating Characteristic Curve

20 15 14 13 11 9 8 6 5 3 2

slide-31
SLIDE 31

ROC Curve

Receiver Operating Characteristic Curve

20 15 14 13 11 9 8 6 5 3 2

slide-32
SLIDE 32

ROC Curve

Receiver Operating Characteristic Curve

20 15 14 13 11 9 8 6 5 3 2

slide-33
SLIDE 33

Partial AUC Optimization

slide-34
SLIDE 34

Partial AUC Optimization

Minimize:

slide-35
SLIDE 35

Partial AUC Optimization

Minimize:

Discrete and Non-differentiable

slide-36
SLIDE 36

Partial AUC Optimization

Minimize:

Discrete and Non-differentiable Convex Upper Bound on “ ”

slide-37
SLIDE 37

Partial AUC Optimization

Minimize:

Discrete and Non-differentiable Convex Upper Bound on “ ” + Regularizer

slide-38
SLIDE 38

Partial AUC Optimization

Minimize:

Discrete and Non-differentiable Convex Upper Bound on “ ” + Regularizer Structural SVM

slide-39
SLIDE 39

Partial AUC Optimization

Minimize:

  • Extends Joachims’ approach for full AUC optimization,

but leads to a trickier combinatorial optimization step.

Discrete and Non-differentiable Convex Upper Bound on “ ” + Regularizer Structural SVM

  • T. Joachims, “A Support Vector Method for Multivariate Performance Measures”, ICML 2005.
slide-40
SLIDE 40

Partial AUC Optimization

Minimize:

  • Extends Joachims’ approach for full AUC optimization,

but leads to a trickier combinatorial optimization step.

  • Efficient solver with the same time complexity as that

for full AUC.

Discrete and Non-differentiable Convex Upper Bound on “ ” + Regularizer Structural SVM

  • T. Joachims, “A Support Vector Method for Multivariate Performance Measures”, ICML 2005.
slide-41
SLIDE 41

Structural SVM Based Approach

slide-42
SLIDE 42

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n

slide-43
SLIDE 43

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL

slide-44
SLIDE 44

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL

slide-45
SLIDE 45

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL Upper Bound on (1 – pAUC)

slide-46
SLIDE 46

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL Regularizer Upper Bound on (1 – pAUC)

slide-47
SLIDE 47

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL pAUC Loss Regularizer Upper Bound on (1 – pAUC)

slide-48
SLIDE 48

Structural SVM Based Approach

Ordering of {x1, x2, …, xs} +1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

m n +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 +1 compared with IDEAL pAUC Loss Regularizer Upper Bound on (1 – pAUC) Exponential Number of Output Matrices!!

slide-49
SLIDE 49

Optimization Solver

slide-50
SLIDE 50

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints.

slide-51
SLIDE 51

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

slide-52
SLIDE 52
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations
slide-53
SLIDE 53
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations
slide-54
SLIDE 54
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Break down!

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations
slide-55
SLIDE 55
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Break down!

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations

+1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

Full AUC

slide-56
SLIDE 56
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Break down!

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations

+1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

+1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

Full AUC Partial AUC

slide-57
SLIDE 57
  • T. Joachims, “Training linear SVMs in linear time”, KDD 2006.

Break down!

Optimization Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Converges in constant number

  • f iterations

+1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

+1 +1 +1 +1 +1

  • 1
  • 1

+1 +1 +1

  • 1
  • 1

+1 +1

  • 1
  • 1
  • 1

+1 +1

  • 1

Full AUC Partial AUC Can be implemented in O((m+n) log (m+n)) time complexity

slide-58
SLIDE 58

Experimental Results

slide-59
SLIDE 59

Experimental Results

Interval [0, β] Drug Discovery

slide-60
SLIDE 60

Experimental Results

Interval [0, β] Drug Discovery Protein Interaction Prediction

slide-61
SLIDE 61

Experimental Results

Interval [0, β] Drug Discovery Protein Interaction Prediction Interval [α, β] KDD Cup 2008 Breast Cancer Detection

slide-62
SLIDE 62

Conclusions

  • A new support vector algorithm for optimizing

partial AUC

  • Efficient algorithm for solving the inner

combinatorial optimization step

  • Experimental results confirm the efficacy of the

algorithm

slide-63
SLIDE 63

Conclusions

  • A new support vector algorithm for optimizing

partial AUC

  • Efficient algorithm for solving the inner

combinatorial optimization step

  • Experimental results confirm the efficacy of the

algorithm

  • Future work:

– Characterize upper bound on partial AUC? – Tighter upper bound on partial AUC? – Statistical consistency?