allspammedup.com pascal-network.org allspammedup.com - - PowerPoint PPT Presentation

allspammedup com pascal network org allspammedup com
SMART_READER_LITE
LIVE PREVIEW

allspammedup.com pascal-network.org allspammedup.com - - PowerPoint PPT Presentation

Support Vector A lgorithms for Optimizing the Partial A rea Under the ROC Curve Harikrishna Narasimhan Department of Computer Science and Automation Indian Institute of Science, Bangalore Joint work with Shivani Agarwal, IISc; Mitra Biotech team


slide-1
SLIDE 1

Harikrishna Narasimhan

Department of Computer Science and Automation Indian Institute of Science, Bangalore

Support Vector Algorithms for Optimizing the Partial Area Under the ROC Curve

Joint work with Shivani Agarwal, IISc; Mitra Biotech team

slide-2
SLIDE 2

allspammedup.com

slide-3
SLIDE 3

allspammedup.com pascal-network.org

slide-4
SLIDE 4

allspammedup.com pascal-network.org fusionsedge.com

slide-5
SLIDE 5

allspammedup.com

  • ptimum7.com

pascal-network.org fusionsedge.com

slide-6
SLIDE 6

Spa pam

  • r
  • r

No Non-sp spam am?

Model

slide-7
SLIDE 7

Fal alse e Positive ive Rate True ue Positive ive Rate

1 1

Receive eiver r Ope pera rating g Charac acte teristic ic Curve ve

slide-8
SLIDE 8

Fal alse e Positive ive Rate True ue Positive ive Rate

1 1

Receive eiver r Ope pera rating g Charac acte teristic ic Curve ve Area a Und nder the ROC Curve ve (AUC)

slide-9
SLIDE 9

Partial AUC?

Ful ull l AUC UC

slide-10
SLIDE 10

Partial AUC?

Vs

Ful ull l AUC UC Parti tial al AUC UC

slide-11
SLIDE 11

Ranking

slide-12
SLIDE 12

Ranking

slide-13
SLIDE 13

Biometric Screening

slide-14
SLIDE 14

Biometric Screening

slide-15
SLIDE 15

Medical Diagnosis

http://en.wikipedia.org/

slide-16
SLIDE 16

Medical Diagnosis

http://en.wikipedia.org/

KDD Cup 2008

slide-17
SLIDE 17

Bioinformatics

― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……

http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp

slide-18
SLIDE 18

Bioinformatics

― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……

http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp

slide-19
SLIDE 19

Partial AUC Optimization

New support vector method for directly optimizing the partial AUC measure

Narasimhan, H. and Agarwal, S. “A structural SVM based approach for optimizing partial AUC”, ICML 2013.

slide-20
SLIDE 20

Partial AUC Optimization

New support vector method for directly optimizing the partial AUC measure

Narasimhan, H. and Agarwal, S. “A structural SVM based approach for optimizing partial AUC”, ICML 2013.

Based on an earlier structural SVM based approach for full AUC optimization (Joachims 2005; 2006)

slide-21
SLIDE 21

Algorithm

Application

ROC Curve & Partial AUC

False e Positive ive Rate True Positiv ive e Rate

slide-22
SLIDE 22

Setting

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

slide-23
SLIDE 23

Setting

Positive Instances Negative Instances ……..

x1

+

x2

+

x3

+

xm

+

……..

x1

  • x2
  • x3
  • xn
  • Training

Set

GOAL?

Model

slide-24
SLIDE 24

Model

Scor

  • re

e Mod

  • del

thres resho hold ld

&

Spa pam

  • r
  • r

No Non-sp spam am?

slide-25
SLIDE 25

Model

Scor

  • re

e Mod

  • del

thres resho hold ld

&

Spa pam

  • r
  • r

No Non-sp spam am?

FPR TPR

1 1

slide-26
SLIDE 26

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Assigned by score model

slide-27
SLIDE 27

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives

slide-28
SLIDE 28

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives

slide-29
SLIDE 29

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives

slide-30
SLIDE 30

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Area Under the ROC Curve (AUC) Joachims (2005)

slide-31
SLIDE 31

Receiver Operating Characteristic Curve Illustration

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Area Under the ROC Curve (AUC) Joachims (2005) Partial AUC

slide-32
SLIDE 32

Observation 1: Best ROC Curve

+ + + + + + – – – – – – False Positives True Positives

slide-33
SLIDE 33

Observation 2: Worst ROC Curve

– – – – – – + + + + + + False Positives True Positives

slide-34
SLIDE 34

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives

Observation 3: Top Fraction of Negatives

slide-35
SLIDE 35

20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives

Observation 3: Top Fraction of Negatives

Score Model

?

slide-36
SLIDE 36

Algorithm

Application

ROC Curve & Partial AUC

False e Positive ive Rate True Positiv ive e Rate

slide-37
SLIDE 37

Classification SVM

slide-38
SLIDE 38

SVM for Full AUC

B A

Higher score to A than B

slide-39
SLIDE 39

SVM for Full AUC

Score Model

slide-40
SLIDE 40

SVM for Full AUC

Score Model

slide-41
SLIDE 41

SVM for Partial AUC

Score Model

False Positives True Positives

slide-42
SLIDE 42

SVM for Partial AUC

Score Model

slide-43
SLIDE 43

SVM for Partial AUC

Score Model

slide-44
SLIDE 44

Score Model

SVM for Partial AUC Structural SVM

GOAL?

slide-45
SLIDE 45

SVM for Partial AUC

slide-46
SLIDE 46

Ordering of examples in training set 1 1 1 1 1 1 1 1 m n

SVM for Partial AUC

slide-47
SLIDE 47

Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL

SVM for Partial AUC

slide-48
SLIDE 48

Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL

SVM for Partial AUC

slide-49
SLIDE 49

Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL pAUC Loss Upper Bound on (1 – pAUC)

SVM for Partial AUC

slide-50
SLIDE 50

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

slide-51
SLIDE 51

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

slide-52
SLIDE 52

Break down!

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

slide-53
SLIDE 53

Break down!

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

1 1 1 1 1 1 1 1 1 1 Full AUC

slide-54
SLIDE 54

Break down!

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Full AUC Partial AUC

Optimize rows independently

slide-55
SLIDE 55

Break down!

Cutting-plane Solver

Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Full AUC Partial AUC

Optimize rows independently

  • H. Narasimhan and S. Agarwal. A Structural SVM Based Approach for Optimizing Partial AUC.

ICML, 2013. Can be implemented in O((m+n) log (m+n)) time complexity

slide-56
SLIDE 56

Experimental Results

  • Baseline Methods:

– Full AUC Optimization (Joachims, 2005)

Vs

slide-57
SLIDE 57

Experimental Results

  • Baseline Methods:

– Full AUC Optimization (Joachims, 2005) – Asymmetric SVM (Wu et al., 2008) – Boosting Style Method (Komori & Eguchi, 2010) – Greedy Heuristic Method (Ricamato & Tortorella, 2011)

Vs

slide-58
SLIDE 58

Experimental Results

Drug Discovery

50 active compounds / 2092 inactive compounds Partial AUC in [0, 0.1] SVMpAUC 65.25 SVM-AUC 62.64 ASVM 63.80 pAUCBoost 43.89 Greedy Heuristic 8.33

Interval [0, 0.1]

slide-59
SLIDE 59

Experimental Results

Protein-Protein Interaction Prediction

~3x103 interacting pairs / ~2x105 non-interacting pairs Partial AUC in [0, 0.1] SVMpAUC 51.79 SVM-AUC 39.72 ASVM 44.51 pAUCBoost 48.65 Greedy Heuristic 47.33

Interval [0, 0.1]

slide-60
SLIDE 60

Experimental Results

Interval [α, β]

KDD Cup 2008 Breast Cancer Detection

~600 malignant ROIs / ~105 benign ROIs Partial AUC in [0.2s, 0.3s] SVMpAUC 51.44 SVM-AUC 50.50 pAUCBoost 48.06 Greedy Heuristic 46.99

slide-61
SLIDE 61

Experimental Results

Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Run un Time Ana naly lysi sis

slide-62
SLIDE 62

Experimental Results

Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.

Run un Time Ana naly lysi sis

Time taken per iteration Total number of iterations

slide-63
SLIDE 63

Total number of iterations

Experimental Results

Run un Time Ana naly lysi sis

slide-64
SLIDE 64

Time taken per iteration

Experimental Results

Run un Time Ana naly lysi sis

slide-65
SLIDE 65

Improved Formulation

Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for

  • ptimizing partial AUC based on a tight convex upper bound”, KDD 2013.
slide-66
SLIDE 66

Improved Formulation

  • Better Formulation: Tighter Approximation

Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for

  • ptimizing partial AUC based on a tight convex upper bound”, KDD 2013.
slide-67
SLIDE 67

Improved Formulation

  • Better Formulation: Tighter Approximation

– Improved Accuracy – Better Run-time Guarantee

Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for

  • ptimizing partial AUC based on a tight convex upper bound”, KDD 2013.
slide-68
SLIDE 68

Algorithm

Application

ROC Curve & Partial AUC

False e Positive ive Rate True Positiv ive e Rate

slide-69
SLIDE 69

Pre-dose Post-dose CR NR PR

Pre-dose Post-dose

Courtesy: Mitra Biotech

slide-70
SLIDE 70

Pre-dose Post-dose CR NR PR

Complete Response

Pre-dose Post-dose

Courtesy: Mitra Biotech

slide-71
SLIDE 71

Pre-dose Post-dose CR NR PR

Complete Response Partial Response

Pre-dose Post-dose

Courtesy: Mitra Biotech

slide-72
SLIDE 72

Pre-dose Post-dose CR NR PR

Complete Response Partial Response No Response

Pre-dose Post-dose

Courtesy: Mitra Biotech

slide-73
SLIDE 73

Predicting Anticancer Drug Response

Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N

slide-74
SLIDE 74

Predicting Anticancer Drug Response

Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N Complete Response (CR) Partial Response (PR) No Response (NR)

100

slide-75
SLIDE 75

Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N

Predicting Anticancer Drug Response

Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.

slide-76
SLIDE 76

Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N

Tumor Ecosystem

Predicting Anticancer Drug Response

Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.

slide-77
SLIDE 77

Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N

Tumor Ecosystem

Anti-cancer drug Features

Predicting Anticancer Drug Response

Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.

slide-78
SLIDE 78

Predicting Anticancer Drug Response

Responders predicted as responders

slide-79
SLIDE 79

Predicting Anticancer Drug Response

Non-responders predicted as responders Responders predicted as responders

slide-80
SLIDE 80

Experimental Results

  • 164 patients:

– Head-and-neck cancer – Colorectal cancer

  • Classifier built by thresholding learnt scoring function:

– False positive rate within 25% – True positive rate: 100%

slide-81
SLIDE 81

Experimental Results

  • 164 patients:

– Head-and-neck cancer – Colorectal cancer

  • Classifier built by thresholding learnt scoring function:

– False positive rate within 25% – True positive rate: 100%

NR PR CR NR PR CR

Training Data Predicted Actual 36 11 36 5 17 2 2

NR PR CR NR PR CR

Test Data Predicted Actual 22 1 1 17 4 9 1

slide-82
SLIDE 82

Algorithm

Application

ROC Curve & Partial AUC

False e Positive ive Rate True Positiv ive e Rate

slide-83
SLIDE 83

Perfor

  • rmance

mance Meas asures es Algorit rithms hms

?

slide-84
SLIDE 84

http://www.tagxedo.com

slide-85
SLIDE 85

Acknowledgements

Machine Learning and Learning Theory Group @ IISc

Shivani Agarwal Harish Guruprasad Ramasamy Siddarth Ramamohan Arun Rajkumar Rohit Vaish Arpit Agarwal Saneem Ahmed Suprovat Ghoshal Aadirupa Saha

slide-86
SLIDE 86

Acknowledgements

Pradip K. Majumder, Mitra Biotech, Bangalore Biswanath Majumder, Mitra Biotech, Bangalore Padhma Radhakrishnan, Mitra Biotech, Bangalore Shiladitya Sengupta, Harvard Medical School, Boston Mallikarjun Sundaram, Mitra Biotech, Bangalore

Collaborators

slide-87
SLIDE 87

Questions?