Harikrishna Narasimhan
Department of Computer Science and Automation Indian Institute of Science, Bangalore
Support Vector Algorithms for Optimizing the Partial Area Under the ROC Curve
Joint work with Shivani Agarwal, IISc; Mitra Biotech team
allspammedup.com pascal-network.org allspammedup.com - - PowerPoint PPT Presentation
Support Vector A lgorithms for Optimizing the Partial A rea Under the ROC Curve Harikrishna Narasimhan Department of Computer Science and Automation Indian Institute of Science, Bangalore Joint work with Shivani Agarwal, IISc; Mitra Biotech team
Department of Computer Science and Automation Indian Institute of Science, Bangalore
Joint work with Shivani Agarwal, IISc; Mitra Biotech team
allspammedup.com
allspammedup.com pascal-network.org
allspammedup.com pascal-network.org fusionsedge.com
allspammedup.com
pascal-network.org fusionsedge.com
Ful ull l AUC UC
Vs
Ful ull l AUC UC Parti tial al AUC UC
http://en.wikipedia.org/
http://en.wikipedia.org/
KDD Cup 2008
― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……
http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp
― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― ……
http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp
New support vector method for directly optimizing the partial AUC measure
Narasimhan, H. and Agarwal, S. “A structural SVM based approach for optimizing partial AUC”, ICML 2013.
New support vector method for directly optimizing the partial AUC measure
Narasimhan, H. and Agarwal, S. “A structural SVM based approach for optimizing partial AUC”, ICML 2013.
Based on an earlier structural SVM based approach for full AUC optimization (Joachims 2005; 2006)
False e Positive ive Rate True Positiv ive e Rate
Positive Instances Negative Instances ……..
x1
+
x2
+
x3
+
xm
+
……..
x1
Set
Positive Instances Negative Instances ……..
x1
+
x2
+
x3
+
xm
+
……..
x1
Set
GOAL?
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Assigned by score model
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Area Under the ROC Curve (AUC) Joachims (2005)
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives Area Under the ROC Curve (AUC) Joachims (2005) Partial AUC
+ + + + + + – – – – – – False Positives True Positives
– – – – – – + + + + + + False Positives True Positives
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives
20 15 14 13 11 9 8 6 5 3 2 False Positives True Positives
False e Positive ive Rate True Positiv ive e Rate
B A
Higher score to A than B
Score Model
Score Model
Score Model
False Positives True Positives
Score Model
Score Model
Score Model
GOAL?
Ordering of examples in training set 1 1 1 1 1 1 1 1 m n
Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL
Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL
Ordering of examples in training set 1 1 1 1 1 1 1 1 m n compared with IDEAL pAUC Loss Upper Bound on (1 – pAUC)
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Break down!
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Break down!
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
1 1 1 1 1 1 1 1 1 1 Full AUC
Break down!
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Full AUC Partial AUC
Optimize rows independently
Break down!
Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Full AUC Partial AUC
Optimize rows independently
ICML, 2013. Can be implemented in O((m+n) log (m+n)) time complexity
– Full AUC Optimization (Joachims, 2005)
Vs
– Full AUC Optimization (Joachims, 2005) – Asymmetric SVM (Wu et al., 2008) – Boosting Style Method (Komori & Eguchi, 2010) – Greedy Heuristic Method (Ricamato & Tortorella, 2011)
Vs
Drug Discovery
50 active compounds / 2092 inactive compounds Partial AUC in [0, 0.1] SVMpAUC 65.25 SVM-AUC 62.64 ASVM 63.80 pAUCBoost 43.89 Greedy Heuristic 8.33
Interval [0, 0.1]
Protein-Protein Interaction Prediction
~3x103 interacting pairs / ~2x105 non-interacting pairs Partial AUC in [0, 0.1] SVMpAUC 51.79 SVM-AUC 39.72 ASVM 44.51 pAUCBoost 48.65 Greedy Heuristic 47.33
Interval [0, 0.1]
Interval [α, β]
KDD Cup 2008 Breast Cancer Detection
~600 malignant ROIs / ~105 benign ROIs Partial AUC in [0.2s, 0.3s] SVMpAUC 51.44 SVM-AUC 50.50 pAUCBoost 48.06 Greedy Heuristic 46.99
Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Run un Time Ana naly lysi sis
Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Run un Time Ana naly lysi sis
Time taken per iteration Total number of iterations
Total number of iterations
Run un Time Ana naly lysi sis
Time taken per iteration
Run un Time Ana naly lysi sis
Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for
Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for
– Improved Accuracy – Better Run-time Guarantee
Narasimhan, H. and Agarwal, S. “SVM_pAUC^tight: A new support vector method for
False e Positive ive Rate True Positiv ive e Rate
Pre-dose Post-dose CR NR PR
Pre-dose Post-dose
Courtesy: Mitra Biotech
Pre-dose Post-dose CR NR PR
Complete Response
Pre-dose Post-dose
Courtesy: Mitra Biotech
Pre-dose Post-dose CR NR PR
Complete Response Partial Response
Pre-dose Post-dose
Courtesy: Mitra Biotech
Pre-dose Post-dose CR NR PR
Complete Response Partial Response No Response
Pre-dose Post-dose
Courtesy: Mitra Biotech
Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N
Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N Complete Response (CR) Partial Response (PR) No Response (NR)
100
Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N
Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.
Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N
Tumor Ecosystem
Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.
Drug therapy 1 Drug therapy 2 Drug therapy 3 . . . . Drug therapy N
Tumor Ecosystem
Anti-cancer drug Features
Majumder, B., Radhakrishnan, P., Narasimhan, H., et al. “Predicting anti-cancer drug response using heterogeneous tumor ecosystems”. In preparation.
Responders predicted as responders
Non-responders predicted as responders Responders predicted as responders
– Head-and-neck cancer – Colorectal cancer
– False positive rate within 25% – True positive rate: 100%
– Head-and-neck cancer – Colorectal cancer
– False positive rate within 25% – True positive rate: 100%
NR PR CR NR PR CR
Training Data Predicted Actual 36 11 36 5 17 2 2
NR PR CR NR PR CR
Test Data Predicted Actual 22 1 1 17 4 9 1
False e Positive ive Rate True Positiv ive e Rate
http://www.tagxedo.com
Shivani Agarwal Harish Guruprasad Ramasamy Siddarth Ramamohan Arun Rajkumar Rohit Vaish Arpit Agarwal Saneem Ahmed Suprovat Ghoshal Aadirupa Saha
Pradip K. Majumder, Mitra Biotech, Bangalore Biswanath Majumder, Mitra Biotech, Bangalore Padhma Radhakrishnan, Mitra Biotech, Bangalore Shiladitya Sengupta, Harvard Medical School, Boston Mallikarjun Sundaram, Mitra Biotech, Bangalore