Fast Training of Support Vector Machines for Survival Analysis - PowerPoint PPT Presentation

Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Pölsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universität München, Munich, Germany 2: Johns Hopkins University, Baltimore MD, USA

Survival Analysis ● Objective : to establish a connection between covariates and the time between the start of the study and an event. ● Possible formulation: Rank subjects according to observed survival time. ● Usually, parts of survival data can only be partially observed – they are censored . ● Survival data consists of n triplets: – a d -dimensional feature vector – time of event or time of censoring – event indicator Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 2

Right Censoring A Lost A Lost † † B B Patients End of Study C Dropped out C Dropped out † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 3

Right Censoring A Lost A Lost Incomparable † † B B Patients End of Study C Dropped out C Dropped out † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 4

Right Censoring A Lost A Lost Incomparable † † B B Patients End of Study C Dropped out C Dropped out Comparable † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 5

Overview ● Problem : – Naive training algorithms for linear Survival Support Vector Machines require O(n 4 ) time and O(n 2 ) space (Van Belle et al., 2007; Evers et al., 2008). ● Proposed Solution : – Perform optimization in the primal using truncated Newton optimization. – Use order statistics trees to lower time and space requirements. – Approach extends to hybrid ranking-regression objective function as well as non-linear Survival SVM. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 6

Survival SVM ● Objective function depends on a quadratic number of pairwise comparisons ● Closely related to RankSVM (Herbrich et al., 2000), where ● Ties in survival time are not common, i.e., number of relevance levels r for RankSVM is O(n) . Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 7

Related Work – Survival SVM ● Van Belle et al., 2007: Explicitly construct all pairwise comparisons of samples to transform ranking problem into classifjcation problem and use standard dual SVM solver. ● Van Belle et al., 2008: Reduces number of samples n by clustering data according to survival times using k - nearest neighbor search. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 8

Related Work – Rank SVM ● Airola et al., 2011: Combines cutting plane optimization with red-black tree based approach to subgradient calculations. ● Lee et al., 2014: Combines truncated Newton optimization with order statistics trees to compute gradient and Hessian. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 9

The Objective Function (1) ● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros. ● denotes the number of support vectors: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 10

The Objective Function (2) is a sparse matrix with each row having one entry ● that is 1, one entry that is -1, and the remainder all zeros. ● Example: ● Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 11

Truncated Newton Optimization ● Problem : Explicitly storing the Hessian matrix can be prohibitive for high-dimensional survival data. ● Proposed Solution : – Optimization in primal. – Avoid constructing Hessian matrix by using truncated Newton optimization, which only requires computation of Hessian-vector product: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 13

Calculation of Search Direction (1) ● In each iteration of Newton's method, has to be recomputed due to its dependency on Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 14

Calculation of Search Direction (2) ● In each iteration of Newton's method, has to be recomputed due to its dependency on Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 15

Calculation of Search Direction (3) ● can compactly be expressed as: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 16

Calculation of Search Direction (4) ● Assume that have been computed. ● Hessian-vector product can be computed in instead of Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 17

Order Statistics Trees ● Problem : Order depends on survival times and predicted scores ● Solution : – Sort survival data according to . – Incrementally add and to an order statistics tree (balanced binary search tree). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 18

Order Statistics Trees 6 5 9 (censored) 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 19

Order Statistics Trees 6 5 8 (censored) 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 20

Order Statistics Trees 6 5 8 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 21

Order Statistics Trees 6 5 8 (censored) 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 22

Order Statistics Trees 6 5 8 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 23

Order Statistics Trees 6 5 8 2 7 9 1 3 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 24

Effjcient Hessian-vector Product ● Before : Hessian-vector product required ● Now : After sorting according to predicted scores, can be obtained in ● Hessian-vector product does not require constructing matrix of size anymore ● Hessian-vector product requires Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 26

Overall Complexity ● Time complexity : ● Space complexity : No need to explicitly construct all pairwise differences. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 27

Training Time (in seconds) dataset size = 1,000 dataset size = 2,500 dataset size = 5,000 dataset size = 10,000 0.35 1.4 7 40 1.4 38.5 0.3 6.7 35 0.30 1.2 6 0.3 30 0.25 1.0 5 Time in seconds 0.2 25 0.20 0.8 4 20 0.7 0.15 0.6 0.7 3 0.6 3.2 0.2 15 15.9 0.10 0.4 2 10 0.05 0.2 1 1.2 1.2 5 4.4 4.5 0.00 0.0 0 0 dataset size = 20,000 dataset size = 100,000 dataset size = 1,000,000 160 40 450 441.7 Method 152.1 424.2 400 140 35 35.8 34.8 Proposed (Red-black tree) 350 Insufficient Insufficient 120 30 Proposed (AVL tree) Time in seconds 300 100 25 Improved 250 Memory Memory Simple 80 20 200 68.8 60 15 150 40 10 100 20 5 50 0 0 0 6.4 5.5 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 28

Extensions ● Non-linear Survival SVM – Transform data with Kernel PCA before training in primal (Chapelle & Keerthi, 2009). ● Hybrid ranking-regression – Ranking approach cannot be used to predict exact time of event. – Use objective function that combines ranking and regression loss. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 29

Conclusion ● Time complexity could be lowered from to ● Space complexity reduces from to ● Same optimization scheme can be applied to non-linear Survival SVM and hybrid ranking-regression. ● Implementation is available online at https://github.com/tum-camp. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 30

Fast Training of Support Vector Machines for Survival Analysis - PowerPoint PPT Presentation

Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Plsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universitt Mnchen, Munich, Germany 2: Johns Hopkins University,

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Rule-Based Classification Johannes Frnkranz Knowledge Engineering Group TU Darmstadt

On Feature Selection, Bias-Variance, and Bagging Art Munson 1 Rich Caruana 2 1 Department of

50 Ways to Tweak Your Paper Some Comments on Paper Writing and Reviewing Johannes Frnkranz TU

Where The Web Is Going @jaredthenerd jaredthenerd.com Where The Web Is Going by Jared Faris is

Functional Bid Landscape Forecasting for Display Advertising Yuchen Wang 1 Kan Ren 1 Weinan Zhang

Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric Gaudel and Jrmie

Machine Learning Techniques Alejandro Bellogn, Ivn Cantador, Pablo Castells, lvaro Ortigosa

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and

Fast Training of Support Vector Machines for Survival Analysis - PowerPoint PPT Presentation

Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Plsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universitt Mnchen, Munich, Germany 2: Johns Hopkins University,

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Rule-Based Classification Johannes Frnkranz Knowledge Engineering Group TU Darmstadt

On Feature Selection, Bias-Variance, and Bagging Art Munson 1 Rich Caruana 2 1 Department of

50 Ways to Tweak Your Paper Some Comments on Paper Writing and Reviewing Johannes Frnkranz TU

Where The Web Is Going @jaredthenerd jaredthenerd.com Where The Web Is Going by Jared Faris is

Functional Bid Landscape Forecasting for Display Advertising Yuchen Wang 1 Kan Ren 1 Weinan Zhang

Clustering Rankings in the Fourier Domain Stphan Clmenon and Romaric Gaudel and Jrmie

Machine Learning Techniques Alejandro Bellogn, Ivn Cantador, Pablo Castells, lvaro Ortigosa

Experiments in Value Function Approximation with Sparse Support Vector Regression Tobias Jung and

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David