fast training of support vector machines for survival
play

Fast Training of Support Vector Machines for Survival Analysis - PowerPoint PPT Presentation

Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Plsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universitt Mnchen, Munich, Germany 2: Johns Hopkins University,


  1. Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Pölsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universität München, Munich, Germany 2: Johns Hopkins University, Baltimore MD, USA

  2. Survival Analysis ● Objective : to establish a connection between covariates and the time between the start of the study and an event. ● Possible formulation: Rank subjects according to observed survival time. ● Usually, parts of survival data can only be partially observed – they are censored . ● Survival data consists of n triplets: – a d -dimensional feature vector – time of event or time of censoring – event indicator Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 2

  3. Right Censoring A Lost A Lost † † B B Patients End of Study C Dropped out C Dropped out † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 3

  4. Right Censoring A Lost A Lost Incomparable † † B B Patients End of Study C Dropped out C Dropped out † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 4

  5. Right Censoring A Lost A Lost Incomparable † † B B Patients End of Study C Dropped out C Dropped out Comparable † † D D E E 2 4 6 8 10 12 1 2 3 4 5 6 Time in months Time since enrollment in months ● Only events that occur while the study is running can be recorded (records are uncensored ). ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored ). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 5

  6. Overview ● Problem : – Naive training algorithms for linear Survival Support Vector Machines require O(n 4 ) time and O(n 2 ) space (Van Belle et al., 2007; Evers et al., 2008). ● Proposed Solution : – Perform optimization in the primal using truncated Newton optimization. – Use order statistics trees to lower time and space requirements. – Approach extends to hybrid ranking-regression objective function as well as non-linear Survival SVM. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 6

  7. Survival SVM ● Objective function depends on a quadratic number of pairwise comparisons ● Closely related to RankSVM (Herbrich et al., 2000), where ● Ties in survival time are not common, i.e., number of relevance levels r for RankSVM is O(n) . Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 7

  8. Related Work – Survival SVM ● Van Belle et al., 2007: Explicitly construct all pairwise comparisons of samples to transform ranking problem into classifjcation problem and use standard dual SVM solver. ● Van Belle et al., 2008: Reduces number of samples n by clustering data according to survival times using k - nearest neighbor search. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 8

  9. Related Work – Rank SVM ● Airola et al., 2011: Combines cutting plane optimization with red-black tree based approach to subgradient calculations. ● Lee et al., 2014: Combines truncated Newton optimization with order statistics trees to compute gradient and Hessian. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 9

  10. The Objective Function (1) ● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros. ● denotes the number of support vectors: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 10

  11. The Objective Function (2) is a sparse matrix with each row having one entry ● that is 1, one entry that is -1, and the remainder all zeros. ● Example: ● Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 11

  12. Truncated Newton Optimization ● Problem : Explicitly storing the Hessian matrix can be prohibitive for high-dimensional survival data. ● Proposed Solution : – Optimization in primal. – Avoid constructing Hessian matrix by using truncated Newton optimization, which only requires computation of Hessian-vector product: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 13

  13. Calculation of Search Direction (1) ● In each iteration of Newton's method, has to be recomputed due to its dependency on Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 14

  14. Calculation of Search Direction (2) ● In each iteration of Newton's method, has to be recomputed due to its dependency on Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 15

  15. Calculation of Search Direction (3) ● can compactly be expressed as: Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 16

  16. Calculation of Search Direction (4) ● Assume that have been computed. ● Hessian-vector product can be computed in instead of Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 17

  17. Order Statistics Trees ● Problem : Order depends on survival times and predicted scores ● Solution : – Sort survival data according to . – Incrementally add and to an order statistics tree (balanced binary search tree). Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 18

  18. Order Statistics Trees 6 5 9 (censored) 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 19

  19. Order Statistics Trees 6 5 8 (censored) 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 20

  20. Order Statistics Trees 6 5 8 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 21

  21. Order Statistics Trees 6 5 8 (censored) 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 22

  22. Order Statistics Trees 6 5 8 2 9 1 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 23

  23. Order Statistics Trees 6 5 8 2 7 9 1 3 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 24

  24. Effjcient Hessian-vector Product ● Before : Hessian-vector product required ● Now : After sorting according to predicted scores, can be obtained in ● Hessian-vector product does not require constructing matrix of size anymore ● Hessian-vector product requires Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 26

  25. Overall Complexity ● Time complexity : ● Space complexity : No need to explicitly construct all pairwise differences. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 27

  26. Training Time (in seconds) dataset size = 1,000 dataset size = 2,500 dataset size = 5,000 dataset size = 10,000 0.35 1.4 7 40 1.4 38.5 0.3 6.7 35 0.30 1.2 6 0.3 30 0.25 1.0 5 Time in seconds 0.2 25 0.20 0.8 4 20 0.7 0.15 0.6 0.7 3 0.6 3.2 0.2 15 15.9 0.10 0.4 2 10 0.05 0.2 1 1.2 1.2 5 4.4 4.5 0.00 0.0 0 0 dataset size = 20,000 dataset size = 100,000 dataset size = 1,000,000 160 40 450 441.7 Method 152.1 424.2 400 140 35 35.8 34.8 Proposed (Red-black tree) 350 Insufficient Insufficient 120 30 Proposed (AVL tree) Time in seconds 300 100 25 Improved 250 Memory Memory Simple 80 20 200 68.8 60 15 150 40 10 100 20 5 50 0 0 0 6.4 5.5 Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 28

  27. Extensions ● Non-linear Survival SVM – Transform data with Kernel PCA before training in primal (Chapelle & Keerthi, 2009). ● Hybrid ranking-regression – Ranking approach cannot be used to predict exact time of event. – Use objective function that combines ranking and regression loss. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 29

  28. Conclusion ● Time complexity could be lowered from to ● Space complexity reduces from to ● Same optimization scheme can be applied to non-linear Survival SVM and hybrid ranking-regression. ● Implementation is available online at https://github.com/tum-camp. Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 30

Recommend


More recommend