Fast Training of Support Vector Machines for Survival Analysis - - PowerPoint PPT Presentation

fast training of support vector machines for survival
SMART_READER_LITE
LIVE PREVIEW

Fast Training of Support Vector Machines for Survival Analysis - - PowerPoint PPT Presentation

Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Plsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universitt Mnchen, Munich, Germany 2: Johns Hopkins University,


slide-1
SLIDE 1

Computer Aided Medical Procedures

Fast Training of Support Vector Machines for Survival Analysis

Sebastian Pölsterl1, Nassir Navab1,2, and Amin Katouzian1

1: Technische Universität München, Munich, Germany 2: Johns Hopkins University, Baltimore MD, USA

slide-2
SLIDE 2

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 2

Survival Analysis

  • Objective: to establish a connection between covariates

and the time between the start of the study and an event.

  • Possible formulation: Rank subjects according to
  • bserved survival time.
  • Usually, parts of survival data can only be partially
  • bserved – they are censored.
  • Survival data consists of n triplets:

– a d-dimensional feature vector – time of event or time of censoring – event indicator

slide-3
SLIDE 3

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 3

Right Censoring

2 4 6 8 10 12

Time in months End of Study A Lost B † C Dropped out D † E

1 2 3 4 5 6

Time since enrollment in months A Lost B † C Dropped out D † E

Patients

  • Only events that occur while the study is running can be

recorded (records are uncensored).

  • For individuals that remained event-free during the study

period, it is unknown whether an event has or has not

  • ccurred after the study ended (records are right censored).
slide-4
SLIDE 4

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 4

Right Censoring

  • Only events that occur while the study is running can be

recorded (records are uncensored).

  • For individuals that remained event-free during the study

period, it is unknown whether an event has or has not

  • ccurred after the study ended (records are right censored).

Incomparable

2 4 6 8 10 12

Time in months End of Study A Lost B † C Dropped out D † E

1 2 3 4 5 6

Time since enrollment in months A Lost B † C Dropped out D † E

Patients

slide-5
SLIDE 5

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 5

Right Censoring

  • Only events that occur while the study is running can be

recorded (records are uncensored).

  • For individuals that remained event-free during the study

period, it is unknown whether an event has or has not

  • ccurred after the study ended (records are right censored).

Incomparable Comparable

2 4 6 8 10 12

Time in months End of Study A Lost B † C Dropped out D † E

1 2 3 4 5 6

Time since enrollment in months A Lost B † C Dropped out D † E

Patients

slide-6
SLIDE 6

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 6

Overview

  • Problem:

– Naive training algorithms for linear Survival Support Vector

Machines require O(n4) time and O(n2) space (Van Belle et al., 2007; Evers et al., 2008).

  • Proposed Solution:

– Perform optimization in the primal using truncated Newton

  • ptimization.

– Use order statistics trees to lower time and space

requirements.

– Approach extends to hybrid ranking-regression objective

function as well as non-linear Survival SVM.

slide-7
SLIDE 7

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 7

Survival SVM

  • Objective function depends on a quadratic number of

pairwise comparisons

  • Closely related to RankSVM (Herbrich et al., 2000),

where

  • Ties in survival time are not common, i.e., number of

relevance levels r for RankSVM is O(n).

slide-8
SLIDE 8

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 8

Related Work – Survival SVM

  • Van Belle et al., 2007: Explicitly construct all pairwise

comparisons of samples to transform ranking problem into classifjcation problem and use standard dual SVM solver.

  • Van Belle et al., 2008: Reduces number of samples n by

clustering data according to survival times using k- nearest neighbor search.

slide-9
SLIDE 9

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 9

Related Work – Rank SVM

  • Airola et al., 2011: Combines cutting plane optimization

with red-black tree based approach to subgradient calculations.

  • Lee et al., 2014: Combines truncated Newton
  • ptimization with order statistics trees to compute

gradient and Hessian.

slide-10
SLIDE 10

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 10

The Objective Function (1)

  • is a sparse matrix with each row having one

entry that is 1, one entry that is -1, and the remainder all zeros.

  • denotes the number of support vectors:
slide-11
SLIDE 11

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 11

The Objective Function (2)

  • is a sparse matrix with each row having one entry

that is 1, one entry that is -1, and the remainder all zeros.

  • Example:
slide-12
SLIDE 12

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 13

Truncated Newton Optimization

  • Problem: Explicitly storing the Hessian matrix can be

prohibitive for high-dimensional survival data.

  • Proposed Solution:

– Optimization in primal. – Avoid constructing Hessian matrix by using truncated

Newton optimization, which only requires computation of Hessian-vector product:

slide-13
SLIDE 13

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 14

Calculation of Search Direction (1)

  • In each iteration of Newton's method, has to be

recomputed due to its dependency on

slide-14
SLIDE 14

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 15

Calculation of Search Direction (2)

  • In each iteration of Newton's method, has to be

recomputed due to its dependency on

slide-15
SLIDE 15

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 16

Calculation of Search Direction (3)

  • can compactly be expressed as:
slide-16
SLIDE 16

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 17

Calculation of Search Direction (4)

  • Assume that have been computed.
  • Hessian-vector product can be computed in

instead of

slide-17
SLIDE 17

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 18

Order Statistics Trees

  • Problem: Order depends on survival times and predicted

scores

  • Solution:

– Sort survival data according to . – Incrementally add and to an order statistics tree

(balanced binary search tree).

slide-18
SLIDE 18

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 19

Order Statistics Trees

6 9 5 1 (censored)

slide-19
SLIDE 19

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 20

Order Statistics Trees

6 8 5 9 2 1 (censored)

slide-20
SLIDE 20

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 21

Order Statistics Trees

6 8 5 9 2 1

slide-21
SLIDE 21

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 22

Order Statistics Trees

6 8 5 9 2 1 (censored)

slide-22
SLIDE 22

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 23

Order Statistics Trees

6 8 5 9 2 1

slide-23
SLIDE 23

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 24

Order Statistics Trees

6 8 5 9 2 1 3 7

slide-24
SLIDE 24

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 26

Effjcient Hessian-vector Product

  • Before: Hessian-vector product required
  • Now: After sorting according to predicted scores,

can be obtained in

  • Hessian-vector product does not require constructing

matrix of size anymore

  • Hessian-vector product requires
slide-25
SLIDE 25

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 27

Overall Complexity

  • Time complexity:
  • Space complexity:

No need to explicitly construct all pairwise differences.

slide-26
SLIDE 26

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 28

Training Time (in seconds)

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Time in seconds

0.3 0.3 0.2 0.2 dataset size = 1,000

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.6 0.7 0.7 1.4 dataset size = 2,500

1 2 3 4 5 6 7

1.2 1.2 3.2 6.7 dataset size = 5,000

5 10 15 20 25 30 35 40

4.4 4.5 15.9 38.5 dataset size = 10,000

20 40 60 80 100 120 140 160

Time in seconds

5.5 6.4 68.8 152.1 dataset size = 20,000

5 10 15 20 25 30 35 40

34.8 35.8 dataset size = 100,000

50 100 150 200 250 300 350 400 450

424.2 441.7 dataset size = 1,000,000 Method Proposed (Red-black tree) Proposed (AVL tree) Improved Simple

Insufficient Memory Insufficient Memory

slide-27
SLIDE 27

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 29

Extensions

  • Non-linear Survival SVM

– Transform data with Kernel PCA before training in primal

(Chapelle & Keerthi, 2009).

  • Hybrid ranking-regression

– Ranking approach cannot be used to predict exact time of

event.

– Use objective function that combines ranking and

regression loss.

slide-28
SLIDE 28

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 30

Conclusion

  • Time complexity could be lowered from to
  • Space complexity reduces from to
  • Same optimization scheme can be applied to non-linear

Survival SVM and hybrid ranking-regression.

  • Implementation is available online at

https://github.com/tum-camp.

slide-29
SLIDE 29

Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 32

Bibliography

  • Airola et al.: Training linear ranking SVMs in linearithmic time using red–black
  • trees. Pattern Recogn. Lett. 32(9), 1328–36 (2011)
  • Chapelle et al.: Effjcient algorithms for ranking with SVMs. Information

Retrieval 13(3), 201–5 (2009)

  • Evers et al.: Sparse kernel methods for high-dimensional survival data.

Bioinformatics 24(14), 1632–8 (2008)

  • Herbrich et al.: Large Margin Rank Boundaries for Ordinal Regression. In:

Advances in Large Margin Classifjers. pp. 115–32. (2000)

  • Lee et al.: Large-Scale Linear RankSVM. Neural Comput. 26(4), 781–817

(2014)

  • Van Belle et al.: Support Vector Machines for Survival Analysis. In: Proc. 3rd
  • Int. Conf. Comput. Intell. Med. Healthc. pp. 1–8 (2007)
  • Van Belle et al.: Survival SVM: a practical scalable algorithm. In: Proc. of 16th

European Symposium on Artifjcial Neural Networks. pp. 89–94 (2008)