Introduction to Survival Analysis Kan Ren Apex Data and Knowledge - - PowerPoint PPT Presentation

introduction to survival analysis
SMART_READER_LITE
LIVE PREVIEW

Introduction to Survival Analysis Kan Ren Apex Data and Knowledge - - PowerPoint PPT Presentation

Introduction to Survival Analysis Kan Ren Apex Data and Knowledge Management Lab Shanghai Jiao Tong University Seminar Tutorial at Apex Lab Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab


slide-1
SLIDE 1

Introduction to Survival Analysis

Kan Ren

Apex Data and Knowledge Management Lab Shanghai Jiao Tong University

Seminar Tutorial at Apex Lab

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 1 / 32

slide-2
SLIDE 2

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 2 / 32

slide-3
SLIDE 3

Background Probability

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 3 / 32

slide-4
SLIDE 4

Background Probability

Probability

Probability Density Function (P.D.F.): pt(t) = Pr(T = t) . (1) Cumulative distribution function (C.D.F.): wt(t) = Pr(T < t) = t pt(v)dv . (2)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 4 / 32

slide-5
SLIDE 5

Background Censored Data

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 5 / 32

slide-6
SLIDE 6

Background Censored Data

Censored Data

Right Censored Data The event happens after the observation time. E: Event; tobsv: The observe time; {(x, tobsv, e = True/False)}; {(x, TE)}, TE is the event happening log. Example Patient’s survival time. The true winning price of a bidding auction. The next visit time of the user.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 6 / 32

slide-7
SLIDE 7

Background Challenges

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 7 / 32

slide-8
SLIDE 8

Background Challenges

Challenges

Right Censorship Partially data usage: discard large data for learning. Right Censorship: only know that the event happening time is greater than the observing time window. Evaluation: proper evaluation metric is needed.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 8 / 32

slide-9
SLIDE 9

Background Challenges

Modeling Right Censored Data in Display Ads

Losing and Winning in 2nd-price Auction

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 9 / 32

slide-10
SLIDE 10

Background Challenges

Modeling Right Censored Data

Right Censored

Right Censorship As in 2nd price auction, if you lose, you only know that the market price is higher than your bidding price, which result in right censorship.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 10 / 32

slide-11
SLIDE 11

Methodology Non-parametric Models

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 11 / 32

slide-12
SLIDE 12

Methodology Non-parametric Models

Kaplan Meier Estimator

Preliminaries S(t) = Pr(t < TE): Survival rate F(t) = 1 − S(t): Failing rate. Algorithm The estimator for an individual is given by S(t) =

  • i:ti≤t
  • 1 − di

ni

  • ,

(3) where di is the number of events and ni is the total individuals at risk at time i.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 12 / 32

slide-13
SLIDE 13

Methodology Non-parametric Models

Survival Tree with Kaplan Meier Methods

Cons of KM

Corse grained, the same for all individuals. Statistcal method, cannot apply personalized forecasting.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 13 / 32

slide-14
SLIDE 14

Methodology Non-parametric Models

Survival Tree with Kaplan Meier Methods

Cons of KM

Corse grained, the same for all individuals. Statistcal method, cannot apply personalized forecasting.

Question

How to apply an appropriate clustering method for one individual?

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 13 / 32

slide-15
SLIDE 15

Methodology Non-parametric Models

Tree-based Mapping

Goal Given the auction feature x, forecast the market price distribution px(z)a.

aYuchen Wang, Kan Ren, Weinan Zhang, Yong Yu. Functional Bid Landscape

Forecasting for Display Advertising. ECML-PKDD, 2016.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 14 / 32

slide-16
SLIDE 16

Methodology Non-parametric Models

Tree-based Mapping

Methodology

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 15 / 32

slide-17
SLIDE 17

Methodology Non-parametric Models

Node Splitting

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 16 / 32

slide-18
SLIDE 18

Methodology Non-parametric Models

Node Splitting

KLD and Clustering

Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32

slide-19
SLIDE 19

Methodology Non-parametric Models

Node Splitting

KLD and Clustering

Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q. Node Splitting (one step) Divide all the category (including in this node) values into two sets, maximizing KLD between the resulted two sets.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32

slide-20
SLIDE 20

Methodology Non-parametric Models

Node Splitting

KLD and Clustering

Kullback-Leibler Divergence (KLD) A measure of the difference between two probability distributions P and Q. Node Splitting (one step) Divide all the category (including in this node) values into two sets, maximizing KLD between the resulted two sets. Algorithm Using K-Means Clustering according to KLD values.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 17 / 32

slide-21
SLIDE 21

Methodology Non-parametric Models

Node Splitting

KLD and Clustering

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 18 / 32

slide-22
SLIDE 22

Methodology Non-parametric Models

Handling Censorship

Survival Model

For winning auctions: We have the true market price value. For lost auctions: We only know our proposed bid price and know that the true market price is higher than that. Intuition Most related works focus only on the winning auctions without considering the lost auction, which contains the information to infer the true distribution. (bi, wi, mi)i=1,2,··· ,M − → (bj, dj, nj)j=1,2,··· ,N bj < bj+1, dj is number of winning auctions by bj − 1, nj is number of lost auctions by bj − 1. So

w(bx) = 1 −

  • bj <bx

nj − dj nj , p(z) = w(z + 1) − w(z). (4)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 19 / 32

slide-23
SLIDE 23

Methodology Non-parametric Models

Survival Model

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 20 / 32

slide-24
SLIDE 24

Methodology Parametric Model

Outline

1

Background Probability Censored Data Challenges

2

Methodology Non-parametric Models

Kaplan Meier Estimator Survival Tree

Parametric Model

Cox Hazard Proportional Model Deep Survival Analysis

3

Evaluation

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 21 / 32

slide-25
SLIDE 25

Methodology Parametric Model

Cox Hazard Proportional Model

Hazard Rate The rate of the event happening given not happened before. Hazard Function The function λ(t|x) to predict the hazard rate w.r.t. the covariate input x. Hazard Proportional Model The hazard function which models with the proportional relationship with the input covariate, where λ(t|x) = λ0(t) exp(h(x)). Example Linear Cox Hazard Model: h(x) = βx. Question: What if h(x) is non-linear?

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 22 / 32

slide-26
SLIDE 26

Methodology Parametric Model

Discussion

Relationship among hazard rate λ, P.D.F. function p(z), C.D.F. function S(b)

λ(b) = lim

db→0

Pr(b ≤ z ≤ b + db|z > b) db = lim

db→0

Pr(b ≤ z ≤ b + db)/Pr(z > b) db = lim

db→0

(wz(b + db) − wz(b))/S(b) db = pz(b) S(b) = −S′(b) S(b) . (5) pt(t|x) = ∂wt(t|x) ∂t = −∂S(t|x) ∂t = ∂ exp( t

0 λ(v|x)dv)

∂t = exp t λ(v|x)dv

  • λ(t|x) .

(6)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 23 / 32

slide-27
SLIDE 27

Methodology Parametric Model

Cost Function: Partial Likelihood

Likelihoodi = λ(ti|xi)

  • j:tj>ti λ(ti|xj)

= λ0(ti)eh(xi)

  • j:tj>ti λ0(ti)eh(xj)

= eh(xi)

  • j:tj>ti eh(xj) .

(7) LPL = − log

  • i:(xi,ti)

Likelihoodi = −

  • i:(xi,ti)

 h(xi) − log

  • j:tj>ti

eh(xj)   (8)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 24 / 32

slide-28
SLIDE 28

Methodology Parametric Model

Base Hazard Function

Example Weibull Distribution: λ0(t) = k

η

  • t

η

k−1 · e−(t/η)k . Question: formulation assumption; without considering x.

Figure: Probability Density Function Figure: Cumulative Distribution Function

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 25 / 32

slide-29
SLIDE 29

Methodology Parametric Model

Deep Survival Analysis

NN-based Cox Model Using deep neural network to model h(x).a b c

aFaraggi D, Simon R. A neural network model for survival data[J]. Statistics in

medicine, 1995.

bRanganath R, Perotte A, Elhadad N, et al. Deep Survival Analysis[C]//Machine

Learning for Healthcare Conference. 2016.

cLuck M, Sylvain T, Cardinal H, et al. Deep Learning for Patient-Specific Kidney

Graft Survival Analysis[J]. arXiv preprint arXiv:1705.10245, 2017.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 26 / 32

slide-30
SLIDE 30

Methodology Parametric Model

Deep Survival Analysis

Generative NN-based Survival Time Estimationa

aDeep Multi-task Gaussian Processes for Survival Analysis with Competing Risks,

NIPS 2017

fZ ∼ GP(0, KΘZ ), fT ∼ GP(0, KΘT ) Zi ∼ N(fZ(Xi), σ2

ZI),

Ti ∼ N(fT(Xi), σ2

TI)

Ti = min(T 1

i , . . . , T K i ) .

(9)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 27 / 32

slide-31
SLIDE 31

Methodology Parametric Model

Deep Survival Analysis

DeepHit (Lee et al. AAAI 2018)

DeepHit: A Deep Learning Approach to Survival Analysis with Competing Risks. Lee et al. AAAI 2018. Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 28 / 32

slide-32
SLIDE 32

Methodology Parametric Model

Deep Survival Analysis

DeepHit (Lee et al. AAAI 2018)

S(b) = P(z ≤ b|x) =

b

  • j=0

P(z = zj|x) (10)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 29 / 32

slide-33
SLIDE 33

Evaluation

Evaluation

Log-Likelihood ¯ P = − 1 N

  • (xi,zi)∈Dtest

log p′

z(zi|xi) ,

(11) where N = |Dtest| is the number of the test dataset and p′

t(t|x) is the

learned P.D.F.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 30 / 32

slide-34
SLIDE 34

Evaluation

Evaluation

Relationship between hazard and P.D.F.

λ(b) = lim

db→0

Pr(b ≤ z ≤ b + db|z > b) db = lim

db→0

Pr(b ≤ z ≤ b + db)/Pr(z > b) db = lim

db→0

(wz(b + db) − wz(b))/S(b) db = pz(b) S(b) = −S′(b) S(b) . (12) pt(t|x) = ∂wt(t|x) ∂t = −∂S(t|x) ∂t = ∂ exp( t

0 λ(v|x)dv)

∂t = exp t λ(v|x)dv

  • λ(t|x) .

(13)

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 31 / 32

slide-35
SLIDE 35

Evaluation

Evaluation

Concordance Index (C-index) Considering all possible pairs (Ti, Ei), (Tj, Ej) for i ≤ j, the C-index is calculated by considering the number of pairs correctly ordered by the model divided by the total number of admissible pairs. admissible: can be ordered in a meaningful way. (uncensored, uncensored); (uncensored, right-censored). admissible pairs.

Kan Ren (Shanghai Jiao Tong University) Introduction to Survival Analysis Seminar Tutorial at Apex Lab 32 / 32