BoXH BoXHED: Boosted eXact Hazard Estimator with Dynamic covariates
With Donald K.K. Lee (Emory U.), Bobak J. Mortazavi (TAMU), Arash Pakbin (TAMU), Hongyu Zhao (Yale U.)
Xiaochen Wang Yale University
BoXHED : B oosted e X act H azard E stimator with D ynamic covariates - - PowerPoint PPT Presentation
BoXH BoXHED : B oosted e X act H azard E stimator with D ynamic covariates Xiaochen Wang Yale University With Donald K.K. Lee (Emory U.), Bobak J. Mortazavi (TAMU), Arash Pakbin (TAMU), Hongyu Zhao (Yale U.) Motivation Dynamic Features in
With Donald K.K. Lee (Emory U.), Bobak J. Mortazavi (TAMU), Arash Pakbin (TAMU), Hongyu Zhao (Yale U.)
Xiaochen Wang Yale University
High frequency health vitals in ICU
Longitudinal data from clinical studies High frequency health vitals in ICU
Longitudinal data from clinical studies High frequency health vitals in ICU Mobile data and wearables devices
Behavioral data in financial risk assessment
Ranganath et al. 16; Bellot & van der Schaar 18, 19; Lee et al. 19)
Ranganath et al. 16; Bellot & van der Schaar 18, 19; Lee et al. 19)
dependent features. https://github.com/BoXHED
hazards with time-dependent covariates” (2017)
Each participant 𝑗 is represented by a triplet (𝑌$ 𝑢 &∈ (,*+ , Δ$, 𝑈$).
instance.
𝑈$ = 2 Event bme if Δ$ = 1 Censoring bme if Δ$ = 0 Goal: Given above information of 𝑜 participants, we want to estimate log-hazard function 𝐺 𝑢, 𝑦 .
𝑆 𝐺 = 1 𝑜 :
$;< =
>
( *+
𝑓@(&,A+ & )𝑒𝑢 − Δ$𝐺(𝑈
$, 𝑌$ 𝑈 $ )
traditional techniques. Solution provided in Lee, Chen, Ishwaran 17.
Time X
Candidate splits on time and feature
Tree Construction Demo
Trajectory Xi(t)
Time X
Candidate splits on time and feature
Tree Construction Demo
Trajectory Xi(t)
Time X
Candidate splits on time and feature
Tree Construction Demo
𝐵< 𝐵G
Trajectory Xi(t) What’s the risk reduction if we split here?
Time X
Candidate splits on time and feature
Tree Construction Demo
𝑒 = ∑I;<
G
𝑊
I 1 + log OP QP − (𝑊 < + 𝑊 G)(1 + log ORSOT Q
RSQ T) , where
𝑉I = ∑$;<
=
∫
( *+ 𝑓@
W &,A+ & 𝐽YP 𝑢, 𝑌$ 𝑢
𝑒𝑢, 𝑊
I = # 𝑝𝑔 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑓𝑤𝑓𝑜𝑢𝑡 𝑗𝑜 𝐵I.
𝐵< 𝐵G
Trajectory Xi(t)
𝑉<
Time X
Candidate splits on time and feature
Tree Construction Demo
𝑒 = ∑I;<
G
𝑊
I 1 + log OP QP − (𝑊 < + 𝑊 G)(1 + log ORSOT Q
RSQ T) , where
𝑉I = ∑$;<
=
∫
( *+ 𝑓@
W &,A+ & 𝐽YP 𝑢, 𝑌$ 𝑢
𝑒𝑢, 𝑊
I = # 𝑝𝑔 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑓𝑤𝑓𝑜𝑢𝑡 𝑗𝑜 𝐵I.
𝐵< 𝐵G
Trajectory Xi(t)
𝑉<
Time X
Candidate splits on time and feature
Tree Construction Demo
𝑒 = ∑I;<
G
𝑊
I 1 + log OP QP − (𝑊 < + 𝑊 G)(1 + log ORSOT Q
RSQ T) , where
𝑉I = ∑$;<
=
∫
( *+ 𝑓@
W &,A+ & 𝐽YP 𝑢, 𝑌$ 𝑢
𝑒𝑢, 𝑊
I = # 𝑝𝑔 𝑝𝑐𝑡𝑓𝑠𝑤𝑓𝑒 𝑓𝑤𝑓𝑜𝑢𝑡 𝑗𝑜 𝐵I.
𝐵< 𝐵G
Trajectory Xi(t)
𝑉<
Four hazard functions (Pérez et al. 13) 0, 20, and 40 irrelevant features from standard normal distribution are added to above four hazards.
𝜇< 𝑢, 𝑦& = 𝐶𝑓𝑢𝑏 𝑢, 2, 2 ×𝐶𝑓𝑢𝑏 𝑦&, 2, 2 , 𝑢 ∈ 0, 1 ; 𝜇G 𝑢, 𝑦& = 𝐶𝑓𝑢𝑏 𝑢, 4, 4 ×𝐶𝑓𝑢𝑏 𝑦&, 4, 4 , 𝑢 ∈ 0, 1 ; 𝜇h 𝑢, 𝑦& = 1 𝑢 𝜚(log 𝑢 − 𝑦&) Φ(𝑦& − log 𝑢) , 𝑢 ∈ 0, 5 ; 𝜇l 𝑢, 𝑦& = 3 2 𝑢(.n exp − 1 2 cos 2𝜌𝑦& − 3 2 , 𝑢 ∈ 0, 5 .
Can handle time- dependent features? Nonparametric? Variable selection Parameter tuning BoXHED √ √ √ Cross-validated on training data Kernel Smoothing √ √ Kernel bandwidth tuned directly to test data FlexSurv √ √ Best parametric family for test data Black-boost √ Best parametric family and #iterations for test data
RMSE error with 95% confidence interval.
RMSE error with 95% confidence interval.
The kernel function is a beta density, resembling 𝜇< and 𝜇G.
RMSE error with 95% confidence interval.
The kernel function is a beta density, resembling 𝜇< and 𝜇G. flexsurv is correctly specified for 𝜇h (log-normal distribution)
AUC versus time 𝑢 for the estimators when applied to data simulated from 𝜇<. Larger AUC values are better. Left: No irrelevant covariates; right: 20 irrelevant covariates.
2017.
every two years.
pressure (DBP), total cholesterol (TC), smoking, diabetes, and BMI.
Conflicting clinical literature on how SBP affects CVD risk.
BoXHED identified novel interaction effects that may partially explain these conflicting findings.
responsible for the reported clinical findings on SBP and CVD risk.
conventional odds ratio analyses.
estimation that is
findings on CVD risk in clinical literature.
https://github.com/BoXHED