Incorporating Grouping Information Into Bayesian Decision Tree - - PowerPoint PPT Presentation

incorporating grouping information into bayesian decision
SMART_READER_LITE
LIVE PREVIEW

Incorporating Grouping Information Into Bayesian Decision Tree - - PowerPoint PPT Presentation

Incorporating Grouping Information Into Bayesian Decision Tree Ensembles Junliang Du Antonio R. Linero 1 / 7 Grouping Structures Common scenarios: omics, with groups corresponding to groups of genes or groups of SNPs. 1 / 7 Additive Models


slide-1
SLIDE 1

Incorporating Grouping Information Into Bayesian Decision Tree Ensembles

Junliang Du Antonio R. Linero

1 / 7

slide-2
SLIDE 2

Grouping Structures

Common scenarios: omics, with groups corresponding to groups

  • f genes or groups of SNPs.

1 / 7

slide-3
SLIDE 3

Additive Models

Assume target f(x) decomposes additively as f(x) =

m

  • t=1

g(x; Tt, Mt), for some adaptively chosen basis functions g(x; Tt, Mt). BART: basis functions are decision trees; similar in many respects to gradient boosting + decision trees.

2 / 7

slide-4
SLIDE 4

Variable Importance

Define the variable importance sj of predictor j as Pr(a given decision rule uses predictor j). For example, the probability of splitting on x2 and x3 in this tree is s2 · s3. Near sparse s = ⇒ small subset of predictors used.

3 / 7

slide-5
SLIDE 5

Overlapping Group BART

LDA-like model: Sampling predictor j arises by

  • 1. sampling a group according to π; and
  • 2. sampling a predictor-within-group according to wg.

4 / 7

slide-6
SLIDE 6

Overlapping Group BART

LDA-like model: Sampling predictor j arises by

  • 1. sampling a group according to π; and
  • 2. sampling a predictor-within-group according to wg.

Set s = Wπ, π ∈ SG−1, wg ∈ SP−1.

4 / 7

slide-7
SLIDE 7

Overlapping Group BART

LDA-like model: Sampling predictor j arises by

  • 1. sampling a group according to π; and
  • 2. sampling a predictor-within-group according to wg.

Set s = Wπ, π ∈ SG−1, wg ∈ SP−1. Incorporate grouping information into sparsity pattern of wg = (wg1, . . . , wgP ). Sparsity inducing prior on π and wg = ⇒ bi-level selection!

4 / 7

slide-8
SLIDE 8

Simulation Studies

Nonparametric ground truth (one relevant group, 5 relevant predictors, 50 members of group, 500 predictors).

FP RMSE F1 FN 2.5 5.0 7.5 10.0 2.5 5.0 7.5 10.0 1 2 3 1 2 3 4 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5

σ

GB-Correct(1,1) GB-Correct(10,10) GB-Wrong(1,1) GB-Wrong(10,10) SB

5 / 7

slide-9
SLIDE 9

Breast Cancer Data

Cross validation suggests encouraging performance on breast cancer dataset of Van De Vijver et al. (2002) (classification of metastatic/non-metastatic tumors) Method Average Heldout Deviance OG-BART 620 SBART 646 (0.005) OG-Lasso 797 (< 0.0001) cMCP 698 (0.014)

6 / 7

slide-10
SLIDE 10

Thanks!

7 / 7

slide-11
SLIDE 11

Bleich, J., Kapelner, A., George, E. I., and Jensen, S. T. (2014). Variable selection for BART: An application to gene

  • regulation. The Annals of Applied Statistics, 8(3):1750–1781.

Van De Vijver, M. J., He, Y. D., Van’t Veer, L. J., Dai, H., Hart, A. A., Voskuil, D. W., Schreiber, G. J., Peterse, J. L., Roberts, C., and Marton, M. J. (2002). A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347(25):1999–2009.

7 / 7