Reproducibility and Cross-study Replicability of Prognostic - - PowerPoint PPT Presentation

reproducibility and cross study replicability of
SMART_READER_LITE
LIVE PREVIEW

Reproducibility and Cross-study Replicability of Prognostic - - PowerPoint PPT Presentation

Reproducibility and Cross-study Replicability of Prognostic Signatures from High Throughput Genomic Data giovanni parmigiani@dfci.harvard.edu 2014 Rutgers Statistics Symposium signatures and prognosis Sorlie PNAS 03 CuratedOvarianData


slide-1
SLIDE 1

Reproducibility and Cross-study Replicability

  • f Prognostic Signatures

from High Throughput Genomic Data

giovanni parmigiani@dfci.harvard.edu 2014 Rutgers Statistics Symposium

slide-2
SLIDE 2

signatures and prognosis

Sorlie PNAS 03

slide-3
SLIDE 3

CuratedOvarianData

Ganzfried etal 2012

slide-4
SLIDE 4

comprehensive signature validation

Waldron etal 2014

slide-5
SLIDE 5

cross-study performance matrix

Hernandez10 Konstantinopoulos10 Kang12 Denkert09 Sabatier11 Bonome08_572genes Mok09 Bonome08_263genes Bentink12 Crijns09 Kernagis12 Yoshihara10 Yoshihara12 TCGA11

0.68 0.55 1.07 0.71 0.86 1.21 0.79 1.04 0.9 1.03 1.34 1.01 0.82 1.07 2.05 1.07 1 1.15 0.97 1.09 2.14 1.19 0.81 0.85 1.21 1.46 1.17 1.55 1.02 0.73 2.6 0.76 1.33 1.31 2.25 1.04 1.29 1.08 1.15 0.79 1.95 1.15 1.17 1.41 1.72 1.07 1.19 1.11 1.3 0.73 0.8 1.89 1.1 1.41 2.29 2.27 1.47 1.07 1.35 0.84 1.54 1.82 3.18 1.71 0.89 1.58 1.28 0.98 0.95 1.39 1.3 2.73 2 1.32 0.53 2.01 1.45 1.17 1.03 0.77 1.94 1.01 1.89 1.44 1.2 1.14 1.62 1.16 1.26 1.45 1.22 1.92 1.49 1.51 1.2 1.44 1.28 1.1 3.04 1.21 2.65 1.39 2.91 2.08 1.45 1.39 1.25 1.23 1.32 0.87 2.69 1.38 1.15 1.93 1.45 1.57 1.47 1.33 0.7 7.27 2.44 9.65 1.21 1.8 1.02 1.77 1.97 1.21 1.35 1.42 2.05 2.28 1.58 1.85 1.36 1.64 1.97 1.94 1.07 1.53 1.81 1.47 1.43 1.41 1.39 1.37 1.35 1.14 1.11 1.04

D r e s s m a n Y

  • s

h i h a r a 2 1 2 A M

  • k

T

  • t

h i l l K

  • n

s t a n t i n

  • p
  • u

l

  • s

B

  • n
  • m

e B e n t i n k T C G A C r i j n s Y

  • s

h i h a r a 2 1 TCGA11 Yoshihara12 Yoshihara10 Kernagis12 Crijns09 Bentink12 Bonome08_263genes Mok09 Bonome08_572genes Sabatier11 Denkert09 Kang12 Konstantinopoulos10 Hernandez10 Dataset Average Implemented Models E x p r e s s i

  • n

D a t a s e t s Validation Statistics for 14 Models in 10 Datasets

slide-6
SLIDE 6

meta-analysis of cross-study performance

Hernandez10 Konstantinopoulos10 Kang12 Denkert09 Sabatier11 Bonome08_572genes Mok09 Bonome08_263genes Bentink12 Crijns09 Kernagis12 Yoshihara10 Yoshihara12 TCGA11

0.68 0.55 1.07 0.71 0.86 1.21 0.79 1.04 0.9 1.03 1.34 1.01 0.82 1.07 2.05 1.07 1 1.15 0.97 1.09 2.14 1.19 0.81 0.85 1.21 1.46 1.17 1.55 1.02 0.73 2.6 0.76 1.33 1.31 2.25 1.04 1.29 1.08 1.15 0.79 1.95 1.15 1.17 1.41 1.72 1.07 1.19 1.11 1.3 0.73 0.8 1.89 1.1 1.41 2.29 2.27 1.47 1.07 1.35 0.84 1.54 1.82 3.18 1.71 0.89 1.58 1.28 0.98 0.95 1.39 1.3 2.73 2 1.32 0.53 2.01 1.45 1.17 1.03 0.77 1.94 1.01 1.89 1.44 1.2 1.14 1.62 1.16 1.26 1.45 1.22 1.92 1.49 1.51 1.2 1.44 1.28 1.1 3.04 1.21 2.65 1.39 2.91 2.08 1.45 1.39 1.25 1.23 1.32 0.87 2.69 1.38 1.15 1.93 1.45 1.57 1.47 1.33 0.7 7.27 2.44 9.65 1.21 1.8 1.02 1.77 1.97 1.21 1.35 1.42 2.05 2.28 1.58 1.85 1.36 1.64 1.97 1.94 1.07 1.53 1.81 1.47 1.43 1.41 1.39 1.37 1.35 1.14 1.11 1.04

D r e s s m a n Y

  • s

h i h a r a 2 1 2 A M

  • k

T

  • t

h i l l K

  • n

s t a n t i n

  • p
  • u

l

  • s

B

  • n
  • m

e B e n t i n k T C G A C r i j n s Y

  • s

h i h a r a 2 1 TCGA11 Yoshihara12 Yoshihara10 Kernagis12 Crijns09 Bentink12 Bonome08_263genes Mok09 Bonome08_572genes Sabatier11 Denkert09 Kang12 Konstantinopoulos10 Hernandez10 Dataset Average Implemented Models E x p r e s s i

  • n

D a t a s e t s Validation Statistics for 14 Models in 10 Datasets Hazard Ratio 0.89 1.00 1.12 1.26 1.41 1.58 1.78 2.00 | | | | | | | | | | | | | | author training set author test set model validation summary (95% CI)

  • excl. author test sets
slide-7
SLIDE 7

using c-stat instead

F i x e d e f f e c t s , C

  • m

B a t b a t c h c

  • r

r e c t i

  • n

R a n d

  • m

e f f e c t s , C

  • m

B a t b a t c h c

  • r

r e c t i

  • n

F i x e d e f f e c t s , D r e s s m a n e x c l u d e d F i x e d e f f e c t s R a n d

  • m

e f f e c t s R a n d

  • m

e f f e c t s , D r e s s m a n e x c l u d e d F i x e d e f f e c t s , s t r i c t s a m p l e e x c l u s i

  • n

R a n d

  • m

e f f e c t s , s t r i c t s a m p l e e x c l u s i

  • n

Hernandez10 Konstantinopoulos10 Sabatier11 Kang12 Denkert09 Bonome08_572genes Mok09 Crijns09 Bentink12 Bonome08_263genes Yoshihara10 Kernagis12 Yoshihara12 TCGA11 0.94 0.92 0.95 0.93 0.93 0.95 0.94 0.93 1.13 1.13 1.08 1.09 1.09 1.08 1.07 1.07 1.25 1.25 1.16 1.19 1.19 1.16 1.16 1.16 1.17 1.16 1.1 1.15 1.14 1.1 1.17 1.16 1.15 1.17 1.13 1.17 1.19 1.13 1.13 1.13 1.22 1.22 1.26 1.23 1.24 1.26 1.18 1.2 1.2 1.2 1.22 1.24 1.27 1.26 1.25 1.29 1.25 1.26 1.3 1.3 1.3 1.3 1.29 1.29 1.3 1.3 1.23 1.26 1.26 1.23 1.24 1.24 1.31 1.35 1.26 1.26 1.28 1.28 1.3 1.34 1.42 1.42 1.34 1.39 1.4 1.33 1.39 1.42 1.45 1.48 1.34 1.38 1.42 1.35 1.37 1.4 1.45 1.46 1.45 1.48 1.51 1.46 1.48 1.52 1.62 1.62 1.62 1.65 1.65 1.62 1.74 1.74 0.8 1.2 1.6

Value Color Key

slide-8
SLIDE 8

sensitivity analysis

slide-9
SLIDE 9

do predictors rank patients similarly?

slide-10
SLIDE 10

selection bias in choice of validation study?

slide-11
SLIDE 11

signatures

slide-12
SLIDE 12

lessons: genomic signatures

template for meta-analytic signature evaluation published ovarian cancer signatures and predictors largely withstand cross-study analysis published ovarian cancer signatures and predictors are not very clinically useful

slide-13
SLIDE 13

multi-study comparison of classification algorithms

Bernau etal 2014

No. Name Adjuvant # patients # ER+ 3Q survival Median follow-up Original Reference therapy [mo.] [mo.] identifiers ‡ 1 CAL chemo, hormonal 118 75 42 82 CAL Chin et al. (2006) 2 MNZ none 200 162 120 94 MAINZ Schmidt et al. (2008) 3 MSK combination 99 57 76 82 MSK Minn et al. (2005) 4 ST1 hormonal 512∗ 507∗ 114 106 MDA5, TAM, VDX3 Foekens et al. (2006) 5 ST2 hormonal 517 325 126 121 EXPO, TAM Symmans et al. (2010) 6 TRB none 198 134 143 171 TRANSBIG Desmedt et al. (2007) 7 UNT none 133 86 151 105 UNT Sotiriou et al. (2006) 8 VDX none 344 209 44 107 VDX Minn et al. (2007)

Table 1. Public microarray datasets of breast cancer patients as curated and summarized by Haibe-Kains et al. (2012). Datasets are referred to using the

CAL MNZ MSK SP1 SP2 TRB UNT VDX VDX UNT TRB SP2 SP1 MSK MNZ CAL

0.5 0.6 0.7 0.8

Value

5 10 20

and Histogram Count

Cross-study C-statistics, Ridge Regression

slide-14
SLIDE 14

simulations for method comparison

Bernau etal 2014

CAL MNZ MSK SP1 SP2 TRB UNT VDX VDX UNT TRB SP2 SP1 MSK MNZ CAL

0.5 0.6 0.7 0.8

Value

5 10 20

Color Key and Histogram Count

CAL MNZ MSK SP1 SP2 TRB UNT VDX VDX UNT TRB SP2 SP1 MSK MNZ CAL

0.4 0.5 0.6 0.7 0.8 0.9

Value

4 8 12

Color Key and Histogram Count

slide-15
SLIDE 15

cross (study) validation (a) Distribution of C-index

C−index

  • CV

CSV

0.50 0.60 0.70 Lasso Ridge Plusminus Unicox SuperPC CoxBoost

slide-16
SLIDE 16

estimated model ranks

Distribution of ranks

  • CV

CSV

1 2 3 4 5 6 Lasso Ridge Plusminus Unicox SuperPC CoxBoost

slide-17
SLIDE 17

CSV CV −0.5 0.0 0.5 1.0 Local Ranking (across) Correlation (Kendall) CSV CV −0.5 0.0 0.5 1.0 Local Ranking (self) Correlation (Kendall) CSV CV −0.5 0.0 0.5 1.0 Global Ranking Correlation (Kendall)

slide-18
SLIDE 18

clustering studies

Trippa etal 201X STAGE 1: Approximate the distribution of the centered array

  • Zs,v − Eps,pv (Zs,v);

s, v = 1, . . . , S

  • STAGE 2: Model-based clustering of the studies 1,. . . ,S

!"#$%&'("!)*)%#' +,''-./012' !"#$%&' !"#$(&' !"#$)&' !"#$*&' !"#%$&' !"#%(&' !"#%)&' !"#%*&' !"#($&' !"#(%&' !"#()&' !"#(*&' !"#)$&' !"#)%&' !"#)(&' !"#)*&' !"#*$&' !"#*%&' !"#*(&' !"#*)&' $' %' (' )' *'

slide-19
SLIDE 19

simulations to illustrate clustering

Logistic reg., N=300, 100 covariates and β1, . . . , β10 strictly positive. Z: Misclassification rates. Measurement errors: {1, 2, 3} LOW, {4, 5, 6} MEDIUM, {7, 8, 9} HIGH. (A) Observed (B) TRUE MEANS (C) Estimates

1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1

1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 MSE ratio = 0.942 MSE ratio = 0.618 MSE ratio = 0.49 MSE ratio = 0.649 MSE ratio = 0.678 MSE ratio = 0.569 MSE ratio = 0.572 MSE ratio = 0.834 MSE ratio = 0.578 verage Pr

2

2 2 1

0.2 0.22 0.24 0.26 0.28

Value MSE ratio = 0.942 MSE ratio = 0.618 MSE ratio = 0.649 MSE ratio = 0.678 MSE ratio = 0.569 MSE ratio = 0.572 MSE ratio = 0.834 MSE ratio = 0.578

slide-20
SLIDE 20

(A) Observed (B) TRUE MEANS (C) Estimates

1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 0.2 0.22 0.24 0.26 0.28

Value

(D) (E)

  • 0.15

0.20 0.25 0.30 0.15 0.20 0.25 0.30

MSE ratio = 0.618

empirical estimate model−based estimate

Training dataset in C1 Validation dataset in C2

0.009 0.014 0.019 0.009 0.014 0.019 MAE Empirical Estimates MAE Bayesian Estimates

  • Training dataset in C1

Training dataset in C2 Training dataset in C3 Validation dataset in C1 Validation dataset in C2 Validation dataset in C3

slide-21
SLIDE 21

ridge regression in the ovarian studies, by n

Ridge regression, ¯ ZB(s),s

C−statistics 100 200 300 400 500 600 0.55 0.60 0.65 Validation datasets: 1 2 3 4 5 6

slide-22
SLIDE 22

lessons: model evaluations

Model evaluation should be context specific Simulations to evaluate methodology should be based on modeling of predictive distributions Multi-study model evaluations are different from single-study and more faithfully represent scientific reproducibility

slide-23
SLIDE 23

conclusions Prognostic model validation in precision medicine is a meta-analysis problem The foundation of research on statistical learning should emphasize empirical multi-study reproducibility

slide-24
SLIDE 24

Credits

Mike Birrer, Lorenzo Trippa, Curtis Huttenhower Levi Waldron, Markus Reister, Ben Ganzfried, Christopher Bernau Benjamin Haibe-Kains, Aedín C. Culhane, Jie Ding, Xin Victoria Wang, Mahnaz Ahmadifar, Svitlana Tyekucheva, Thomas Risch