Estimation of Optimally-Combined-Biomarker Accuracy in the Absence - - PowerPoint PPT Presentation

estimation of optimally combined biomarker accuracy in
SMART_READER_LITE
LIVE PREVIEW

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence - - PowerPoint PPT Presentation

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test L. Garcia Barrado 1 E. Coart 2 T.


slide-1
SLIDE 1

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

  • L. Garcia Barrado1
  • E. Coart2
  • T. Burzykowski1,2

1Interuniversity Institute for Biostatistics and Statistical Bioinformatics (I-Biostat) 2International Drug Development Institute (IDDI) 1 / 29

slide-2
SLIDE 2

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test

Outline

Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions

2 / 29

slide-3
SLIDE 3

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting

Outline

Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions

3 / 29

slide-4
SLIDE 4

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting

Problem setting

Establish accuracy of a combination of biomarkers in the absence of a gold-standard reference test

◮ Area under the Receiver Operating Characteristics (ROC) curve

(AUC) as measure of accuracy

◮ Choose combination of biomarkers that maximizes AUC ◮ Imperfect reference test leads to biased estimates of accuracy

=> To this end a Bayesian latent-class mixture model will be proposed

4 / 29

slide-5
SLIDE 5

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Accuracy definition

Area under the Receiver Operating Characteristics curve

−2 2 4 0.0 0.1 0.2 0.3 0.4

Classification example

Biomarker value Control Disease −2 2 4 0.0 0.1 0.2 0.3 0.4

Classification example

Biomarker value Control Disease 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

ROC curve

1−Sp Se1

AUC=0.98 AUC=0.76

5 / 29

slide-6
SLIDE 6

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers

Data assumptions and notation

Underlying true biomarker distribution

◮ Mixture of two K-variate normal distributions by true disease

status (D)

◮ Y|D=0 ∼ NK(µ0, Σ0) ◮ Y|D=1 ∼ NK(µ1, Σ1)

◮ Se: Unknown sensitivity of the reference test (T) ◮ Sp: Unknown specificity of the reference test (T) ◮ θ: Unknown true prevalence of disease in the data set ◮ Reference test is imperfect

◮ Conditionally on true disease status, misclassification

independent of biomarker value

◮ Ignoring will UNDERESTIMATE performance of biomarker

6 / 29

slide-7
SLIDE 7

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers

ROC parameters optimal combination of biomarkers

According to Siu and Liu (1993) the linear combination maximizing AUC is of the form: a’Y|D=0 ∼ N(a’µ0, a’Σ0a) a’Y|D=1 ∼ N(a’µ1, a’Σ1a) For which: a’ ∝ (Σ0 + Σ1)−1(µ1 − µ0) Area Under the ROC Curve: AUCOptComb = Φ

  • ((µ1 − µ0)′(Σ0 + Σ1)−1(µ1 − µ0))

1 2

  • This is all under the assumption of a gold standard reference test. We

propose to extend this to the imperfect reference test case.

7 / 29

slide-8
SLIDE 8

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Optimal combination of biomarkers

ROC parameters optimal combination of biomarkers

According to Siu and Liu (1993) the linear combination maximizing AUC is of the form: a’Y|D=0 ∼ N(a’µ0, a’Σ0a) a’Y|D=1 ∼ N(a’µ1, a’Σ1a) For which: a’ ∝ (Σ0 + Σ1)−1(µ1 − µ0) Area Under the ROC Curve: AUCOptComb = Φ

  • ((µ1 − µ0)′(Σ0 + Σ1)−1(µ1 − µ0))

1 2

  • This is all under the assumption of a gold standard reference test. We

propose to extend this to the imperfect reference test case.

8 / 29

slide-9
SLIDE 9

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Problem setting Absence of gold-standard reference

Underlying versus observed data

Ignoring misclassification in imperfect reference test will lead to bias of estimated accuracy:

−2 2 4 6

True distributions VS observed data

Biomarker value

Observed Control (T=0) Observed Disease (T=1) True Control (D=0) True Disease (D=1)

◮ In example: conditionally

independent misclassification

◮ Misclassification in reference

test causes skewed observed distributions

◮ Goal: retrieve accuracy of

true underlying biomarker by

  • bserved data

9 / 29

slide-10
SLIDE 10

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model

Outline

Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions

10 / 29

slide-11
SLIDE 11

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model

Full data likelihood

L(µ0, µ1, Σ0, Σ1, θ, Se, Sp|Y, T, D)

=

N

  • i=1
  • θSe

ti (1 − Se)(1−ti)

1

  • 2π|Σ1|

× EXP

  • −1

2 (Yi − µ1)′ Σ−1

1

(Yi − µ1) di ×

  • (1 − θ)(1 − Sp)

ti Sp(1−ti)

1

  • 2π|Σ0|

× EXP

  • −1

2 (Yi − µ0)′ Σ−1

(Yi − µ0) (1−di)

11 / 29

slide-12
SLIDE 12

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition

”Naive” prior definition

Hyperprior

θ ∼ Uniform(0.1,0.9)

Priors Di ∼ Bernoulli(θ) (Observation i: 1,. . . ,N)

µkj ∼ N(0,106)

(Disease indicator j: 0, 1; Biomarker k: 1,. . .,K)

Σ−1

j

∼ Wish(S,K)

(Disease indicator j: 0, 1) with S = VarCov-matrix of observed control group Se = Sp ∼ Beta(1,1)T(0.51,∞) [Non-informative] OR Se = Sp ∼ Beta(10,1.764706)T(0.51,∞) [Informative]

12 / 29

slide-13
SLIDE 13

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition

Se/Sp Beta(10,1.764706) Prior

Mean = 0.85 Var = 0.009988479 Equal-tail 95%-probability interval: 0.6078 - 0.9834

0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4

Informative Se/Sp prior

Se or Sp Density

13 / 29

slide-14
SLIDE 14

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition

Implied priors

Variances and correlations

Simulated invwishart: Scale matrix = S Df = 3 of Sigma11 Covariance matrix component Sigma11 Density 10 20 30 40 50 60 0.0 0.1 0.2 0.3 Simulated invwishart: Scale matrix = S Df = 3 of Sigma22 Covariance matrix component Sigma22 Density 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Simulated invwishart: Scale matrix = S Df = 3 of Sigma33 Covariance matrix component Sigma33 Density 10 20 30 40 50 60 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 Simulated invwishart: Scale matrix = S Df = 3 of Cor12 Correlation component Cor12 Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.5 1.0 Simulated invwishart: Scale matrix = S Df = 3 of Cor13 Correlation component Cor13 Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Simulated invwishart: Scale matrix = S Df = 3 of Cor23 Correlation component Cor23 Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

14 / 29

slide-15
SLIDE 15

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model ”Naive” prior definition

Implied priors

AUC

Implied AUC prior

AUC Density 0.0 0.2 0.4 0.6 0.8 1.0 10 20 30 40 50

◮ Prior specification is used

commonly (e.g. O’Malley and Zou (2006))

◮ Uninformative mixture

component priors lead to prior point mass distribution centred at 1 for AUC

◮ Extremely informative prior

for component of interest!

15 / 29

slide-16
SLIDE 16

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition

Controlled prior definition (Σ)

Set Σj =VjRjVj* For: Vj = σk,jIK and Rj is a correlation matrix. [j:0,1; k:1,. . .,K] Then: Cj = Cholesky factor of Rj.

σk,j ∼ Uniform(0,1000)

Say K=3 then: Cj,12 = ρj,12 ∼ Uniform(-1,1) Cj,13 = ρj,13 ∼ Uniform(-1,1) Cj,23 ∼ Uniform

  • 1 − ρ2

j,13,

  • 1 − ρ2

j,13

  • ρj,23 = ρj,12ρj,13 + Cj,22Cj,23

* Wei, Y and Higgins, J.P .T (2013)

16 / 29

slide-17
SLIDE 17

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition

Controlled prior definition (AUC)

Set ∆ = L(µ1 − µ0) For L = the Cholesky factor of (Σ0 + Σ1)−1

∆ ∼ NK(κ, Ψ) µ0k ∼ N(0, 106) (k: 1,. . . ,K) µ1 = ∆L−1 + µ0

17 / 29

slide-18
SLIDE 18

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition

Implied priors

Variances and correlations

Histogram component11 Variance Density 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 1e−06 2e−06 3e−06 4e−06 Histogram component22 Variance Density 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 1e−06 2e−06 3e−06 4e−06 Histogram component33 Variance Density 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 1e−06 2e−06 3e−06 4e−06 Histogram Corr12 Correlation Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.5 Histogram Corr13 Correlation Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.5 Histogram Corr23 Correlation Density −1.0 −0.5 0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6

18 / 29

slide-19
SLIDE 19

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Bayesian latent-class mixture model Controlled prior definition

Implied priors

AUC

For κ =

   , σi = 0.7 and ρij = 0.6

[i,j: 1,. . .,K]

0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0

AUC Kappa=(0.0,0.0,0.0),Sd=(0.7,0.7,0.7),Cor=(0.6,0.6,0.6)

Y fY Simulated AUC Expansion appr.

◮ Less informative prior

distribution for AUC

◮ Prior on ∆ gives control

  • ver informativeness AUC

prior

19 / 29

slide-20
SLIDE 20

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Simulation study

Outline

Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions

20 / 29

slide-21
SLIDE 21

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Simulation study Data

400 datasets for 3 independent biomarkers

N = 100, 400 or 600

θ = 0.5

Se = Sp = 0.85 Mixture component parameters set such that: AUC of biomarker 1 = 0.75 AUC of biomarker 2 = 0.75 AUC of biomarker 3 = 0.75 AUCOptimalCombination= 0.88

21 / 29

slide-22
SLIDE 22

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Simulation study Results

AUC Results (Average of median posterior AUC)

True AUC = 0.8786

Sample Size Prior Formulation Se/Sp Prior N=100 N=400 N=600 GS / 0.7710 (0.0361) 0.7661 (0.0210) 0.7614 (0.0157)

◮ Gold Standard model fit leads to severe underestimation ◮ Naive AUC prior specification causes slight overestimation

◮ Increased sample size reduces overestimation and decreases standard errors ◮ Informative Se/Sp prior also reduces this bias, but seems to increase standard

errors

◮ Controlled AUC prior reduces overestimation compared to Naive-prior case

◮ Increased sample size decreases standard errors ◮ Informative Se/Sp prior no substantial effect

22 / 29

slide-23
SLIDE 23

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Simulation study Results

AUC Results (Average of median posterior AUC)

True AUC = 0.8786

Sample Size Prior Formulation Se/Sp Prior N=100 N=400 N=600 GS / 0.7710 (0.0361) 0.7661 (0.0210) 0.7614 (0.0157) Naive Non-Inf 0.9241 (0.0279) 0.8890 (0.0279) 0.8836 (0.0262) Naive Inf 0.9068 (0.0344) 0.8827 (0.0286) 0.8785 (0.0263)

◮ Gold Standard model fit leads to severe underestimation ◮ Naive AUC prior specification causes slight overestimation

◮ Increased sample size reduces overestimation and decreases standard errors ◮ Informative Se/Sp prior also reduces this bias, but seems to increase standard

errors

◮ Controlled AUC prior reduces overestimation compared to Naive-prior case

◮ Increased sample size decreases standard errors ◮ Informative Se/Sp prior no substantial effect

23 / 29

slide-24
SLIDE 24

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Simulation study Results

AUC Results (Average of median posterior AUC)

True AUC = 0.8786

Sample Size Prior Formulation Se/Sp Prior N=100 N=400 N=600 GS / 0.7710 (0.0361) 0.7661 (0.0210) 0.7614 (0.0157) Naive Non-Inf 0.9241 (0.0279) 0.8890 (0.0279) 0.8836 (0.0262) Naive Inf 0.9068 (0.0344) 0.8827 (0.0286) 0.8785 (0.0263) Controlled Non-Inf 0.8907 (0.0347) 0.8803 (0.0290) 0.8773 (0.0271) Controlled Inf 0.8728 (0.0388) 0.8741 (0.0292) 0.8722 (0.0269)

◮ Gold Standard model fit leads to severe underestimation ◮ Naive AUC prior specification causes slight overestimation

◮ Increased sample size reduces overestimation and decreases standard errors ◮ Informative Se/Sp prior also reduces this bias, but seems to increase standard

errors

◮ Controlled AUC prior reduces overestimation compared to Naive-prior case

◮ Increased sample size decreases standard errors ◮ Informative Se/Sp prior no substantial effect

24 / 29

slide-25
SLIDE 25

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Conclusions

Outline

Problem setting Accuracy definition Optimal combination of biomarkers Absence of gold-standard reference Bayesian latent-class mixture model ”Naive” prior definition Controlled prior definition Simulation study Data Results Conclusions

25 / 29

slide-26
SLIDE 26

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Conclusions

Conclusions

◮ Bayesian latent-class mixture model:

◮ Takes unknown true disease status into account ◮ Incorporates information from reference test while acknowledges

imperfectness

◮ Provides estimates of accuracy of the reference test

◮ Simulation study

◮ Model is able to retrieve true AUC

◮ Careful prior specification

◮ Complex function of uninformative prior distributions =>

informative prior => biased estimates

◮ Controlled prior specification is proposed

26 / 29

slide-27
SLIDE 27

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Conclusions

Further considerations

◮ Sensitivity to misspecified Se/Sp prior distribution ◮ Extend to incorporate non-normally distributed biomarkers ◮ Evaluate impact of conditional independence assumption

27 / 29

slide-28
SLIDE 28

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Conclusions

References

◮ O’Malley, A.J., Zou, K.H.: Bayesian multivariate hierarchical

transformation models for ROC analysis. Statistical Medicine. 25, 459–479 (2006)

◮ Su, J.Q., Liu, J.S.: Linear combinations of multiple diagnostic markers.

Journal of the American Statistical Association. 88, 1350–1355 (1993)

◮ Wei, Y, Higgins, P

.T.: Bayesian multivariate meta-analysis with multiple

  • utcomes. Statistics in Medicine (2013) doi: 10.1002/sim.5745

28 / 29

slide-29
SLIDE 29

Estimation of Optimally-Combined-Biomarker Accuracy in the Absence of a Gold-Standard Reference Test Conclusions

Thank you for your attention !

29 / 29