the analysis of placement values for evaluating
play

The Analysis of Placement Values for Evaluating Discriminatory - PowerPoint PPT Presentation

The Analysis of Placement Values for Evaluating Discriminatory Measures Margaret Sullivan Pepe & Tianxi Cai Biometrics (2004) Allison Meisner May 27, 2014 1 Overview When we have a continuous test Y and a binary outcome D , the ROC


  1. The Analysis of Placement Values for Evaluating Discriminatory Measures Margaret Sullivan Pepe & Tianxi Cai Biometrics (2004) Allison Meisner · May 27, 2014 1

  2. Overview When we have a continuous test Y and a binary outcome D , the ROC curve plots the (FPR, TPR) pairs for each possible cutoff of the test. Problem: The ROC curve may differ by patient characteristics. Identifying such variability helps us to apply the test in an optimal way. Solution: ROC regression with placement values 2

  3. Motivating Example Prostate-specific antigen (PSA) is a popular, though controversial, way to screen men for prostate cancer (PCa). The biology of PSA and PCa has implications for the usefulness of PSA as a screening tool: ◮ PSA levels differ by age: older men typically have higher PSA, regardless of PCa status ◮ Age can potentially affect the ability of PSA to discriminate PCa cases ◮ Among PCa cases, PSA measured closer to diagnosis does a better job of discriminating PCa 3

  4. Background: FPR, TPR, ROC 4

  5. Background: FPR, TPR, ROC 5

  6. Background: FPR, TPR, ROC 6

  7. Background: FPR, TPR, ROC 7

  8. Background: Effect of Covariates on ROC 8

  9. Background: Effect of Covariates on ROC 9

  10. Background: Effect of Covariates on ROC 10

  11. Background: Effect of Covariates on ROC 11

  12. Background: Effect of Covariates on ROC 12

  13. Background: Effect of Covariates on ROC 13

  14. Background: Effect of Covariates on ROC 14

  15. Background: Effect of Covariates on ROC Recall, ROC ( u ) = (TPR at FPR = u ) . 15

  16. ROC Model ◮ ROC model (Pepe, 1997): ROC Z D ( u ) = g ( β T Z D + H α ( u )) ◮ α = underlying shape of ROC curve ◮ β = impact of Z D on shape of ROC curve ◮ Problem: estimation ◮ Pepe (2000) and Alonzo and Pepe (2002) create indicators I ( Y Di ≥ F − 1 D (1 − u )) for some set of FPRs u and then use binary regression techniques ◮ Pepe & Cai propose using placement values and what is known about their distribution to estimate the parameters more efficiently 16

  17. Placement Values ◮ Definitions ◮ Placement values: U Di = 1 − F D ( Y Di ) for the i th diseased subject. In words, the placement value for the i th diseased subject is the proportion of the reference (non-diseased) population with marker Y values above Y Di . ◮ If Z D affects the distribution of Y in the reference population, U Di = 1 − F D, Z D ( Y Di ). ◮ ROC curve: ROC ( u ) = P ( Y D ≥ F − 1 D (1 − u )) = (TPR at FPR=u) ◮ Relationship between ROC and placement values P ( Y D ≥ F − 1 ROC ( u ) = D (1 − u )) = P (1 − u ≤ F D ( Y D )) = P (1 − F D ( Y D ) ≤ u ) = P ( U D ≤ u ) 17

  18. Placement Values 18

  19. Proposed Method ◮ ROC model (Pepe, 1997): ROC Z D ( u ) = g ( β T Z D + H α ( u )) ◮ Proposed model: H α ( U D ) = − β T Z D + ǫ , where ǫ ∼ g ◮ Proof of equivalence: Pr ( U D ≤ u ) = Pr ( H α ( U D ) ≤ H α ( u )) Pr ( − β T Z D + ǫ ≤ H α ( u )) = Pr ( ǫ ≤ β T Z D + H α ( u )) = g ( β T Z D + H α ( u )) = ROC Z D ( u ) = Recall that if Z D affects the distribution of Y in the reference population, U Di = 1 − F D, Z D ( Y Di ); then we may write H α ( U D ) = − β T Z D + ǫ ⇔ ROC Z D , Z D ( u ) = g ( β T Z D + H α ( u )) ◮ In our example, Z D = age and Z D = (age, time). 19

  20. Proposed Method: Algorithm Since Pr ( U D ≤ u ) = g ( β T Z D + H α ( u )), we know the density function is f ( u ) = ∂g ( β T Z D + H α ( u )) . ∂u Then, for [ a, b ] ⊂ (0 , 1), the log likelihood is n D [ I ( U Di < a )log { g ( β T Z Di + H α ( a )) } � ℓ ( θ ) = i =1 + I ( U Di > b )log { 1 − g ( β T Z Di + H α ( b )) } + I ( U Di ∈ ( a, b ))log f ( U Di )] where θ = ( α , β ). 20

  21. Proposed Method: Algorithm Estimating F D, Z D ◮ Pepe and Cai advise estimating F D, Z D nonparametrically if Z D is discrete and semiparametrically otherwise. ◮ For semiparametric estimation, Pepe and Cai recommend the semiparamtric regression quantile estimation procedure developed by Heagerty and Pepe (1999). The estimates of the placement values, ˆ U Di , are substituted into ℓ ( θ ), yielding a pseudo-log-likelihood*, which is maximized to estimate θ . 21

  22. Competing Method: Algorithm Alonzo and Pepe proposed an algorithm for fitting ROC regression based on binary regression methods. 1. For [ a, b ] ⊂ (0 , 1), let T = { u 1 , ..., u n T } = { 1 − j/n D ; j = 1 , ..., n D − 1 } ∩ [ a, b ] (the maximal set). 2. Then for each diseased subject i , the n T binary variables B ui are calculated: B ui = I [ ˆ U Di ≤ u ] , u ∈ T. 3. The binary generalized linear regression model E { B ui } = g { β T Z D + H α ( u ) } is fit using standard techniques. The Pepe and Cai method is claimed to be more efficient than that of Alonzo and Pepe. 22

  23. Simulations Set-up ◮ Y D = α − 1 1 { α 0 + β 1 Z 1 + ( β 2 + 0 . 5 α 1 ) Z 2 + ǫ D } Y D = 0 . 5 Z 2 + ǫ D ◮ Z 1 ∼ Bernoulli(0 . 5), Z 2 ∼ Uniform(0 , 1) ◮ ǫ D ∼ N (0 , 1), ǫ D ∼ N (0 , 1) Induced ROC curve: ROC Z D , Z D ( u ) = Pr ( U D ≤ u ) = Pr (1 − F D ( Y D ) ≤ u ) Pr ( F − 1 D (1 − u ) ≤ α − 1 = 1 { α 0 + β 1 z 1 + ( β 2 + 0 . 5 α 1 ) z 2 + ǫ D ) Pr (Φ − 1 (1 − u ) + 0 . 5 z 2 ≤ = α − 1 1 { α 0 + β 1 z 1 + ( β 2 + 0 . 5 α 1 ) z 2 + ǫ D } ) Pr ( ǫ D ≤ − α 1 Φ − 1 (1 − u ) + α 0 + β 1 z 1 + β 2 z 2 ) = Φ( α 1 Φ − 1 ( u ) + α 0 + β 1 z 1 + β 2 z 2 ) = g ( β T Z D + H α ( u )) = Recall, α = shape of ROC, β = effects of Z D on ROC 23

  24. Simulations Note that here Z D = Z 2 and Z D = ( Z 1 , Z 2 ) . Despite their recommendations, Pepe and Cai did not use the semiparametric method of Heagerty and Pepe to estimate placement values. Instead, Pepe and Cai regress Y on Z 2 among the non-diseased subjects: E ( Y D | Z 2 = z 2 ) = γ 0 + γ 1 z 2 ⇒ ˆ ǫ Di = Y Di − ˆ γ 0 − ˆ γ 1 z 2 Di . Then the placement value for subject i was estimated to be n D U Di = 1 ˆ � I (ˆ ǫ D j > Y Di − ˆ γ 0 − ˆ γ 1 z 2 Di ) . n D j =1 24

  25. Simulations Two sets of simulations (1000 simulations each): 1. Pepe and Cai method only ◮ Bias ◮ Empirical SE ◮ Mean estimated SE ◮ Empirical coverage probability ◮ Note: α 0 = 1 , α 1 = 1 , β 1 = 0 . 5 , β 2 = 0 . 7 throughout ◮ Considered [ a, b ] = [0 . 01 , 0 . 99] and [ a, b ] = [0 . 01 , 0 . 20] 2. Pepe and Cai vs. Alonzo and Pepe ◮ Bias ◮ MSE ◮ Two sets of parameter values considered ◮ α 0 = 1 , α 1 = 1 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ α 0 = 1 . 5 , α 1 = 0 . 9 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ Considered [ a, b ] = [0 . 01 , 0 . 99] and [ a, b ] = [0 . 01 , 0 . 50] 25

  26. Simulations: Pepe & Cai ◮ [ a, b ] = [0 . 01 , 0 . 99] 26

  27. Simulations: Pepe & Cai vs. Alonzo & Pepe ◮ α 0 = 1 , α 1 = 1 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ [ a, b ] = [0 . 01 , 0 . 99] 27

  28. Application The proposed method was applied to data from a study on PSA and PCa screening. ◮ 88 PCa cases, 88 age-matched controls ◮ Recall, Z D = age and Z D = (age, time) ◮ Model: ROC Z D , Z D ( u ) = Φ( α 0 + α 1 Φ − 1 ( u ) + β 1 time + β 2 age) ◮ SE estimates from the bootstrap (500 replications) Estimate (SE) α 0 4.30 (0.93) α 1 0.84 (0.09) β 1 -0.16 (0.03) β 2 -0.04 (0.01) 28

  29. Conclusions ◮ The proposed method has nice intuition behind it and makes full use of the data through placement values, as opposed to creating indicators. ◮ Implementation of the proposed method is less straightforward and is not particularly computationally efficient. ◮ In most scenarios, the proposed method is more statistically efficient than the binary regression technique. ◮ Both methods are susceptible to misspecification in both the estimation of F D and the form of the ROC model. 29

  30. Effects of Misspecification What happens when Y D = 0 . 5 Z 2 2 + N (0 , ( Z 2 + 0 . 5) 2 ) but we still assume Y D = 0 . 5 Z 2 + N (0 , 1)? This will impact 1. estimates of placement values 2. form of the induced ROC curve (used in the likelihood calculation) 30

  31. Effects of Misspecification ◮ α 0 = 1 , α 1 = 1 , β 1 = 0 . 5 , β 2 = 0 . 7 31

  32. Effects of Misspecification ◮ α 0 = 1 . 5 , α 1 = 0 . 9 , β 1 = 0 . 5 , β 2 = 0 . 7 32

  33. Conclusions ◮ The proposed method has nice intuition behind it and makes full use of the data through placement values, as opposed to creating indicators. ◮ Implementation of the proposed method is less straightforward and is not particularly computationally efficient. ◮ In most scenarios, the proposed method is more statistically efficient than the binary regression technique. ◮ Both methods are susceptible to misspecification in both the estimation of F D and the form of the ROC model. 33

  34. Simulations: Pepe & Cai ◮ [ a, b ] = [0 . 01 , 0 . 20] 34

  35. Simulations: Pepe & Cai vs. Alonzo & Pepe ◮ α 0 = 1 , α 1 = 1 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ [ a, b ] = [0 . 01 , 0 . 50] 35

  36. Simulations: Pepe & Cai vs. Alonzo & Pepe ◮ α 0 = 1 . 5 , α 1 = 0 . 9 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ [ a, b ] = [0 . 01 , 0 . 99] 36

  37. Simulations: Pepe & Cai vs. Alonzo & Pepe ◮ α 0 = 1 . 5 , α 1 = 0 . 9 , β 1 = 0 . 5 , β 2 = 0 . 7 ◮ [ a, b ] = [0 . 01 , 0 . 0 . 5] 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend