part i
play

PART I V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES O - PowerPoint PPT Presentation

V ARIABLE S ELECTION AND THE A SSESSMENT OF P REDICTIVE A CCURACY WITH I NTERVAL -C ENSORED R ESPONSES R ICHARD C OOK S TATISTICS AND A CTUARIAL S CIENCE U NIVERSITY OF W ATERLOO Statistical Issues in Biomarker and Drug Co-Development Toronto,


  1. V ARIABLE S ELECTION AND THE A SSESSMENT OF P REDICTIVE A CCURACY WITH I NTERVAL -C ENSORED R ESPONSES R ICHARD C OOK S TATISTICS AND A CTUARIAL S CIENCE U NIVERSITY OF W ATERLOO Statistical Issues in Biomarker and Drug Co-Development Toronto, Ontario November 8, 2014 Joint work with Ying Wu and Ker-Ai Lee

  2. PART I V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES O UTLINE 1

  3. P ROGNOSTIC H UMAN L EUKOCYTE A NTIGENS IN P SORIATIC A RTHRITIS • The University of Toronto Psoriatic Arthritis Clinic is a tertiary referral clinic comprised of 1300 patients with extensive longitudinal follow-up on disease progression and collection of genetic and serum samples. • Patients with psoriatic arthritis are classified as suffering from arthritis mu- tilans if they have 5 or more damaged joints • Patients are scheduled to be radiologically assessed every two years . • The time for the development of arthritis mutilans is unknown because it is subject to interval-censoring. I MMEDIATE G OAL Interest lies in identifying HLA markers that predict onset of arthritis mutilans. I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 2

  4. J OINT D AMAGE AND M ARKER V ALUES IN C ONTINUOUS T IME 10 − − 100 ESR MARKER TOTAL NUMBER OF DAMAGED JOINTS MARKER OF INFLAMMATION (ESR) # DAMAGED JOINTS 8 − − 80 − − 6 60 − − 4 40 − − 2 20 | HLA MARKERS CLINIC ENTRY TIME SINCE ONSET OF PSORIATIC ARTHRITIS I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 3

  5. J OINT D AMAGE AND M ARKER V ALUES IN C ONTINUOUS T IME 10 − − 100 ESR MARKER TOTAL NUMBER OF DAMAGED JOINTS MARKER OF INFLAMMATION (ESR) # DAMAGED JOINTS 8 − − 80 − − 6 60 − − 4 40 − − 2 20 | | T HLA MARKERS CLINIC ARTHRITIS ENTRY MUTILANS TIME SINCE ONSET OF PSORIATIC ARTHRITIS I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 4

  6. A VAILABLE D ATA D UE TO I NTERMITTENT A SSESSMENTS X 10 − − 100 X ESR MARKER X TOTAL NUMBER OF DAMAGED JOINTS MARKER OF INFLAMMATION (ESR) X # DAMAGED JOINTS 8 − − 80 X − − 6 60 − − 4 40 X − − 2 20 X | | | | | | | | | s 1 s 2 s 3 s 4 s 5 s 6 T HLA MARKERS CLINIC FOLLOW−UP ASSESSMENT TIMES ENTRY TIME SINCE ONSET OF PSORIATIC ARTHRITIS I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 5

  7. D ATA FOR R ESPONSE M ODEL CENSORING INTERVAL | | | PsA ONSET L R HLA DATA (X) D ATA FOR A SSESSMENT P ROCESS Z ( s j ) denotes marker of inflammation w j = s j − s j − 1 , j = 1 , 2 , . . . are waiting times | | | | | | | s 1 s 2 s 3 s 4 s 5 s 6 PsA ONSET Z ( s 1 ) Z ( s 2 ) Z ( s 3 ) Z ( s 4 ) Z ( s 5 ) Z ( s 6 ) HLA DATA (X) I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 6

  8. S EMI -P ARAMETRIC E STIMATES OF W AITING T IME D ISTRIBUTIONS 1.0 0.8 CUMULATIVE PROBABILITY 0.6 Diagnosis to 1st X−RAY 0.4 1st to 2nd X−RAY 2nd to 3rd X−RAY 3rd to 4th X−RAY 4th to 5th X−RAY 5th to 6th X−RAY 6th to 7th X−RAY 0.2 7th to 8th X−RAY 8th to 9th X−RAY 9th to 10th X−RAY 0.0 0 10 20 30 40 TIME IN YEARS I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 7

  9. E STIMATE 1 OF DISTRIBUTION OF TIME TO ARTHRITIS MUTILANS 1.0 TURNBULL ESTIMATE CUMULATIVE PROBABILITY OF ARTHRITIS MUTILANS POINTWISE 95% CONFIDENCE BAND 0.8 0.6 0.4 0.2 0.0 0 10 20 30 40 YEARS SINCE DIAGNOSIS OF PsA 1 Turnbull BW (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data, Journal of the Royal Statistical Society. Series B (Methodological) 38, 290-295. I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 8

  10. P ENALIZED R EGRESSION FOR F AILURE T IME D ATA • log L ( β ) is the log likelihood or log partial likelihood • Consider a penalized “likelihood” function p � log L PEN ( β ) = log L ( β ) − π γ,λ ( β j ) (1.1) j =1 • π γ,λ ( · ) is a penalty function • ( γ, λ ) are tuning parameters • λ = ( λ 1 , . . . , λ p ) ′ if we use different penalties for each variable I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 9

  11. S OME P ARTICULAR P ENALTY F UNCTIONS The L 2 penalty π λ ( | β | ) = λ | β | 2 gives ridge regression 2 The L 1 penalty π λ ( | β | ) = λ | β | yields the LASSO 3 S MOOTHLY C LIPPED A BSOLUTE D EVIATION (SCAD) P ENALTY The smoothly clipped absolute deviation (SCAD) 4 penalty has the form A DAPTIVE LASSO The adaptive LASSO 5 with penalty has the form π λ ( | β j | ) = λ | β j | τ j , with small weights τ j chosen for large coefficients and large weights for small 2 Hoerl AE and Kennard RW (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12 (1), 55–67. 3 Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288. 4 Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96 (456), 1348–1360. 5 Zou H (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101 (476), 1418–1429. I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 10

  12. P ENALIZED R EGRESSION WITH I NTERVAL -C ENSORED D ATA • For individual i , D i = { ( L i , R i ) , X i } , where X i is a p × 1 covariate vector • Data consists of D = { D i , i = 1 , 2 , . . . , m } O BSERVED D ATA L OG -L IKELIOOD m � log L ∝ log [ F ( L i | X i ) − F ( R i | X i )] i =1 where F ( s | X ) is the survivor function P ENALIZED O BSERVED D ATA L OG -L IKELIOOD p � m � log L penalized ∝ log [ F ( L i | X i ) − F ( R i | X i )] − π γ,λ ( β j ) i =1 j =1 I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 11

  13. P ENALIZED R EGRESSION WITH I NTERVAL C ENSORED D ATA B 1 B 2 B 3 B k | | | | | | b 0 b 1 b 2 b 3 b k−1 b k Breakpoints 0 = b 0 < · · · < b K = ∞ define B k = [ b k − 1 , b k ) , k = 1 , . . . , K . � u If I k ( u ) = I ( u ∈ B k ) and S k ( u ) = 0 I ( v ∈ B k ) dv then � K i β )) I k ( u ) ( ρ k exp ( x ′ h ( s ; θ ) = k =1 where θ = ( ρ ′ , β ′ ) ′ , ρ = ( ρ 1 , . . . , ρ K ) ′ and β = ( β 1 , . . . , β p ) ′ C OMPLETE D ATA L IKELIHOOD � m � K { I k ( u i ) [log( ρ k ) + X ′ i β ] − S k ( u i ) ρ k exp( X ′ log L c ( θ ) = i β ) } i =1 k =1 I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 12

  14. A N EM A LGORITHM 6 WITH P ENALIZED R EGRESSION T HE E XPECTATION S TEP Take the conditional expectation of penalized complete data log-likelihood p � � log L c ( θ ) | D ; θ r − 1 � Q ( θ ; θ r − 1 ) = E − π α,λ ( β j ) j =1 If � I k ( u i ) | D i ; θ r − 1 � g r ˆ ik = E � S k ( u i ) | D i ; θ r − 1 � ˆ S r ik = E then � � p � m � K � i β ) − ˆ Q ( θ ; θ r − 1 ) = g r ik (log( ρ k ) + X ′ S r ik ρ k exp( X ′ ˆ i β ) − π γ,λ ( β j ) i =1 j =1 k =1 6 Dempster AP, Laird NM and Rubin DB (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38. I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 13

  15. M AXIMIZATION S TEP Let • Z ij = I ( j = k ) , j = 2 , . . . , K , Z ik = (1 , Z i 2 , . . . , Z iK ) ′ • α 1 = log( ρ 1 ) , α j = log( ρ j ) − log( ρ 1 ) , j = 2 , . . . , K Then Q ( θ ; θ r − 1 ) is � � p m K � � � i β ) − ˆ ik ( Z ′ ik α + X ′ ik exp( Z ′ ik α + X ′ g r S r ˆ i β ) − π γ,λ ( β j ) i =1 k =1 j =1 With a pseudo dataset we can maximize Q ( θ ; θ r − 1 ) using standard software for penalized regression (e.g. glmnet(.), SIS(.)) I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 14

  16. S ELECTION OF O PTIMAL P ENALTY λ OPT • The criterion for selecting the optimal λ is similar to the traditional cross- validation. • We partition the dataset into R subsamples T 1 , . . . , T R . • T r and T − T r are r th testing and training sets. • For a given λ , the cross-validation statistic is � R � CV ( λ ) = log L ( θ − r ( λ )) − log L − r ( θ − r ( λ )) . r =1 • L − r is the observed likelihood for the r th training dataset. • θ − r ( λ ) is the estimate for the r th training data. • The optimal λ maximizes � CV ( λ ) . I. V ARIABLE S ELECTION WITH I NTERVAL - CENSORED R ESPONSES 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend