nhanes data management with r
play

NHANES Data: Management with R David Winsemius, MD. MPH Heritage - PowerPoint PPT Presentation

NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT Acknowledgements and Credits: External R Development Core Team (2009). R: A language and environment for statistical computing. R


  1. NHANES Data: Management with R David Winsemius, MD. MPH Heritage Laboratories Olathe, KS West Hartford, CT

  2. Acknowledgements and Credits: External • R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, • http://www.R-project.org. • Harrell FE: “Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis.” New York: Springer, 2001. – Alzola CF, Harrell FE: An Introduction to S and the Hmisc and Design Libraries . Freely available electronic book. • The survival Package: Title “Survival analysis, including penalised likelihood.” – Maintainer Thomas Lumley <tlumley@u.washington.edu> – Author: Terry Therneau, original R port by Thomas Lumley 2

  3. http://www.cdc.gov/nchs/nhanes.htm

  4. http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/nhanes07_08.ht m

  5. > library(foreign) #Demographics with subject ID, age > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/demo_c.xpt") # Followed by various laboratory values from 2003-2004 cycle > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l02_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l10_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l13_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l16_c.xpt") ) > merge0304 <- merge(merge0304, read.xport( "/Users/davidwinsemius/Documents/Heritage/ NHANES-continuous/nchs/nhanes/nhanes2003-2004/l40_c.xpt") ) # and finally append to the cases assembled from prior years > merge0904 <- rbind(merge0902[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ], + merge0304[ , names(merge0902)[ names(merge0902) %in% names(merge0304)] ])

  6. Option of using Hmisc sasxport.get > require(Hmisc) # I usually type require(rms) > w <- sasxport.get('/Users/davidwinsemius/Documents/Heritage/NHANES-continuous/nchs/ nhanes/nhanes2003-2004/demo_c.xpt') Processing SAS dataset DEMO_C .. > str(w) 'data.frame': 10122 obs. of 43 variables: $ seqn :Class 'labelled' atomic [1:10122] 21005 21006 21007 21008 21009 21010 21011 21012 21013 21014 ... .. ..- attr(*, "label")= chr "Respondent sequence number" $ sddsrvyr:Class 'labelled' atomic [1:10122] 3 3 3 3 3 3 3 3 3 3 ... .. ..- attr(*, "label")= chr "Data Release Number" $ ridstatr:Class 'labelled' atomic [1:10122] 2 2 2 2 2 2 2 2 2 2 ... .. ..- attr(*, "label")= chr "Interview/Examination Status" $ riagendr:Class 'labelled' atomic [1:10122] 1 2 2 1 1 2 1 1 2 1 ... > contents(w) .. ..- attr(*, "label")= chr "Gender - Adjudicated" Data frame:w 10122 observations and 43 variables Maximum # NAs:8678 Labels Storage NAs seqn Respondent sequence number integer 0 sddsrvyr Data Release Number integer 0 ridstatr Interview/Examination Status integer 0 riagendr Gender - Adjudicated integer 0 ridageyr Age at Screening Adjudicated - Recode integer 0 ridagemn Age in Months - Recode integer 223 ridageex Exam Age in Months - Recode integer 692 ridreth1 Race/Ethnicity - Recode integer 0 ridreth2 Linked NH3 Race/Ethnicity - Recode integer 0 dmqmilit Veteran/Military Status integer 4196

  7. > load("/Users/davidwinsemius/Documents/Heritage/merge9906.Rdta") > require(foreign) > #demographics 2007-2008 > merge0304 <- read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/DEMO_E.xpt") > # hep c > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HEPC_E.xpt") ) > # Glycohemoglobin > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/GHB_E.xpt") ) > # Cholesterol and HDL-Cholesterol > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/TCHOL_E.xpt") ) > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/HDL_E.xpt") ) > #Urinary Albumin and Urinary Creatinine > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/ALB_CR_E.xpt") ) > #Biochemistry Profile > merge0304 <- merge(merge0304, read.xport("/Users/davidwinsemius/Documents/Heritage/ NHANES/BIOPRO_E.xpt") ) > str(merge0304) 'data.frame': 6917 obs. of 85 variables: $ SEQN : num 41475 41477 41479 41481 41482 ... $ SDDSRVYR: num 5 5 5 5 5 5 5 5 5 5 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... $ RIDEXMON: num 2 2 1 1 2 1 1 1 1 1 ...

  8. > describe(merge0304$LBDHCV) merge0304$LBDHCV n missing unique Mean 6416 501 3 1.990 1 (114, 2%), 2 (6286, 98%), 5 (16, 0%) > merge0708 <- merge0304 > rm(merge0304) >merge9908 <- rbind(merge9906[, names(merge9906)[names(merge9906) %in% names(merge0708)] ], merge0708[, names(merge9906)[names(merge9906) %in% names(merge0708)] ])

  9. > str(merge9908) 'data.frame': 27456 obs. of 57 variables: $ SEQN : num 1000 1003 1005 1009 1010 ... $ SDDSRVYR: num 1 1 1 1 1 1 1 1 1 1 ... $ RIDSTATR: num 2 2 2 2 2 2 2 2 2 2 ... <snip> $ DMDHRAGE: num 53 78 40 39 85 26 49 41 85 64 ... $ DMDHREDU: num 4 5 3 5 3 3 1 2 1 2 ... $ DMDHRMAR: num 1 NA 1 1 1 1 NA 99 1 1 ... $ DMDHSEDU: num 4 NA 3 4 3 3 NA NA 2 3 ... $ WTINT2YR: num 9488 29627 2867 105753 21555 ... $ WTMEC2YR: num 9946 31456 2964 110068 26756 ... $ SDMVPSU : num 2 1 2 1 1 1 1 2 1 2 ... $ SDMVSTRA: num 13 8 3 8 1 7 13 2 11 11 ... $ LBDHCV : num 2 2 2 2 2 2 2 2 2 2 ... $ LBXSAL : num 4.3 4.5 5 4.4 4.1 4.6 5.2 4.9 4.6 4.6 ... $ LBDSALSI: num 43 45 50 44 41 46 52 49 46 46 ... $ LBXSATSI: num 37 19 13 23 12 27 18 12 11 69 ... $ LBXSASSI: num 18 21 19 18 19 25 17 15 17 43 ... $ LBXSBU : num 28 31 14 11 31 10 11 8 17 10 ... $ LBDSBUSI: num 10 11.1 5 3.9 11.1 3.6 3.9 2.9 6.1 3.6 ... $ LBXSCH : num 265 133 131 223 218 233 176 212 215 170 ...

  10. Variabl e tracking :

  11. Or from the NHANES mailing list (citation OK: stetzer@uwm.edu): https://pantherfile.uwm.edu/stetzer/www/NHANES_filelocator_20100414.xlsx

  12. Or from the NHANES mailing list (citation OK: stetzer@uwm.edu): https://pantherfile.uwm.edu/stetzer/www/NHANES_filelocator_20100414.xlsx

  13. > with(merge9908[merge9908$LBDHCV!=1,], plot(hcv.scr3, LBXSATSI, ylab="ALT", cex=0.1, col=LBDHCV, xlim=c(-5,5), ylim=c(0,300))) > with(merge9908[merge9908$LBDHCV==1,], points(hcv.scr3, LBXSATSI, col=LBDHCV)) > abline(v=-0.8926, lty=4) > abline(h=49, lty=4) > title(main="NHANES 1999-2008: Compare ALT and New HCV.scr for ability\n to classify HCV positives (open circles): 90th%ile lines") > text(-3.5,200, labels="46 HCV cases w/ \nHCV.sc1 low, ALT high") > text(3.5,10, labels="72 HCV cases w/ \nHCV.sc1 high, ALT low") > with(merge9908[merge9908$LBDHCV==1, ], table(HCVscr=hcv.scr1>5.2, ALT= LBXSATSI > 45)) ALT HCVscr FALSE TRUE FALSE 147 42 TRUE 68 119 > save(merge9908, HCVHIV, build.detect, build.detect2, ddH, file="HCVApr2010.Rdta")

  14. Now look at ... what motivated this effort. Not from NHANES , but rather from an insurance derived dataset

  15. Assessment: Is the HCV Score also an HCV Mortality Score? • Predicts mortality in both HCV positive and HCV negative applicants • Much better at predicting mortality in HCV positive

  16. Not from NHANES , but rather from an insurance derived dataset # require(rms) # ... prior creation of Cox model with interacting rcs() spline terms > bplot( Predict(HCVALT.mort.test2, hcv.scr2=seq(-5,2,by=0.1), ALT=seq(0, 120, by=1), hepC3="Negative", fun=exp), xlab="HCV score (revised)", ylab="ALT (mIU/mL)", main= "Mortality: HCV-score x ALT in HepC='Negative': \n Age-Sex-Smoking-ALT Adjusted", lwd=3, scales=list(alternating=FALSE)) > # 3way.HCVneg.rcs(hcv-scr.ALT).pdf

  17. Assessment: Is ALT also an HCV Mortality Score when used in combination? • Well, yes, er, um, ... it is, but ... • It has extremely paradoxical risk relationship when used in combination with the HCV Score • BUT … as everyone should know, plotting marginal effects when these effects are included in interactions is just statistically wrong!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend