r packages for rank based estimates
play

R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 - PowerPoint PPT Presentation

R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 University of Wisconsin - Madison 2 Western Michigan University 10 July 2013 Kloke, McKean R Packages for Rank-based Estimates Outline Rank-based (R) estimation for linear


  1. R Packages for Rank-based Estimates John Kloke 1 Joseph McKean 2 1 University of Wisconsin - Madison 2 Western Michigan University 10 July 2013 Kloke, McKean R Packages for Rank-based Estimates

  2. Outline ◮ Rank-based (R) estimation for linear models ◮ Rfit ◮ Joint ranking (JR) estimation for cluster correlated data ◮ jrfit Kloke, McKean R Packages for Rank-based Estimates

  3. Notation Usual iid Linear Model y i = α + x T i β + e i for i = 1 , . . . , n iid Assume e i ∼ F , f are continuous errors with I ( f ) < ∞ . Matrix Formulation y = α 1 n + X β + e WLOG assume X is centered ( X T 1 n = 0 ) and has full rank and design conditions hold. Kloke, McKean R Packages for Rank-based Estimates

  4. Rank Estimation Jureˇ ckov´ a (1971) & Jaeckel (1972) The rank-based estimator of β is defined as � = Argmin � y − X β � ϕ β ϕ n � a [ R ( y i − x T i β )]( y i − x T = Argmin i β ) i =1 ◮ � v � ϕ = � n v ∈ R n , is Jaeckel’s (1972) i =1 a [ R ( v i )] v i dispersion function . ◮ R denotes Rank. � � t ◮ Scores are generated as a [ t ] = ϕ . ( n +1) ◮ Score function : ϕ ( u ) is a nondecreasing, square-integrable � function defined on (0 , 1) such that ϕ ( u ) du = 0 and � ϕ 2 ( u ) du = 1. Kloke, McKean R Packages for Rank-based Estimates

  5. Asymptotic Theory and Inference � ϕ ( X T X ) − 1 � ˆ β , τ 2 β ϕ ˙ ∼ N p where τ ϕ is a scale parameter which depends on the error distribution ( f ) and the score function ( ϕ ). A consistent estimate of τ ϕ was proposed by Koul, Sievers, McKean (1987). Inference ◮ Simple confidence intervals and tests of hypothesis for the slope parameters should be based on a t n − p − 1 where p = ncol( X ) (see Hettmansperger, McKean 2009). ◮ A reduction in dispersion which is akin to the least squares reduced model test is based on the statistic F RD = ( � y − ˆ y F � ϕ − � y − ˆ y R � ) / q ˆ τ ϕ / 2 where ˆ y F is the full model fit, ˆ y R is the reduced model fit, and q is the number of constraints. Kloke, McKean R Packages for Rank-based Estimates

  6. Rfit Kloke, McKean 2012 > args(rfit.default) function (formula, data = list(), yhat0 = NULL, scores = ws symmetric = FALSE, intercept = NULL, TAU = "F0", ...) ◮ yhat0 : initial fit. Defaults to an L 1 estimate ( quantreg , Koenker 2013). ◮ scores : object of class ‘scores’. Several available. Defaults to Wilcoxon. ◮ symmetric : use median (F) or Hodges-Lehmann (T) to estimate intercept ◮ intercept : should an intercept be added to the model? Default is yes unless already in column space of the design matrix. ◮ TAU : F0 = fortran; R = R; N = NA Kloke, McKean R Packages for Rank-based Estimates

  7. Example 1: Uric Acid & Cardiovascular Risk Factors Heritier, et. al. 2009 ◮ Investigating the association between uric acid levels and various cardiovascular risk factors in developing countries. ◮ Outcome variable: Uric Acid ( µ mol/L) ◮ Risk Factors: body mass index (bmi), systolic blood pressure (sys), low-density lipoprotein cholesterol (ldl), total cholesterol (choles), male (sex) ◮ 998 Participants (474 men & 524 women) aged 25 to 64. Kloke, McKean R Packages for Rank-based Estimates

  8. > summary(rfit(Uric~bmi+sys+choles+ldl+sex,data=CardioRiskFactor Call: rfit.default(formula = Uric ~ bmi + sys + choles + ldl + sex, data = CardioRiskFactors) Coefficients: Estimate Std. Error t.value p.value -122.54244 19.10680 -6.4136 2.197e-10 *** bmi 5.16547 0.49141 10.5115 < 2.2e-16 *** sys 0.67100 0.10480 6.4027 2.352e-10 *** choles 60.37128 5.10748 11.8202 < 2.2e-16 *** ldl -53.30736 5.33107 -9.9994 < 2.2e-16 *** sex 128.74061 5.00197 25.7380 < 2.2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Multiple R-squared (Robust): 0.4688407 Reduction in Dispersion Test: 175.1226 p-value: 0 Kloke, McKean R Packages for Rank-based Estimates

  9. > fitF<-rfit(Uric~bmi+sys+choles+ldl+sex +smok+alco+apoa+trig+age,data=CardioRiskFactors) > fitR<-rfit(Uric~bmi+sys+choles+ldl+sex, data=CardioRiskFactors) > drop.test(fitF,fitR) Drop in Dispersion Test F-Statistic p-value 82.193 0.000 Kloke, McKean R Packages for Rank-based Estimates

  10. Wilcoxon Studentized Residuals > qqnorm(rstudent(fit)) Normal Q−Q Plot 20 15 Sample Quantiles 10 5 0 −5 −3 −2 −1 0 1 2 3 Theoretical Quantiles Kloke, McKean R Packages for Rank-based Estimates

  11. Cluster Correlated Data Consider an experiment done over m blocks or clusters. For block k ( k = 1 , . . . , m ) we have the linear model y k = α 1 n k + X k β + e k ◮ Assume e 1 , . . . , e m are independent random vectors. ◮ N = � m k =1 n k . Matrix Formulation y = α 1 N + X β + e m ] T is a N × p design matrix where X = [ X T 1 · · · X T Kloke, McKean R Packages for Rank-based Estimates

  12. JR Estimator Kloke, McKean, Rashid 2009 Assume equal marginals: e kj ∼ f , F for k = 1 , . . . , m , j = 1 , . . . n k The joint rankings (JR) estimator of β is ˆ β JR = Argmin � y − X β � ϕ , where � N � v � ϕ = a ( R ( v t )) v t t =1 is Jaeckel’s (1972) dispersion function. Kloke, McKean R Packages for Rank-based Estimates

  13. Asymptotic Distribution Using a result from Brunner and Denker (1994) � � − 1 � � � − 1 � ˆ β , τ 2 X T X X T X β JR ˙ ∼ N p V ϕ ϕ ◮ V ϕ = � m k =1 X T k Σ ϕ, k X k ◮ Σ ϕ, k = var( ϕ ( F ( e k ))) ◮ ϕ ( F ( e k )) = [ ϕ ( F ( e k 1 )) , . . . , ϕ ( F ( e kn k ))] T Kloke, McKean R Packages for Rank-based Estimates

  14. Estimates of V ϕ = � m k =1 X T k Σ ϕ X k 1. Assume Σ ϕ is compound symmetric . Then � Σ ϕ = (1 − ˆ ρ ϕ ) I + ˆ ρ ϕ J where � m � e kj )), M = � m � n k � 1 ρ ϕ = ˆ i > j a ( R (ˆ e ki )) a ( R (ˆ . k =1 k =1 M − p 2 2. Sandwich estimator. � m k =1 X T e k )) T X k k a ( R (ˆ e k )) a ( R (ˆ 1. Valid under exchangeable within block errors. 2. Requires a large number of clusters (large m ). Kloke, McKean R Packages for Rank-based Estimates

  15. jrfit > args(jrfit) function (x, y, block, yhat0 = NULL, scores = wscores, fitint = NULL, var.type = "sandwich", fitblock = NULL, ...) ◮ x : N × p design matrix ◮ y : N × 1 vector of responses ◮ block : N × 1 vector denoting block membership ◮ var.type : ‘sandwich’, ‘cs’, or ‘user’. ◮ fitblock : should block be fit as fixed effect? Default T for ‘cs’ and F for ‘sandwich’ Kloke, McKean R Packages for Rank-based Estimates

  16. Example 2: Simulated Cluster-Correlated ◮ Model: y ij = α + w i ∆ + x ij β + b j + e ij ◮ m = 160 blocks ◮ n = 4 observations (repeated measures) per block ◮ ∆ = 0 . 5 is the treatment effect ◮ β = 0 is the effect of a (baseline) covariate ◮ e ij ∼ t 5 and b j ∼ t 3 Kloke, McKean R Packages for Rank-based Estimates

  17. > fit<-jrfit(cbind(x,w),y,block) > summary(fit) Coefficients: Estimate Std. Error t-ratio p.value x 0.0904889 0.1390482 0.6507737 0.5161260 w 0.7094427 0.2578163 2.7517367 0.0066124 > > fit<-jrfit(cbind(x,w),y,block,var.type=’cs’,fitblock=F) > summary(fit) Coefficients: Estimate Std. Error t-ratio p.value x 0.0904889 0.1182933 0.7649543 0.4454256 w 0.7094427 0.2642782 2.6844538 0.0080298 Kloke, McKean R Packages for Rank-based Estimates

  18. Summary ◮ Rfit provides robust rank-based inference for iid linear model. Available at CRAN. ◮ jrfit provides robust rank-based inference for linear model w/ cluster correlated errors. Available at http://www.biostat.wisc.edu/~kloke/ . Kloke, McKean R Packages for Rank-based Estimates

  19. Future Work ◮ Regression through the origin ◮ High-breakdown R estimates (+ diagnostics) ◮ Generalized JR estimate based on working covariance of var( y i ). ◮ “Nonparametric Statistical Methods Using R” (2014) - Chapman Hall Kloke, McKean R Packages for Rank-based Estimates

  20. References ◮ Brunner, E. and Denker, M. (1994), Rank statistics under dependent observations and applications to factorial designs, Journal of Statistical Inference and Planning , 42, 353-378. ◮ Hettmansperger, T. P. and McKean, J. W. (2011), Robust Nonparametric Statistical Methods, 2nd Edition , Boca Raton, FL: CRC Press. ◮ Heritier, S. Cantoni, E., Copt, S. and Victoria-Feser, M. (2009), Robust Methods in Biostatistics , New York: Wiley. ◮ Jaeckel, L. A. (1972), Estimating regression coefficients by minimizing the dispersion of the residuals, Annals of Mathematical Statistics , 43, 1449-1458. ◮ Jureˇ akov´ a, J. (1971), Nonparametric estimate of regression coefficients, Annals of Mathematical Statistics , 42, 1328:1338. ◮ Kloke, JD, McKean, JW (2012). Rfit: Rank-based Estimation for Linear Models. R Journal , 4/2, 57-64. ◮ Kloke, J.D., McKean, J.W., and Rashid, M. (2009), Rank-Based Estimation and Associated Inferences for Linear Models with Cluster Correlated Errors, Journal of the American Statistical Association , 104, 485:384-390. ◮ Koenker, R. (2013). quantreg: Quantile Regression. R package version 4.98. http://CRAN.R-project.org/package=quantreg. ◮ Koul, H.L., Sievers, G.L. and McKean, J.W. (1987). An estimator of the Scale parameter for the Rank Analysis of Linear Models under General Score Functions. Scand J. Statist. 14, 131 - 141. Kloke, McKean R Packages for Rank-based Estimates

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend