Analysis of variance and regression November 22, 2007

Parametrisations : • Choice of parameters • Comparison of models • Test for linearity • Linear splines

Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

Parametrisations, November 2007 1 Parameter: unknown quantity that we want to estimate (provide a good guess) • the decrease in blood pressure following treatment A or the difference in decrease for treatment A and placebo • the increase in insulin growth factor (IGF-1) with age Parametrisation: choice of which parameters are to enter the model description Re-parametrisation: shift to a new set of parameters

Parametrisations, November 2007 2 Most well known choice of parametrisation: • Change of scale/units Do we measure height in cm or m ? Take the relation of lung capacity versus height: fev1 = α + β × height If we change from measuring height in cm to m , we also change the regression coefficient (the parameter) from β to β ∗ = 100 β • Change of origin/intercept – choice of another reference group in ANOVA – subtracting e.g. 170 cm from all height measurements Re-parametrisations do not change the model as such! • same fitted values • same confidence- and prediction limits • – but a possibility for interpretations of specific interest

Parametrisations, November 2007 3 What makes us choose a specific parametrisation? • Ease - the program has some default parametrisations • Estimation of specific quantities: - the potency of a drug, ED 50 or ED 90 • Test of specific hypotheses - difference between treatment and placebo - difference in height for boys and girls at the age of 14

Parametrisations, November 2007 4 In the more advanced situations ( beyond linearity ) – non-linear regression, logistic regression, correlated observations: • Knowledge of distributional assumptions: - Some parameter estimates may be more normally distributed than others (and we like to be able to construct symmetric confidence intervals, using the standard error) In linear models the estimates have exact normal distributions (provided the model assumptions are met, of course...)

Parametrisations, November 2007 5 Example: A group consisting of 45 patients with Reumatoid Arthritis are randomised to one out of 6 possible treatments ( treat ): • Placebo • Aspirin • One of 4 doses ( dose ) of an active anti-inflammatory drug which we shall denote X. Outcome: An index ( Index ) summing up the effectiveness of the treatment (decrease in various symptoms)

Parametrisations, November 2007 6 Outcome: Index -values: Reference: Woolson, R.F. & Clarke, W.R.: Statistical methods for the analysis of biomedical data. 2ed., Wiley, 2002. (Exercise 10.4 page 409)

Parametrisations, November 2007 7 How do we represent these data in SAS? Obs group type dose index 1 placebo placebo 0 6.2 2 placebo placebo 0 5.8 3 placebo placebo 0 9.5 4 placebo placebo 0 10.2 5 placebo placebo 0 8.3 6 placebo placebo 0 7.9 7 placebo placebo 0 9.2 38 x20 active 20 29.5 39 x20 active 20 34.6 40 x20 active 20 31.9 41 x25 active 25 41.8 42 x25 active 25 45.2 43 x25 active 25 43.2 44 x25 active 25 46.5 45 x25 active 25 41.7

Parametrisations, November 2007 8 Summary statistics The MEANS Procedure Analysis Variable : index N group Obs N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------- aspirin 11 11 23.2545455 2.9561338 17.1000000 26.6000000 placebo 9 9 8.6222222 1.8369661 5.8000000 11.6000000 x10 5 5 5.9600000 0.7635444 5.2000000 6.9000000 x15 9 9 17.9444444 1.0607911 16.4000000 19.5000000 x20 6 6 33.1333333 3.0051068 29.5000000 37.2000000 x25 5 5 43.6800000 2.1182540 41.7000000 46.5000000 -------------------------------------------------------------------------

Parametrisations, November 2007 9 We start by looking at the 4 X-groups only: Below, the outcome Index is plotted against Dose group.

Parametrisations, November 2007 10 Comparison of 4 dose groups: One-way ANOVA Model written as a multiple regression: Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ǫ where the x ’s are socalled ”dummy”variables: x 1 is 1 if subject i belongs to the first group, and 0 otherwise x 2 is 1 if subject i belongs to the second group, and 0 otherwise x 3 is 1 if subject i belongs to the third group, and 0 otherwise With this parametrisation, β 0 will correspond to the level for the last group (the reference group, here group 4); β 1 will be the difference in level between group 1 and group 4 β 2 will be the difference in level between group 2 and group 4 and so on...

Parametrisations, November 2007 11 Traditional One-way ANOVA in SAS: proc glm data=drug; where type=’active’; class group; model index=group / solution; run; which yields the output: The GLM Procedure Class Level Information Class Levels Values group 4 x10 x15 x20 x25 Number of Observations Used 25

Parametrisations, November 2007 12 The GLM Procedure Dependent Variable: index Sum of Source DF Squares Mean Square F Value Pr > F Model 3 4391.364444 1463.788148 412.97 <.0001 Error 21 74.435556 3.544550 Corrected Total 24 4465.800000 R-Square Coeff Var Root MSE index Mean 0.983332 7.734994 1.882698 24.34000 Source DF Type I SS Mean Square F Value Pr > F group 3 4391.364444 1463.788148 412.97 <.0001 Source DF Type III SS Mean Square F Value Pr > F group 3 4391.364444 1463.788148 412.97 <.0001

Parametrisations, November 2007 13 Standard Parameter Estimate Error t Value Pr > |t| Intercept 43.68000000 B 0.84196796 51.88 <.0001 group x10 -37.72000000 B 1.19072251 -31.68 <.0001 group x15 -25.73555556 B 1.05011855 -24.51 <.0001 group x20 -10.54666667 B 1.14003001 -9.25 <.0001 group x25 0.00000000 B . . . NOTE: The X’X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ’B’ are not uniquely estimable. The ’ B ’ to the right of the estimates is explained in the NOTE It simply means: By renaming the group levels/names, we may get a different parametrisation!

Parametrisations, November 2007 14 We here disregard the problem of variance heterogeneity : proc glm data=drug; where type=’active’; class group; model index=group / noint solution; means group / hovtest=levene; run; from which we get Levene’s Test for Homogeneity of index Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F group 3 192.7 64.2320 5.92 0.0043 Error 21 228.0 10.8585 A clear indication that the variance increases with dose. Logarithms?

Parametrisations, November 2007 15 Same model , now parametrised with one level for each group: proc glm data=drug; where type=’active’; class group; model index=group / noint solution; run; now yielding instead: Dependent Variable: index Sum of Source DF Squares Mean Square F Value Pr > F Model 4 19202.25444 4800.56361 1354.35 <.0001 Error 21 74.43556 3.54455 Uncorrected Total 25 19276.69000

Parametrisations, November 2007 16 R-Square Coeff Var Root MSE index Mean 0.983332 7.734994 1.882698 24.34000 Source DF Type I SS Mean Square F Value Pr > F group 4 19202.25444 4800.56361 1354.35 <.0001 Source DF Type III SS Mean Square F Value Pr > F group 4 19202.25444 4800.56361 1354.35 <.0001 Standard Parameter Estimate Error t Value Pr > |t| group x10 5.96000000 0.84196796 7.08 <.0001 group x15 17.94444444 0.62756587 28.59 <.0001 group x20 33.13333333 0.76860808 43.11 <.0001 group x25 43.68000000 0.84196796 51.88 <.0001 The tests now refer to the hypothesis of a zero level (which is not interesting)

Parametrisations, November 2007 17 Parametrisations in One-way ANOVA • One level ( µ 4 ) for the reference group (the last, numerically or alphabetically), supplemented with differences from this reference group to each of the remaining groups ( β 1 , β 2 , β 3 ) Y gi = µ 4 + β g + ε gi , – good for testing of identity and certain pairwise comparisons β i = µ i − µ 4 • One level for each group Y gi = µ g + ε gi – good for estimation, not suited for testing!!

Parametrisations, November 2007 18 Estimate statements in GLM If we want to compare dose 10 with dose 15: proc glm data=drug; where type=’active’; class group; model index=group / noint solution; estimate ’dose 15 vs. dose 10’ group -1 1 0 0; run; from which we get Standard Parameter Estimate Error t Value Pr > |t| dose 15 vs. dose 10 11.9844444 1.05011855 11.41 <.0001

Parametrisations, November 2007 19 We return to the scatter plot, now with a linear regression line Can we use a simple model, saying that the dose effect is linear?

Analysis of variance and regression November 22, 2007 - PowerPoint PPT Presentation

Analysis of variance and regression November 22, 2007 Parametrisations : Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health,

Analysis of variance and regression December 4, 2007 Variance component models Variance

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Analysis of variance and regression Other types of regression models Other types of regression

Analysis of variance and regression November 27, 2007 Other types of regression models Counts

Estimating Variance under Estimating Mean . . . Interval and Fuzzy Estimating Variance . . .

Alex Psomas: Lecture 18. Random Variables: Variance 1. Variance 2. Distributions Variance Flip

Variance = E[I 2 ] 2pE[I] + p 2 = E[I] 2p p + p 2 = 2 2 = p-2p+ p pq variance.1

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation

Module 15 Standard Costing and Variance Analysis Dr. Varadraj Bapat 1 Standard Costing

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression November 13, 2007 SAS graphics Scatter plots

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march

Feb 27: Expectation, Variance, and Standard Deviation In-class Midterm Exam MOVED to 3/10

Commonwealth Libraries Town Hall Session At the Pennsylvania Library Association Annual

Introd u ction to statistical seismolog y C ASE STU D IE S IN STATISTIC AL TH IN K IN G J u

Lattices from Codes or Codes from Lattices Amin Sakzad Dept of Electrical and Computer Systems

Pipes & FIFOs 1 2 Maria Hybinette, UGA Maria Hybinette, UGA What is a Pipe? Example: Shell

Complexity driven collapse of economic equilibria Giacomo Livan joint work with Marco Bardoscia

BETTER-BEHAVED, BETTER-PERFORMING MULTIMEDIA NETWORKING Jae Chung and Mark Claypool Computer

Outline About this Tutorial An Introduction to the R Environment Basics of R Objects and

Arlington Democrats July 2020 General Meeting July 1, 2020 7:00 PM Welcome to Resisting While

Sambuz

Useful Links

Newsletter

Mail Us