Dag 4: Linear regression Susanne Rosthj Section of Biostatistics - - PowerPoint PPT Presentation

dag 4 linear regression
SMART_READER_LITE
LIVE PREVIEW

Dag 4: Linear regression Susanne Rosthj Section of Biostatistics - - PowerPoint PPT Presentation

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s Dag 4: Linear regression Susanne Rosthj Section of Biostatistics Department of Public Health University of Copenhagen sr@biostat.ku.dk u n i v e


slide-1
SLIDE 1

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Dag 4: Linear regression

Susanne Rosthøj

Section of Biostatistics Department of Public Health University of Copenhagen sr@biostat.ku.dk

slide-2
SLIDE 2

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Example: vitamin D as a function of BMI

2 / 11

slide-3
SLIDE 3

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Linear regression model

n independent observations Y1, Y2, . . . , Yn. We assume Yi = a + b · Xi + ǫi, with ǫi ∼ N(0, σ2) independent ǫ is the residual error. The outcome Yi has a normal distribution with mean = a + b · Xi variance = σ2 where Xi is a quanititative explanatory variable.

3 / 11

slide-4
SLIDE 4

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Estimation

Estimated regression line: vitaminD = 111.05 − 2.39 · BMI SE of the effect of BMI: 0.69. 95%CI (-3.78;-1.00). Test of the effect of BMI by a t-test t = −2.39 0.69 = −3.47, p = 0.001. Interpretation : For a 1 unit increase in BMI, vitamin D is lowered by 2.39 nmol/L (95% CI 1.00-3.78), p=0.001.

4 / 11

slide-5
SLIDE 5

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Analysis in SAS

We use proc glm (General Linear Model):

data vitamin; infile ’http://publicifsv.sund.ku.dk/~sr/MPH/datasets/vitamin.txt’ URL firstobs=2; input country vitd age bmi sunexp intake; run; proc glm data=vitamin plots=DiagnosticsPanel; model vitd = bmi / solution clparm; where country=4; * Irland; run;

Discuss the output in the handout and find the numbers on the previous slide.

5 / 11

slide-6
SLIDE 6

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Does the model fit to the data?

Our conclusions based on the model are valid only if the model is valid. Assumptions : 1) Independence between observations 2) Linearity 3) Normality (of residual errors ǫ) 4) Homogeneity of variance (of residual errors ǫ) Normality and homogeneity assessed through the residuals.

6 / 11

slide-7
SLIDE 7

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Assessment of 2) Linearity

Extend the model : Yi = a + b · BMIi + c · BMI2

i + ǫi

A test of linearity : H0 : c = 0.

proc glm data=vitamin plots=DiagnosticsPanel; model vitd = bmi bmi*bmi / solution clparm; where country=4; * Irland; run;

Is the linear model plausible?

7 / 11

slide-8
SLIDE 8

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Predicted values and residuals

The predicted or fitted values : ˆ Yi = 111.05 − 2.39 · BMIi Expected vitamin D level for a woman with BMI=21.8 111.05 − 2.39 × 21.8 = 58.9 For each woman we determine the residual ri = Yi − (111.05 − 2.39 · BMIi) as the difference between observed and predicted value. Residual for the woman with BMI=21.8 and Y=vitamin D=89.1 89.1 − (111.05 − 2.39 · 21.8) = 89.1 − 58.9 = 30.2

8 / 11

slide-9
SLIDE 9

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Residuals

  • 20

25 30 35 20 40 60 80 100 BMI Vitamin D

  • 9 / 11
slide-10
SLIDE 10

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Assessment of 3) Normality

QQ-plot of residuals (plot 4 i output) (evt histogram (plot 7)):

10 / 11

slide-11
SLIDE 11

u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s

Assessment of 4) Homogeneity

Plot residuals as a function of predicted values (plot 1): Constant variance? Trumpet-shape?

11 / 11