[PPT] - Overview of this module Course 02429 Analysis of correlated data: PowerPoint Presentation

SLIDE 1

Course 02429 Analysis of correlated data: Mixed Linear Models Module 6: Model diagnostics Per Bruun Brockhoff

DTU Compute Building 324 - room 220 Technical University of Denmark 2800 Lyngby – Denmark e-mail: perbb@dtu.dk

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 1 / 32

Overview of this module

1

Linear model diagnostics The model assumptions Residuals Normality investigation Variance homogeneity New basic model Outliers Influential observations Data transformation and back transformation

2

Diagnostics for mixed models Residuals Influential observations Random effects normality

3

Drying of beech wood data - a case study, part II

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 2 / 32 Linear model diagnostics The model assumptions

The model assumptions

The model structure should capture the systematic effects in the data. Normality of residuals Variance homogeneity of residuals Independence of residuals Also focus on:

Outliers Influential observations

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 4 / 32 Linear model diagnostics Residuals

Residuals

Predicted values: ˆ yi = ˆ µ + ˆ α(widthi) + ˆ β(depthi) + ˆ δ(planki) Residuals: ˆ ǫi = yi − ˆ yi Standardized residuals: ˆ ǫi = yi − ˆ yi ˆ σ(1 − hi) Or Studentized residuals: ˆ ǫi = yi − ˆ yi ˆ σ(i)(1 − hi) Later we will use raw residuals from the mixed model: r = y −

Xˆ

β + Zˆ u

But standardization is more difficult in mixed models (and here simply

ignored).

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 5 / 32

SLIDE 2

Linear model diagnostics Residuals

Linear model residuals in R

The raw residuals:

planks <- read.table("planks.txt", colClasses = c(rep("factor",3),"numeric"), header = TRUE, sep = ",") lm1 <- lm(humidity depth+plank+width, data = planks) rawresiduals <- resid(lm1)

Or the standardized

standresid <- rstandard(lm1)

Or even the so-called studentized ones:

studresid <- rstudent(lm1)

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 6 / 32 Linear model diagnostics Residuals

Quick in R:

par(mfrow=c(2,2), mar=c(1,1,3,3)) plot(lm1, which=1:4)

3 4 5 6 7 8 −2 −1 1 2

●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Residuals vs Fitted

192 184183

●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
−3

−2 −1 1 2 3 −4 −2 2 4

Normal Q−Q

192 184 183

0.0 0.5 1.0 1.5 2.0

●
●
●
●
●
●
●
●
●
●
●
●
Scale−Location

192 184183

0.00 0.02 0.04

Cook's distance

192 184 183

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 7 / 32 Linear model diagnostics Normality investigation

Normality investigation

par(mfrow=c(1,1)) plot(lm1, which=2)

●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●●
●
−3

−2 −1 1 2 3 −4 −2 2 4 Theoretical Quantiles Standardized residuals lm(humidity ~ depth + plank + width) Normal Q−Q

192 184 183 Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 8 / 32 Linear model diagnostics Variance homogeneity

Checking for variance homogeneity

par(mfrow=c(1,2)) plot(lm1, which=c(1,3))

3 4 5 6 7 8 −2 −1 1 2 Fitted values Residuals

●
●
●
●
Residuals vs Fitted

192 184 183

3 4 5 6 7 8 0.0 0.5 1.0 1.5 2.0 Fitted values Standardized residuals

●
●
●
●
Scale−Location

192 184 183

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 9 / 32

SLIDE 3

Linear model diagnostics Variance homogeneity

Checking for variance homogeneity

par(mfrow=c(2,2)) plot(studresid predict(lm1)) with(planks, plot(studresid plank, col = heat.colors(20))) with(planks, plot(studresid width, col = rainbow(3))) with(planks, plot(studresid depth, col = rainbow(5)))

3

4 5 6 7 8 −3 −2 −1 1 2 predict(lm1) studresid

1

11 13 15 17 19 20 4 6 8 −3 −2 −1 1 2 plank studresid

1

2 3 −3 −2 −1 1 2 width studresid

1

3 5 7 9 −3 −2 −1 1 2 depth studresid

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 10 / 32 Linear model diagnostics New basic model

New basic model

[I]266

300

depth × width8

15

[depth × plank]38

60

[width × plank]76

100

depth4

5

width2

3

[plank]19

20

01

1

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 11 / 32 Linear model diagnostics New basic model

New basic model

Corresponding to; log Yi = µ + α(widthi) + β(depthi) + γ(widthi, depthi) + d(planki) + f(widthi, planki) + g(depthi, planki) + ǫi, d(j) ∼ N(0, σ2

Plank), f(k) ∼ N(0, σ2 Plank∗width)

g(l) ∼ N(0, σ2

Plank∗depth), ǫi ∼ N(0, σ2) planks$loghum=log(planks$humidity) lm3 <- lm(loghum ~ depth * plank + depth * width + plank * width, data = planks) studresid <- rstudent(lm3)

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 12 / 32 Linear model diagnostics New basic model

Diagnostics - Extended model

par(mfrow=c(2,2)) plot(lm3, which=1:4)

1.2 1.4 1.6 1.8 2.0 2.2 −0.15 0.00 0.10 Fitted values Residuals

●
●
●
●
●
Residuals vs Fitted

131 240 239

●
●
●
●
●
●
●
●
●
−3

−2 −1 1 2 3 −3 −1 1 2 3 Theoretical Quantiles Standardized residuals

Normal Q−Q

131 240 239

1.2 1.4 1.6 1.8 2.0 2.2 0.0 0.5 1.0 1.5 Fitted values Standardized residuals

●
●
●
Scale−Location

131 240 239

50 100 150 200 250 300 0.00 0.02 0.04 0.06

Obs. number

Cook's distance

131 240 239

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 13 / 32

SLIDE 4

Linear model diagnostics New basic model

Diagnostics - Extended model

par(mfrow=c(2,2)) plot(studresid predict(lm3)) with(planks, plot(studresid plank, col = heat.colors(20))) with(planks, plot(studresid width, col = rainbow(3))) with(planks, plot(studresid depth, col = rainbow(5)))

●
●
●
1.2

1.4 1.6 1.8 2.0 2.2 −3 −1 1 2 predict(lm3) studresid

1

11 13 15 17 19 20 4 6 8 −3 −1 1 2 plank studresid

1

2 3 −3 −1 1 2 width studresid

1

3 5 7 9 −3 −1 1 2 depth studresid

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 14 / 32 Linear model diagnostics Outliers

Outliers

Definition: An observation that deviates unusually much from it’s expected value. Look at all the previous plots! What to do about outliers:

Identify the outlying observations Check whether some of these may be due to errors or may be explained and excluded for some external/atypical reasons. Investigate the influence of non-explainable outliers.

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 15 / 32 Linear model diagnostics Influential observations

Check for influential observations

DFFIT, a measure of influence for each observation: fi = ˆ yi − ˆ y(i) ˆ σ2

(i)

√hi Cook’s Distance, another measure of influence for each observation: Di = n

j=1(ˆ

yj − ˆ yj(i))2 pˆ σ2

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 16 / 32 Linear model diagnostics Influential observations

Check for influential observations

lm0 <- lm(loghum ~ width, data = planks) infl.lm0 <- influence.measures(lm0) dim(infl.lm0$infmat) ## [1] 300 7 head(infl.lm0$infmat) ## dfb.1_ dfb.wdt2 dfb.wdt3 dffit cov.r cook.d hat ## 1 -1.9577e-01 0.138432 1.3843e-01 -0.195772 0.98212 0.01265651 0.01 ## 2 -3.8198e-02 0.027010 2.7010e-02 -0.038198 1.01888 0.00048778 0.01 ## 3 -2.9551e-02 0.020896 2.0896e-02 -0.029551 1.01948 0.00029198 0.01 ## 4 -3.8198e-02 0.027010 2.7010e-02 -0.038198 1.01888 0.00048778 0.01 ## 5 -1.2533e-01 0.088624 8.8624e-02 -0.125333 1.00446 0.00522638 0.01 ## 6 6.2970e-16 -0.096027 -9.5819e-17 -0.135802 1.00172 0.00613037 0.01

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 17 / 32

SLIDE 5

Linear model diagnostics Influential observations

Check for influential observations

par(mfrow=c(1,1)) plot(lm1, which=4)

50 100 150 200 250 300 0.00 0.01 0.02 0.03 0.04 0.05

Obs. number

Cook's distance lm(humidity ~ depth + plank + width) Cook's distance

192 184 183 Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 18 / 32 Linear model diagnostics Data transformation and back transformation

Data transformation and back transformation

Most common: The log-transformation ("trumpet-shape" of residuals vs predicted plot) Box-Cox transformations: Zi =

Y λ

i ,

λ = 0 log Yi, λ = 0 , Optimizes the choice of transformation. Presentation of results: Back transformatin often needed(wanted!) For the log-transformation, the following is OK:

Back transformation of LSMEANS Back transformations of differences of LSMEANS Back transformations of confidence limits of either of the above

For power transformations:

Back transformation of LSMEANS and their confidence limits

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 19 / 32 Linear model diagnostics Data transformation and back transformation

Box-cox in R

model1.5 <- lm(humidity depth * plank + depth * width + plank*width, data = planks) library(MASS) par(mfrow=c(1,2)) plot(boxcox(model1.5))

−2 −1 1 2 −20 20 40 λ log−Likelihood 95%

−2

−1 1 2 −20 20 40 boxcox(model1.5)$x boxcox(model1.5)$y

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 20 / 32 Diagnostics for mixed models Residuals

Residuals from an lmer result

Raw residuals from the mixed model: r = y −

Xˆ

β + Zˆ u

library(lmerTest)

lmer3 <- lmer(loghum depth * width + (1|plank) + (1|depth:plank) + (1|plank:width), data = planks) lmerresid <- resid(lmer3)

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 22 / 32

SLIDE 6

Diagnostics for mixed models Residuals

Residuals versus predicted and Normal QQ plotfor the mixed model

par(mfrow=c(1,2)) plot(sqrt(abs(lmerresid)) predict(lmer3)) qqnorm(lmerresid)

●
●
1.2

1.4 1.6 1.8 2.0 2.2 0.0 0.1 0.2 0.3 0.4 predict(lmer3) sqrt(abs(lmerresid))

−3

−2 −1 1 2 3 −0.20 −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 23 / 32 Diagnostics for mixed models Residuals

Residuals versus predicted and factor levels for the mixed model

par(mfrow=c(2,2)) plot(lmerresid predict(lmer3)) with(planks, plot(lmerresid plank, col = heat.colors(20))) with(planks, plot(lmerresid width, col = rainbow(3))) with(planks, plot(lmerresid depth, col = rainbow(5)))

●
●
1.2

1.4 1.6 1.8 2.0 2.2 −0.20 −0.10 0.00 0.10 predict(lmer3) lmerresid

1

11 13 15 17 19 20 4 5 6 7 8 9 −0.20 −0.10 0.00 0.10 plank lmerresid

1

2 3 −0.20 −0.10 0.00 0.10 width lmerresid

1

3 5 7 9 −0.20 −0.10 0.00 0.10 depth lmerresid

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 24 / 32 Diagnostics for mixed models Influential observations

Cook’s distance for the mixed model

library(influence.ME) lmer3.infl <- influence(lmer3, obs=TRUE) par(mfrow=c(1,1)) plot(cooks.distance(lmer3.infl))

50

100 150 200 250 300 0.00 0.01 0.02 0.03 Index cooks.distance(lmer3.infl)

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 25 / 32 Diagnostics for mixed models Random effects normality

Check for random effects normality

Assumptions: d(j) ∼ N(0, σ2

Plank), f(k) ∼ N(0, σ2 Plank∗width),

g(l) ∼ N(0, σ2

Plank∗depth)

Use the predicted/expected values of the random effects from the model:

BLUPs=Best Linear Unbiased Predictions The 20 plank predictions The 100 depthplank predictions The 60 widthplank predictions

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 26 / 32

SLIDE 7

Diagnostics for mixed models Random effects normality

Checking for random effects normality in new model for transformed data

par(mfrow=c(1,3)) qqnorm(ranef(lmer3)$`depth:plank`[,1]) qqnorm(ranef(lmer3)$`plank:width`[,1]) qqnorm(ranef(lmer3)$plank[,1])

−2

−1 1 2 −0.03 −0.02 −0.01 0.00 0.01 0.02 0.03

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

−2

−1 1 2 −0.2 −0.1 0.0 0.1 0.2

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

−2

−1 1 2 −0.2 −0.1 0.0 0.1 0.2 0.3

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 27 / 32 Drying of beech wood data - a case study, part II

Drying of beech wood data - a case study, part II

From fixed model: Source of DF Sums of Mean F P-value variation squares squares depth 4 4.28493467 1.07123367 (217.14) (< .0001) width 2 0.79785186 0.39892593 (80.86) (< .0001) depthwidth 8 0.08877653 0.01109707 2.25 0.0268 plank 19 9.12355684 0.48018720 (97.34) (< .0001) depthplank 76 0.51331023 0.00675408 1.37 0.0521 width*plank 38 1.74239118 0.04585240 9.29 < .0001 Error 152 0.74986837 0.00493334 From Mixed model: Source of Numerator Denominator Mean P-value variation degrees degrees squares

f freedom
f freedom

depth 4 76 158.61 <.0001 width 2 38 8.70 0.0008 depth*width 8 152 2.25 0.0268

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 29 / 32 Drying of beech wood data - a case study, part II

Back transformed expected values

mylsmeans <- lsmeans(lmer3, "depth:width")$lsmeans.table with(mylsmeans, interaction.plot(depth, width, exp(Estimate), col = 2:4))

4.5 5.0 5.5 6.0 depth mean of exp(Estimate) 1 3 5 7 9 width 2 1 3

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 30 / 32 Drying of beech wood data - a case study, part II

Results

Width 1 Width 2 Width 3 Depth 1 4.50a Depth 9 4.80a Depth 9 4.31a Depth 9 4.63a Depth 1 4.91a Depth 1 4.44a Depth 3 5.78b Depth 7 5.97b Depth 3 5.28b Depth 7 5.91bc Depth 3 6.26b Depth 7 5.33b Depth 5 6.19bc Depth 5 6.41b Depth 5 5.59b

Depth 1 Depth 3 Depth 5 Depth 7 Depth 9 Width 3 4.44a Width 3 5.28a Width 3 5.59a Width 3 5.33a Width 3 4.31a Width 1 4.50ab Width 1 5.78b Width 1 6.19b Width 1 5.91b Width 1 4.63ab Width 2 4.91b Width 2 6.26b Width 2 6.41b Width 2 5.97b Width 2 4.80b Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 31 / 32

SLIDE 8

Drying of beech wood data - a case study, part II

Overview of this module

1

Linear model diagnostics The model assumptions Residuals Normality investigation Variance homogeneity New basic model Outliers Influential observations Data transformation and back transformation

2

Diagnostics for mixed models Residuals Influential observations Random effects normality

3

Drying of beech wood data - a case study, part II

Per Bruun Brockhoff (perbb@dtu.dk) Mixed Linear Models, Module 6 Fall 2014 32 / 32