Week 8, Lectures 1 & 2 : Fixed-, Random-, and Mixed-Effects - - PowerPoint PPT Presentation

week 8 lectures 1 2 fixed random and mixed effects models
SMART_READER_LITE
LIVE PREVIEW

Week 8, Lectures 1 & 2 : Fixed-, Random-, and Mixed-Effects - - PowerPoint PPT Presentation

Week 8, Lectures 1 & 2 : Fixed-, Random-, and Mixed-Effects models 1. The repeated measures design, where each of n Ss is measured k times, is a popular one in Psych. We approach this design in 2 ways: 1. As a generalisation of the paired


slide-1
SLIDE 1

1

Week 8, Lectures 1 & 2: Fixed-, Random-, and Mixed-Effects models

  • 1. The repeated measures design, where each of

n Ss is measured k times, is a popular one in

  • Psych. We approach this design in 2 ways:

1. As a generalisation of the paired t-test 2. As an expansion from 1-way to 2-way designs

  • 2. Fixed & random effects; nested & repeated

measures designs

  • 3. Using EMS tables to define appropriate F ratios

for certain designs

  • 4. Options for modeling random effects in R
slide-2
SLIDE 2

2

Outline of Lectures 1 & 2 on Mixed- Effects models

  • 5. ‘kv0.csv’; ‘skv1.r’
  • 6. Expanded HW-5:
  • 1. range of possible models
  • 2. Use AIC and log(Likelihood) to compare models
  • 7. Refs: Howell, Chap. 14;

http://www.ats.ucla.edu/stat/R/seminars/ Repeated_Measures/repeated_measures.htm (‘ucla’) http://www.personality-project.org/R/r.anova.html (‘nwu’)

slide-3
SLIDE 3

Reprise of paired t-test

  • Each S gives 2 scores,
  • ne in Group = 0, and

the other in Group = 1.

  • Wrong analysis wd be

res1 = t.test(x0, x1, var.equal = T), because it treats the 2 groups as independent!

  • Correct analysis is

res2 = t.test(x0, x1, var.equal = T, paired = T)

3

This latter analysis aptly finesses the problem introduced by the correlation across Ss between x0i and x1i.

slide-4
SLIDE 4

Reprise of paired t-test

  • For the i’th S, compute the difference:

di = x1i – x0i.

  • Let s = s.d. of the {di}.

Because k – 1 =1.

tn−1 = d s / n ; or F

1,n−1 = tn−1 2

= nd 2 s2 . d = x1 − x0. Grand mean is x1 + x0 2 ≡ x. SSGroup = n[(x1 − x)2 + (x0 − x)2] = n(x1 − x0)2 = nd 2, after simplifying. Thus, F

1,n−1 = nd 2

s2 = SSGroup s2 = MSGroup s2 .

slide-5
SLIDE 5

Reprise of paired t-test

  • How best to interpret s2?
  • di is also the slope of each line in the
  • Fig. Thus variation in di is an index of a

Subject x Group interaction. That is,

  • s2 = var(di) = the Subj x Group

interaction MS.

5

F

1,n−1 = nd 2

s2 = MSGroup s2 F

1,n−1 = nd 2

s2 = MSGroup MSSub*Group

This expression for F generalises to the case, k > 2.

slide-6
SLIDE 6

Introduction to Mixed Models

  • An alternative way to finesse the problem of cor(x0i,

x1i) is to use the package lme4 ¡(= Linear Mixed Effects, with S4 classes), and the function, lmer(). ¡ ¡ First, arrange data in ‘long form’, d1 (as for lm()). ¡

  • res3 = lmer(score ~ Group + (1 | suid), data

= d1).

  • NOT res3i = lm(score ~ Group, data = d1),

which is incorrect, because it ignores the grouping/ correlation resulting from the repeated measures design!

  • Of course the linear mixed model, lmer(), can

do much more than paired t-tests! [Recent articles tout its efficacy: JPSP 2012, by Judd, Westfall & Kenny; Science, Oct 2013, “Biology’s Dry Future”, by R. Service.]

slide-7
SLIDE 7

7

Passage from 1- to 2- & 3-way designs

  • Example 1: How does the Score on a memory test

depend on the length of the study period, T (= 1, 2, or 3 units)? We could use a 1-way between-groups design in which, say, 24 participants (Ss) are randomly assigned, n = 8 to each level of T.

  • Even though the data in the table are in rows, there is no

‘Row’ factor because the 3 Ss in each row have nothing in common.

T = 1 T = 2 T = 3 3 4 3 5 5 7 … … … 1 6 6 Source df SS MS F Between 2 MSb MSb/MSw Within 21 MSw Total 23

slide-8
SLIDE 8

8

  • Example 2: Same as Ex. 1, except that you now worry

that Score might also depend on Ss' verbal Ability (A). So divide Ss into 2 levels of A, ‘lo’ and ‘hi’, 12 Ss at each

  • level. At each level of ability, randomly assign n = 4 Ss

to each level of T. This is a 2-way between-groups factorial design.

  • We have a ‘Row’(A) & a ‘Col’ (T) factor, so we can

define the A*T interaction. The 4 independent obs in each

  • f the 6 cells are used to estimate MSw, which is the

denominator in all 3 F ratios.

A T = 1 T = 2 T = 3 lo 3,5,… 4,5,… 3,7,… hi 5,1,… 5,6,… 7,6,… Source df SS MS F T 2 MST/MSw A 1 MSA/MSw A * T 2 MSA*T/MSw Within 18 MSw Total 23

slide-9
SLIDE 9

9

  • Example 3: Same as Ex. 2, except that you now decide

that the best way to control for Ability (A) is to use each S as her or his own control, and to measure each S’s score at all 3 levels of T. Suppose we have 8 Ss. This is a 2-way within-group design, with S and T as the 2 factors.

  • Because n = 1 obs per cell, we cannot estimate the

within-cell variance, MSw, separately from the interaction

  • MS. Hence we lump the two sources of variation into

MSres, as use this as the denominator in the F ratios for the 2 main effects.

S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5 Source df SS MS F T 2 MST/MSres S 7 MSS/MSres Residual 14 MSres Total 23

slide-10
SLIDE 10

10

  • Example 3 (cont’d): One way to finesse the problem that

the interaction MS and MSw are confounded is to assume that the interaction MS is 0 and, therefore, that MSres =

  • MSw. Assuming that the interaction MS is 0 is equivalent

to assuming that S and T have additive effects. The

lm() model would be: rs3 = lm(score ~ subid + time, data=d0), rather than rs3a = lm(score ~ subid * time, data=d0), which wd give an error message!

S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5 Source df SS MS F T 2 MST/MSres S 7 MSS/MSres Residual 14 MSres Total 23

slide-11
SLIDE 11

11

ANOVA Table for the additive model

Source df MS F df of F Row, e.g., gift r-1 MSr MSr / MSerror r-1, dferror Column, e.g., income c-1 MSc MSc/ MSerror c-1, dferror Within Cell,

  • r Error, or

Residual dferror= rc(n-1)+ (r-1)(c-1) MSerror

  • r MSresid

Total N - 1 = rcn - 1

The formulae for the expected MS (EMS) tell us how to define F for testing each effect. n obs per cell.

slide-12
SLIDE 12

12

ANOVA Table for the interactive model:

**Note reduction in dferror when we test for RxC interaction; and that dferror = 0 if n = 1

Source df MS F df of F Row r-1 MSr MSr / MSerror r-1, dferror Column c-1 MSc MSc/ MSerror c-1, dferror RxC interaction (r-1)(c-1) MSrc MSrc/ MSerror (r-1)(c-1), dferror Within cell dferror= rc(n-1) ** MSerror Total N - 1 = rcn - 1

slide-13
SLIDE 13

13

CAVEAT: When n = 1, one cannot test for interaction! cat('Interactive Model’) rs3a = lm(score ~ subid * time, data=d0) print(anova(rs3a))

Response: score Df Sum Sq Mean Sq F value Pr(>F) ability 2 2.0000 1.0000 time 3 25.6667 8.5556 ability:time 6 3.3333 0.5556 Residuals 0 0.0000 #Note #Note Residuals: ALL 12 residuals are 0: no residual degrees of freedom!

Warning message: In anova.lm(rs3a) : ANOVA F-tests on an essentially perfect fit are unreliable

slide-14
SLIDE 14

Design Differences between Ex. 2 & 3

  • Both are 2-way factorial designs: A by T, and S

by T. However, because we are rarely interested in S per se as an explanatory factor (Why?), but are interested in A, we refer to the S by T design as a 1-factor, within-S design!

  • The number of levels of A is ‘small’; that of S is

‘large’ (more a symptom than a principled diff).

14

A T = 1 T = 2 T = 3 lo 3,5,… 4,5,… 3,7,… hi 5,1,… 5,6,… 7,6,… S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5

slide-15
SLIDE 15

Design Differences between Ex. 2 & 3

  • We are rarely interested in the levels S per se

(i.e., in ‘john’ vs ‘mary’ vs …); these are merely random selections (or ‘effects’) from a ‘large’ popn of possible values. We wish to make inferences about this popn of S levels.

  • We are interested in the levels of A (‘lo’ vs ‘hi’).

These are fixed effects to be interpreted.

15

A T = 1 T = 2 T = 3 lo 3,5,… 4,5,… 3,7,… hi 5,1,… 5,6,… 7,6,… S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5

slide-16
SLIDE 16

Design Differences between Ex. 2 & 3

  • In the A by T design, all obs at each level of A

come from different Ss and, therefore, are statistically independent (i.e., uncorrelated)

  • In the S by T design, all obs at each level of S

come from the same S and, therefore, are

  • correlated. That is, the correlation between

scores when, e.g., T=1 and T=2 shd be positive.

16

A T = 1 T = 2 T = 3 lo 3,5,… 4,5,… 3,7,… hi 5,1,… 5,6,… 7,6,… S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5

slide-17
SLIDE 17

Design Differences between Ex. 2 & 3

  • The S by T design is also called a repeated

measures design. The within-S correl between scores introduces complexities into the calculation of F ratios for this design

  • To solve these complexities, we assume that the

within-S correls satisfy certain simplifying conditions, e.g., Compound Symmetry.

17

A T = 1 T = 2 T = 3 lo 3,5,… 4,5,… 3,7,… hi 5,1,… 5,6,… 7,6,… S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5

slide-18
SLIDE 18

Defn of Compound Symmetry (CS)

  • We assume that the correlation across Ss

between the scores at T = 1 and T = 2, cor(x1i, x2i) = cor(x1i, x3i) = cor(x3i, x2i).

  • If there is a between-Ss factor, A, we also

assume that the correls, cor(x1i, x2i), cor(x1i, x3i) and cor(x3i, x2i), are the same when A = 1 as when A = 2.

  • An almost equivalent set of conditions, known

as sphericity, is that the variance of the differences between scores at T = i and T = j is the same for all i and j.

18

slide-19
SLIDE 19

Mixed Models

  • Fixed effect factors, e.g., ‘gender’, ‘A (lo/

med/hi)’. For quant, B, fixed effects are the linear (‘slope’) and quadratic effects of B

  • Random effect factors, e.g., ‘subject (suid)’,

‘word’, ‘school’, ‘classroom’

  • Mixed models (MM) contain fixed and random
  • effects. F ratios have to be defined carefully
  • ‘Subject’ is often present in a mixed model;

subjects usually give > 1 obs (i.e., there is a within-S factor), and are often nested within a between-S factor. Hence MM analysis is often complicated by nesting and within-S correls

19

slide-20
SLIDE 20

20

Example 4: 1 between-S factor, A, 1 within- S factor, T; Ss are thus nested within A

  • The research questions are the same as before. Ss

are classified as Low or High ability (A); n Ss at each A level, and each S is tested at all levels of T. A varies between Ss; and T varies within Ss - thus we have a repeated measures design.

  • In addition to the fixed effects factors, A and T,

there is the (unmentioned) random effects factor,

  • S. But S1 at Low A is not the same person as S1 at

High Ability. Thus, S is not crossed with A. We say that Subject is nested within Ability.

slide-21
SLIDE 21

21

  • We need to be able to recognise a nested

design (as opposed to a factorial design). Some R and SPSS functions require that we specify the nested factors; other functions do

  • not. The data for n = 3 might be:

Ability T Subject 1 2 3 4 1 1 4 1 2 Low 2 2 5 2 1 3 2 6 3 1 1 3 5 4 2 High 2 2 7 4 3 3 3 6 2 2

slide-22
SLIDE 22

22

Structural (or formal) models are useful for calculating the EMS

The structural model for the 2-way additive, fixed effects model is:

Xijk = µ + ai + bj + eijk,

where ai is the fixed effect of Row i, bj is the fixed effect of Column j, Σiai = 0, Σjbj = 0, etc. var(eijk) is labeled σe

  • 2. The overall Row and Column effects are

defined by parameters, θa and θb, that are quasi- variances:

θa = ai

2 i=1 a

a −1 ,

θb = bj

2 j=1 b

b −1

slide-23
SLIDE 23

23

Under H0, each ai = 0, implying , and each bj = 0, implying .

θa = 0 θb = 0

Under H1, θa > 0. Indeed θa is the key determinant of E(MSr), the Expected Mean Square for Rows. It can be shown that: E(MSr) = σe

2 + nbθa, and

E(MSc) = σe

2 + naθb.

Whether H0 is true or not, E(MSerror) = σe

2.

Good estimates of θa and θb are: θa = (MSr - MSerror)/nb, and θb = (MSc - MSerror)/na. In random effects models, ai is a random variable and θa is replaced by var(ai) = σa

2 , etc.

slide-24
SLIDE 24

24

Mixed-Effects Modeling in R

  • Various functions in R can be used to

analyse mixed models

– aov() in the base package; ideal for balanced designs with no missing data – lme() and gls() in ‘nlme’, with p-values! – lmer() in ‘lme4’, with the latest tricks, no p’s!

  • Use pvals.fnc {languageR} or {lmerTest} for p-values

– fastrml.lmm() in ‘lmm’, with improved est of the random effects – Each function has its own syntax for distinguishing fixed from random effects – See R script, ‘srepmeas0.r’

slide-25
SLIDE 25

25

Drug Responses 1

  • 1, 0, 1, 2, 1, -2, 3, 0

2 2, 3, 2, 0, 6, 1, 1, 5 3 4, 6, 3, 5, 3, 2, 5, 6

Drug may be a fixed- or random- effects factor, as discussed earlier.

slide-26
SLIDE 26

26

  • ‘srepmeas0.r’

library(nlme) library(lme4) library(languageR) # see also {lmerTest} plot plot(d0) cat(‘1. Fixed Effects model’) rs.fix = aov(score ~ drug, d0) print(summary(rs.fix)) print(model.tables model.tables(rs.fix,‘means’,digits = 3) cat(‘2. Random Effects model with aov()’) rs.ran = aov(score ~ drug + Error(drug), d0) #Drug random print(summary(rs.ran))

slide-27
SLIDE 27

27

cat(‘3. Random Effects model with lme()’) rs.lme1 = lme(score ~ 1, random = ~ 1| drug, d0) #Package ‘nlme’ print(summary(rs.lme1)) cat(‘4. Random Effects model with lmer()’) rs.lmer1 = lmer(score ~ (1 | drug), d0) #Package ‘lme4’ print(summary(rs.lmer1)) #To get p-values rs.mcmc = pvals.fnc(rs.lmer1, nsim = 10000, addPlot = T) print(rs.mcmc)

slide-28
SLIDE 28

Compare outputs from different functions in mixed-models analysis

  • Our preferred function for analysing mixed-

models is lmer {lme4}. However, it is helpful to have some knowledge of others, esp lme {nlme}, because its syntax is sometimes used in yet other functions.

  • A broad survey shows that the fixed effects are

characterised by estimates and p-values, and the random effects by variances and covariances/correlations.

28

slide-29
SLIDE 29

29

  • 1. Drug as a fixed effects

fixed effects factor, using aov aov

Df Sum Sq Mean Sq F value Pr(>F)

drug 2 56.333 28.1667 9.315 0.001271 ** Residuals 21 63.500 3.0238

  • 2. Redo aov

aov with Drug as random effects random effects factor

Error: drug Df Sum Sq Mean Sq drug 2 56.333 28.167 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 21 63.5 3.0238 Notes Notes: MSdrug = 28.17 and MSresid = 3.024 in both

  • analyses. In 1-way designs, the

In 1-way designs, the F ratio ratio is the same for the 2 models is the same for the 2 models.

slide-30
SLIDE 30

30

  • 3. Redo

Redo random effects

random effects model with

model with lme

lme() ()

Linear mixed-effects model fit by REML (Default: ‘method=REML’

Default: ‘method=REML’)

Data: d0 AIC BIC logLik 104.3624 107.7689 -49.18119 Random effects: Formula: ~1 | drug (Intercept) Residual StdDev: 1.772811 1.738910 Fixed effects: score ~ 1 Value Std.Error DF t-value p-value (Intercept) 2.416667 1.083333 21 2.230769 0.0367 Notes: MSresid = 1.73892 = 3.024, as found previously. Recall that E(MSb) = σe

2 + n σb 2 = 3.024+8*1.77282 =

28.2, as found previously.

slide-31
SLIDE 31

31

  • 4. Output from lmer() (default is ‘REML=T’)

Formula: score ~ 1 + (1 | drug) Data: d0 [deviance = -2*logLik] AIC BIC logLik deviance REMLdev 104.4 107.9 -49.18 100.3 98.36 Random effects: Groups Name Variance Std.Dev. drug (Intercept) 3.1429 1.7728 Residual 3.0238 1.7389 Number of obs: 24, groups: drug, 3

Fixed effects: Estimate Std. Error t value (Intercept) 2.417 1.083 2.231 These results are consistent with earlier ones.

slide-32
SLIDE 32

32

$fixed Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|) 1 2.417 2.431 0.1647 4.726 0.039 0.0357 $random Groups Name StDev MCMCmedn MCMCmean HPD95lower HPD95upr drug (Int) 1.773 1.4184 1.6403 0.0000 3.4063 Residual 1.739 1.8363 1.8777 1.3567 2.5386 [Comments on Comments on logLik logLik, deviance, , deviance, REMLdev REMLdev, Highest , Highest Posterior density (HPD); Posterior density (HPD); AIC = deviance + 2(p + 1)] Output from lme() AIC BIC logLik 104.3624 107.7689 -49.18119 Output from lmer() AIC BIC logLik deviance REMLdev 104.4 107.9 -49.18 100.3 98.36

slide-33
SLIDE 33

Week 8, Lec 2: Issues in Mixed Models

  • Review of various issues, preliminary to the

introduction of lmer(), that arose in Lec. 1.

  • Describe ‘kv0.csv’ in HW-5: DV = score; ‘attn’

is a between-S factor (2 levels); ‘nsol’ is a within-S factor (3 levels); 10 Ss nested in each level of ‘attn’.

  • What is the set of plausible (mixed) models

for use in analysing the data? What is the theoretical rationale for these models?

  • How to implement and test/compare models in

R?

33

slide-34
SLIDE 34

34

  • Subject (suid) is usually a random effects factor (unless we’re

interested in the roles of the specific Ss); is the variance, σa

2, of

these random effects (e.g., of the overall mean or ‘intercept’ of each S) significantly different from 0? σa

2 is a parameter to be est.

  • We are interested in the effect associated with each level of T; thus T

is a fixed effects factor. If T is categorical with 2 df, there is a fixed effect associated with each df; this is a parameter to be est. If T is quantitative, the linear effect of T (i.e., the ‘slope’) is a fixed effect with 1 df; and the quadratic effect of T (i.e., the ‘non-linearity’) is also a fixed effect with 1 df; ; these are parameters to be est.

S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5 Source df SS MS F T 2 MST/MSres S 7 MSS/MSres Residual 14 MSres Total 23

Preliminary Issues in Mixed Models

slide-35
SLIDE 35

35

When we have repeated obs for each S, this introduces a correl across Ss between, e.g., score at T=1 and score at T=2. This correl complicates the definition of test statistics for the different null hypotheses.

  • lmer() is designed to take account of these correls in defining

appropriate testing procedures.

  • Models with only fixed effects are analysed using lm(). Models

with both fixed and random effects are called mixed models and are analysed using lmer() (or lme() or aov(), …).

  • Subject (or suid) is sometimes referred to as the ‘grouping’ factor.

In linguistic studies, ‘stimuli’ (e.g., words) can also be a random effect if each ‘speaker’ responds to each word.

  • Subjects can vary randomly in their overall average score (or

‘intercept’); but they can also vary in their sensitivity to T (or ‘slope’). How to specify and test these models with lmer()?

S T = 1 T = 2 T = 3 1 3 4 7 2 5 5 6 … … … … 8 1 2 5

Preliminary Issues in Mixed Models

slide-36
SLIDE 36

36

  • The issue of p-values. We will discuss this issue more

fully later on. Note that lme() gives p-values for fixed and random effects parameters, whereas lmer() gives t statistics, but no p-values, and only for fixed effects

  • parameters. This is sometimes inconvenient (but never

more than that!)

  • For ‘intercept only’ mixed models (the simplest type),

pvals.fnc() {languageR} does give p-values. See also {lmerTest}.

  • The recommendation is that we use, e.g., anova(rs1,

rs2), to compare nested models, one containing the variance parameter of interest, and the other omitting this

  • parameter. This comparison yields a p-value. Or the

rule-of-thumb: Reject if |t| > 2!

Preliminary Issues in Mixed Models

slide-37
SLIDE 37

A simple example: 1-way design

  • In this simple, 1-way design, the results are the

same whether we regard drug as a fixed- or a random-effects factor. This is far from true for more complex models.

  • Let us discuss the terms in the lmer() output.

37

rs.lmer1 = lmer(score ~ (1 | drug), d0) print(summary(rs.lmer1)) #To get p-values rs.mcmc = pvals.fnc(rs.lmer1, nsim = 10000, addPlot = T) print(rs.mcmc)

slide-38
SLIDE 38

38

Formula: score ~ 1 + (1 | drug) Data: d0 AIC BIC AIC BIC logLik logLik deviance deviance REMLd REMLdev ev 104.4 107.9 -49.18 100.3 98.36 Random Random effects: Groups Groups Name Variance Std.Dev. drug (Intercept) 3.1429 1.7728 Residual 3.0238 1.7389 Number of obs: 24, groups: drug, 3 Fixed Fixed effects: Estimate Std. Error t value (Intercept) 2.417 1.083 2.231

slide-39
SLIDE 39

39

rs.mcmc = pvals.fnc(rs.lmer1, nsim = 10000, addPlot = T) $fixed Estimate MCMCmean HPD95lower HPD95upper pMCMC Pr(>|t|) 1 2.417 2.431 0.1647 4.726 0.039 0.0357 $random Groups Name StDev MCMCmedn MCMCmean HPD95lower HPD95upr drug (Int) 1.773 1.4184 1.6403 0.0000 3.4063 Residual 1.739 1.8363 1.8777 1.3567 2.5386 [Comments on Comments on logLik logLik, deviance, , deviance, REMLdev REMLdev, Highest , Highest Posterior density (HPD) Posterior density (HPD)]

slide-40
SLIDE 40

Assessing an Intervention

  • 50 patients enrolled in a weight loss program

note their weights at the start of an intervention, at the end of Week 1, and at the end of Week 2. Let W0, W1 and W2 be the initial, end-Wk1, and end-Wk2 weights. Data in ‘wtloss1.csv’.

  • R script in ‘srepmeas3.r’
  • Weights at the 3 time-points are correlated,

because each person gives 3 weights, i.e., 3 repeated measures. So it wd be incorrect to treat this design as a 1-way design with ‘time’ as ‘group’.

40

slide-41
SLIDE 41

id w1 w2 w3 1 72 76 71 2 80 76 65 3 79 70 63 . .. .. .. 48 56 51 57 49 56 53 68 50 72 60 55

How to test H0: E(W1) = E(W2) = E(W2), taking account of cor(Wi, Wj), i ≠ j?

  • Ans. Reshape the data into long

form, and use lmer().

slide-42
SLIDE 42

melt(), lm(), melt(), lm(), lmer lmer() ()

d1 = melt(d0, id.vars = c('id'), measure.vars = c('w1', 'w2', 'w3'), variable.name = 'time', value.name = 'weight') d1$id = factor(d1$id) # The random effects factors have to be defined as factors d1$time = as.numeric(d1$time) # Change 'time' from factor to quantitative variable, 1:3; for use in ggplot(). But CHECK that this line of code gives the correct coding for 'time'!

42

slide-43
SLIDE 43

id w1 w2 w3 1 72 76 71 2 80 76 65 3 79 70 63 . .. .. .. 48 56 51 57 49 56 53 68 50 72 60 55

id time weight 1 1 72 2 1 80 3 1 79 .. .. .. 1 2 57 2 2 73 3 2 68 .. .. .. 1 3 70 2 3 82 3 3 66 .. .. .. 50 3 82 Data in long form, using melt() res2a = lmer lmer(weight ~ time + (1 | id), d1) (weight ~ time + (1 | id), d1)

slide-44
SLIDE 44

Plot data for each Subject to guage ‘how’ Ss vary: Is there appreciable variation in ‘intercept’, as well as in ‘slope’?

  • Data has to be in long form for ggplot2()

g2 = (qplot(time, weight, facets = ~ id, geom = c('point', 'smooth'), method = lm, data = d1) + theme_bw() + ylim(40, 90) ) ggsave('wtloss1a.pdf', plot = g2)

44

slide-45
SLIDE 45

45

  • 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 40 50 60 70 80 90 1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0

time weight

slide-46
SLIDE 46

Mixed models analyses with lmer

lmer() ()

  • There appears to be appreciable variation in both

‘intercept’ and ‘slope’ across Ss. We start with ‘intercept-only’ mixed models, then consider an ‘intercept + slope’ model. (‘slope’ = linear effect of ‘time’ for each S. ‘intercept’ is approx the mean wt of each S.)

res3 = lmer(weight ~ time + (1 | id), d1) print(summary(res3)) res4 = lmer(weight ~ poly(time, 2) + (1 | id), d1) print(summary(res4)) print(anova(res3, res4))

slide-47
SLIDE 47

Discuss results!

res3 = lmer(weight ~ time + (1 | id), d1) print(summary(res3))

Random effects: Groups Name Variance Std.Dev. id (Intercept) 37.795 6.1478 Residual 39.188 6.2601 Number of obs: 150, groups: id, 50 Fixed effects: Estimate Std. Error t value (Intercept) 74.780 1.608 46.51 time -3.630 0.626 -5.80

47

slide-48
SLIDE 48

Discuss results!

res4 = lmer(weight ~ poly(time, 2) + (1 | id), d1) print(summary(res4))

Random effects: Groups Name Variance Std.Dev. id (Intercept) 37.662 6.1369 Residual 39.588 6.2919 Number of obs: 150, groups: id, 50 Fixed effects: Estimate Std. Error t value (Intercept) 67.5200 1.0085 66.95 poly(time, 2)1 -36.3000 6.2919 -5.77 poly(time, 2)2 0.1732 6.2919 0.03

48

slide-49
SLIDE 49

Compare res3 & res4: Is the quadratic effect of ‘time’ sig?

  • No, because the coeff of poly(time, 2)2 in the

lmer() output is not sig (t < 1).

  • For another answer: print(anova(res3, res4))

Models: res3: weight ~ time + (1 | id) res4: weight ~ poly(time, 2) + (1 | id) Df AIC BIC logLik Chisq ChiDf Pr(>Chisq) res3 4 1049.9 1061.9 -520.94 res4 5 1051.9 1066.9 -520.94 0.0024 1 0.9605

res4 is not sig better than res3, so keep simpler model, res3.

49

slide-50
SLIDE 50

‘Intercept + slope’ model

res5 = lmer(weight ~ time + (1 + time | id), d1) print(summary(res5)) print(anova(res3, res5))

Models: res3: weight ~ time + (1 | id) res5: weight ~ time + (1 + time | id) Df AIC BIC logLik Chisq ChiDf Pr(>Chisq) res3 4 1049.9 1061.9 -520.94 res5 6 1033.8 1051.8 -510.88 20.132 2 4.3e-05 ***

The variation in ‘slope’ is sig, and this parameter shd be included in mixed models analyses of these data.

50

slide-51
SLIDE 51

Summary of mixed models analyses

  • The individual-S plots show that the ‘time’ effect

is approx linear for all Ss. A formal test of the relevant coefficient, and a formal model comparison using anova() show that the quadratic effect of ‘time’ is not sig.

  • The individual-S plots show appreciable

variation in both ‘intercept’ and ‘slope’ across Ss. A formal model comparison using anova() shows that the variation in ‘slope’ is sig, and shd be included in any mixed models analysis of these data.

51

slide-52
SLIDE 52

52

HW-5 using ‘kv0.csv’

  • The fictitious data set, ‘kv0.csv’, is based
  • n Dr. Katerina Velanova’s FYP data that

she shared with us when she was a Psych 252 student.

  • I found this acknowledgement in an old

handout: Data and analyses courtesy of Katerina Velanova, to whom Psy 252 owes a debt of gratitude [11/8/96].

  • Relevant R script: ‘skv1.r’
slide-53
SLIDE 53

53

  • Ten subjects were run in a divided attention

condition (attn = 1), and 10 different subjects were run in a focused attention condition (attn = 2). Each subject was then tested on a word task (anagrams) that had a unique solution (numsol = 1), two solutions (numsol = 2), or multiple solutions (numsol = 3). The dependent measure was a memory score (higher numbers reflect better performance).

  • This design involves both between- and within-

factors and is probably the most useful design for psychologists.

slide-54
SLIDE 54

54

Attention SubjID Numsol=1 Numsol=2 Numsol=3 1.00 2.00 4.00 7.00 2.00 3.00 4.00 5.00 3.00 3.00 5.00 6.00 4.00 5.00 7.00 5.00 Attn = 1 5.00 4.00 5.00 8.00 6.00 5.00 5.00 6.00 7.00 5.00 4.50 6.00 8.00 5.00 7.00 8.00 9.00 2.00 3.00 7.00 10.00 6.00 5.00 6.00 1.00 6.00 5.00 6.00 2.00 8.00 9.00 8.00 3.00 6.00 5.00 9.00 4.00 8.00 8.00 7.00 Attn = 2 5.00 8.00 8.00 7.00 6.00 6.00 8.00 7.00 7.00 7.00 7.00 6.00 8.00 7.00 8.00 6.00 9.00 5.00 6.00 6.00 10.00 6.00 6.00 5.00

slide-55
SLIDE 55

55

slide-56
SLIDE 56

56

#Scripts to analyse kv0.csv; in ‘skv1.r’ source('makerm.r') library(lme4) library(nlme) d0 = read.csv('kv0.csv', header = TRUE) con1 = cbind(c(-2,1,1), c(0,-1,1)) #Convert d0 into long form with make.rm() #Now we know how to use melt() {ggplot2}! Now we know how to use melt() {ggplot2}!

dl1 = make.rm(constant = 1:2, repeated = 3:5, data = d0) colnames(dl1) = c('suid','atten','score','nsol')

slide-57
SLIDE 57

57

dl1$nsol = factor(dl1$nsol, labels=c('one','two','many')) dl1$atten = factor(dl1$atten, labels=c('div','foc')) dl1$suid = factor(dl1$suid) contrasts(dl1$atten) = c(-1,1) contrasts(dl1$nsol) = cbind(lin=c(-1,0,1), quad=c(-1,2,-1))

slide-58
SLIDE 58

58

rs.aov = aov(score ~ atten*nsol + Error(suid), dl1) print(summary(rs.aov))

Error: suid Df Sum Sq Mean Sq F value Pr(>F) atten 1 42.504 42.504 17.598 0.0005442 *** Residuals 18 43.475 2.415 Error: Within Df Sum Sq Mean Sq F value Pr(>F) nsol 2 14.408 7.2042 6.5909 0.003639 ** atten:nsol 2 15.408 7.7042 7.0483 0.002612 ** Residuals 36 39.350 1.0931

slide-59
SLIDE 59

59

rs.lme1 = lmer(score ~ atten*nsol + (1|suid), dl1) print(summary(rs.lme1))

Linear mixed model fit by REML Formula: score ~ atten * nsol + (1 | suid) Data: dl1 AIC BIC logLik deviance REMLdev 213.5 230.2 -98.73 185.1 197.5 Random effects: Groups Name Variance Std.Dev. suid (Intercept) 0.44074 0.66388 Residual 1.09306 1.04549 Number of obs: 60, groups: suid, 20 Fixed effects: Estimate Std. Error t value (Intercept) 5.958333 0.200628 29.698 atten1 0.841667 0.200628 4.195 nsollin 0.600000 0.165307 3.630 nsolquad 0.008333 0.095440 0.087 atten1:nsollin -0.600000 0.165307 -3.630 atten1:nsolquad 0.091667 0.095440 0.960

slide-60
SLIDE 60

60

rs.lme2a = lmer(score ~ atten * nsol + (1 + nsol|suid), dl1) print(summary(rs.lme2a))

Linear mixed model fit by REML Formula: score ~ atten * nsol + (1 + nsol | suid) Data: dl1 AIC BIC logLik deviance REMLdev 215.5 242.8 -94.77 176.4 189.5 Random effects: Groups Name Variance Std.Dev. Corr suid (Intercept) 0.663354 0.81447 nsollin 0.459614 0.67795 -0.401 nsolquad 0.069408 0.26345 0.494 -0.804 Residual 0.425219 0.65209 Number of obs: 60, groups: suid, 20

slide-61
SLIDE 61

61

Fixed effects: Estimate Std. Error t value (Intercept) 5.958333 0.200636 29.697 atten1 0.841667 0.200636 4.195 nsollin 0.600000 0.183334 3.273 nsolquad 0.008333 0.083751 0.100 atten1:nsollin -0.600000 0.183334 -3.273 atten1:nsolquad 0.091667 0.083751 1.095

Describe the data!