Analysis of variance and regression May 13, 2008 Repeated - - PowerPoint PPT Presentation

analysis of variance and regression may 13 2008 repeated
SMART_READER_LITE
LIVE PREVIEW

Analysis of variance and regression May 13, 2008 Repeated - - PowerPoint PPT Presentation

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline considerations Lene


slide-1
SLIDE 1

Analysis of variance and regression May 13, 2008

slide-2
SLIDE 2

Repeated measurements over time

  • Presentation of data
  • Traditional ways of analysis
  • Variance component model

(the dogs revisited)

  • Random regression
  • Baseline considerations
slide-3
SLIDE 3

Lene Theil Skovgaard,

  • Dept. of Biostatistics,

Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~pd/regression08_1

slide-4
SLIDE 4

Repeated measurements, May 2008 1

Traditional presentation of longitudinal data: Ex: Aspirin absorption for healthy and ill subjects (Matthews et.al.,1990)

Comparison of groups for each time:

  • mass significance

problem

  • tests are not

independent

  • interpretation

may be difficult

slide-5
SLIDE 5

Repeated measurements, May 2008 2

What is the purpose of the investigation?

  • Description of time course
  • Comparison of groups

– in which respect? level, trend,...

  • verall pattern
slide-6
SLIDE 6

Repeated measurements, May 2008 3

Why is this difficult? – or at least different from usual analyses

  • We have several measurements on each individual

– traditional independence assumption is violated – repeated observations on the same individual are correlated (look alike) – ignoring this correlation may lead to bias, wrong standard error and therefore potentially misleading conclusions

  • Time course may be quite irregular, with no obvious structure,

to be treated as a class-variable (using many parameters) in ANOVA-type models (variance component models)

  • Time course may vary between individuals

Random regression

slide-7
SLIDE 7

Repeated measurements, May 2008 4

Notation from multi-level models: level unit covariate 1 single observations time effects 2 individuals treatment effects If we fail to take this correlation into account, we will experience:

  • possible bias in the mean value structure
  • low efficiency (type 2 error) for evaluation of level 1 covariates

(time-related effects)

  • too small standard errors (type 1 error) for estimates of level 2

effects (treatments)

slide-8
SLIDE 8

Repeated measurements, May 2008 5

Possible bias?

Individual time courses Average curve sometimes referred to as the healthy worker effect

slide-9
SLIDE 9

Repeated measurements, May 2008 6

Missing values

  • MCAR

Missing completely at random

  • MAR

Missing at random

  • may depend on past observations
  • NR

Informative missing (non-random)

  • depends on the missing value itself
slide-10
SLIDE 10

Repeated measurements, May 2008 7

Level 1 covariates (unit: single observations), i.e.

  • Time itself
  • Covariates varying with time:

blood pressure, heart rate, age If correlation is not taken into account, we ignore the paired situation, leading to low efficiency, i.e. too large P-values (type 2 error) Effects may go undetected!

slide-11
SLIDE 11

Repeated measurements, May 2008 8

Level 2 covariates (unit: individuals), i.e.

  • Treatment
  • Gender, age

If correlation is ignored, we act as if we have (a lot) more information than we actually have, leading to too small P-values (type 1 error) ’Noise’ may be taken to be real effects!

slide-12
SLIDE 12

Repeated measurements, May 2008 9

Average curves may hide important structures!

  • They give no indication of the variation in the time profiles
  • Comparisons between groups

should not be performed for each time point separately

  • Comparisons between time points cannot be judged from the

curves (they are paired)

slide-13
SLIDE 13

Repeated measurements, May 2008 10

The model must describe the characteristic differences between individuals, and the rest (noise, error) should be of an unsystematic, random nature.

  • Do not average over individual profiles, unless these have

identical shapes, i.e. only shifts in level are seen between individuals.

  • Alternative:

Calculate individual characteristics

slide-14
SLIDE 14

Repeated measurements, May 2008 11

Individual time profiles (spaghettiogram) - divided into groups Do we see time profiles of identical shape? Are the averages representative?

slide-15
SLIDE 15

Repeated measurements, May 2008 12

Commonly used characteristics

  • The response for selected times, e.g. endpoint
  • Average over a specific period of time
  • The slope, perhaps for a specific period
  • Peak value
  • Time to peak
  • The area under the curve (AUC).
  • A measure of cyclic behaviour.

These are analysed as new observations.

slide-16
SLIDE 16

Repeated measurements, May 2008 13

Ex: Aspirin

  • time to peak
  • peak value

Conclusion: P=0.02 for identity of peak values. Quantifications!

slide-17
SLIDE 17

Repeated measurements, May 2008 14

Example: 2 groups of dogs (5 resp. 6 dogs). Average profiles:

  • f osmolality, measured 4 times

(including treatments along the way)

slide-18
SLIDE 18

Repeated measurements, May 2008 15

Do we have ’identical’ repetitions (except for level)?

slide-19
SLIDE 19

Repeated measurements, May 2008 16

Model control

Residual plot for 2-way ANOVA in (dog, treatment) We see a clear trumpet shape, because dogs with a high level also vary more than dogs with a low level. Multiplicative structure Solution: Make a logarithmic transformation!

slide-20
SLIDE 20

Repeated measurements, May 2008 17

Profiles on logarithmic scale, with corresponding residual plot:

slide-21
SLIDE 21

Repeated measurements, May 2008 18

Multilevel model structure: level/niveau 1 2 unit single measurements individuals variation within individuals between individuals σ2

W

ω2

B

covariates x z time, grp*time grp Multilevel models are part of the broader class of models: variance component models (which are not necessarily hierarchical)

slide-22
SLIDE 22

Repeated measurements, May 2008 19

Two-level model:

  • Observations Ygdt (group, dog, time)
  • Random dog-level, Var(agd) = ω2

B

  • Residual variation, within dogs, Var(εgdt) = σ2

W

  • Systematic effect of time and grp

proc mixed data=dog; class grp time_no dog; model losmol=grp time_no grp*time_no / ddfm=satterth; random dog(grp); run;

slide-23
SLIDE 23

Repeated measurements, May 2008 20

This model assumes the socalled compound symmetry, i.e. that all measurements on the same individual are equally correlated: Corr(Ygdt1, Ygdt2) = ρ = ω2

B

ω2

B + σ2 W

This means that the distance in time is not taken into account!!

slide-24
SLIDE 24

Repeated measurements, May 2008 21

Two-level model with random dog level:

Class Levels Values grp 2 1 2 time_no 4 1 2 3 4 dog 11 1 2 3 4 5 6 7 8 9 10 11 Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) 0.06587 0.03532 1.86 0.0311 Residual 0.03554 0.009672 3.67 0.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 27 21.35 <.0001 grp*time_no 3 27 2.50 0.0805

P=0.08 for test of interaction, i.e. no convincing indication of this.

slide-25
SLIDE 25

Repeated measurements, May 2008 22

Factor diagram:

[I] = [Dog ∗ Time] Grp ∗ Time [Dog] Time Grp

❍❍❍ ❍ ❥ ✟✟✟ ✟ ✯ ✑✑✑✑ ✑ ✸ ✲ ✲ We have used the notation [ ] for the random effects, corresponding to variance components. We may note the following:

  • The effect of Grp*Time is evaluated against Dog*Time
  • If Grp*Time is not considered significant, we thereafter evaluate

– Time against Dog*Time – Grp against Dog(Grp)

slide-26
SLIDE 26

Repeated measurements, May 2008 23

The variance component model with random dog level specifies the covariance structure:

B B B B @ ω2

B + σ2 W

ω2

B

ω2

B

ω2

B

ω2

B

ω2

B + σ2 W

ω2

B

ω2

B

ω2

B

ω2

B

ω2

B + σ2 W

ω2

B

ω2

B

ω2

B

ω2

B

ω2

B + σ2 W

1 C C C C A = (ω2

B+σ2 W )

B B B B @ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 1 C C C C A

called the compound symmetry structure. The correlation ρ is here estimated to ρ = Corr(Ygdt1, Ygdt2) = ω2

B

ω2

B + σ2 W

≈ 0.06587 0.06587 + 0.03554 = 0.65

slide-27
SLIDE 27

Repeated measurements, May 2008 24

Note, that the specification ’random dog(grp);’ can be written in two other ways: random intercept / subject=dog(grp); repeated time / type=CS subject=dog(grp); In the following, we shall see generalisations

  • f the constructions above.
slide-28
SLIDE 28

Repeated measurements, May 2008 25

Compound symmetry analysis

proc mixed data=dog; class grp time dog; model losmol=grp time grp*time / ddfm=satterth; repeated time / type=cs subject=dog(grp) rcorr; run;

Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.6496 0.6496 0.6496 2 0.6496 1.0000 0.6496 0.6496 3 0.6496 0.6496 1.0000 0.6496 4 0.6496 0.6496 0.6496 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate CS dog(grp) 0.06587 Residual 0.03554 Fit Statistics

  • 2 Res Log Likelihood

14.8 AIC (smaller is better) 18.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 27 21.35 <.0001 grp*time_no 3 27 2.50 0.0805

slide-29
SLIDE 29

Repeated measurements, May 2008 26

The option ddfm=satterth (- or kenwardrogers):

  • When the distributions are exact, they have no effect

– in balanced situations

  • When approximations are necessary, these are considered best

– in unbalanced situations, i.e for almost all observational designs – in case of missing observations

  • It may give rise to fractional degrees of freedom
  • The computations may require a little more time,

but in most cases this will not be noticable

  • When in doubt, use it!
slide-30
SLIDE 30

Repeated measurements, May 2008 27

Since the interaction was not significant, we omit it from the model:

Covariance Parameter Estimates Standard Z Cov Parm Estimate Error Value Pr Z dog(grp) 0.06453 0.03534 1.83 0.0339 Residual 0.04088 0.01056 3.87 <.0001 Solution for Fixed Effects Standard Effect grp time_no Estimate Error DF t Value Pr > |t| Intercept 0.5422 0.1235 9 4.39 0.0017 grp 1 0.2795 0.1656 9 1.69 0.1257 grp 2 . . . . time_no 1 0.1215 0.08621 30 1.41 0.1691 time_no 2

  • 0.2173

0.08621 30

  • 2.52

0.0173 time_no 3

  • 0.4608

0.08621 30

  • 5.35

<.0001 time_no 4 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 30 17.66 <.0001

slide-31
SLIDE 31

Repeated measurements, May 2008 28

The variance component model (compound symmetry) with random dog level specifies the covariance structure:

(ω2

B + σ2 W )

B B B B B @ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 ρ ρ ρ ρ 1 1 C C C C C A

But: The assumption of equal correlation for all pairs of observations taken on the same individual is not necessarily reasonable! Observations taken close to each other in time will often be more closely correlated than observations taken further apart!

slide-32
SLIDE 32

Repeated measurements, May 2008 29

In the dog example, the empirical correlation matrix is        1 0.60 0.60 0.48 0.60 1 0.73 0.63 0.60 0.73 1 0.95 0.48 0.63 0.95 1        Rather large differences are seen between individual correlations. So what?

slide-33
SLIDE 33

Repeated measurements, May 2008 30

Unstructured covariance If we do not assume any special structure for the covariance, we may let it be arbitrary = unstructured This is done in MIXED by using ’type=UN’ and remembering the

  • ption hlm:

proc mixed data=dog; class grp dog time_no; model losmol=grp time_no grp*time_no / ddfm=satterth; repeated time_no / type=UN hlm subject=dog(grp) rcorr; run;

slide-34
SLIDE 34

Repeated measurements, May 2008 31 Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.6010 0.5978 0.4817 2 0.6010 1.0000 0.7310 0.6336 3 0.5978 0.7310 1.0000 0.9464 4 0.4817 0.6336 0.9464 1.0000 Fit Statistics

  • 2 Res Log Likelihood

2.3 AIC (smaller is better) 22.3 Type 3 Hotelling-Lawley-McKeon Statistics Num Den Effect DF DF F Value Pr > F time 3 7 90.97 <.0001 grp*time_no 3 7 4.91 0.0381

slide-35
SLIDE 35

Repeated measurements, May 2008 32

Advantages with unstructured covariance

  • We do not force a wrong covariance structure upon our
  • bservations.
  • We gain some insight in the actual structure of the covariance.

Drawbacks of the unstructured covarianc

  • We use quite a lot of parameters to describe the covariance
  • structure. The result may therefore be unstable.
  • It cannot be used for small data sets
  • It can only be used in case of balanced data

(all subjects have to be measured at identical times) Can we do something ’in between’?

slide-36
SLIDE 36

Repeated measurements, May 2008 33

Comparison of models, using the likelihood

  • Default likelihood is the REML-likelihood, where the mean

value structure has been ’eliminated’

  • The traditional likelihood may be obtained using an extra option:

proc mixed method=ml;

  • Use differences in −2 log L (∆ = −2 log Q) and compare to χ2

with degrees of freedom equal to the difference in parameters

  • Comparison of covariance structures:

Use either of the two likelihoods

  • Comparison of mean value structures:

Use only the traditional likelihood (ML) Comparison of compound symmetry and unstructured covariance:

−2 log Q = 14.8 − 2.3 = 12.5 ∼ χ2(10 − 2) = χ2(8) ⇒ P = 0.13.

slide-37
SLIDE 37

Repeated measurements, May 2008 34

Autoregressive structure of first order In case of equidistant times, this specifies the following covariance structure σ2        1 ρ ρ2 ρ3 ρ 1 ρ ρ2 ρ2 ρ 1 ρ ρ3 ρ2 ρ 1        i.e. the correlation decreases (in powers) with the distance between

  • bservations.

The non-equidistant analogue is Corr(Ygdt1, Ygdt2) = ρ|t1−t2|

slide-38
SLIDE 38

Repeated measurements, May 2008 35

Autoregressive structure of first order (TYPE=AR(1))

Estimated R Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.7950 0.6321 0.5025 2 0.7950 1.0000 0.7950 0.6321 3 0.6321 0.7950 1.0000 0.7950 4 0.5025 0.6321 0.7950 1.0000 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z AR(1) dog(dog) 0.7950 0.09035 8.80 <.0001 Residual 0.1114 0.04188 2.66 0.0039 Fit Statistics

  • 2 Res Log Likelihood

9.8 AIC (smaller is better) 13.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 8.89 2.49 0.1497 time_no 3 25.6 29.97 <.0001 grp*time_no 3 25.6 2.94 0.0522

slide-39
SLIDE 39

Repeated measurements, May 2008 36

Note: Comparison of models with different covariance structures using a χ2-test on −2 log Q, the difference between −2 log L’s requires, that the models are nested This is not the case for CS and AR(1)! Therefore, we have to compare both of them with the model which combines the two covariance structures:

proc mixed data=dog; class grp dog time_no; model losmol = grp time_no grp*time_no / ddfm=satterth; random intercept / subject=dog(grp) vcorr; repeated time_no / type=AR(1) subject=dog(grp); run;

slide-40
SLIDE 40

Repeated measurements, May 2008 37

Estimated V Correlation Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 1.0000 0.7930 0.6381 0.5222 2 0.7930 1.0000 0.7930 0.6381 3 0.6381 0.7930 1.0000 0.7930 4 0.5222 0.6381 0.7930 1.0000 Covariance Parameter Estimates Cov Parm Subject Estimate dog(grp) 0.01966 AR(1) dog(grp) 0.7483 Residual 0.09103 Fit Statistics

  • 2 Res Log Likelihood

9.8 AIC (smaller is better) 15.8 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 8.88 2.49 0.1493 time_no 3 17.2 29.53 <.0001 grp*time_no 3 17.2 2.93 0.0633

slide-41
SLIDE 41

Repeated measurements, May 2008 38

Comparison of covariance structures

cov. Model

  • 2 log L

par. ∆ df P CS=random dog 14.8 2 5.0 1 0.025 AR(1) 9.8 2 0.0 1 1.00 both 9.8 3 7.5 7 0.38 UN 2.3 10

  • Conclusions?
  • The autoregressive structure is probably the best!
  • Our data set is too small!
slide-42
SLIDE 42

Repeated measurements, May 2008 39

What, if we had had double or triple measurements at each time?

  • If we always have the same number of repetitions,

the correct and optimal approach is to analyze averages

  • If the number of repetitions vary, analysis of averages may still

be valid (depends on the reason for the unbalance), although not optimal

  • The optimal approach is to modify the random-statement to:

random dog dog*time_no;

slide-43
SLIDE 43

Repeated measurements, May 2008 40

Actually, the times are not equidistant! Then what?? The non-equidistant analogue to the autoregressive structure is Corr(Ygdt1, Ygdt2) = ρ|t1−t2| which is written as TYPE=SP(POW)(time) For technical reasons, we have to rescale time to hours=time/60

proc mixed covtest data=dog; class grp hours dog; model losmol=grp hours grp*hours / s ddfm=satterth; repeated hours / subject=dog(grp) type=sp(exp)(hours) r; run;

slide-44
SLIDE 44

Repeated measurements, May 2008 41

Class Level Information Class Levels Values grp 2 1 2 hours 4 0.8333333333 1.8333333333 2.8333333333 4.8333333333 dog 11 1 2 3 4 5 6 7 8 9 10 11 Estimated R Matrix for dog(grp) 1 1 Row Col1 Col2 Col3 Col4 1 0.1138 0.09177 0.07400 0.04811 2 0.09177 0.1138 0.09177 0.05967 3 0.07400 0.09177 0.1138 0.07400 4 0.04811 0.05967 0.07400 0.1138 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9.31 2.56 0.1433 hours 3 25.5 23.23 <.0001 grp*hours 3 25.5 2.78 0.0614

slide-45
SLIDE 45

Repeated measurements, May 2008 42

Example: Calcium supplement for adolescent women, to improve the rate of bone gain A total of 112 11-year old girls were randomized to receive either calcium or placebo. Outcome: BMD=bone mineral density, in

g cm2 ,

measured every 6 months (5 visits)

slide-46
SLIDE 46

Repeated measurements, May 2008 43

Factor diagram:

[I] = [Girl ∗ V isit] Grp ∗ V isit [Girl] V isit Grp

❍❍❍ ❍ ❥ ✟✟✟ ✟ ✯ ✑✑✑✑ ✑ ✸ ✲ ✲

Two-level model with:

  • Observations Ygit (group, girl=individual, visit)
  • Random girl-level, Var(agi) = ω2

B

  • Residual variation, within girls, Var(εgit) = σ2

W

slide-47
SLIDE 47

Repeated measurements, May 2008 44

Variance component model: Ygit = µ + αg + βt + γgt + agi + εgit where Var(agi) = ω2

B,

Var(εgit) = σ2

W

Like previously, we have assumed compound symmetry, i.e. that all measurements on the same girl are equally correlated: Corr(Ygit1, Ygit2) = ρ = ω2

B

ω2

B + σ2 W

slide-48
SLIDE 48

Repeated measurements, May 2008 45

Empirical correlation structure:

Row COL1 COL2 COL3 COL4 COL5 1 1.00000000 0.96987049 0.94138162 0.92499715 0.89865454 2 0.96987049 1.00000000 0.97270895 0.95852788 0.93987185 3 0.94138162 0.97270895 1.00000000 0.98090996 0.95919348 4 0.92499715 0.95852788 0.98090996 1.00000000 0.97553849 5 0.89865454 0.93987185 0.95919348 0.97553849 1.00000000

Is compound symmetry reasonable? Other possibilities:

  • Unstructured: T (T +1)

2

covariance parameters

  • ’patterned’, e.g. an autoregressive structure
  • random regression
slide-49
SLIDE 49

Repeated measurements, May 2008 46

Compound symmetry results for the calcium example:

Covariance Parameter Estimates (REML) Cov Parm Estimate GIRL(GRP) 0.00443925 Residual 0.00023471 Tests of Fixed Effects Source NDF DDF Type III F Pr > F GRP 1 110 2.63 0.1078 VISIT 4 382 619.42 0.0001 GRP*VISIT 4 382 5.30 0.0004

No doubt, we see an interaction GRP*VISIT, or?

slide-50
SLIDE 50

Repeated measurements, May 2008 47

Autoregressive covariance structure: σ2           1 ρ ρ2 ρ3 ρ4 ρ 1 ρ ρ2 ρ3 ρ2 ρ 1 ρ ρ2 ρ3 ρ2 ρ 1 ρ ρ4 ρ3 ρ2 ρ 1           Results:

Covariance Parameter Estimates (REML) Cov Parm Subject Estimate AR(1) GIRL(GRP) 0.97083335 Residual 0.00441242 Tests of Fixed Effects Source NDF DDF Type III F Pr > F GRP 1 110 2.74 0.1005 VISIT 4 381 233.91 0.0001 GRP*VISIT 4 381 2.86 0.0232

slide-51
SLIDE 51

Repeated measurements, May 2008 48

Comparison of test results for the test of no interaction GRP*VISIT: Covariance structure Test statistic ∼ distribution P value Independence 0.35 ∼ F(4,491) 0.84 Compound symmetry 5.30 ∼ F(4,382) 0.0004 Autoregressive 5.30 ∼ F(4,381) 0.023 Unstructured 2.72 ∼ F(4,107) 0.034

slide-52
SLIDE 52

Repeated measurements, May 2008 49

Predicted profiles for the unstructured covariance:

  • The

evolution

  • ver

time looks pretty linear

  • Include

time=visit as a quantitative covariate?

  • What about the

baseline difference?

slide-53
SLIDE 53

Repeated measurements, May 2008 50

Test of linear time trend (time=visit):

proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit grp*visit / ddfm=satterth; repeated visit / type=UN subject=girl(grp) r; run; proc mixed data=calcium; class grp girl visit; model bmd=grp time grp*time visit / s ddfm=satterth; repeated visit / type=UN subject=girl(grp) r; run;

Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 110 0.36 0.5485 time . . . time*grp . . . visit 3 97.7 3.61 0.0160 grp*visit 3 97.7 1.03 0.3849

slide-54
SLIDE 54

Repeated measurements, May 2008 51

Solution for Fixed Effects Standard Effect grp visit Estimate Error DF t Value Pr > |t| Intercept 0.8699 0.01220 138 71.29 <.0001 grp C 0.006565 0.01131 109 0.58 0.5629 grp P . . . . time 0.01755 0.001825 118 9.62 <.0001 time*grp C 0.004330 0.001520 97.2 2.85 0.0054 time*grp P . . . . visit 1

  • 0.01765

0.006013 95.8

  • 2.94

0.0042 visit 2

  • 0.01384

0.004246 95.1

  • 3.26

0.0016 visit 3

  • 0.00680

0.002370 93.6

  • 2.87

0.0050 visit 4 . . . . visit 5 . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 109 0.34 0.5629 time . . . time*grp 1 97.2 8.12 0.0054 visit 3 98.8 3.65 0.0151

There is some deviation from linearity (P=0.0151)

slide-55
SLIDE 55

Repeated measurements, May 2008 52

The time course is reasonably linear, but maybe the girls have have different growth rates (slopes)? If we let Ygpt denote BMD for the p’th girl (in the g’th group) at time t (t=1,· · · ,5), we have the model: ygpt = agp + bgpt + εgpt, εgpt ∼ N(0, σ2

W )

slide-56
SLIDE 56

Repeated measurements, May 2008 53

We generalize the idea of a random level to Random regression We let each individual (girl) have

  • her own level agp
  • her own slope bgp
slide-57
SLIDE 57

Repeated measurements, May 2008 54

But we bind these individual ’parameters’ (agp and bgp) together by normal distributions   agp bgp   ∼ N2     αg βg   , G   G =   τ 2

a

ω ω τ 2

b

  =   τ 2

a

ρτaτb ρτaτb τ 2

b

  G describes the population variation of the lines, i.e. the inter-individual variation.

slide-58
SLIDE 58

Repeated measurements, May 2008 55

We estimate in this model by writing:

proc mixed covtest data=calcium; class grp girl; model bmd=grp time time*grp / ddfm=satterth s; random intercept time / type=un subject=girl(grp) g v vcorr; run;

Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C 101 0.004105 3.733E-6 2 time C 101 3.733E-6 0.000048 Estimated V Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 0.004285 0.004211 0.004263 0.004314 0.004366 2 0.004211 0.004435 0.004410 0.004509 0.004608 3 0.004263 0.004410 0.004681 0.004703 0.004850 4 0.004314 0.004509 0.004703 0.005022 0.005092 5 0.004366 0.004608 0.004850 0.005092 0.005459

slide-59
SLIDE 59

Repeated measurements, May 2008 56

Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 1.0000 0.9660 0.9518 0.9300 0.9027 2 0.9660 1.0000 0.9677 0.9553 0.9364 3 0.9518 0.9677 1.0000 0.9700 0.9594 4 0.9300 0.9553 0.9700 1.0000 0.9725 5 0.9027 0.9364 0.9594 0.9725 1.0000 Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) 0.004105 0.000575 7.13 <.0001 UN(2,1) girl(grp) 3.733E-6 0.000053 0.07 0.9435 UN(2,2) girl(grp) 0.000048 8.996E-6 5.30 <.0001 Residual 0.000125 0.000010 11.99 <.0001 Fit Statistics

  • 2 Res Log Likelihood
  • 2341.6

AIC (smaller is better)

  • 2333.6
slide-60
SLIDE 60

Repeated measurements, May 2008 57

Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.8471 0.008645 110 97.98 <.0001 grp C 0.007058 0.01234 110 0.57 0.5685 grp P . . . . time 0.02242 0.001098 95.8 20.42 <.0001 time*grp C 0.004494 0.001571 96.4 2.86 0.0052 time*grp P . . . . Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 110 0.33 0.5685 time 1 96.4 985.55 <.0001 time*grp 1 96.4 8.18 0.0052

Thus, we find an extra increase in BMD of 0.0045(0.0016) g pr. cm3

  • pr. half year, when giving calcium supplement.
slide-61
SLIDE 61

Repeated measurements, May 2008 58

Note concerning MIXED-notation

  • It is necessary to use TYPE=UN in the RANDOM-statement

in order to allow intercept and slope to be arbitrarily correlated

  • Default option in RANDOM is TYPE=VC, which only specifies

variance components with different variances

  • If TYPE=UN is omitted, we may experience convergence problems

and sometimes totally incomprehensible results. In this particular case, the correlation between intercept and slope is not that impressive (intercept is not completely out of range in this example).

slide-62
SLIDE 62

Repeated measurements, May 2008 59

It turns out, that

  • the girls are only seen approximately twice a year

the actual dates are available and are translated into ctime, the internal date representation in SAS, denoting days since ....

We can no longer use the construction type=UN, but still the random-statement (CS). A lot of other covariance structures will still be possible, e.g. the autoregressive type=AR(1)

  • the girls were not precisely 11 years at the first visit

As a covariate, we ought to have the specific age of the girl, but unfortunately, these are not available.

slide-63
SLIDE 63

Repeated measurements, May 2008 60

If we use the newly constructed ctime:

proc mixed covtest data=calcium; class grp girl; model bmd=grp ctime ctime*grp / ddfm=satterth s; random intercept ctime / type=un subject=girl(grp) g; run;

Iteration History Iteration Evaluations

  • 2 Res Log Like

Criterion 1

  • 1221.35800531

1 2

  • 2316.64715219

0.02023229 2 1

  • 2316.64847895

0.02011117 3 1

  • 2316.64847962

0.02010938 4 1

  • 2316.64848338

0.02010936 5 1

  • 2316.64852100

0.02010918 46 1

  • 2317.30142017

0.01737561 47 1

  • 2317.30142024

0.01737561 48 1

  • 2317.30142030

0.01737561 49 1

  • 2317.30142036

0.01737561 50 1

  • 2317.30142043

0.01737561 WARNING: Did not converge.

slide-64
SLIDE 64

Repeated measurements, May 2008 61

The variablen ctime has much too large values, with a very small range, and we get numerical instability. We normalise, to approximate age or age11: age=(ctime-11475)/365.25+12; age11=age-11;

Variable N Mean Std Dev Minimum Maximum

  • ctime

501 11475.08 258.4199017 11078.00 11931.00 bmd 501 0.9219202 0.0767447 0.7460000 1.1260000 visit 560 3.0000000 1.4154779 1.0000000 5.0000000 age 501 12.0002186 0.7075151 10.9130732 13.2484600 age11 501 1.0002186 0.7075151

  • 0.0869268

2.2484600

slide-65
SLIDE 65

Repeated measurements, May 2008 62

Random regression, covariate age: ygpt = agp + bgp(age-11) + εgpt

slide-66
SLIDE 66

Repeated measurements, May 2008 63

Random regression, using actual age (age11=age-11):

proc mixed covtest data=calcium; class grp girl; model bmd=grp age11 age11*grp / ddfm=satterth s outpm=predicted_mean; random intercept age11 / type=un subject=girl(grp) g vcorr; run;

Estimated G Matrix Row Effect grp girl Col1 Col2 1 Intercept C 101 0.004215 0.000095 2 age11 C 101 0.000095 0.000180 Estimated V Correlation Matrix for girl(grp) 101 C Row Col1 Col2 Col3 Col4 Col5 1 1.0000 0.9664 0.9537 0.9321 0.9056 2 0.9664 1.0000 0.9687 0.9566 0.9385 3 0.9537 0.9687 1.0000 0.9697 0.9590 4 0.9321 0.9566 0.9697 1.0000 0.9723 5 0.9056 0.9385 0.9590 0.9723 1.0000

slide-67
SLIDE 67

Repeated measurements, May 2008 64

Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) girl(grp) 0.004215 0.000580 7.26 <.0001 UN(2,1) girl(grp) 0.000095 0.000104 0.91 0.3617 UN(2,2) girl(grp) 0.000180 0.000034 5.21 <.0001 Residual 0.000124 0.000010 12.01 <.0001 Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.8667 0.008688 110 99.75 <.0001 grp C 0.01113 0.01240 110 0.90 0.3715 grp P . . . . age11 0.04529 0.002152 96 21.05 <.0001 age11*grp C 0.008891 0.003076 96.6 2.89 0.0048 age11*grp P . . . .

In this model, we quantify the effect of a calcium supplement to 0.0089 (0.0031) g per cm3 per year.

slide-68
SLIDE 68

Repeated measurements, May 2008 65

Results from random regression: Group level at age 11 slope P 0.8667 (0.0087) 0.0453 (0.0022) C 0.8778 (0.0088) 0.0542 (0.0022) difference 0.0111 (0.0124) 0.0089 (0.0031) P 0.37 0.0048

slide-69
SLIDE 69

Repeated measurements, May 2008 66

Comparison of slopes for different covariance structures: Covariance −2 log L cov.par. AIC Difference structure

  • f slopes

P Independence

  • 1245.0

1

  • 1243.0

0.0094 (0.0086) 0.27 Compound

  • 2251.7

2

  • 2247.7

0.0089 (0.0020) < 0.0001 symmetry Exponential

  • 2372.0

2

  • 2368.0

0.0094 (0.0032) 0.0038 (autoregressive) Random

  • 2350.1

4

  • 2342.1

0.0089 (0.0031) 0.0048 regression

slide-70
SLIDE 70

Repeated measurements, May 2008 67

Predicted values from random regression It looks as if there is a difference right from the start (although we have previously seen this to be insignificant, P=0.37). Baseline adjustment?

slide-71
SLIDE 71

Repeated measurements, May 2008 68

It the first visit is a baseline measurement:

  • The two groups are known to be equal at baseline
  • To include this measurement in the comparison between groups

may therefore weaken a possible difference between these (type 2 error)

  • Dissimilarities may be present in small studies
  • For ’slowly varying’ outcomes, even a small difference may

produce non-treatment related differences, i.e. bias

slide-72
SLIDE 72

Repeated measurements, May 2008 69

Approaches for handling baseline differences:

  • Use follow-up data only (exclude baseline from analysis)
  • only reasonable if correlation between repeated measurements

is very low

  • Subtract baseline from successive measurements
  • only reasonable if correlation between repeated measurements

is very high

  • Use baseline measurement as a covariate
  • may be used for any degree of correlation
slide-73
SLIDE 73

Repeated measurements, May 2008 70

Baseline included as a covariate

  • will hardly change the results for the slopes

– since these are within-individual quantities A small change is expected because of the exclusion of visit 1 from the analysis.

  • may affect the difference between groups at fixed ages

– e.g. endpoint age of 13 years

slide-74
SLIDE 74

Repeated measurements, May 2008 71

Excluding baseline (4 visits only), without baseline as covariate:

proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run;

Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.9574 0.009721 102 98.49 <.0001 grp C 0.02474 0.01383 102 1.79 0.0765 grp P . . . . age13 0.04634 0.002288 92.3 20.25 <.0001 age13*grp C 0.007456 0.003277 92.5 2.28 0.0252 age13*grp P . . . .

Estimated gain at the age 13: 0.0247 (0.0138) g per cm3

slide-75
SLIDE 75

Repeated measurements, May 2008 72

Including baseline as covariate

proc mixed covtest noclprint data=calcium; where visit>1; class grp girl; model bmd=baseline grp age13 grp*age13 / ddfm=satterth s; random intercept age13 / type=un subject=girl(grp) g; run;

Solution for Fixed Effects Standard Effect grp Estimate Error DF t Value Pr > |t| Intercept 0.01825 0.02690 106 0.68 0.4989 baseline 1.0797 0.03054 102 35.36 <.0001 grp C 0.01728 0.006236 101 2.77 0.0067 grp P . . . . age13 0.04597 0.002287 93.1 20.11 <.0001 age13*grp C 0.007419 0.003276 93.2 2.26 0.0258 age13*grp P . . . .

Estimated gain at the age 13: 0.0173 (0.0062) g per cm3

slide-76
SLIDE 76

Repeated measurements, May 2008 73

Including baseline as covariate

  • explains some (but not all) of the difference between groups

at age 13 without baseline: 0.0247 (0.0138) baseline as covariate: 0.0173 (0.0062)

  • increases the precision of the estimated difference

(standard error becomes smaller)

slide-77
SLIDE 77

Repeated measurements, May 2008 74

Example from Vickers, A.J. & Altman, D.G.: Analysing controlled clinical trials with baseline and follow-up measurements. British Medical Journal 2001; 323: 1123-24.: 52 patients with shoulder pain are randomized to either

  • Acupuncture (n=25)
  • Placebo (n=27)

Pain is evaluated on a 100 point scale before and after treatment. High scores are good

slide-78
SLIDE 78

Repeated measurements, May 2008 75

Results: Pain score (mean and SD) placebo acupuncture difference (n=27) (n=25) (95% CI) P-value baseline 53.9 (14.0) 60.4 (12.3) 6.5

  • Type of analysis

follow-up 62.3 (17.9) 79.6 (17.1) 17.3 (7.5; 27.1) 0.0008 changes* 8.4 (14.6) 19.2 (16.1) 10.8 (2.3; 19.4) 0.014 ancova 12.7 (4.1; 21.3) 0.005 * results published

slide-79
SLIDE 79

Repeated measurements, May 2008 76

Baseline

  • The placebo group lies somewhat below acupuncture

Follow-up

  • We would expect the placebo group to be lower also after

treatment

  • Therefore, the comparison is unreasonable

(we see too big a difference) Change

  • Low baseline implies an expected large positive change

(regression to the mean)

  • The placebo group is therefore expected to increase the most
  • Therefore, the comparison is unreasonable

(we see too small a difference)

slide-80
SLIDE 80

Repeated measurements, May 2008 77

What to do in such a situation? Ancova Analysis of covariance, a special case of multiple regression:

  • Outcome: follow-up data
  • Covariates

– treatment (factor: acupuncture/placebo) – baseline measurement (quantitative)

slide-81
SLIDE 81

Repeated measurements, May 2008 78

When can we use follow-up data?

  • when we have a control group and proper randomisation
  • when the correlation is low

When can we use differences?

  • when we have a control group and proper randomisation
  • when the correlation is large

When can we use analysis of covariance?

  • allways
slide-82
SLIDE 82

Repeated measurements, May 2008 79

Specification of mixed models:

  • Systematic variation:

– Between-individual covariates: treatment, sex, age, baseline value... – Within-individual covariates: time, cumulative dose, temperature...

  • Random variation
slide-83
SLIDE 83

Repeated measurements, May 2008 80

Sources of random variation:

  • 1. Random effects:
  • 2. Serial correlation:
  • 3. Measurement error:
slide-84
SLIDE 84

Repeated measurements, May 2008 81

SAS, PROC MIXED

  • model

describes the systematic part (fixed effects, mean value structure)

  • random

describes the random effects

  • repeated

describes the process covariance structure

  • local

adds an additional measurement error

slide-85
SLIDE 85

Repeated measurements, May 2008 82

Explained variation in percent, R2 We have two (or more) different variances to explain!

  • residual variation (variation within individuals, σ2

W )

– decreases when we include an important x covariate (level 1) – may decrease when we include an important z covariate (level 2)

  • variation between individuals , ω2

B

– decreases when we include an important z covariate (level 2) – may increase, when we include an important x covariate (level 1)

slide-86
SLIDE 86

Repeated measurements, May 2008 83

Hypothetical example: The x’s vary between individuals, but the average outcomes (¯ y) are almost identical: Levels of y, for fixed x are very different!