[PPT] - Performing repeated measures analysis Graeme L. Hickey @ PowerPoint Presentation

SLIDE 1

Performing repeated measures analysis

Graeme L. Hickey

@graemeleehickey

www.glhickey.com graeme.hickey@liverpool.ac.uk

SLIDE 2

Co Confl flicts s of f interest

None
Assistant Editor (Statistical Consultant) for EJCTS and ICVTS

SLIDE 3

Wha What are “r “repe peated d measur sures” s” da data

A B D A B D A B D

“Condition”: chocolate cake “Condition”: lemon cake “Condition”: cheesecake Measurement: taste score Measurement: taste score Measurement: taste score

Same people score each condition

SLIDE 4

Wha What are “r “repe peated d measur sures” s” da data

A B D A B D A B D

Measurement: systolic BP Measurement: systolic BP Measurement: systolic BP

Same people provide BP at every follow-up appointment

SLIDE 5

Wh Why y do do we ne need d spe special metho hodo dology? gy?

Data are not independent: repeated observations on the same

individual will be more similar to each other than to observations on

ther individuals
Guidelines for reporting mortality and morbidity after cardiac valve

interventions also propose the use of longitudinal data analysis for repeated measurement data

SLIDE 6

Si Simp mplest case: : 2 2 me measureme ment time mes

A B D A B D

Measurement: AV gradient Measurement: AV gradient

pre-surgery post-surgery

Suitable methods: paired t-test or Wilcoxon signed-rank test

SLIDE 7

Wha What if f we ha have treatment gr group ups? s?

A B D Measurement taken Measurement taken

before treatment after treatment

A B D E F H E F H

Placebo Active treatment Question: if patients are randomised to treatment arms, how can we test whether active treatment is more effective than placebo?

SLIDE 8

Me Methods: sh shoulder pain example

Source: Vickers & Altman. BMJ. 2001; 323: 1123–4.

Placebo (n = 27) Acupuncture (n = 25) Difference between means (95% CI) P Follow-up 62.3 (17.9) 79.6 (17.1) 17.3 (7.5 to 27.1) <0.001 Change score 8.4 (14.6) 19.2 (16.1) 10.8 (.3 to 19.4) 0.014 ANCOVA 12.7 (4.1 to 21.3) 0.005

General rule-of-thumb: analysis of covariance (ANCOVA) has the highest statistical power Note: never use percentage change scores!

SLIDE 9

Mo More general scenari rio

We record measurements of each patient >2 times
Two (or more treatment groups)

SLIDE 10

De Desig ign c consid ideratio ions

Balanced versus unbalanced
Balanced follow-up (e.g. baseline, 1-hr, 2-hr, 8-hr, 16-hr, 24-hr)
Unbalanced (e.g. patient A visits their physician on days 1, 4, 6, 9, 12, and

patient B visits only on days 5, 9, and 15)

Missing data
E.g. patient fails to attend scheduled follow-up appointment

SLIDE 11

Ho How w no not to to proceed

Multiple testing

issues

No account of same

patients being measured ⇒ successive

bservations likely

correlated

Visualization +

reporting issues

Source: Matthews et al. BMJ. 1990; 300: 230–5.

SLIDE 12

Da Data f a format / / c colle llect ctio ion

Wide format

Subject Jan 01 Aug 30 Dec 08 A 120 113 115 B 94 94 110 C 140 145 160 D 100 101 100

Long format

Subject Date BP (mmHg) A Jan 01 120 A Aug 30 113 A Dec 08 115 B Jan 01 94 B Aug 30 94 B Dec 08 110 ⠇ ⠇ ⠇ D Aug 30 101 D Dec 08 100

Good for balanced datasets Good for unbalanced datasets

SLIDE 13

Fir First t step ep (alw always!): ): visu sualize the data

Source: Gueorguieva & Krystal. Arch Gen Psychiatry. 2004; 61: 310–317.

Mean profile plot

Source: Matthews et al. BMJ. 1990; 300: 230–5.

Individual panel plots Individual plots grouped by treatment

SLIDE 14

Ana Analysi sis s options ns

Repeated measures analysis of variance (RM-ANOVA)
Linear mixed models (LMMs)
Summary statistics / data-reduction techniques
Multivariate analysis of variance (MANOVA)
Generalized least squares (GLS)
Generalized estimating equations
Non-linear mixed effects models
Empirical Bayes methods
…

SLIDE 15

RM RM-AN ANOVA

Total variation Between- subjects variation Within- subjects variation Treatment Error due to subjects within treatment Time Treatment* Time Error

Test for: treatment effect time effect interaction effect

SLIDE 16

Sp Spheri ricity

RM-ANOVA depends on the usual assumptions for ANOVA…
… and the assumption of sphericity

SDT2 – T1 ≅ SDT3 – T1 ≅ SDT3 – T2 ≅ …

Restrictive for longitudinal data ⇒ measurements taken closely

together are often more correlated than those taken at larger time intervals

Test for sphericity using Mauchly’s test

Tomorrow (14:15 – 15:45): Checking model assumptions with regression diagnostics

SLIDE 17

Whe When n sphe sphericity y is s violated

If sphericity is violated, then type I errors are inflated and interaction

term effects biased – that is serious

Mauchly’s test may not reject sphericity if the sample size is small,

even if the variances are vastly different Correction proposal:

1. Calculate the epsilon statistic

i. Greenhouse-Geisser ii. Huynh-Feldt

2. Multiply the F-statistic degrees of freedom by epsilon

SLIDE 18

Li Linear r mi mixed mo models

Generalizes linear regression to account for correlation in repeated

measures within subjects

Also described as random effects models, mixed effects models,

random growth models, multi-level models, hierarchical models, …

SLIDE 19

Outcome Time

SLIDE 20

𝑧"# = 𝛾& + 𝛾(𝑢"# + 𝜁"#

Fixed effects regression line

Time Outcome

SLIDE 21

𝑧"# = 𝛾&" + 𝛾(𝑢"# + 𝜁"#

Fixed effects regression line + within-subject intercepts

Time Outcome

SLIDE 22

Within-subjects fixed effects regression lines

𝑧"# = 𝛾&" + 𝛾("𝑢"# + 𝜁"#

Time Outcome

SLIDE 23

Li Linear r mi mixed mo models

A compromise is the model

𝑍

"# = 𝛾& + 𝑐&" + 𝛾( + 𝑐(" 𝑢"# + 𝜁"#

𝑐&", 𝑐(" are called subject-specific random intercepts: intercept and slope

respectively, distributed N2(0, Σ)

Observations within-subjects are more correlated than observations

between-subjects

Can be adjusted for other (possibly time-varying) covariates and baseline

measurements

SLIDE 24

Su Summa mmary statistics

A two-stage approach:

1. Reduce the repeated measurements for each subject to a single value 2. Apply routine statistical methods on these summary values to compare treatments, e.g. using independent samples t-test, ANOVA, Mann-Whitney U-test, …

Benefits
Easy to do, and conceptually easy to understand
Can be used to contrast different features of the data
Encourages researchers to think about the features of the data most important to

them in advance

Choice of summary statistic depends on the data

SLIDE 25

T0 T1 T3 T4

Outcome ymax

T2 T0 T1 T3 T4

Outcome

T2 T0 T1 T3 T4

Outcome ypre

T2

ypost - ypre

T0 T1 T3 T4 T2

Outcome

If the data display a ‘peaked curve’ trend…

Area under the curve Maximum measurement Time to reach maximum Mean follow-up – baseline

SLIDE 26

If the data display a ‘growth curve’ trend…

Change score Final value Time to a certain % increase/decrease Slope

T0 T1 T3 T4

Outcome

T2

ychange

T0 T1 T3 T4

Outcome

T2

yfinal

T0 T1 T3 T4

Outcome

T2

slope

T0 T1 T3 T4 T2

Outcome

SLIDE 27

Mi Missing data

Method Can it handle missing data? Can it handle unbalanced data? RM- ANOVA No – typically exclude patients with 1 or missing value No LMM Yes – for data that is missing (completely) at random Yes Summary statistics Depends on the choice of summary statistic Depends on the choice of summary statistic

SLIDE 28

So Software

All methods implemented in standard statistical software
Summary statistics usually require ‘manual’ calculation, but can be

done easily in Microsoft Excel or programmed in a statistics software package

SLIDE 29

Performing repeated measures analysis

Co Confl flicts s of f interest

Wha What are “r “repe peated d measur sures” s” da data

Same people score each condition

Wha What are “r “repe peated d measur sures” s” da data

Same people provide BP at every follow-up appointment

Wh Why y do do we ne need d spe special metho hodo dology? gy?

individual will be more similar to each other than to observations on

interventions also propose the use of longitudinal data analysis for repeated measurement data

Si Simp mplest case: : 2 2 me measureme ment time mes

Suitable methods: paired t-test or Wilcoxon signed-rank test

Wha What if f we ha have treatment gr group ups? s?

Placebo Active treatment Question: if patients are randomised to treatment arms, how can we test whether active treatment is more effective than placebo?

Me Methods: sh shoulder pain example

Mo More general scenari rio

De Desig ign c consid ideratio ions

Ho How w no not to to proceed

issues

patients being measured ⇒ successive

correlated

reporting issues

Da Data f a format / / c colle llect ctio ion

Good for balanced datasets Good for unbalanced datasets

Fir First t step ep (alw always!): ): visu sualize the data

Ana Analysi sis s options ns

RM RM-AN ANOVA

Test for: treatment effect time effect interaction effect

Sp Spheri ricity

SDT2 – T1 ≅ SDT3 – T1 ≅ SDT3 – T2 ≅ …

together are often more correlated than those taken at larger time intervals

Whe When n sphe sphericity y is s violated

term effects biased – that is serious

even if the variances are vastly different Correction proposal:

Li Linear r mi mixed mo models

measures within subjects

random growth models, multi-level models, hierarchical models, …

𝑧"# = 𝛾& + 𝛾(𝑢"# + 𝜁"#

𝑧"# = 𝛾&" + 𝛾(𝑢"# + 𝜁"#

𝑧"# = 𝛾&" + 𝛾("𝑢"# + 𝜁"#

Li Linear r mi mixed mo models

Su Summa mmary statistics

If the data display a ‘peaked curve’ trend…

If the data display a ‘growth curve’ trend…

Mi Missing data

So Software

done easily in Microsoft Excel or programmed in a statistics software package

Thank you for listening… any questions?

Slides available (shortly) from: www.glhickey.com

Statistical Primer article to be published soon!