Comparing Multiple Comparisons Phil Ender Culver City, California - - PowerPoint PPT Presentation

comparing multiple comparisons
SMART_READER_LITE
LIVE PREVIEW

Comparing Multiple Comparisons Phil Ender Culver City, California - - PowerPoint PPT Presentation

Prologue Comparing Multiple Comparisons Phil Ender Culver City, California Stata Conference Chicago - July 29, 2016 Phil Ender Comparing Multiple Comparisons 1/ 23 Prologue Prologue In ANOVA, a significant omnibus F-tests only indicates


slide-1
SLIDE 1

Prologue

Comparing Multiple Comparisons

Phil Ender

Culver City, California

Stata Conference Chicago - July 29, 2016

Phil Ender Comparing Multiple Comparisons 1/ 23

slide-2
SLIDE 2

Prologue

Prologue

In ANOVA, a significant omnibus F-tests only indicates that there is a significant effect. It does not indicate where the significant effects can be found. This is why many, if not most, significant ANOVAs, with more than two levels, are followed by post-hoc multiple comparisons.

Phil Ender Comparing Multiple Comparisons 2/ 23

slide-3
SLIDE 3

Prologue

What’s is the Problem?

Computing multiple comparisons increases the probability of making a Type I error. The more comparisons you make, the greater the chance of Type I errors. Multiple comparison techniques are designed to control the probability of these Type I errors.

Phil Ender Comparing Multiple Comparisons 3/ 23

slide-4
SLIDE 4

Prologue

What’s the Problem? Part 2

If n independent contrasts are each tested at α, then the probability of making at least one Type I error is 1 − (1 − α)n. The table below gives the probability of making at least one type I error for different numbers of comparisons when α = 0.05: n probability 1 0.0500 2 0.0975 3 0.1426 5 0.2262 10 0.4013 15 0.5367 20 0.6415 The above probabilities apply to independent contrasts. However, most sets of contrasts are not independent.

Phil Ender Comparing Multiple Comparisons 4/ 23

slide-5
SLIDE 5

Prologue

What is the solution?

Adjust the critical values or p-values to reduce the probability of a false positive. The goal is to protect the familywise or experimentwise error rate in a strong sense, i.e., whether the null is true or not. Multiple comparison techniques such as Dunnett, Tukey HSD, Bonferroni, ˘ Sid` ak or Scheff` e do a reasonably good job of of protecting the familywise error rate. Techniques such as Fisher’s least significant difference (LSD), Student-Newman-Keuls, and Duncan’s multiple range test fail to strongly protect the familywise error rate. Such procedures are said to protect the familywise error rate in a weak sense, avoid them if possible.

Phil Ender Comparing Multiple Comparisons 5/ 23

slide-6
SLIDE 6

Prologue

Outline of Multiple comparisons

I. Planned Comparisons

  • A. Planned Orthogonal Comparisons
  • B. Planned Non-orthogonal Comparisons

II. Post-hoc Comparisons

  • A. All Pairwise
  • B. Pairwise versus control group
  • C. Non-pairwise Comparisons
  • III. Other Comparisons

Phil Ender Comparing Multiple Comparisons 6/ 23

slide-7
SLIDE 7

Prologue

  • I. Planned Comparisons

Phil Ender Comparing Multiple Comparisons 7/ 23

slide-8
SLIDE 8

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

Phil Ender Comparing Multiple Comparisons 8/ 23

slide-9
SLIDE 9

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

  • 1. Comparisons must be planned

Phil Ender Comparing Multiple Comparisons 8/ 23

slide-10
SLIDE 10

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

  • 1. Comparisons must be planned
  • 2. Comparisons must be orthogonal

Phil Ender Comparing Multiple Comparisons 8/ 23

slide-11
SLIDE 11

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

  • 1. Comparisons must be planned
  • 2. Comparisons must be orthogonal

Say, 1vs2, 3vs4 and avg 1&2vs avg 3&4

Phil Ender Comparing Multiple Comparisons 8/ 23

slide-12
SLIDE 12

Prologue

Planned Orthogonal Comparisons

These are among the most powerful hypothesis tests available. Two Stringent requirements:

  • 1. Comparisons must be planned
  • 2. Comparisons must be orthogonal

Say, 1vs2, 3vs4 and avg 1&2vs avg 3&4 Downside: Comparisons of interest may not be orthogonal.

Phil Ender Comparing Multiple Comparisons 8/ 23

slide-13
SLIDE 13

Prologue

Planned Non-orthogonal Comparisons

Use either the Dunn or the ˘ Sid` ak-Dunn adjustment. Consider C contrasts: Dunn: αDunn = αEW /C ˘ Sid` ak-Dunn: αSD = 1 − (1 − αEW )(1/C) If C = 5 and αEW = .05 then αDunn = .01 and αSD = .010206. Basically, just Bonferroni and ˘ Sid` ak adjustments.

Phil Ender Comparing Multiple Comparisons 9/ 23

slide-14
SLIDE 14

Prologue

Planned Non-orthogonal Comparisons: Pairwise vs Control

Special Case: Pairwise versus control group. Dunnett’s test is used to compare k − 1 treatment groups with a control group. Does not require an omnibus F-test. Dunnett’s test is a t-test with critical values derived by Dunnett (1955). The critical value depends on the number of groups and the denominator degrees of freedom.

Phil Ender Comparing Multiple Comparisons 10/ 23

slide-15
SLIDE 15

Prologue

  • II. Post-hoc Comparisons

Phil Ender Comparing Multiple Comparisons 11/ 23

slide-16
SLIDE 16

Prologue

Post-hoc Comparisons: All pairwise

Tukey’s HSD (honestly significant difference) is the perennial favorite for performing all possible pairwise comparisons among group means. With k groups there are k ∗ (k − 1)/2 possible contrasts. Tukey’s HSD uses quantiles of Studentized Range Statistic to make adjustments for the number of comparisons. All pairwise contrasts with large k may look like a fishing expedition.

Phil Ender Comparing Multiple Comparisons 12/ 23

slide-17
SLIDE 17

Prologue

Post-hoc Comparisons: All pairwise

Tukey HSD Test,

qHSD =

Ymi−Ymj

MSerror/n

Note the single n in the denominator. Tukey’s HSD requires that all groups must have the same number of observations.

Phil Ender Comparing Multiple Comparisons 13/ 23

slide-18
SLIDE 18

Prologue

What if the cell sizes are not equal?

Harmonic mean, the old school approach n = k/(1/n1 + 1/n2 + 1/n3 + 1/n4) Spjøtvol and Stoline’s modification of the HSD test,

qSS =

Ymi−Ymj

MSerror/nmin

Uses the minimum n of the two groups. Uses Studentized Augmented Range distribution for k and error df.

Phil Ender Comparing Multiple Comparisons 14/ 23

slide-19
SLIDE 19

Prologue

More on unequal cell sizes

Tukey-Kramer Modification of the HSD test,

qTK =

Ymi−Ymj

MSerror(1/ni+1/nj)/2

Use the Studentized Range distribution for k means with ν error degrees of freedom.

Phil Ender Comparing Multiple Comparisons 15/ 23

slide-20
SLIDE 20

Prologue

Post-hoc Comparisons: Pairwise vs Control

I know Dunnett’s test is for planned comparisons of k − 1 treatment groups with a control group. However, it is also used for post-hoc comparisons. It is marginally more powerful then the Tukey HSD because there are fewer contrasts. Dunnett’s test is a t-test with critical values derived by Dunnett (1955). The critical value depends on number of groups (k) and the anova error degrees of freedom.

Phil Ender Comparing Multiple Comparisons 16/ 23

slide-21
SLIDE 21

Prologue

Post-hoc Comparisons: Non-pairwise Comparisons

Example: Average of groups 1 & 2 versus the mean of group 3. Use the Scheff´ e adjustment. Scheff´ e is very conservative adjustment making use the F

  • distribution. The Scheff´

e critical value is ... FCrit = (k − 1) ∗ F(1,νerror) Where k is the total number of groups.

Phil Ender Comparing Multiple Comparisons 17/ 23

slide-22
SLIDE 22

Prologue

  • III. Other Comparisons

Phil Ender Comparing Multiple Comparisons 18/ 23

slide-23
SLIDE 23

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far?

Phil Ender Comparing Multiple Comparisons 19/ 23

slide-24
SLIDE 24

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables.

Phil Ender Comparing Multiple Comparisons 19/ 23

slide-25
SLIDE 25

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables. Try a Bonferroni or ˘ Sid´ ak adjustments

Phil Ender Comparing Multiple Comparisons 19/ 23

slide-26
SLIDE 26

Prologue

If you absolutely positively have to make a few comparisons, but ...

but they don’t fit any of the approaches we’ve seen so far? ... say, 15 regressions on 15 separate response variables. Try a Bonferroni or ˘ Sid´ ak adjustments Good protection but low power.

Phil Ender Comparing Multiple Comparisons 19/ 23

slide-27
SLIDE 27

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more?

Phil Ender Comparing Multiple Comparisons 20/ 23

slide-28
SLIDE 28

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg.

Phil Ender Comparing Multiple Comparisons 20/ 23

slide-29
SLIDE 29

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg. FDR control offers a way to increase power while maintaining some principled bound on error.

Phil Ender Comparing Multiple Comparisons 20/ 23

slide-30
SLIDE 30

Prologue

What if you want to make a huge number of contrasts, ...

say 10,000 or more? Try a false discovery rate (FDR) method such as Benjamini-Hochberg. FDR control offers a way to increase power while maintaining some principled bound on error. Note that when the FDR is controlled at .05, it is guaranteed that on average only 5% of the tests that are rejected are spurious.

Phil Ender Comparing Multiple Comparisons 20/ 23

slide-31
SLIDE 31

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods.

Phil Ender Comparing Multiple Comparisons 21/ 23

slide-32
SLIDE 32

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods. All comparisons are made from a single posterior distribution.

Phil Ender Comparing Multiple Comparisons 21/ 23

slide-33
SLIDE 33

Prologue

What if you don’t want to be bothered making any adjustments for multiple comparisons?

Analyze your experiment using Bayesian methods. All comparisons are made from a single posterior distribution. See whether the region of equivalence for the difference in means falls outside of the 95% highest posterior density (HPD) credible interval.

Phil Ender Comparing Multiple Comparisons 21/ 23

slide-34
SLIDE 34

Prologue

References

Benjamini, Y, & Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc. Series B (Methodological), 57(1), 289.-300. Hays, R.E. (1995). Experimental design: Procedures for the behavioral sciences (3rd Edition). Pacific Grove, CA: Brooks/Cole. Kruschke, J.K. (2015). Doing bayesian analysis: a tutorial with R., JAGS and Stan (2nd Edition). Amsterdam: Elsevier.

Phil Ender Comparing Multiple Comparisons 22/ 23

slide-35
SLIDE 35

Prologue

¿Questions?

Phil Ender Comparing Multiple Comparisons 23/ 23