Multiple Comparisons Occasionally, e.g., at the start of a research - PowerPoint PPT Presentation

Lecture 7.1: Multiple Comparisons (A ‘non-quiz’ topic) • Examples of the need for multiple comparisons • The problem with multiple comparisons post hoc; an outline of a solution • Specific solutions: Fisher LSD, Tukey HSD, Holm (same as False Discovery Rate, FDR), Dunn/ Bonferroni, Ryan (REGWQ) 1

Multiple Comparisons • Occasionally, e.g., at the start of a research project, we do not have a priori theories and contrasts and, therefore, cannot use the ‘surgical’ approach of planned comparisons . We simply want to see whether the different ‘ treatments ’ are all the same. • If the omnibus F ratio is significant , we may want to know after the fact (or post hoc ) which treatments seem to ‘ work ’ . This leads to multiple comparisons , e.g., (a) between every ‘ treatment ’ and the ‘ control ’ group ( k-1 comps), or (b) between every pair of ‘ treatments ’ ( k(k-1)/2 comps). 2

• Ex : Ss are randomly assigned to one of 3 conditions: No organiser (‘no.org’), Organiser before lecture (‘pre.org’), and Organiser after lecture (‘post.org’). • We might plan to examine 2 orthog contrasts, but we might also wish to compare ‘post’ with ‘no’, even though we have only 2 df between groups. some.no pre.post no.post no.org -2 0 -1 pre.org 1 -1 0 post.org 1 1 1 3

A Paw-Licking Example • Morphine (M) reduces a rat’s sensitivity to pain – under M for 1 st time, it takes them longer to lick their paws (signalling pain) when they are put on an uncomfortably warm surface. So ‘time to lick’ is also an index of M-tolerance (= 0 on 1 st trial). • Group MM receives M for 3 trials, then M on the critical 4 th trial in same lab setting. M-tolerance has developed, so RT is ‘normal’. • Group MS receives Saline on 4 th trial – they expected M but got S, so they are hypersensitive to pain and RT is very short. 4

A Paw-Licking Example • Group MM’ receives Morphine on 4 th trial, but in a different setting. The usual cues are absent on the 4 th trial, so rat shd not show M tolerance, and RT shd be long. • Group SM receives Saline for 3 trials and Morphine on 4 th trial, but in same setting. Rat shd not show M tolerance, and RT shd be long. • The 5 th group was SS. Predictions for RT are: SM = MM’ > MM ? SS > MS • Tr = M vs S on 1 st 3 trials; Test = M vs S on 4 th trial 5

A Paw-Licking Example Morphine Morphine Morphine Saline à à Saline à à à à à à à Saline Saline Morphine à Morphine Morphine (New Envt) Contrast 1 (new v same) Contrast 2 Tr: M v S Contrast 3 Test: M v S Contrast 4 Tr * Test Contrast 5 NA! (After Siegel, 1975 – See Howell, 6 th ed., p. 346)

Orthogonal contrasts for a (2x2 + 1) = 5-group design • The (train, test) groups in the ‘paw-lick’ study are MM, MS, SM, SS and MM’ (where M’ = M in a new context). The 1 st 4 groups conform to a tidy 2X2 design. Interpret each contrast below! Group l con l tr l te l T*T 1=MM 1 1 1 1 2=MS 1 1 -1 -1 3=SM 1 -1 1 -1 4=SS 1 -1 -1 1 5=MM’ -4 0 0 0 7

• Ex: ‘Paw-lick’ study of Morphine tolerance, with (train, test) groups, MM, MS, SM, SS, MM’ (where M’ = M in a new context; S = saline). The 1 st 4 groups conform to a tidy 2X2 design, and yields 3 orthog contrasts. But we might be interested also in comparing MM with MM’. Group l con l tr l te l T*T l Kara 1=MM 1 1 1 1 1 2=MS 1 1 -1 -1 0 3=SM 1 -1 1 -1 0 4=SS 1 -1 -1 1 0 5=MM’ -4 0 0 0 -1 8

The problem of Type I errors • Measure 10 variables on n = 100 Ss, and examine the correl matrix for sig correls. Assume true r = 0. How many observed r ’s do we expect to be sig (where | r | crit = 0.20 , p = .05)? (Ans. E = Np = 45*0.05 = 2.25. Why? ) • What is the P(at least 1 sig correl)? Ans . P(at least 1) = 1 – P(none) = 1 – (.95) 45 = .9. We’re almost certain to find at least 1 sig r ! This is the problem with multiple comarisons ! • Suppose we used α = .001 , instead of .05. | r | crit = 0.32 , and P(at least 1 sig r ) = 1 – (.999) 45 = .044, which is much more acceptable. 9

The problem of Type I errors • Decreasing the Type I error rate from .05 to α = . 001 , raises the critical value from 0.2 to | r | crit = 0.32 . • But then we would retain H 0 in cases of ‘seemingly large’ r , e.g., r = 0.27! That is, we would fail to detect violations of H 0 more often; i.e., our power would decrease . • How to decrease α without sacrificing too much power (assuming that sample size, n , is fixed)? • Recall that power depends on (i) α , (ii) the difference in parameter value (e.g., µ, ρ ) between H 0 and H 1 , and (iii) measurement error. 10

The classical approach to multiple comparisons relies on the concepts of Type I and Type II errors. False Discovery Rate (FDR) is a new approach to the multiple comparisons problem. Instead of controlling the chance of any false positives, i.e., Prob(at least 1 false positive) [as Bonferroni or other methods do], FDR controls the expected proportion of false positives among voxels that are judged to be suprathreshold. This turns out to be a relatively lenient metric for false positives, and it leads to an increase in power. The FDR approach is well-suited to the case of “many, many tests.” Later we will show how FDR thresholds are determined from the observed p-value distribution. 12

Outline of a Solution • To ensure that P(at least 1 sig r ) is acceptably low (e.g., .05 or .10), each individual test has to be done with a very stringent level of α (e.g., .01 or .001). • To proceed formally, let us label P(at least 1 sig r ) as the family-wise Type I error rate, α F ; α is, as before, the Type I error rate for each individual test . If we wish α F to be ‘small’ (e.g., .1), what should α be? If we set α at, e.g., .01, what is the resulting α F ? • In sum, what is the relationship between α F and α ? Which approaches ‘optimise’ this reln? 13

R packages, with examples • Most post-hoc comparisons fall into 1 of 2 categories. • Compare every ‘ treatment ’ to a ‘ control ’ group. Download with i nstall.packages( ‘ multcomp ’ ), and use Dunnett ’ s test. • Compare each treatment with every other treatment. Use TukeyHSD(model) and pairwise.t.test(score, group). • Other approaches include Fisher’s Least Significant Difference (LSD) approach, and the use of the False Discovery Rate (FDR). 14

Table 1: Mean outcome judgments as a function of Procedure (Voice vs No voice) and Outcome of Other Participant (Expt. 1) Outcome of other participant Dependent Variable Unknown Better Worse Equal Outcome satisfaction Voice 5.1 a,b 2.6 c 4.1 b 5.4 a No voice 3.1 d 2.8 c 4.2 b 5.3 a Outcome fairness Voice 5.1 b 2.3 c 2.0 c 6.1 a No voice 3.0 d 2.4 c,d 2.1 c 6.1 a Note: For each dependent variable , means with no subscripts in common differ significantly, as indicated by a least significant difference test for multiple comparisons between means (p < .05 ). 15

# Organiser study: Tukey HSD approach contrasts(d00$group, 2)=contr.treatment(3,base=2, contrasts=TRUE) rs3 = aov(score ~ group, data=d00) rs30 = TukeyHSD(rs3) print(rs30) [You may need to define a ‘group’ variable] Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = score ~ group, data = d00) $group diff lwr upr p adj pre.org-no.org 0.1 -1.57075607 1.770756 0.9879376 post.org-no.org 1.7 0.02924393 3.370756 0.0455236 post.org-pre.org 1.6 -0.07075607 3.270756 0.0624878 16

• plot(rs30) 17

# Organiser study: Holm’s approach rs31 = pairwise.t.test(d00$score, d00$group) print(rs31) Pairwise comparisons using t tests with pooled SD data: d00$score and d00$group no.org pre.org pre.org 0.883 - post.org 0.054 0.054 P value adjustment method: holm (Holm ’ s procedure for controlling the familywise Type I error rate will be introduced in a later slide.) 18

Error Rates in Multiple Hypothesis Testing • For a single test of a null hypothesis, H 0 , α • = P(Reject H 0 | H 0 true), the Type I error rate, and β • = P(Retain H 0 | H 0 false), the Type II error rate β • Power = 1 - • How to define “ error rate ” when we test m hypotheses simultaneously? 19

Decision Retain Reject True Correct False Alarm H 0 Retention Type I error False Miss Correct Type II error Rejection False Alarm aka False Discovery or False Rejection. Miss aka False Non-Discovery α = False Alarm rate = P(Reject H 0 |H 0 True) β β = P(Retain H 0 |H 0 False); 1 - = Power 20

Testing m null hypotheses • If we test the (45) correlations among 10 α variables for significance, with = .05, we wd expect about 5% of them, i.e., about 2 or 3 r ’ s to be significant, even if H 0 is true everywhere ; and the prob of at least 1 False Alarm wd be much greater than 0.05. • The prob of at least 1 Type I error when testing m null hypotheses is called the α F familywise Type I error rate , . What is α α F the relation between and ? 21

Multiple Comparisons Occasionally, e.g., at the start of a research - PowerPoint PPT Presentation

Lecture 7.1: Multiple Comparisons (A non-quiz topic) Examples of the need for multiple comparisons The problem with multiple comparisons post hoc; an outline of a solution Specific solutions: Fisher LSD, Tukey HSD, Holm (same as

I10 - Multiple comparisons STAT 401 (Engineering) - Iowa State University March 2, 2018

Correction for multiple comparisons in FreeSurfer 1 Problem of Multiple Comparisons p < 10 -7

Case Comparisons Department of Government London School of Economics and Political Science Uses

Comparing Multiple Comparisons Phil Ender Culver City, California Stata Conference Chicago -

Comparisons of gyrokinetic PIC and CIP codes Comparisons of gyrokinetic PIC and CIP codes

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

BMI-206 Structure-Structure comparisons Sequence-Structure comparisons Marc A. Marti-Renom

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

Multiple Access Readings: Kurose & Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing

Classification Comparisons Math 3220 Data Mining Methods Angelo Parker Overview

Confront Comparisons Confident Me: workshop 3 of 5 School Workshops for Body Confidence

Promoting Pensions and Health Insurance - Comparisons & Contrasts Tony McSweeney Director

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Pick-and-place : Learning from virtual demonstration by Matthew Ng Cher-Wai 1 Todays

Physical Infrastructure Week 1 INFM 603 Agenda Computers The Internet The Web

1 yyyy-mm-dd <the title of the document> <security class> Senior Software Engineer

Migrate early, migrate often! JDK release cadence strategies Dan Heidinga Theresa Mammarella

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever,

Phase II Technical Subgroup Meeting #8 October 19, 2018 (Docket No. 16-521) https://mn.gov/puc

Validation & Evaluation CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

r Author: Pedro Davi Drugowick Ferreira Sao Paulo, 2017 Results Motivatio ion to to study

Multiple Comparisons Occasionally, e.g., at the start of a research - PowerPoint PPT Presentation

Lecture 7.1: Multiple Comparisons (A non-quiz topic) Examples of the need for multiple comparisons The problem with multiple comparisons post hoc; an outline of a solution Specific solutions: Fisher LSD, Tukey HSD, Holm (same as

I10 - Multiple comparisons STAT 401 (Engineering) - Iowa State University March 2, 2018

Correction for multiple comparisons in FreeSurfer 1 Problem of Multiple Comparisons p &lt; 10 -7

Case Comparisons Department of Government London School of Economics and Political Science Uses

Comparing Multiple Comparisons Phil Ender Culver City, California Stata Conference Chicago -

Comparisons of gyrokinetic PIC and CIP codes Comparisons of gyrokinetic PIC and CIP codes

Graph Resistance and Learning from Pairwise Comparisons pairwise comparisons of items. In

BMI-206 Structure-Structure comparisons Sequence-Structure comparisons Marc A. Marti-Renom

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Decrement Models Lecture: Weeks 8-9 Lecture: Weeks 8-9 (STT 456) Multiple Decrement

Multiple Sequence Multiple Sequence Alignments Alignments Multiple alignment Pairwise

Single Single- -Thread NVE Thread NVE Multiple Subsystems, Multiple Threads Multiple

Multiple Access Readings: Kurose &amp; Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing

Classification Comparisons Math 3220 Data Mining Methods Angelo Parker Overview

Confront Comparisons Confident Me: workshop 3 of 5 School Workshops for Body Confidence

Promoting Pensions and Health Insurance - Comparisons &amp; Contrasts Tony McSweeney Director

Rank Aggregation from Pairwise Comparisons in the Presence of Adversarial Corruptions Arpit

Pick-and-place : Learning from virtual demonstration by Matthew Ng Cher-Wai 1 Todays

Physical Infrastructure Week 1 INFM 603 Agenda Computers The Internet The Web

1 yyyy-mm-dd &lt;the title of the document&gt; &lt;security class&gt; Senior Software Engineer

Migrate early, migrate often! JDK release cadence strategies Dan Heidinga Theresa Mammarella

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever,

Phase II Technical Subgroup Meeting #8 October 19, 2018 (Docket No. 16-521) https://mn.gov/puc

Validation &amp; Evaluation CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

r Author: Pedro Davi Drugowick Ferreira Sao Paulo, 2017 Results Motivatio ion to to study

Correction for multiple comparisons in FreeSurfer 1 Problem of Multiple Comparisons p < 10 -7

Multiple Access Readings: Kurose & Ross, 5.3, 5.5 Multiple Access Multiple hosts sharing

Promoting Pensions and Health Insurance - Comparisons & Contrasts Tony McSweeney Director

1 yyyy-mm-dd <the title of the document> <security class> Senior Software Engineer

Validation & Evaluation CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY