An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - - PowerPoint PPT Presentation
An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - - PowerPoint PPT Presentation
An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non-
Objectives
By actively following the lecture and practical and carrying
- ut the independent study the successful student will be
able to:
- Select One-way or Two-way ANOVA, or their non-
parametric equivalent tests appropriately and apply them in R
- Explain the rationale and principles of One-way and
Two-way ANOVA
- Interpret and report the results One-way and Two-way
ANOVA and their non-parametric equivalents
Continuous~categorical tests…
How many samples? 1 sample 2 samples Data in pairs?
(e.g multiple measurements on same organism)
Yes No One sample t-test H0: µ1 = µ0 H1: µ1 ≠ µ0 Compares mean to a hypothesized mean (µ0) Paired sample t-test H0: µ1 - µ2 = 0 H1: µ1 - µ2 ≠ 0 Compares mean difference to zero Two sample t-test H0: µ1 = µ2 H1: µ1 ≠ µ2 Compares two means >2 samples? ANOVA
Continuous ~ Categorical: more than two groups
Analysis of Variance tests allow us to consider the differences between more than two groups. They have nonparametric alternatives when assumptions are not met Does genotype affect blood pressure: replicates of 3 different genotypes and measures of systolic blood pressure.
genotype Val/Val Met/Val Met/Met … … …
Why ANOVA and not several t-tests?
Doing lots of comparisons inflates the type 1 error rate (rejecting the null hypothesis when it is true)
Ø For a statistical test with α= 0.05, if the null hypothesis is true then the probability of not obtaining a significant result is 0.95. Ø You compare 4 groups (A, B, C, D) = 6 tests (α= 0.05 for each) The probability of not obtaining a significant result is (0.95)6 = 0.74 Your chances of incorrectly rejecting the null hypothesis (a type I error) is about 1 in 4 instead of 1 in 20! ANOVA compares all means simultaneously and maintains the type I error probability at the designated level (and not inflating it)
Same principles: t-tests & ANOVA
These fundamentally the same way using measures of variation
t-tests: is difference big relative to variation? ANOVA: is variation between groups big relative to variation within groups? Also has assumptions based on normal distribution: normality and equal variance
ANOVA terminology
The categorical explanatory variable: Factor, Treatment (e.g. genotype) The different groups: Levels of the factor (Val/Val, Met/Val, Met/Met) Variance: MS - Mean square “mean of the squared deviations from the mean” Total variation: Total MS Variation between groups: Treatment MS, Factor MS Variation within the groups: Residual MS, Error MS
- r
One-way ANOVA: example
Which of three media is best for growing bacterial cultures? One factor: media Three levels: Control Control + sugar Control + sugar + amino acids Continuous response: colony diameters (mm)
Long format Response ~ explanatory
Test H0: F = 1 vs H1: F > 1 Interpretation: H0: mean1 = mean2 = mean3 vs H1: at least two means differ
One-way ANOVA: example
Checking assumptions before running the ANOVA Normality
tapply(culture$diameter, culture$medium,shapiro.test) $control Shapiro-Wilk normality test data: X[[1L]] W = 0.9347, p-value = 0.4955 $`with sugar` Shapiro-Wilk normality test data: X[[2L]] W = 0.9429, p-value = 0.5857 $`with sugar + amino acids` Shapiro-Wilk normality test data: X[[3L]] W = 0.9284, p-value = 0.4322
One-way ANOVA: example
No evidence that assumptions are not met
bartlett.test(culture$diameter, culture$medium) Bartlett test of homogeneity of variances data: diameter and medium Bartlett's K-squared = 2.3986, df = 2, p-value = 0.3014
One-way ANOVA: example
Checking assumptions before running the ANOVA Equal variance
No evidence that assumptions are not met
Running the test
mod <- aov(diameter ~ medium, data = culture) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
aov() - the anova function Output saved to mod Response ~ Explanatory the ‘model formula’
One-way ANOVA: example
Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’
0.1 ‘ ’ 1 Between groups Within groups
One-way ANOVA: example
Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- No. of levels – 1
3-1=2 (no. in each level - 1) x no. of levels (10-1) x 3 = 27 sum of squared deviation between each value and its group mean sum of squared deviations between group mean and overall mean * number in each group
One-way ANOVA: example
Mean Square (aka variance) = SS / d.f. F: Medium MS / Residual MS
One-way ANOVA: example
Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Reporting the result
Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006). But not quite finished reporting…..
Significance Direction Statistics
One-way ANOVA: example
Checking assumptions after running the ANOVA
Use the residuals - the ‘real’ assumption plot(mod)
Spread should be similar in each group: equal variance Should be approx 1:1 for normality
One-way ANOVA: example
Reporting the result
But not quite finished reporting yet….. Which means differ? Requires a “post-hoc” test e.g., Tukey
Significance Direction Statistics
One-way ANOVA: example
Reporting the result: which means differ
TukeyHSD(aov(diameter ~ medium, data = culture)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = diameter ~ medium) $medium diff lwr upr p adj with sugar-control 0.170 -0.857331 1.197331 0.9116894 with sugar + amino acids-control 1.331 0.303669 2.358331 0.0092052 with sugar + amino acids-with sugar 1.161 0.133669 2.188331 0.0243794 plot(TukeyHSD(aov(diameter ~ medium, data = culture))) 95% CI comparison A difference of zero
One-way ANOVA: example
Illustrating (order factors going from lowest to highest mean)
Figure 1. Mean colony diameter for bacteria grown on different media. Error bars are +/-S.E. Means that do not differ significantly under post- hoc comparison are labelled with the same letter code
One-way ANOVA: example
Or Anything is possible with ggplot +geom_jitter() : show all data points +annotate() : to add lines and text to the plot
Reporting the result: finishes
There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006) with colonies shown, by post-hoc comparison, to grow significantly better when both sugar and amino acids were added to the medium (see Figure 1). The addition of sugar alone did not significantly increase growth.
Significance Direction Statistics
One-way ANOVA: example
One-way ANOVA: nonparametric equivalent
When: residuals are heteroscedastic (unequal variance) and/or not normal. Especially when there is a combination of unequal samples sizes and heteroscedasticity.
Kruskal-Wallis
Uses ranks H0: mean rank g1 = mean rank g2 = mean rank g3 etc vs H1: at least 2 mean ranks differ
Kruskal-Wallis - example
Running the test Here used on same data – for comparison of power.
kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742
Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584
Reporting the result
kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742
There was a significant effect of media on the diameter of bacterial colonies (Kruskal-Wallis: !2= 8.1; d.f. = 2; p = 0.017). Post-hoc test?
Significance Direction Statistics
Kruskal-Wallis - example
Reporting the result: which groups differ Difference between the mean of the ranks
ranked <- rank(culture$diameter) tapply(ranked, culture$medium,mean) control with sugar with sugar + amino acids 11.85 12.70 21.95 library(pgirmess) kruskalmc(diameter, medium, probs = 0.05) Multiple comparison test after Kruskal-Wallis p.value: 0.05 Comparisons
- bs.dif critical.dif difference
control-with sugar 0.85 9.425108 FALSE control-with sugar + amino acids 10.10 9.425108 TRUE with sugar-with sugar + amino acids 9.25 9.425108 FALSE
Kruskal-Wallis - example
Figure 1. Median (heavy lines) colony diameter for bacteria grown on different media.
Kruskal-Wallis - example
Two-way ANOVA
What if we have want to see the effects of more than one categorical variable on a continuous variable? Sex Species
F.flappa F.concocti I.lepidoptera Male Female
Two-way ANOVA
Two-way Analysis of variance allows us to examine the effects of two variables simultaneously and whether those two variables act independently in their effect on the response variable A two-way ANOVA tests for: The effect of categorical variable one The effect of categorical variable two The independence of effects (requires replication)
Running and interpreting the test
mod <- aov(winglen ~ sex * spp, data = butter) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 There is an effect of sex (difference between sexes) There is an effect of species (difference between species) There is an interaction between sex and species…..
Response ~ Explanatory 1 * Exploratory 2 + just the main effects (no interaction) * main effects and the interaction Equivalent to sex + spp + sex:spp Total df = no. of values – 1 Residual d.f. = total d.f. – all The interaction: d.f. each factor multiplied together Residual MS: the ‘error’ MS for all three tests
Two-way ANOVA with replication: Example
Interpreting the interaction
interaction.plot(butter$spp, butter$sex, butter$winglen) Do not include in report but helps you understand ‘effect of one factor depends on the level of another’
Two-way ANOVA with replication: Example
Significance Direction Statistics Reporting the result
summarySE(data = butter, measurevar = "winglen", groupvars = c("sex","spp")) sex spp N winglen sd se ci 1 females F.concocti 10 31.37 4.275265 1.3519574 3.058340 2 females F.flappa 10 24.67 3.270423 1.0341986 2.339520 3 males F.concocti 10 24.97 4.957609 1.5677337 3.546460 4 males F.flappa 10 23.45 3.012290 0.9525696 2.154862 F.concocti had significantly longer wings than F.flappa (ANOVA: F = 10.79; d.f. = 1,36; p = 0.002) and females were significantly bigger than males (F = 9.27; d.f. = 1,36; p =0.004). However, there was also a significant interaction between sex and species (F = 4.28; d.f. = 1,36; p = 0.046) with a much greater difference between males and females in F.concocti than in F.flappa (Figure 1).
Two-way ANOVA with replication: Example
Illustrating result
Two-way ANOVA with replication: Example
Sex – NS Spp – NS Int – Sig But sex does have an effect! It is just reversed If you have a significant interaction, interpret main effects with care.
Two-way ANOVA with replication: Example
Summary…
- ANOVA –parametric test to determine if 3(+) population means are all equal
- Kruskal-Wallis is the non-parametric equivalent
- Post-hoc tests can be used to determine where significant differences lie
between groups
- Two-way Analysis of variance allows us to examine the effects of two
variables simultaneously and whether those two variables act independently in their effect on the response variable