An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - - PowerPoint PPT Presentation

an alysis o f va riance anova
SMART_READER_LITE
LIVE PREVIEW

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - - PowerPoint PPT Presentation

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non-


slide-1
SLIDE 1

Analysis of variance (ANOVA)

Lecture 4

slide-2
SLIDE 2

Objectives

By actively following the lecture and practical and carrying

  • ut the independent study the successful student will be

able to:

  • Select One-way or Two-way ANOVA, or their non-

parametric equivalent tests appropriately and apply them in R

  • Explain the rationale and principles of One-way and

Two-way ANOVA

  • Interpret and report the results One-way and Two-way

ANOVA and their non-parametric equivalents

slide-3
SLIDE 3

Continuous~categorical tests…

How many samples? 1 sample 2 samples Data in pairs?

(e.g multiple measurements on same organism)

Yes No One sample t-test H0: µ1 = µ0 H1: µ1 ≠ µ0 Compares mean to a hypothesized mean (µ0) Paired sample t-test H0: µ1 - µ2 = 0 H1: µ1 - µ2 ≠ 0 Compares mean difference to zero Two sample t-test H0: µ1 = µ2 H1: µ1 ≠ µ2 Compares two means >2 samples? ANOVA

slide-4
SLIDE 4

Continuous ~ Categorical: more than two groups

Analysis of Variance tests allow us to consider the differences between more than two groups. They have nonparametric alternatives when assumptions are not met Does genotype affect blood pressure: replicates of 3 different genotypes and measures of systolic blood pressure.

genotype Val/Val Met/Val Met/Met … … …

slide-5
SLIDE 5

Why ANOVA and not several t-tests?

Doing lots of comparisons inflates the type 1 error rate (rejecting the null hypothesis when it is true)

Ø For a statistical test with α= 0.05, if the null hypothesis is true then the probability of not obtaining a significant result is 0.95. Ø You compare 4 groups (A, B, C, D) = 6 tests (α= 0.05 for each) The probability of not obtaining a significant result is (0.95)6 = 0.74 Your chances of incorrectly rejecting the null hypothesis (a type I error) is about 1 in 4 instead of 1 in 20! ANOVA compares all means simultaneously and maintains the type I error probability at the designated level (and not inflating it)

slide-6
SLIDE 6

Same principles: t-tests & ANOVA

These fundamentally the same way using measures of variation

t-tests: is difference big relative to variation? ANOVA: is variation between groups big relative to variation within groups? Also has assumptions based on normal distribution: normality and equal variance

slide-7
SLIDE 7

ANOVA terminology

The categorical explanatory variable: Factor, Treatment (e.g. genotype) The different groups: Levels of the factor (Val/Val, Met/Val, Met/Met) Variance: MS - Mean square “mean of the squared deviations from the mean” Total variation: Total MS Variation between groups: Treatment MS, Factor MS Variation within the groups: Residual MS, Error MS

  • r
slide-8
SLIDE 8

One-way ANOVA: example

Which of three media is best for growing bacterial cultures? One factor: media Three levels: Control Control + sugar Control + sugar + amino acids Continuous response: colony diameters (mm)

slide-9
SLIDE 9

Long format Response ~ explanatory

Test H0: F = 1 vs H1: F > 1 Interpretation: H0: mean1 = mean2 = mean3 vs H1: at least two means differ

One-way ANOVA: example

slide-10
SLIDE 10

Checking assumptions before running the ANOVA Normality

tapply(culture$diameter, culture$medium,shapiro.test) $control Shapiro-Wilk normality test data: X[[1L]] W = 0.9347, p-value = 0.4955 $`with sugar` Shapiro-Wilk normality test data: X[[2L]] W = 0.9429, p-value = 0.5857 $`with sugar + amino acids` Shapiro-Wilk normality test data: X[[3L]] W = 0.9284, p-value = 0.4322

One-way ANOVA: example

No evidence that assumptions are not met

slide-11
SLIDE 11

bartlett.test(culture$diameter, culture$medium) Bartlett test of homogeneity of variances data: diameter and medium Bartlett's K-squared = 2.3986, df = 2, p-value = 0.3014

One-way ANOVA: example

Checking assumptions before running the ANOVA Equal variance

No evidence that assumptions are not met

slide-12
SLIDE 12

Running the test

mod <- aov(diameter ~ medium, data = culture) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

aov() - the anova function Output saved to mod Response ~ Explanatory the ‘model formula’

One-way ANOVA: example

slide-13
SLIDE 13

Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’

0.1 ‘ ’ 1 Between groups Within groups

One-way ANOVA: example

slide-14
SLIDE 14

Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  • No. of levels – 1

3-1=2 (no. in each level - 1) x no. of levels (10-1) x 3 = 27 sum of squared deviation between each value and its group mean sum of squared deviations between group mean and overall mean * number in each group

One-way ANOVA: example

slide-15
SLIDE 15

Mean Square (aka variance) = SS / d.f. F: Medium MS / Residual MS

One-way ANOVA: example

Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  • Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
slide-16
SLIDE 16

Reporting the result

Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006). But not quite finished reporting…..

Significance Direction Statistics

One-way ANOVA: example

slide-17
SLIDE 17

Checking assumptions after running the ANOVA

Use the residuals - the ‘real’ assumption plot(mod)

Spread should be similar in each group: equal variance Should be approx 1:1 for normality

One-way ANOVA: example

slide-18
SLIDE 18

Reporting the result

But not quite finished reporting yet….. Which means differ? Requires a “post-hoc” test e.g., Tukey

Significance Direction Statistics

One-way ANOVA: example

slide-19
SLIDE 19

Reporting the result: which means differ

TukeyHSD(aov(diameter ~ medium, data = culture)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = diameter ~ medium) $medium diff lwr upr p adj with sugar-control 0.170 -0.857331 1.197331 0.9116894 with sugar + amino acids-control 1.331 0.303669 2.358331 0.0092052 with sugar + amino acids-with sugar 1.161 0.133669 2.188331 0.0243794 plot(TukeyHSD(aov(diameter ~ medium, data = culture))) 95% CI comparison A difference of zero

One-way ANOVA: example

slide-20
SLIDE 20

Illustrating (order factors going from lowest to highest mean)

Figure 1. Mean colony diameter for bacteria grown on different media. Error bars are +/-S.E. Means that do not differ significantly under post- hoc comparison are labelled with the same letter code

One-way ANOVA: example

slide-21
SLIDE 21

Or Anything is possible with ggplot +geom_jitter() : show all data points +annotate() : to add lines and text to the plot

slide-22
SLIDE 22

Reporting the result: finishes

There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006) with colonies shown, by post-hoc comparison, to grow significantly better when both sugar and amino acids were added to the medium (see Figure 1). The addition of sugar alone did not significantly increase growth.

Significance Direction Statistics

One-way ANOVA: example

slide-23
SLIDE 23

One-way ANOVA: nonparametric equivalent

When: residuals are heteroscedastic (unequal variance) and/or not normal. Especially when there is a combination of unequal samples sizes and heteroscedasticity.

Kruskal-Wallis

Uses ranks H0: mean rank g1 = mean rank g2 = mean rank g3 etc vs H1: at least 2 mean ranks differ

slide-24
SLIDE 24

Kruskal-Wallis - example

Running the test Here used on same data – for comparison of power.

kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742

Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

slide-25
SLIDE 25

Reporting the result

kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742

There was a significant effect of media on the diameter of bacterial colonies (Kruskal-Wallis: !2= 8.1; d.f. = 2; p = 0.017). Post-hoc test?

Significance Direction Statistics

Kruskal-Wallis - example

slide-26
SLIDE 26

Reporting the result: which groups differ Difference between the mean of the ranks

ranked <- rank(culture$diameter) tapply(ranked, culture$medium,mean) control with sugar with sugar + amino acids 11.85 12.70 21.95 library(pgirmess) kruskalmc(diameter, medium, probs = 0.05) Multiple comparison test after Kruskal-Wallis p.value: 0.05 Comparisons

  • bs.dif critical.dif difference

control-with sugar 0.85 9.425108 FALSE control-with sugar + amino acids 10.10 9.425108 TRUE with sugar-with sugar + amino acids 9.25 9.425108 FALSE

Kruskal-Wallis - example

slide-27
SLIDE 27

Figure 1. Median (heavy lines) colony diameter for bacteria grown on different media.

Kruskal-Wallis - example

slide-28
SLIDE 28

Two-way ANOVA

What if we have want to see the effects of more than one categorical variable on a continuous variable? Sex Species

F.flappa F.concocti I.lepidoptera Male Female

slide-29
SLIDE 29

Two-way ANOVA

Two-way Analysis of variance allows us to examine the effects of two variables simultaneously and whether those two variables act independently in their effect on the response variable A two-way ANOVA tests for: The effect of categorical variable one The effect of categorical variable two The independence of effects (requires replication)

slide-30
SLIDE 30

Running and interpreting the test

mod <- aov(winglen ~ sex * spp, data = butter) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 There is an effect of sex (difference between sexes) There is an effect of species (difference between species) There is an interaction between sex and species…..

Response ~ Explanatory 1 * Exploratory 2 + just the main effects (no interaction) * main effects and the interaction Equivalent to sex + spp + sex:spp Total df = no. of values – 1 Residual d.f. = total d.f. – all The interaction: d.f. each factor multiplied together Residual MS: the ‘error’ MS for all three tests

Two-way ANOVA with replication: Example

slide-31
SLIDE 31

Interpreting the interaction

interaction.plot(butter$spp, butter$sex, butter$winglen) Do not include in report but helps you understand ‘effect of one factor depends on the level of another’

Two-way ANOVA with replication: Example

slide-32
SLIDE 32

Significance Direction Statistics Reporting the result

summarySE(data = butter, measurevar = "winglen", groupvars = c("sex","spp")) sex spp N winglen sd se ci 1 females F.concocti 10 31.37 4.275265 1.3519574 3.058340 2 females F.flappa 10 24.67 3.270423 1.0341986 2.339520 3 males F.concocti 10 24.97 4.957609 1.5677337 3.546460 4 males F.flappa 10 23.45 3.012290 0.9525696 2.154862 F.concocti had significantly longer wings than F.flappa (ANOVA: F = 10.79; d.f. = 1,36; p = 0.002) and females were significantly bigger than males (F = 9.27; d.f. = 1,36; p =0.004). However, there was also a significant interaction between sex and species (F = 4.28; d.f. = 1,36; p = 0.046) with a much greater difference between males and females in F.concocti than in F.flappa (Figure 1).

Two-way ANOVA with replication: Example

slide-33
SLIDE 33

Illustrating result

Two-way ANOVA with replication: Example

slide-34
SLIDE 34

Sex – NS Spp – NS Int – Sig But sex does have an effect! It is just reversed If you have a significant interaction, interpret main effects with care.

Two-way ANOVA with replication: Example

slide-35
SLIDE 35

Summary…

  • ANOVA –parametric test to determine if 3(+) population means are all equal
  • Kruskal-Wallis is the non-parametric equivalent
  • Post-hoc tests can be used to determine where significant differences lie

between groups

  • Two-way Analysis of variance allows us to examine the effects of two

variables simultaneously and whether those two variables act independently in their effect on the response variable