An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - PowerPoint PPT Presentation

An alysis o f va riance (ANOVA) Lecture 4

Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non- ● parametric equivalent tests appropriately and apply them in R Explain the rationale and principles of One-way and ● Two-way ANOVA Interpret and report the results One-way and Two-way ● ANOVA and their non-parametric equivalents

Continuous~categorical tests… >2 samples? How many samples? ANOVA 2 samples 1 sample Data in pairs? (e.g multiple measurements on Yes same organism) One sample t-test H 0 : µ 1 = µ 0 No H 1 : µ 1 ≠ µ 0 Paired sample t-test H 0 : µ 1 - µ 2 = 0 Compares mean Two sample t-test H 1 : µ 1 - µ 2 ≠ 0 to a hypothesized H 0 : µ 1 = µ 2 mean (µ0) H 1 : µ 1 ≠ µ 2 Compares mean difference to zero Compares two means

Continuous ~ Categorical: more than two groups Analysis of Variance tests allow us to consider the differences between more than two groups . They have nonparametric alternatives when assumptions are not met Does genotype affect blood pressure: replicates of 3 different genotypes and measures of systolic blood pressure. genotype Val/Val Met/Val Met/Met … … …

Why ANOVA and not several t -tests? Doing lots of comparisons inflates the type 1 error rate (rejecting the null hypothesis when it is true) Ø For a statistical test with α= 0.05, if the null hypothesis is true then the probability of not obtaining a significant result is 0.95. Ø You compare 4 groups (A, B, C, D) = 6 tests (α= 0.05 for each) The probability of not obtaining a significant result is (0.95) 6 = 0.74 Your chances of incorrectly rejecting the null hypothesis (a type I error) is about 1 in 4 instead of 1 in 20! ANOVA compares all means simultaneously and maintains the type I error probability at the designated level (and not inflating it)

Same principles: t-tests & ANOVA These fundamentally the same way using measures of variation t -tests: is difference big relative to variation? ANOVA: is variation between groups big relative to variation within groups? Also has assumptions based on normal distribution: normality and equal variance

ANOVA terminology The categorical explanatory variable: Factor, Treatment (e.g. genotype) The different groups: Levels of the factor (Val/Val, Met/Val, Met/Met) Variance: MS - Mean square “mean of the squared deviations from the mean” Total variation: Total MS Variation between groups: Treatment MS, Factor MS Variation within the groups: Residual MS, Error MS or

One-way ANOVA: example Which of three media is best for growing bacterial cultures? One factor: media Three levels: Control Control + sugar Control + sugar + amino acids Continuous response: colony diameters (mm)

One-way ANOVA: example Long format Response ~ explanatory Test H 0 : F = 1 vs H 1 : F > 1 Interpretation: H 0 : mean1 = mean2 = mean3 vs H 1 : at least two means differ

One-way ANOVA: example Checking assumptions before running the ANOVA tapply(culture$diameter, culture$medium,shapiro.test) Normality $control Shapiro-Wilk normality test data: X[[1L]] W = 0.9347, p-value = 0.4955 $`with sugar` Shapiro-Wilk normality test data: X[[2L]] W = 0.9429, p-value = 0.5857 No evidence that assumptions are $`with sugar + amino acids` not met Shapiro-Wilk normality test data: X[[3L]] W = 0.9284, p-value = 0.4322

One-way ANOVA: example Checking assumptions before running the ANOVA Equal variance bartlett.test(culture$diameter, culture$medium) Bartlett test of homogeneity of variances data: diameter and medium Bartlett's K-squared = 2.3986, df = 2, p-value = 0.3014 No evidence that assumptions are not met

One-way ANOVA: example Running the test aov() - the anova function Response ~ Explanatory Output saved to mod the ‘model formula’ mod <- aov(diameter ~ medium, data = culture) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

One-way ANOVA: example Df Sum Sq Mean Sq F value Pr(>F) Between groups medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 Within groups --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

One-way ANOVA: example No. of levels – 1 sum of squared deviations between 3-1=2 group mean and overall mean * number in each group Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 sum of squared deviation between (no. in each level - 1) each value and its group mean x no. of levels (10-1) x 3 = 27

One-way ANOVA: example Mean Square (aka variance) F: Medium MS / = SS / d.f. Residual MS Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

One-way ANOVA: example Reporting the result Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006). But not quite finished reporting….. Significance Direction Statistics

Spread should be similar in One-way ANOVA: example each group: equal variance Checking assumptions after running the ANOVA Use the residuals - the ‘real’ assumption plot(mod) Should be approx 1:1 for normality

One-way ANOVA: example Reporting the result But not quite finished reporting yet….. Significance Direction Statistics Which means differ? Requires a “post-hoc” test e.g., Tukey

One-way ANOVA: example A difference of zero Reporting the result: which means differ comparison 95% CI TukeyHSD(aov(diameter ~ medium, data = culture)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = diameter ~ medium) $medium diff lwr upr p adj with sugar-control 0.170 -0.857331 1.197331 0.9116894 with sugar + amino acids-control 1.331 0.303669 2.358331 0.0092052 with sugar + amino acids-with sugar 1.161 0.133669 2.188331 0.0243794 plot(TukeyHSD(aov(diameter ~ medium, data = culture)))

One-way ANOVA: example Illustrating (order factors going from lowest to highest mean) Figure 1. Mean colony diameter for bacteria grown on different media. Error bars are +/- S.E. Means that do not differ significantly under post- hoc comparison are labelled with the same letter code

Or Anything is possible with ggplot +geom_jitter() : show all data points +annotate() : to add lines and text to the plot

One-way ANOVA: example Reporting the result: finishes There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f . = 2, 27; p = 0.006) with colonies shown, by post-hoc comparison, to grow significantly better when both sugar and amino acids were added to the medium (see Figure 1). The addition of sugar alone did not significantly increase growth. Significance Direction Statistics

One-way ANOVA: nonparametric equivalent When: residuals are heteroscedastic (unequal variance) and/or not normal. Especially when there is a combination of unequal samples sizes and heteroscedasticity. Kruskal-Wallis Uses ranks H 0 : mean rank g1 = mean rank g2 = mean rank g3 etc vs H 1 : at least 2 mean ranks differ

Kruskal-Wallis - example Running the test Here used on same data – for comparison of power. kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742 Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

Kruskal-Wallis - example Significance Direction Statistics Reporting the result kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742 There was a significant effect of media on the diameter of bacterial colonies (Kruskal-Wallis: ! 2 = 8.1; d.f. = 2; p = 0.017). Post-hoc test?

Kruskal-Wallis - example Reporting the result: which groups differ library(pgirmess) kruskalmc(diameter, medium, probs = 0.05) Multiple comparison test after Kruskal-Wallis p.value: 0.05 Comparisons obs.dif critical.dif difference control-with sugar 0.85 9.425108 FALSE control-with sugar + amino acids 10.10 9.425108 TRUE with sugar-with sugar + amino acids 9.25 9.425108 FALSE Difference between the mean of the ranks ranked <- rank(culture$diameter) tapply(ranked, culture$medium,mean) control with sugar with sugar + amino acids 11.85 12.70 21.95

Kruskal-Wallis - example Figure 1. Median (heavy lines) colony diameter for bacteria grown on different media.

Two-way ANOVA What if we have want to see the effects of more than one categorical variable on a continuous variable? Species Sex F.flappa F.concocti I.lepidoptera Male Female

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - PowerPoint PPT Presentation

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non-

Two-Way ANOVA Two-way ANOVA So far, our ANOVA problems had only one dependent variable and

Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline

EDUR 8131 Chat 13: ANOVA , Part 2 1 Notes 9a: One-way ANOVA Previous chat covered through

Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some situations) Does

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Statistical Power in Statistical Power in ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

ANOVA: Analysis of Variance An example ANOVA problem 25 individuals split into three

STAT 213 Two-Way ANOVA II Colin Reimer Dawson Oberlin College May 2, 2018 1 / 21 Outline

STAT 215 Multifactor ANOVA I Colin Reimer Dawson Oberlin College November 28, 2017 1 / 25

R06 - ANOVA and F-tests STAT 587 (Engineering) Iowa State University November 3, 2020

Factorial ANOVA Theory Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A&M

Writing Results for Writing Results for ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&M

Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision

Treating AML: Other Molecular Targets Richard A. Larson, MD The University of Chicago September

COMP364: PROSITE & Regexp Jrme Waldisphl, McGill University

Incorporating Concept Hierarchies Into Usage Mining Based Recommendations Amit Bose - University

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Presenter : Peter Muhlberger on behalf of the SaTC Team: Nina Amla, Vijay Atluri, Jeremy Epstein,

NSF/Intel Partnership on Cyber-Physical Systems Security and

INTRODUCTION TO THE CCC AND THE CCC COUNCIL June 20, 2017 AN OVERVIEW OF THE COMPUTING

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS department, Oxford University

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - PowerPoint PPT Presentation

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non-

Two-Way ANOVA Two-way ANOVA So far, our ANOVA problems had only one dependent variable and

Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline

EDUR 8131 Chat 13: ANOVA , Part 2 1 Notes 9a: One-way ANOVA Previous chat covered through

Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some situations) Does

SVD- -based Functional ANOVA For based Functional ANOVA For SVD Measurement Evaluation of

Statistical Power in Statistical Power in ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

ANOVA: Analysis of Variance An example ANOVA problem 25 individuals split into three

STAT 213 Two-Way ANOVA II Colin Reimer Dawson Oberlin College May 2, 2018 1 / 21 Outline

STAT 215 Multifactor ANOVA I Colin Reimer Dawson Oberlin College November 28, 2017 1 / 25

R06 - ANOVA and F-tests STAT 587 (Engineering) Iowa State University November 3, 2020

Factorial ANOVA Theory Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A&amp;M

Writing Results for Writing Results for ANOVA ANOVA Rick Balkin Balkin, Ph.D., LPC , Ph.D.,

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&amp;M

Workshop 7.4a: Single factor ANOVA Murray Logan November 23, 2016 Table of contents 1 Revision

Treating AML: Other Molecular Targets Richard A. Larson, MD The University of Chicago September

COMP364: PROSITE &amp; Regexp Jrme Waldisphl, McGill University

Incorporating Concept Hierarchies Into Usage Mining Based Recommendations Amit Bose - University

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Presenter : Peter Muhlberger on behalf of the SaTC Team: Nina Amla, Vijay Atluri, Jeremy Epstein,

NSF/Intel Partnership on Cyber-Physical Systems Security and

INTRODUCTION TO THE CCC AND THE CCC COUNCIL June 20, 2017 AN OVERVIEW OF THE COMPUTING

SMT-BASED ANALYSIS OF BIOLOGICAL SYSTEMS Nicola Paoletti CS department, Oxford University

Factorial ANOVA Theory Rick Balkin, Ph.D., LPC-S, NCC Department of Counseling Texas A&M

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&M

COMP364: PROSITE & Regexp Jrme Waldisphl, McGill University