201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D - PowerPoint PPT Presentation

201ab Quantitative methods L.13: ANOVA (b) “ANalysis Of VAriance” E D V UL | UCSD Psychology Psych 201ab: Quantitative methods

Three ways to think about factors Cell organization: Data frame/table: Matrix notation: This is the common way to write out This is how we will generally see This is what R/SPSS/JMP/etc. our data if we are going to do ANOVA our data. This representation is do to your data to carry out calculation by hand. This way it’s not directly used for analysis an ANOVA analysis. It is easy to see how to sum things in a (technically), but can be easier to think in this given cell, what a cell mean is, how transformed into either of the notation to figure out to sum across cells, etc. other two representations. different variable coding We are going to avoid all this hand schemes. calculation, but conceptually, this way of thinking about data is helpful to keep track of what we are going to be estimating. E D V UL | UCSD Psychology

How does R encode categories? summary(lm(height~country)) Estimate Std. Error t value Pr(>|t|) (Intercept) 71.6960 0.7247 98.925 < 2e-16 *** countryNorth K. -6.2374 0.9167 -6.804 1.53e-10 *** countrySouth K. -2.3837 0.9588 -2.486 0.0138 * countryUSA -1.5696 0.8876 -1.768 0.0787 . (Intercept): Mean height of Netherlands. Significance: comparison of Neth. mean to 0. Netherlands North K. South K. USA E D V UL | UCSD Psychology

One way ANOVA SS partitioning. anova(lm(height~country)) Response: height Df Sum Sq … country 3 64.782 … Residuals 14 281.414 … SST = SS[country]+SS[residuals] Variability of all heights around mean height. SS[country] Variability “Between” country-means (deviations of country means from from overall mean, scaled by n) SS[residuals] Variability “within” country (deviations of observations from country mean) E D V UL | UCSD Psychology

Does the mean vary with a factor? summary(lm(height~country)) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 73.296 2.589 28.316 9.25e-14 *** countryNorth K. -5.849 3.274 -1.786 0.0957 . countrySouth K. -3.666 3.424 -1.070 0.3025 countryUSA -4.057 3.170 -1.280 0.2214 The coefficient tests compare various offsets. Not our question. anova(lm(height~country)) Response: height Df Sum Sq Mean Sq F value Pr(>F) country 3 64.782 21.594 1.0743 0.3917 Residuals 14 281.414 20.101 ANOVA asks: does mean vary across countries? Country df=3 (3 coefficients encode differences among 4 categories) F = (SSR[country] / (4-1)) / (SSE / (n-4)) p = 1-pf(F, 4-1, n-4) Significance means: more variability in mean height across countires than expected by chance if means are truly the same (therefore accounting for mean differences explains more variance than expected under that null) E D V UL | UCSD Psychology

Factor significance anova(lm(height~country)) Response: height Df Sum Sq Mean Sq F value Pr(>F) country 3 923.72 307.906 19.54 5.567e-11 *** Residuals 176 2773.38 15.758 " % SSR SOURCE F.Country = (923/3) / (2773/176) $ ' p SOURCE # & 19.5 F ( p SOURCE , n − p FULL ) = " % SSE FULL $ ' p.Country = 1-pf(19.54, 3, 176) n − p FULL # & 5e-11 F statistic measures how much Not representative of stats above variance is explained by factor. More “signal variance” always means Our F statistic bigger F, so we do a one-tailed test. E D V UL | UCSD Psychology

Analysis of Variance • Coding factors in regression (general linear model) – “Design matrix” in regression – Categorical coding and indicator variables • Indicator variable coefficients and significance • Factor sums of squares and significance • Factorial ANOVA – main effects. – Unbalanced designs and multicolinearity • Factorial ANOVA – interactions. – Interpreting interactions • Sums of squares in full factorial ANOVA. – No interactions with one observation per cell • ANOVA effect size and power. E D V UL | UCSD Psychology

Factorial designs Multiple factors crossed in one design/model Factor A: Country (index: i) Why do factorial designs? (rather than doing multiple North Korea USA South Korea Netherlands single factor studies) • You can investigate more y 1,1,1 67 y 2,1,1 y 3,1,1 74 y 4,1,1 75 71 y 1,1,2 effects with same data. 66 y 2,1,2 y 3,1,2 83 y 4,1,2 72 77 y 1,1,3 64 y 2,1,3 y 3,1,3 73 y 4,1,3 • 68 You gain power by 70 Male y 1,1,4 64 y 2,1,4 74 y 4,1,4 j=1 80 accounting for the variance y 2,1,5 68 y 4,1,5 73 that arises from the other Factor B: Gender (index: j) y 4,1,6 79 factors, thus reducing error. y 4,1,7 75 • Somewhat stronger (i=1, j=1) (i=2, j=1) (i=3, j=1) (i=4, j=1) evidence for generalizability of effects. y 1,2,1 64 y 2,2,1 y 3,2,1 59 y 4,2,1 61 75 y 1,2,2 68 y 2,2,2 y 3,2,2 • You can test for 63 y 4,2,2 57 68 y 1,2,3 66 y 2,2,3 y 3,2,3 Female 68 y 4,2,3 64 72 interactions . y 1,2,4 57 y 2,2,4 y 3,2,4 60 y 4,2,4 63 66 y 1,2,5 64 y 2,2,5 y 3,2,5 67 65 j=2 y 1,2,6 64 y 2,2,6 y 3,2,6 64 64 Don’t go crazy, 3+ factors is y 2,2,7 59 often a bad idea. y 2,2,8 68 y 2,2,9 • Number of cells (and 72 y 2,2,10 57 sample size req.) multiply. (i=1, j=2) (i=2, j=2) (i=3, j=2) (i=4, j=2) • Interpretation of interactions i=1 i=2 i=3 i=4 becomes impenetrable. E D V UL | UCSD Psychology

Representing factorial designs E D V UL | UCSD Psychology

<- Coding just for “main effects”: additive effects of a factor. Main effect of sex: average difference between men and women Main effect of country: average differences between countries. summary(lm(height~country+sex)) Estimate Std. Error t value Pr(>|t|) (Intercept) 58.437 1.429 40.891 < 2e-16 *** countryNetherlands 5.555 1.745 3.183 0.00300 ** countryS.Korea 3.905 1.818 2.148 0.03855 * countryUSA 5.256 1.818 2.892 0.00646 ** sexm 5.517 1.243 4.439 8.22e-05 *** So, the model predicts different cell means to be: N.K. females = B0 (intercept) Netherlands females = B0 + B1 + (countryNetherlands) S.K. females = B0 + B2 + (countryS.Korea) USA females = B0 + B3 + (countryUSA) N.K. males = B0 + B4 + (sexm) Netherlands males = B0 + B1 + B4 + (netherlands) + (sexm) S.K. males = B0 + B2 + B4 + (S.K.) + (sexm) USA males = B0 + B3 + B4 + (USA) + (sexm) “main effects”: Effect of maleness is additive with effect of country. Difference between males and females is the same for every country, and differences among countries are the same within males and within females. E D V UL | UCSD Psychology

<- Coding just for “main effects”: additive effects of a factor. Main effect of sex: average difference between men and women Main effect of country: average differences between countries. summary(lm(height~country+sex)) Estimate Std. Error t value Pr(>|t|) (Intercept) 58.437 1.429 40.891 < 2e-16 *** countryNetherlands 5.555 1.745 3.183 0.00300 ** countryS.Korea 3.905 1.818 2.148 0.03855 * countryUSA 5.256 1.818 2.892 0.00646 ** sexm 5.517 1.243 4.439 8.22e-05 *** anova(lm(height~country+sex)) Response: height Df Sum Sq Mean Sq F value Pr(>F) country 3 196.18 65.394 4.1827 0.01223 * sex 1 308.09 308.095 19.7060 8.217e-05 *** Residuals 36 562.84 15.635 Significance of main effects (in ANOVA) says variation in average height across country is significantly greater than 0. Similarly, variation in average height across sex is greater than 0. E D V UL | UCSD Psychology

What does a sig. main effect mean? 1. Amount of variance accounted for by factor levels is bigger than chance. 2. Variance of means across factor level is greater than zero. 3. Evidence that not all factor level means are equal. Compare mean of left vs right, and mean of red vs blue… E D V UL | UCSD Psychology

What does a sig. main effect mean? 1. Amount of variance accounted for by factor levels is bigger than chance. 2. Variance of means across factor level is greater than zero. 3. Evidence that not all factor level means are equal. What it does not mean: – That there is a uniform additive offset of factor level. (just one rogue cell would do) – Or that the means vary in any other particular pattern. (mean changes might not coincide with your prediction) Ugh: main effects will show up, but they aren’t consistent with intuitive interpretation. E D V UL | UCSD Psychology

Analysis of Variance • Coding factors in regression (general linear model) – “Design matrix” in regression – Categorical coding and indicator variables • Indicator variable coefficients and significance • Factor sums of squares and significance • Factorial ANOVA – main effects. – Unbalanced designs and multicolinearity • Factorial ANOVA – interactions. – Interpreting interactions • Sums of squares in full factorial ANOVA. – No interactions with one observation per cell • ANOVA effect size and power. E D V UL | UCSD Psychology

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D - PowerPoint PPT Presentation

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D V UL | UCSD Psychology Psych 201ab: Quantitative methods Three ways to think about factors Cell organization: Data frame/table: Matrix notation: This is the common way

201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology

201ab Quantitative methods ANCOVA E D V UL | UCSD Psychology What does ANCOVA do? In an ANOVA ,

Two-Way ANOVA Two-way ANOVA So far, our ANOVA problems had only one dependent variable and

201ab Quantitative methods Linear model diagnostics. Model assumptions, in order of importance

201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris. E

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline

STAT 401A - Statistical Methods for Research Workers Two-way ANOVA Jarad Niemi (Dr. J) Iowa

Quantitative Quantitative Quantitative Quantitative Modal Modal Transition Transition

ANOVA: Analysis of Variance An example ANOVA problem 25 individuals split into three

R06 - ANOVA and F-tests STAT 587 (Engineering) Iowa State University November 3, 2020

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&M

MIT Light Guides Jarrett Moon CSU DUNE Workshop 5/17/16 Overview Light Detection Goals

Comparison of AIRS and IASI observed radiances using SNOs: Approach and Preliminary Results Dave

Simulation of Flexible Multibody Systems Robert Altmann Technische Universit at Berlin

Federal Demonstration Partnership NASA Agency Updates Thursday, January 9, 2020 Speakers:

Probability and Statistics for Computer Science many problems are naturally

Housekeeping Agenda Introduction Emma Miller, Refinitiv Breadth and depth of Data

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Analyzing and interpreting neural networks for NLP Tal Linzen Department of Cognitive Science

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D - PowerPoint PPT Presentation

201ab Quantitative methods L.13: ANOVA (b) ANalysis Of VAriance E D V UL | UCSD Psychology Psych 201ab: Quantitative methods Three ways to think about factors Cell organization: Data frame/table: Matrix notation: This is the common way

201ab Quantitative methods L.12 Linear model: Categorical predictors E D V UL | UCSD Psychology

201ab Quantitative methods ANCOVA E D V UL | UCSD Psychology What does ANCOVA do? In an ANOVA ,

Two-Way ANOVA Two-way ANOVA So far, our ANOVA problems had only one dependent variable and

201ab Quantitative methods Linear model diagnostics. Model assumptions, in order of importance

201ab Quantitative methods Multiple regression (b) With great illustrations from Julian Parris. E

201ab Quantitative methods L.09: Correlation, regression (2) Alt-text: Correlation doesn't imply

201ab Quantitative methods Visualization E D V UL | UCSD Psychology Visualization failure

201ab Quantitative methods non-linear Transformations E D V UL | UCSD Psychology 1 Linearly

Unit 4: Inference for numerical variables Lecture 3: ANOVA Statistics 101 Thomas Leininger June

Workshop 7.6a: Factorial ANOVA Murray Logan 19 Jul 2017 Section 1 Background Factorial ANOVA

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline

STAT 401A - Statistical Methods for Research Workers Two-way ANOVA Jarad Niemi (Dr. J) Iowa

Quantitative Quantitative Quantitative Quantitative Modal Modal Transition Transition

ANOVA: Analysis of Variance An example ANOVA problem 25 individuals split into three

R06 - ANOVA and F-tests STAT 587 (Engineering) Iowa State University November 3, 2020

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&amp;M

MIT Light Guides Jarrett Moon CSU DUNE Workshop 5/17/16 Overview Light Detection Goals

Comparison of AIRS and IASI observed radiances using SNOs: Approach and Preliminary Results Dave

Simulation of Flexible Multibody Systems Robert Altmann Technische Universit at Berlin

Federal Demonstration Partnership NASA Agency Updates Thursday, January 9, 2020 Speakers:

Probability and Statistics for Computer Science many problems are naturally

Housekeeping Agenda Introduction Emma Miller, Refinitiv Breadth and depth of Data

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Analyzing and interpreting neural networks for NLP Tal Linzen Department of Cognitive Science

Computing a one- way ANOVA Rick Balkin, Ph.D., LPC, NCC Department of Counseling Texas A&M