Analysis of Variance October 16, 2019 October 16, 2019 1 / 23 - - PowerPoint PPT Presentation

analysis of variance
SMART_READER_LITE
LIVE PREVIEW

Analysis of Variance October 16, 2019 October 16, 2019 1 / 23 - - PowerPoint PPT Presentation

Analysis of Variance October 16, 2019 October 16, 2019 1 / 23 ANOVA and the F-test Question: is the variability in the sample means so large that it seems unlikely to be from chance alone? We call this variability the mean square between


slide-1
SLIDE 1

Analysis of Variance

October 16, 2019

October 16, 2019 1 / 23

slide-2
SLIDE 2

ANOVA and the F-test

Question: is the variability in the sample means so large that it seems unlikely to be from chance alone? We call this variability the mean square between groups (MSG) or mean square for treatment (MST).

Section 7.5 October 16, 2019 2 / 23

slide-3
SLIDE 3

Mean Square Between Groups

This acts as a measure of variability for the k group means. It has degrees of freedom d fG = k − 1. If H0 is true, we expect this variability to be small.

Section 7.5 October 16, 2019 3 / 23

slide-4
SLIDE 4

Mean Square Between Groups

MSG = 1 d fG SSG = 1 k − 1

k

  • i=1

(¯ xi − ¯ x)2 where SSG is the sum of squares between groups.

Section 7.5 October 16, 2019 4 / 23

slide-5
SLIDE 5

Mean Square Between Groups

...but MSG isn’t very useful on its own.

Section 7.5 October 16, 2019 5 / 23

slide-6
SLIDE 6

Mean Square Error

We need an idea of how much variability would be expected (or normal) if H0 were true. This is done using a pooled variance estimate, called the mean square error (MSE). This is a measure of variability within groups. MSE has degrees of freedom d fE = n − k

Section 7.5 October 16, 2019 6 / 23

slide-7
SLIDE 7

Mean Square Error

MSE = 1 d fE SSE = 1 n − k

k

  • i=1

(ni − 1)s2

i

where SSE is the sum of squares for error and si is the standard deviation for the observations in group i.

Section 7.5 October 16, 2019 7 / 23

slide-8
SLIDE 8

Sum of Squares Total

It’s also useful to think of a sum of squares total (SST) SST = SSG + SSE and total degrees of freedom d fT = d fG + d fE = k − 1 + n − k = n − 1

Section 7.5 October 16, 2019 8 / 23

slide-9
SLIDE 9

Mean Square Total

If we were to find the mean square total, MST = 1 d fT SST = 1 n − 1(SSG + SST) = 1 n − 1

n

  • j=1

(xi − ¯ x)2 we would get the variance across all observations!

Section 7.5 October 16, 2019 9 / 23

slide-10
SLIDE 10

ANOVA

The ANOVA breaks the variance down into within group (random) variability (MSE). between group (means) variability (MSG).

Section 7.5 October 16, 2019 10 / 23

slide-11
SLIDE 11

ANOVA

We want to know how much variability is due to differences in groups relative to the within groups variability. So our test statistic is F = MSG MSE

Section 7.5 October 16, 2019 11 / 23

slide-12
SLIDE 12

Example

For our baseball example, OF IF C Sample size (ni) 160 205 64 Sample mean (¯ xi) 0.320 0.318 0.302 Sample sd (si) 0.043 0.038 0.038 MSG = 0.00803 and MSE = 0.00158. Find the degrees of freedom and the F statistic.

Section 7.5 October 16, 2019 12 / 23

slide-13
SLIDE 13

The F Test

With our F distribution comes the F-test. Using the F-distribution, we calculate Fα(d f1, d f2) critical values. p-values

Section 7.5 October 16, 2019 13 / 23

slide-14
SLIDE 14

The F Test

If the between-group variability is high relative to the within group variability, MSG > MSE F will be large. Large values of F represent stronger evidence against the null.

Section 7.5 October 16, 2019 14 / 23

slide-15
SLIDE 15

The F Test

This is the F(2, 426) distribution from our baseball example. F-test p-values will always be from the upper tail area. We no longer have one- or two-sided tests to worry about. The critical value is F0.05(2, 426) = 3.0169.

Section 7.5 October 16, 2019 15 / 23

slide-16
SLIDE 16

Example

What can we conclude about the baseball field positions? Recall F0.05(2, 426) = 3.0169.

Section 7.5 October 16, 2019 16 / 23

slide-17
SLIDE 17

Reading an ANOVA Table

Typically we will run ANOVA using software. Fortunately there is a standard output for this analysis. Let’s take some time to write out the ANOVA table.

Section 7.5 October 16, 2019 17 / 23

slide-18
SLIDE 18

Reading an ANOVA Table from Software

This is the ANOVA from R for the MLB example. Df Sum Sq Mean Sq F value Pr(>F) position 2 0.0161 0.0080 5.0766 0.0066 Residuals 426 0.6740 0.0016 What can we conclude based on the table?

Section 7.5 October 16, 2019 18 / 23

slide-19
SLIDE 19

Example

Suppose we have 10 data points from each of 5 groups of interest. Source df SS MS F Group 3 Error Total 20 Fill in the missing information from the ANOVA table.

Section 7.5 October 16, 2019 19 / 23

slide-20
SLIDE 20

Graphical Diagnostics for ANOVA

There are three conditions for ANOVA:

1 Independence 2 Approximate normality 3 Constant variance Section 7.5 October 16, 2019 20 / 23

slide-21
SLIDE 21

ANOVA Diagnostics: Independence

It is reasonable to assume independence if the data are a simple random sample. If the data are not a random sample, consider carefully.

In the MLB example, no clear reason why a player’s batting stats would impact another player’s batting stats.

Section 7.5 October 16, 2019 21 / 23

slide-22
SLIDE 22

ANOVA Diagnostics: Normality

Normality is especially important for small samples. For large samples, ANOVA is robust to deviations from normality.

Section 7.5 October 16, 2019 22 / 23

slide-23
SLIDE 23

ANOVA Diagnostics: Constant Variance

We can check this visually or by examining the standard deviations for each group. Constant variance is especially important when the sample sizes differ between groups.

Section 7.5 October 16, 2019 23 / 23