Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some - - PowerPoint PPT Presentation

▶

Oct 10, 2023 145 likes •307 views

Topic 9 - ANOVA Background ANOVA 1 Comparing several means (some situations) Does the average number of words per sentence in advertisements differ across magazine types? Does the expected survival time vary for different

SLIDE 1

Topic 9 - ANOVA

Background
ANOVA

SLIDE 2

Comparing several means (some situations)

Does the average number of words per sentence in

advertisements differ across magazine types?

Does the expected survival time vary for different types of cancer

among patients treated with a specific drug?

Is the mean response time not the same for three different types
f circuits?
Is there a difference in average distance carry for baseballs stored

at a variety of humidity levels?

Is there a statistical difference between the home run hitting

ability of, say Babe Ruth vs. Roger Maris vs. the modern day Mark McGwire or Sammy Sosa?

SLIDE 3

Comparing several means

Suppose that instead of comparing two means we want to test for

the equivalence of several means H0: 1 = 2 = …= I HA: at least two i’s are different

Each of the groups we are comparing are called treatments or

factors.

We make our decision based on samples from each of the I

treatment groups.

Let Xi,j represent the jth sample from the ith treatment group with j

= 1,…,ni.

We assume each sample comes from a Normal population with

common variance.

SLIDE 4

ANOVA – Analysis of Variance

We partition the total variability of the data into treatment (in
ur control good) and error (out of our control bad) components.
What you really want here is for the SSTRT to equal the SSTOT.

That means that you have no random error, no SSERR, and 100% of the variation in the model is defined by the treatments. While this would be a perfect result, it is rarely ever the case.

      

                   

    

2 , 1 1 1 2 1 2 , 1 1 1

( ) , 1 ( ) , 1 ( ) , ,

i i

n I I tot i j tot i i j i I trt i i trt i n I I err i j i err i i j i tot trt err tot trt err

SS X X DF n SS n X X DF I SS X X DF n I SS SS SS DF DF DF

SLIDE 5

ANOVA - Means squares

MStrt = SStrt/DFtrt, MSerr = SSerr/DFerr, F = MStrt/Mserr
If H0 is true (all the means are the same, or really close to being the same), then F

should be close to 0. – Your distribution means should be visually close and there should be a lot of “commonality” amongst the distributions….meaning that from a visual standpoint, it would be quite difficult to tell if any specific value of X fell into distribution 1 or 2 or 3 or 4…..

If H0 is false (at least two of the means are different), then F should be much larger

than 1. – Distribution means should be separated and there should be minimal overlap

r “commonality” of the distributions….it should be relatively easy to tell if a

specific value of X fell into distribution 1 or 2 or 3 or 4…..

The lower the level of overlap in the distributions, the higher the F value and the

more persuasive your result.

SLIDE 6

ANOVA – Decision rule

Reject H0 if F > FDFtrt,DFerr
Demonstration of F calculator.
Note: Since your F test statistic is the ratio of the MStrt to MSerr,

the higher that value the better. Larger values of the F test statistic are similar to larger test stats for Z or T, inasmuch as they are more powerful, or able to prove our point with greater significance.

SLIDE 7

Example Calc of SStrt+SSerr=SStot (1)

7 TRT OBS1 OBS2 OBS3 OBS4 OBS5 AVG 1 10 11 11 12 11.00 2 10 13 13 14 14 12.80 3 11 11 11 12 12 11.40 4 14 15 15 15 11 14.00 5 10 10 9 9 10 9.60 11.79 Grand mean is the average of all values in the dataset = 11.79 SStrt is the summation of the squared differences between the treatment means and the grand mean, weighted by the number of observations for each treatment. SStrt = (4(11‐11.79)^2)+(5(12.8‐11.79)^2)+(5(11.4‐11.79)^2) +(5(14‐11.79)^2)+(5(9.6‐11.79)^2)=56.7584 SSerr is the summation of the squared differences between the individual

bservations and their respective treatment means.

SSerr=(10‐11)^2+(11‐11)^2+(11‐11)^2+(12‐11)^2+(10‐12.8)^2+…+(10‐9.6)^2=27.2 SStot=(10‐11.79)^2+(11‐11.79)^2+…+(9‐11.79)^2+(10‐11.79)^2=83.9584

SLIDE 8

Example Calc of SStrt+SSerr=SStot (2)

8 SStrt = (4(11‐11.79)^2)+(5(12.8‐11.79)^2)+(5(11.4‐11.79)^2) +(5(14‐11.79)^2)+(5(9.6‐11.79)^2)=56.7584 SSerr=(10‐11)^2+(11‐11)^2+(11‐11)^2+(12‐11)^2+(10‐12.8)^2+…+(10‐9.6)^2=27.2 SStot=(10‐11.79)^2+(11‐11.79)^2+…+(9‐11.79)^2+(10‐11.79)^2=83.9584 Analysis of Variance results: Data stored in separate columns. Column means Column n Mean

Std. Error

Trt1 4 11 0.408248 Trt2 5 12.8 0.734847 Trt3 5 11.4 0.244949 Trt4 5 14 0.774597 Trt5 5 9.6 0.244949 ANOVA table Source df SS MS F‐Stat P‐value Treatments 4 56.75834 14.18958 9.911841 0.0002 Error 19 27.2 1.431579 Total 23 83.95834

SLIDE 9

ANOVA table

Source df SS MS F-Stat P-value Treatments 2 5.756057 2.8780284 64.97913 <0.0001 Error 6 0.26574945 0.044291575 Total 8 6.0218062

SLIDE 10

Magazine ads example

30 magazines were grouped by educational level:

– Group 1 – High educational level – Group 2 – Medium educational level – Group 3 – Low educational level

3 magazines randomly selected from each group:

– Group 1: 1. Scientific American, 2. Fortune, 3. The New Yorker – Group 2: 4. Sports Illustrated, 5. Newsweek, 6. People – Group 3: 7. National Enquirer, 8. Grit, 9. True Confessions

6 ads randomly selected from each of the 9 magazines and the variables

below recorded: – WDS - number of words in advertisement copy – SEN - number of sentences in advertising copy – 3SYL - number of 3+ syllable words in advertising copy – MAG - magazine (1 through 9 as above) – GROUP - educational level

SLIDE 11

Magazine Ads in StatCrunch

Is the average number of words per sentence the same across

magazine groups? – WDS/SEN – Compare boxplots & QQ plots

What are the null and alternative hypotheses?
Note: Remember to hold down the CNTL key in StatCrunch when you

want to add several ANOVA treatments.

1 2 3

: : at least two groups have a different average words per sentence

H H     

SLIDE 12

Circuit example

Response times in milliseconds were recorded for three different types
f circuits used in a shutoff mechanism. Does the data suggest at

level 0.05 that all three circuits have the same mean response time? Ho: The mean response times are all the same Ha: At least two of the mean response times are different.

SLIDE 13

Golf Ball Data

I play a lot of golf and I’m always looking for equipment to help me shoot

lower scoresThe problem is that I’m cheap…..

One of the main factors in golf is to drive the ball as far as possible

(assuming that you don’t create additional dispersion in the process), so if you can find a “longer ball”, it could be beneficial.

The link above shows sample driving distances for three types of balls

under consideration (Trispeed, E6 and B330). Test to see if there’s a difference in driving distance….(discuss method here). Ho: The mean driving distance of all balls is the same. Ha: At least two of the balls are decidedly higher or lower than the rest.

SLIDE 14

Multiple comparisons

If we reject H0 in favor of the alternative HA, then we are only

concluding that at least two of the means are different.

If we want to drill down to see which means are actually

different, we might be tempted to do two-sample t tests for all mean pairs.

The problem is that the overall level of significance is much

higher than the level of significance for each pair wise test.

3 groups of pairwise comparisons at 5% alpha, gives us 3
comparisons. The resulting overall alpha is

which is way more than we wanted, plus it’s conservative, because 3-pairwise comparisons are not actually independent.

To do these multiple comparisons, we must use Tukey’s

method to maintain an overall level of significance.

1 .95 0.142625  

SLIDE 15

Tukey’s interpretation of Golf Ball Data

Shows simultaneous confidence intervals at overall alpha = .05. If “0” is inside a confidence interval, the two listed populations are not different. If it’s not, the two populations are statistically different. Here, both Trispeed and E6 are different than B330, but not from each other.