Outline One-Factor Analysis of Variance (11.1) Randomized Block - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline One-Factor Analysis of Variance (11.1) Randomized Block - - PDF document

2/8/2007 219323 Probability and Statistics for Software Statistics for Software and Knowledge Engineers Lecture 11: The Analysis of Variance Th A l i f V i Monchai Sopitkamon, Ph.D. Outline One-Factor Analysis of Variance (11.1)


slide-1
SLIDE 1

2/8/2007 1

219323 Probability and Statistics for Software Statistics for Software and Knowledge Engineers

Lecture 11: Th A l i f V i The Analysis of Variance

Monchai Sopitkamon, Ph.D.

Outline

One-Factor Analysis of Variance (11.1) Randomized Block Designs (11.2)

slide-2
SLIDE 2

2/8/2007 2

One-Factor Analysis of Variance

One-Factor Layouts (11.1.1) Partitioning the Total Sum of Squares Partitioning the Total Sum of Squares

(11.1.2)

The Analysis of Variance Table (11.1.3) Pairwise Comparisons of the Factor Level

Means (11.1.4)

Sample Size Determination (11.1.5)

p ( )

Model Assumptions (11.1.6)

One-Factor Layouts I (11.1.1)

Comparing three or more population means Objectives: Objectives:

– To determine if the population means are unequal – To determine which population means are different and by how much

Completely randomized design – a set of

independent samples from a set of several independent samples from a set of several populations

Uses the analysis of variance (ANOVA)

technique to analyze such design

slide-3
SLIDE 3

2/8/2007 3

One-Factor Layouts II (11.1.1): Randomization

A

Randomizer

B C D computer programs

One-Factor Layouts III (11.1.1): Randomization

A

Randomizer

B C D computer programs

slide-4
SLIDE 4

2/8/2007 4

One-Factor Layouts IV (11.1.1)

K populations w/ unknown population

means μ1, μ2,…, μk means μ1, μ2,…, μk

If k = 1, one-sample inference problems

(Chapter 8)

If k = 2, two-sample comparison

problems (Chapter 9)

If k ≥ 3, one-factor ANOVA problems (this

chapter )

One-Factor Layouts V (11.1.1)

xij jth observation from the ith population One factor layout If n1 = n2 = … = nk balanced data set, else unbalanced data set

slide-5
SLIDE 5

2/8/2007 5

One-Factor Layouts VI (11.1.1)

Estim ating the population ( factor level) m eans

One-Factor Layouts VII (11.1.1)

Hypothesis testing

– Null hypothesis: Null hypothesis: H0 : μ1 = … = μk population means are all equal – Alternative hypothesis: HA : μi ≠ μj for some i and j at least two of the population means are not equal Acceptance of H0 means that no evidence that any

  • f the population means are unequal
  • f the population means are unequal

Rejection of H0 means that there is evidence that there is some of the population means are unequal, and so it is not plausible to assume that the population means are all equal

slide-6
SLIDE 6

2/8/2007 6

Partitioning the Total Sum of Squares I (11.1.2)

Partition of total sum of squares for com pletely random ized one factor layout

Partitioning the Total Sum of Squares II (11.1.2) : SST

Total Sum of Squares (SST) – a measure of

the total variability in the data set the total variability in the data set

( )

2 1 1 ..

∑∑

= =

− =

k i n j ij

i

x x SST

where

1 1 k i n j ij

x

i

∑∑

  • verall or grand mean

:

1 1 .. T i j

n x

= =

=

  • verall or grand mean.

:

ij

x

j-th observation in group or level i.

:

i

n

number of observations in group or level i nT: total number of observations: ∑

= k i i

n

1

slide-7
SLIDE 7

2/8/2007 7

Partitioning the Total Sum of Squares III (11.1.2): SST Example

  • A, B, C, and D are number of processors.
  • Factor: number of processors.

Levels: A B C and D

A B C D 11 12 18 11 13 14 16 12 17 17 18 16 Page Replacement Algorithm

  • Levels: A, B, C, and D.
  • Number in each column: running times (in seconds) of

programs under each CPU configuration .

Number of Processors 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18

Partitioning the Total Sum of Squares IV (11.1.2) : SST Example

A B C D 11 12 18 11 Page Replacement Algorithm Number of Processors 13 14 16 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17

SST = (11-16.075)2 + (13-16.075)2 + … + (18- 16.075)2 = 336.75

14 18 20 18 Grand Mean 16.075

slide-8
SLIDE 8

2/8/2007 8

Partitioning the Total Sum of Squares V (11.1.2)

Partition of total sum of squares for com pletely random ized one factor layout

Partitioning the Total Sum of Squares VI (11.1.2) : SSTr

Treatment Sum of Squares (SSTr) – a

measure of the variability between the factor measure of the variability between the factor levels.

( )

=

− =

k i i i

x x n SSTr

1 2 ..

where

k n ij

x

i

∑∑

  • verall or grand mean.

:

⋅ i

x

sample mean corresponding to group or level i

:

i

n

number of observations in group or level i

:

1 1 .. T i j ij

n x

∑∑

= =

=

slide-9
SLIDE 9

2/8/2007 9

Partitioning the Total Sum of Squares VII (11.1.2) : SSTr Example

A B C D 11 12 18 11 13 14 16 12 Page Replacement Algorithm

Number of Processors

13 14 16 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 Mean 13.9 17.2 18.3 14.9 Grand Mean 16.075

SSTr = 10 (13.9-16.075)2 + 10 (17.2-16.075)2 + 10 (18.3- 16.075)2 + 10 (14.9- 16.075)2 = 123.275

Partitioning the Total Sum of Squares VIII (11.1.2)

Partition of total sum of squares for com pletely random ized one factor layout

slide-10
SLIDE 10

2/8/2007 10

Partitioning the Total Sum of Squares IX (11.1.2) : SSE

Error Sum of Squares (SSE) – a measure of

the variability within the factor levels. the variability within the factor levels. where

( )

2 1 1

∑∑

= = ⋅

− =

k i n j i ij

i

x x SSE

:

⋅ i

x

sample mean corresponding to group or level i

:

ij

x

j-th observation in group or level i

Partitioning the Total Sum of Squares X (11.1.2) : SSE Example

A B C D 11 12 18 11 13 14 16 12 Page Replacement Algorithm Number of Processors 13 14 16 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 14 18 20 18 Mean 13.9 17.2 18.3 14.9

SSE = (11-13.9)2 + …+(14-13.9)2 + (12-17.2)2 + … + (18-17.2)2 + (18-18.3)2 + … + (20-18.3)2 + (11-14.9)2 + … + (18-14.9)2 = 213.5

slide-11
SLIDE 11

2/8/2007 11

Partitioning the Total Sum of Squares XI (11.1.2)

SST = SSTr + SSE = 123.275 + 213.5 = 336.775

Partitioning the Total Sum of Squares XII (11.1.2) : Conclusion

I nterpretation of th f the sum of squares for treatm ents and the sum of squares for error

slide-12
SLIDE 12

2/8/2007 12

Partitioning the Total Sum of Squares XIII (11.1.2) : Conclusion

Dependence of p-value on the sum of squares for treatm ents and the sum of squares for error

The Analysis of Variance Table I (11.1.3)

  • Mean Squares for Treatments (MSTr)

SSTr SSTr S

  • Mean Square Error (MSE)
  • A p-value for the null hypothesis that the factor level means μI

1 freedom

  • f

degrees − = = k SSTr SSTr MSTr k n SSE SSE MSE

T −

= = freedom

  • f

degrees

  • A p value for the null hypothesis that the factor level means μI

are all equal is p-value = P(X ≥ F) where F-statistic is and RV X has an distribution

MSE MSTr F =

k n k

T

F

− − , 1

slide-13
SLIDE 13

2/8/2007 13

The Analysis of Variance Table II (11.1.3)

P-value calculation for

  • ne factor

analysis of variance table variance table

The Analysis of Variance Table III (11.1.3): ANOVA Example

A B C D 11 12 18 11 13 14 16 12 17 17 18 16 Page Replacement Algorithm

F > F-crit reject Ho.

Number of Processors

17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 Mean 13.9 17.2 18.3 14.9 Grand Mean 16.075 SSA 47 30625 12 65625 49 50625 13 80625 123 275

# procs A, B, C, and D have a significant difference at 0.05 level

  • f significance.

SST

SSA 47.30625 12.65625 49.50625 13.80625 123.275 SSW 213.5 SST 336.775 MSA 41.091667 MSW 5.9305556 F 6.93 df numer. 3 df denom. 36 Fu 2.87 (from table)

k-1 = 4-1 nT-k = 40-4 α F=MSTr/MSE

MSTr

F-crit MSE SSTr SSE

slide-14
SLIDE 14

2/8/2007 14

The Analysis of Variance Table IV (11.1.3): ANOVA With Excel

Since the p-value is less than α = 0.05, reject Ho.

The Analysis of Variance Table V (11.1.3)

Analysis of variance table for one factor layout Reject H0 if F-statistic > F-critical (from

Table IV or Excel’s FINV function) Table IV or Excel s FINV function)

  • r Reject H0 if p-value < α (as in previous

chapter)

Excel sheet

slide-15
SLIDE 15

2/8/2007 15

Pairwise Comparisons of the Factor Level Means I (11.1.4)

If H0 is rejected (not all population means

are equal), we’d like to be able to tell which are equal), we d like to be able to tell which samples are different and by how much.

Need to do Tukey multiple comparisons to

compare all groups simultaneously

– By computing the differences μI – μj for 1 ≤ i < j ≤ k among all k(k – 1)/2 pairs of factor level means means

Compute confidence intervals

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ± − ∈ −

− ⋅ ⋅ j i k n k j i j i

n n MSE q x x

T

1 1 2

, , α

μ μ

Pairwise Comparisons of the Factor Level Means II (11.1.4)

If the CI for the difference μI – μj contains 0,

then factor levels i and j are not significantly then factor levels i and j are not significantly different.

If the CI for the difference μI – μj does not

contains 0, then factor levels i and j are significantly different.

The CI indicates by how much the factor

l l h t b diff t level means are shown to be different.

slide-16
SLIDE 16

2/8/2007 16

Pairwise Comparisons of the Factor Level Means III (11.1.4)

Pairw ise confidence intervals for the carpet fiber blends experim ent

Pairwise Comparisons of the Factor Level Means IV (11.1.4)

Schem atic presentation of the results of the carpet fiber blends experim ent

Conclusions

– Fiber types 1, 3, and 5 are indistinguishable from one Fiber types 1, 3, and 5 are indistinguishable from one another – These three fibers are not significantly different – Fiber types 2 and 6 and indistinguishable, as are fiber types 4 and 6, but type 2 have average scores larger than that of type 4 – The best fiber type is either type 2 or type 6

slide-17
SLIDE 17

2/8/2007 17

Sample Size Determination I (11.1.5)

The power of the test of H0 that the factor

level means are al equal increases as the level means are al equal increases as the sample sizes (n1,…, nk) increase.

This power is the prob that the H0 is rejected

when it is not true.

An increase in sample sizes also results in a

decrease in the lengths of the pairwise CIs.

The minimum number of sample sizes, n, to

achieve maximum length, L, is

2 2 , ,

4 L MSE q n

k n k

T −

α

Sample Size Determination II (11.1.5)

Ex.63 pg.507: Roadway Base Aggregates

Although the H was accepted and the Although the H0 was accepted and the pairwise comparison CIs all contains 0, the lengths of the pairwise CIs are all equal to To increase sensitivity of the comparisons,

3564 10 1479 81 . 3 2 2

, ,

= × × = =

n MSE q L

k n k

T

α

y p , additional sampling is desired. So, with desired shorter lengths of Cis of 2000,

8 . 31 2000 1479 81 . 3 4 4

2 2 2 2 2 , ,

= × × = ≅

L MSE q n

k n k

T

α

slide-18
SLIDE 18

2/8/2007 18

Model Assumptions (11.1.6)

The ANOVA for a one-factor layout is based on

the assumption that

– The observations are independent and have normal distribution with the same variance.

Independence is ensured by randomization of

experimental units between the k factor levels.

Normality: ANOVA F test is robust as long as

distributions are not extremely different from a normal distribution particularly for large p y g samples.

Homogeneity of variance computed from

sample variances

– If variances are much different, make pairwise comparisons (using the general procedure from Chapter 9) between the k factor levels, one pair at a time, and use a small CI (90%).

2 i

s

Randomized Block Designs (11.2)

One-Factor Layouts with Blocks (11.2.1) Partitioning the Total Sum of Squares Partitioning the Total Sum of Squares

(11.2.2)

The Analysis of Variance Table (11.2.3) Pairwise Comparisons of the Factor Level

Means (11.2.4)

Model Assumptions (11.2.5)

p ( )

slide-19
SLIDE 19

2/8/2007 19

One-Factor Layouts with Blocks I (11.2.1)

A complete randomized block design – an

extension of paired samples (comparisons extension of paired samples (comparisons within each pair of observations of different levels) to accommodate the comparison of k (k ≥ 3) population means or factor levels.

One-Factor Layouts with Blocks II (11.2.1): Randomization

A

Randomizer

B C D computer programs

slide-20
SLIDE 20

2/8/2007 20

One-Factor Layouts with Blocks III (11.2.1): Randomization

A

Randomizer

B C D computer programs

A random ized block design data set

slide-21
SLIDE 21

2/8/2007 21

One-Factor Layouts with Blocks V (11.2.1):

Number of Processors

b = 5 k = 4

Partitioning the Total Sum of Squares I (11.2.2)

Partition of the total sum of squares for a random ized block design

SST = SSTr+SSBl+SSE

slide-22
SLIDE 22

2/8/2007 22

Partitioning the Total Sum of Squares II (11.2.2) : SST

Total Sum of Squares (SST):

( )

2 1 1 ..

∑∑

= =

− =

k i b j ij

x x SST

x

k b ij

∑∑

where :

1 1 ..

kb x

i j j

∑∑

= =

=

  • verall or grand mean.

xij : observation in b-th block and level i. b : number of blocks.

Partitioning the Total Sum of Squares III (11.2.2) : SST Example

Number of Processors

Grand Mean = 15.61 SST = (11.0-15.61)2+(13.0-15.61)2+…+(15.0- 15.61)2+ … (11.0-15.61)2+(12.0-15.61)2+…+(13.5- 15.61)2 = 232.44

slide-23
SLIDE 23

2/8/2007 23

Partitioning the Total Sum of Squares IV (11.2.2)

Partition of the total sum of squares for a random ized block design

SST = SSTr+SSBl+SSE

Partitioning the Total Sum of Squares V (11.2.2) : SSTr

Treatment Sum of Squares (SSTr)

( )

2 1 ..

= ⋅ −

=

k i i

x x b SSTr

where

b

:

1 .

b x x

j ij i

=

=

group mean.

slide-24
SLIDE 24

2/8/2007 24

Partitioning the Total Sum of Squares VI (11.2.2) : SSTr Example

Number of Processors

SSTr = 5 * [ (14.0-15.61)2+(15.1-15.61)2+…+(13.1-15.61)2] = 155.018

Partitioning the Total Sum of Squares VII (11.2.2)

Partition of the total sum of squares for a random ized block design

SST = SSTr+SSBl+SSE

slide-25
SLIDE 25

2/8/2007 25

Partitioning the Total Sum of Squares VIII (11.2.2): SSBl

Block Sum of Squares (SSBl)

( )

2 1 .. .

=

− =

b j j

x x k SSBl

where

k

:

1 .

k x x

i ij j

=

=

block mean.

Partitioning the Total Sum of Squares IX (11.2.2): SSBl Example

Number of Processors

SSBl = 4 * [(13.0-15.61)2+(14.5-15.61)2+…+(16.4-15.61)2] = 76.133

slide-26
SLIDE 26

2/8/2007 26

Partitioning the Total Sum of Squares X (11.2.2)

Partition of the total sum of squares for a random ized block design

SST = SSTr+SSBl+SSE

Partitioning the Total Sum of Squares XI (11.2.2): SSE

Error Sum of Squares (SSE)

( )

2 1 1 .. . .

∑∑

= =

+ − − =

k i b j j i ij

x x x x SSE

slide-27
SLIDE 27

2/8/2007 27

Partitioning the Total Sum of Squares XII (11.2.2): SSE Example

  • Number of Processors

+

  • SSE = (11.0-14.0-13.0+15.61)2+(13.0-14.0-14.5+15.61)2+

…. (11.0-13.1-13.0+15.61)2+(13.5-13.1-16.4+15.61)2 = 1.287

The Analysis of Variance Table I (11.2.3)

Mean Squares for Treatments (MSTr)

SSTr SSTr

Mean Squares for Blocks (MSBl)

1 freedom

  • f

degrees − = = k SSTr SSTr MSTr 1 freedom

  • f

degrees − = = b SSBl SSBl MSBl

Mean Square Error (MSE)

) 1 )( 1 ( freedom

  • f

degrees − − = = b k SSE SSE MSE

slide-28
SLIDE 28

2/8/2007 28

The Analysis of Variance Table II (11.2.3)

A p-value for the null hypothesis that the factor

level means μi are all equal is p value = P(X ≥ F ) p-value = P(X ≥ FT) where F-statistic is and RV X has an distribution l P(X F )

MSE MSTr F

T = ) 1 )( 1 ( , 1 − − − b k k

F

p-value = P(X ≥ FB) where F-statistic is and RV X has an distribution

MSE MSBl FB =

) 1 )( 1 ( , 1 − − − b k b

F

The Analysis of Variance Table III (11.2.3)

Analysis of variance table for a random ized block design Reject H0 if F-statistic > F-critical (from

Table IV or Excel’s FINV function) Table IV or Excel s FINV function)

  • r Reject H0 if p-value < α (as in previous

chapter)

Excel sheet

slide-29
SLIDE 29

2/8/2007 29

The Analysis of Variance Table IV (11.2.3)

Number of Processors

SSTr

F > F-crit reject Ho F=MSTr/MSE

MSTr F-crit

MSTr Since the p-value is less than α = 0.05, reject Ho. Since F > F-critical, reject Ho.

=MSTr/MSE

MSE

slide-30
SLIDE 30

2/8/2007 30

Pairwise Comparisons of the Factor Level Means I (11.2.4)

If H0 is rejected (not all population means

are equal), we’d like to be able to tell which are equal), we d like to be able to tell which samples are different and by how much.

Need to do Tukey multiple comparisons to

compare all groups simultaneously

– By computing the differences μI – μj for 1 ≤ i < j ≤ k among all k(k – 1)/2 pairs of factor level means means

Compute confidence intervals

b MSE q x x

b k k j i j i ) 1 )( 1 ( , , − − ⋅ ⋅

± − ∈ −

α

μ μ

Pairwise Comparisons of the Factor Level Means II (11.2.4)

Ex.55 pg.525: Heart Rate Reduction

q = 3 70 (From Table V) q0.05,3, 14 = 3.70 (From Table V) 31 . 6 8 28 . 23 70 . 3

) 1 )( 1 ( , ,

= × =

− −

b MSE q

b k k α

CIs of Diff lower upper μ1 − μ2

  • 14.61
  • 1.99
  • Med dose gives

heart rate reduction between 2-14.6%

μ1 − μ3

  • 17.11
  • 4.49

μ2 − μ3

  • 8.81

3.81

compared to low dose

  • High dose gives

reduction between 4.5-17.1% compared to low dose No significant difference in reduction between high and med doses!

slide-31
SLIDE 31

2/8/2007 31

Pairwise Comparisons of the Factor Level Means III (11.2.4)

Schem atic presentation of the results of the heart rate reductions exam ple

Model Assumptions (11.2.5)

Modeling assumption

xij = μ + αI + βj + εij xij μ + αI + βj + εij where εij are error terms w/ independently distributed, or εij ∼ N(0, σ2) – normal distribution w/ common variance assumption of error terms are the same as for the completely randomized design

No interaction between the factor levels and

No interaction between the factor levels and the blocks (differences between factor level effects are the same for each of the blocks)