Designed Experiments, One Factor with More Than Two levels The - - PowerPoint PPT Presentation

designed experiments one factor with more than two levels
SMART_READER_LITE
LIVE PREVIEW

Designed Experiments, One Factor with More Than Two levels The - - PowerPoint PPT Presentation

ST 370 Probability and Statistics for Engineers Designed Experiments, One Factor with More Than Two levels The simplest experimental design problem is where only a single factor is being studied. Having more than two levels brings some


slide-1
SLIDE 1

ST 370 Probability and Statistics for Engineers

Designed Experiments, One Factor with More Than Two levels

The simplest experimental design problem is where only a single factor is being studied. Having more than two levels brings some complications.

1 / 16 One Factor with More Than Two Levels

slide-2
SLIDE 2

ST 370 Probability and Statistics for Engineers

The tensile strengths for four different compositions of paper are an example of this design: Response: tensile strength; Factor: concentration of hardwood fiber in the pulp; Levels: a = 4 levels: 5%, 10%, 15%, 20%; Replication: n = 6 samples were tested at each level of hardwood concentration. Note Other factors could also affect the tensile strength of the paper: Basis weight; Coatings. These were all held constant.

2 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-3
SLIDE 3

ST 370 Probability and Statistics for Engineers

Randomization If feasible, all 24 measurements should be made in random order, in

  • rder to prevent drifts in the measurements (caused for example by

changing ambient humidity) being confounded with the effect of changing the concentration of hardwood fiber. The design is then called a completely randomized design (CRD).

3 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-4
SLIDE 4

ST 370 Probability and Statistics for Engineers

Data Analysis The R tools that we shall use to analyze data collected in factorial experiments are generally:

lm(), for fitting linear models; aov(), for the analysis of variance.

We begin by asking whether the factor had any effect on the response; that is, testing the null hypothesis that it has no effect: H0 : µ1 = µ2 = µ3 = µ4 where µi is the population mean strength for level i, i = 1, 2, 3, 4.

4 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-5
SLIDE 5

ST 370 Probability and Statistics for Engineers

For convenience when we use these methods with more than one factor, we write µi = µ + τi, i = 1, 2, 3, 4 where: µ is an overall typical value; τi, called the treatment effect of the ith level, is the deviation of µi from µ. In terms of these parameters, the null hypothesis is τ1 = τ2 = τ3 = τ4.

5 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-6
SLIDE 6

ST 370 Probability and Statistics for Engineers

The four parameters µ1, µ2, µ3, and µ4 have been replaced by five: µ, τ1, τ2, τ3, and τ4, which introduces a redundancy. The redundancy is eliminated by imposing a constraint on τ1, τ2, τ3, and τ4; possibilities include: 4

i=1 τi = 0, as in the book;

τ4 = 0, as in SAS and JMP; τ1 = 0, as in R. With the constraint τ1 = 0, µ1 = µ + τ1 = µ, so the “overall typical value” µ is in fact just µ1, the mean strength for the first level of the factor, called the baseline (or reference) level. For i = 1, τi = µi − µ = µi − µ1. That is, the ith treatment effect is the difference between the mean strength for level i and for the baseline level.

6 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-7
SLIDE 7

ST 370 Probability and Statistics for Engineers

Differences among means Suppose that the sample mean strengths for the 4 levels of hardwood concentration are ¯ y1·, ¯ y2·, ¯ y3·, and ¯ y4·, respectively: ¯ yi· = 1 n

n

  • j=1

yi,j where yi,j is the jth measurement for the ith level of hardwood concentration. To test the null hypothesis, we need a measure of how different these means are.

7 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-8
SLIDE 8

ST 370 Probability and Statistics for Engineers

With only two levels, this was easy: |ˆ δ| = |¯ y2· − ¯ y1·| is the obvious choice; with more than two levels, various measures could be used: The range, maxi,i′ |¯ yi· − ¯ yi′·|; The sum,

i,i′ |¯

yi· − ¯ yi′·|; The sum of squares,

i,i′ |¯

yi· − ¯ yi′·|2. The first measure, the range, is important, but the conventional measure is the third, in the form of the Treatment sum of squares: SSTreatments = n

a

  • i=1

(¯ yi· − ¯ y··)2.

8 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-9
SLIDE 9

ST 370 Probability and Statistics for Engineers

When there are two levels, we standardize |ˆ δ| = |¯ y2· − ¯ y1·| by dividing by its standard error, to form the t-statistic. Now, with more than two levels, we also need to form a ratio, the F-statistic: MSTreatments = SSTreatments a − 1 SSErrors =

a

  • i=1

n

  • j=1

(yi,j − ¯ yi·)2 MSErrors = SSErrors a(n − 1) Fobs = MSTreatments MSErrors

9 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-10
SLIDE 10

ST 370 Probability and Statistics for Engineers

The ANOVA table The calculations are usually laid out in a table: Source Degrees Sum Mean F-ratio

  • f Freedom
  • f Squares

Square Treatments a − 1 SSTreatments MSTreatments Fobs Errors a(n − 1) SSErrors MSErrors Total an − 1 SSTotal The sums of squares are related by the Analysis of Variance (ANOVA) equation: SSTotal =

a

  • i=1

n

  • j=1

(yi,j − ¯ y··)2 = SSTreatments + SSErrors

10 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-11
SLIDE 11

ST 370 Probability and Statistics for Engineers

Finding the P-value When we measure the extent of the differences among ¯ y1·, ¯ y2·, ¯ y3·, and ¯ y4· by the F-ratio, we still need to find the probability of

  • btaining as large a value as Fobs, when the null hypothesis

H0 : τ1 = τ2 = τ3 = τ4 = 0 is true. As before, we assume that the measurements follow the normal distribution with the same standard deviation, and then it can be shown that, when the null hypothesis is true, F follows a related distribution called Snedecor’s (or Fisher’s) F-distribution. The probability that F ≥ Fobs can be calculated as an area under the corresponding density function.

11 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-12
SLIDE 12

ST 370 Probability and Statistics for Engineers

In R

summary(aov(Strength ~ factor(Hardwood), paper)) # output: # Df Sum Sq Mean Sq F value Pr(>F) # factor(Hardwood) 3 382.8 127.60 19.61 3.59e-06 *** # Residuals 20 130.2 6.51 # --- # Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

The P-value is extremely small: we have strong evidence that the null hypothesis should be rejected.

12 / 16 One Factor with More Than Two Levels Completely Randomized Design

slide-13
SLIDE 13

ST 370 Probability and Statistics for Engineers

Comparing the sample means When, as in the paper example, we reject the null hypothesis of no effect: H0 : τ1 = τ2 = τ3 = τ4 = 0, the next question is: what is the effect? For instance: which levels of hardwood concentration give significantly higher tensile strength than other levels? We can answer this question either:

  • ne pair of levels at a time;

for all pairs simultaneously.

13 / 16 One Factor with More Than Two Levels Comparing Means

slide-14
SLIDE 14

ST 370 Probability and Statistics for Engineers

One pair at a time For a given pair of means, say µ1 and µ2, we could test the null hypothesis H0;1,2 : µ1 = µ2 in the same way as when there were only two levels: tobs:1,2 = ¯ y1· − ¯ y2· standard error where standard error =

  • MSE

1 n1 + 1 n2

  • .

Note that MSE replaces s2

pooled; in fact, when a = 2, they are the

same.

14 / 16 One Factor with More Than Two Levels Comparing Means

slide-15
SLIDE 15

ST 370 Probability and Statistics for Engineers

We reject H0:i,i′ when |tobs:i,i′| ≥ t.025,20. For the paper strength data, we reject all the null hypotheses except H0:2,3. That is, we cannot state that strength of 10% hardwood paper is significantly different from that of 15% hardwood paper, but all other differences are significant.

15 / 16 One Factor with More Than Two Levels Comparing Means

slide-16
SLIDE 16

ST 370 Probability and Statistics for Engineers

Equivalently, we could calculate confidence intervals for each pairwise difference µi − µi′. The null hypothesis H0:i,i′ is rejected if and only if the confidence interval does not contain zero. Each confidence interval is of the form ¯ yi· − ¯ yi′· ± LSD where LSD is Fisher’s Least Significant Difference: LSD = t.025,20

  • MSE

1 ni + 1 ni′

  • = 3.07.

16 / 16 One Factor with More Than Two Levels Comparing Means