Model Adequacy Usual residual plots: Residuals versus predicted - - PowerPoint PPT Presentation

model adequacy
SMART_READER_LITE
LIVE PREVIEW

Model Adequacy Usual residual plots: Residuals versus predicted - - PowerPoint PPT Presentation

ST 516 Experimental Statistics for Engineers II Model Adequacy Usual residual plots: Residuals versus predicted (fitted) values; Probability plot (q-q plot) of residuals; Residuals by treatment level. And for blocks: Residuals by block. R


slide-1
SLIDE 1

ST 516 Experimental Statistics for Engineers II

Model Adequacy

Usual residual plots: Residuals versus predicted (fitted) values; Probability plot (q-q plot) of residuals; Residuals by treatment level. And for blocks: Residuals by block. R command for the usual plots

plot(aov(Yield ~ factor(Batch) + factor(Pressure), graftLong))

1 / 26 Blocked Designs Model Adequacy

slide-2
SLIDE 2

ST 516 Experimental Statistics for Engineers II

85 90 95 −4 −2 2 4 Fitted values Residuals

  • aov(Yield ~ factor(Batch) + factor(Pressure))

Residuals vs Fitted

1.3 4.2 1.2 2 / 26 Blocked Designs Model Adequacy

slide-3
SLIDE 3

ST 516 Experimental Statistics for Engineers II

  • −2

−1 1 2 −1 1 2 Theoretical Quantiles Standardized residuals aov(Yield ~ factor(Batch) + factor(Pressure)) Normal Q−Q

1.3 4.2 1.2 3 / 26 Blocked Designs Model Adequacy

slide-4
SLIDE 4

ST 516 Experimental Statistics for Engineers II

85 90 95 0.0 0.4 0.8 1.2 Fitted values Standardized residuals

  • aov(Yield ~ factor(Batch) + factor(Pressure))

Scale−Location

1.3 4.2 1.2 4 / 26 Blocked Designs Model Adequacy

slide-5
SLIDE 5

ST 516 Experimental Statistics for Engineers II

−2 −1 1 2 Factor Level Combinations Standardized residuals 2 3 5 4 1 6 factor(Batch) :

  • Constant Leverage:

Residuals vs Factor Levels

1.3 4.2 1.2 5 / 26 Blocked Designs Model Adequacy

slide-6
SLIDE 6

ST 516 Experimental Statistics for Engineers II

General Regression Approach Any design may be analyzed based on the statistical model; e.g. RCBD: yi,j = µ + τi + βj + ǫi,j, i = 1, 2; j = 1, 2, . . . , n. Parameter estimates: use least squares: for balanced design, gives the usual estimates; for unbalanced design, e.g. with missing values, gives optimal estimates.

6 / 26 Blocked Designs General Regression Approach

slide-7
SLIDE 7

ST 516 Experimental Statistics for Engineers II

General Regression Significance Test For any fitted model, write

  • y 2 = R(parameters) + SSE

E.g. RCBD:

a

  • i=1

b

  • j=1

y 2

i,j = R(µ, τ, β) + SSE.

R(µ, τ, β) is the “sum of squares explained by µ, τ1, . . . , τa, and β1, . . . , βb”.

7 / 26 Blocked Designs General Regression Approach

slide-8
SLIDE 8

ST 516 Experimental Statistics for Engineers II

A hypothesis such as H0 : τ1 = τ2 = · · · = τa = 0 specifies a reduced model with those parameters omitted (but still containing β1, . . . , βb). The “sum of squares associated with those parameters” is the difference between the explained sums of squares. E.g. R(τ|µ, β) = R(µ, τ, β) − R(µ, β) is “the sum of squares associated with τ1, . . . , τa”.

8 / 26 Blocked Designs General Regression Approach

slide-9
SLIDE 9

ST 516 Experimental Statistics for Engineers II

In a balanced design: R(τ|µ, β) is exactly SSTreatments R(β|µ, τ) is exactly SSBlockss. With unbalanced data, e.g. a balanced design with missing data, R(τ|µ, β) is the correct numerator for an F-statistic to test the hypothesis H0 : τ1 = τ2 = · · · = τa = 0. So in either case (balanced or unbalanced), R(τ|µ, β) is the correct numerator.

9 / 26 Blocked Designs General Regression Approach

slide-10
SLIDE 10

ST 516 Experimental Statistics for Engineers II

Missing Observation Suppose the yield for 8700 psi is missing in batch 4: R commands

graftCopy <- graftLong graftCopy["2.4", "Yield"] <- NA summary(aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy))

Output

Df Sum Sq Mean Sq F value Pr(>F) factor(Batch) 5 190.12 38.024 5.2346 0.006448 ** factor(Pressure) 3 163.40 54.466 7.4981 0.003130 ** Residuals 14 101.70 7.264

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 1 observation deleted due to missingness

The order matters! Blocks before Treatments.

10 / 26 Blocked Designs Missing Observation

slide-11
SLIDE 11

ST 516 Experimental Statistics for Engineers II

Montgomery describes an approximate method for handling a missing data point, essentially by replacing it by its predicted (“imputed”) value. To use the imputation method, we could use the predict() method to impute the missing value. R commands

p <- predict(aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy), newdata = graftCopy["2.4", ]) # 2.4 # 91.08 graftCopy["2.4", "Yield"] <- p summary(aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy))

11 / 26 Blocked Designs Missing Observation

slide-12
SLIDE 12

ST 516 Experimental Statistics for Engineers II

Output

Df Sum Sq Mean Sq F value Pr(>F) factor(Batch) 5 189.5 37.90 5.591 0.00419 ** factor(Pressure) 3 166.1 55.38 8.169 0.00185 ** Residuals 15 101.7 6.78

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

12 / 26 Blocked Designs Missing Observation

slide-13
SLIDE 13

ST 516 Experimental Statistics for Engineers II

But note: The degrees of freedom for Residuals should be 14, not 15, so the table must be adjusted. By hand: Df Sum Sq Mean Sq F value Pr(>F) factor(Batch) 5 189.5 37.90 5.218 0.00653 factor(Pressure) 3 166.1 55.38 7.624 0.00292 Residuals 14 101.7 7.264

13 / 26 Blocked Designs Missing Observation

slide-14
SLIDE 14

ST 516 Experimental Statistics for Engineers II

Adjusting residual df using R:

a <- aov(Yield ~ factor(Batch) + factor(Pressure), graftCopy) a$df.residual <- a$df.residual - 1 summary(a)

Output

Df Sum Sq Mean Sq F value Pr(>F) factor(Batch) 5 189.5 37.90 5.218 0.00653 ** factor(Pressure) 3 166.1 55.38 7.624 0.00292 ** Residuals 14 101.7 7.26

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Compare with correct table, above: close, but still only an

  • approximation. Moral: do it the correct way!

14 / 26 Blocked Designs Missing Observation

slide-15
SLIDE 15

ST 516 Experimental Statistics for Engineers II

Latin Squares

The RCBD allows you to remove variability due to one controllable nuisance factor. More complex designs are needed for more than one controllable nuisance factor: Latin Square for two nuisance factors; Graeco-Latin Square for three nuisance factors.

15 / 26 Blocked Designs Latin Squares

slide-16
SLIDE 16

ST 516 Experimental Statistics for Engineers II

A Latin Square design has one treatment factor and two nuisance factors. Same number p of levels of each factor. All p2 combinations of levels of nuisance factors are run. Treatment assignments are balanced across both nuisance factors.

16 / 26 Blocked Designs Latin Squares

slide-17
SLIDE 17

ST 516 Experimental Statistics for Engineers II

Examples (row = first nuisance factor, column = second nuisance factor, letter = level of treatment): 4 × 4: A B D C B C A D C D B A D A C B 5 × 5: A D B E C D A C B E C B E D A B E A C D E C D A B

17 / 26 Blocked Designs Latin Squares

slide-18
SLIDE 18

ST 516 Experimental Statistics for Engineers II

E.g. Rocket Propellant (rocket-propellant.txt): Batch Operator Formulation BurningRate 1 1 a 24 1 2 b 20 1 3 c 19 1 4 d 24 1 5 e 24 2 1 b 17 . . .

18 / 26 Blocked Designs Latin Squares

slide-19
SLIDE 19

ST 516 Experimental Statistics for Engineers II

R commands

rocketPropellant <- read.table("data/rocket-propellant.txt", header = TRUE) summary(aov(BurningRate ~ factor(Batch) + factor(Operator) + Formulation, rocketPropellant))

Output

Df Sum Sq Mean Sq F value Pr(>F) factor(Batch) 4 68 17.000 1.5937 0.239059 factor(Operator) 4 150 37.500 3.5156 0.040373 * Formulation 4 330 82.500 7.7344 0.002537 ** Residuals 12 128 10.667

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

19 / 26 Blocked Designs Latin Squares

slide-20
SLIDE 20

ST 516 Experimental Statistics for Engineers II

Model equation: yi,j,k = µ + αi + τj + βk + ǫi,j,k, with 1 ≤ i ≤ p, 1 ≤ j ≤ p, 1 ≤ k ≤ p. Here: αi = ith row effect, βk = kth column effect, τj = jth treatment effect. But we have only one observation in the (i, k) cell of the table; the corresponding value of j can be read from the design.

20 / 26 Blocked Designs Latin Squares

slide-21
SLIDE 21

ST 516 Experimental Statistics for Engineers II

Random Effects If Batch is a random effect with variance σ2

Batches, then the expected

mean square is E (MSBatches) = σ2 + pσ2

Batches

So we estimate σ2

Batches by

ˆ σ2

Batches = MSBatches − MSResiduals

p = 17.000 − 10.667 5 = 1.267.

21 / 26 Blocked Designs Latin Squares

slide-22
SLIDE 22

ST 516 Experimental Statistics for Engineers II

R note: using reshape() The data file rocket-propellant.txt has a different layout from Table 4.8 in the text. table-04-8.txt has essentially the same layout as Table 4.8:

Batch F1 R1 F2 R2 F3 R3 F4 R4 F5 R5 1 A 24 B 20 C 19 D 24 E 24 2 B 17 C 24 D 30 E 27 A 36 3 C 18 D 38 E 26 A 27 B 21 4 D 26 E 31 A 26 B 23 C 22 5 E 22 A 30 B 20 C 29 D 31

22 / 26 Blocked Designs Latin Squares

slide-23
SLIDE 23

ST 516 Experimental Statistics for Engineers II

Note that we need to reshape this with all the Fn columns stacked into a single Formulation column, and with all the Rn columns stacked into a single BurningRate column. Read it and reshape it like this:

a <- read.table("data/table-04-8.txt", header = T) rocketPropellant <- reshape(a, varying = list(c("F1", "F2", "F3", "F4", "F5"), c("R1", "R2", "R3", "R4", "R5")), v.names = c("Formulation", "BurningRate"), timevar = "Operator", direction = "long");

Note that varying is now a list of two vectors of variable names, and

v.names is a vector of names for the stacked columns.

23 / 26 Blocked Designs Latin Squares

slide-24
SLIDE 24

ST 516 Experimental Statistics for Engineers II

Two nuisance factors or one? A different design for the rocket propellant experiment, also with 25 runs, is: Batch Operator Formulation 1 1 a 1 1 b 1 1 c 1 1 d 1 1 e 2 2 a 2 2 b ...

24 / 26 Blocked Designs Latin Squares

slide-25
SLIDE 25

ST 516 Experimental Statistics for Engineers II

This is a complete blocked design. In each block, a single operator tests every formulation using a single batch of raw material. If we observe significant block effects, we cannot distinguish operator effects from batch effects. But we get 16 degrees of freedom for Residuals, instead of 12, and hence a more powerful test for Formulations.

25 / 26 Blocked Designs Latin Squares

slide-26
SLIDE 26

ST 516 Experimental Statistics for Engineers II

So we do not need to use the Latin Square design just because we have two controllable nuisance factors. If primary focus is on Formulations, the RCBD is better. But the RCBD only allows us to estimate σ2

β = σ2 Operators + σ2 Batches.

The Latin Square allows us to estimate σ2

Operators and σ2 Batches

separately.

26 / 26 Blocked Designs Latin Squares