Nonnormal Responses We have usually assumed that experimental data - - PowerPoint PPT Presentation

nonnormal responses
SMART_READER_LITE
LIVE PREVIEW

Nonnormal Responses We have usually assumed that experimental data - - PowerPoint PPT Presentation

ST 516 Experimental Statistics for Engineers II Nonnormal Responses We have usually assumed that experimental data are at least approximately normally distributed, with at least approximately constant variance. When either assumption is


slide-1
SLIDE 1

ST 516 Experimental Statistics for Engineers II

Nonnormal Responses

We have usually assumed that experimental data are at least approximately normally distributed, with at least approximately constant variance. When either assumption is violated, we can try transforming the response to remove the violation, or using another model for the response distribution.

1 / 10 Other Topics Nonnormal Responses

slide-2
SLIDE 2

ST 516 Experimental Statistics for Engineers II

Box-Cox approach The power transformations y ∗ = y λ are useful. Box and Cox developed a systematic approach to finding a good λ, based on y (λ) =      y λ − 1 λ ˙ y λ−1 λ = 0, ˙ y ln y λ = 0, where ˙ y = exp 1 n

  • ln y
  • is the geometric mean response.

2 / 10 Other Topics Nonnormal Responses

slide-3
SLIDE 3

ST 516 Experimental Statistics for Engineers II

Procedure Fit model for various λ, and graph SSE against λ. Lowest SSE gives best λ. All λ with SSE(λ) ≤ SS∗ comprise a 100(1 − α)% confidence interval, where SS∗ = SSE(λopt)

  • 1 +

t2

α/2,dfE

dfE

  • .

Example Peak discharge data (peak-discharge.txt): (peak-discharge-box-cox.R).

3 / 10 Other Topics Nonnormal Responses

slide-4
SLIDE 4

ST 516 Experimental Statistics for Engineers II

Generalized Linear Model

Sometimes a better approach is to use a different statistical model. E.g., for counted data, assume that Y has the Poisson distribution. Replace the linear model E(Y ) = µ = β0 + β1x1 + β2x2 + · · · + βkxk = x′β by g(µ) = x′β ⇐ ⇒ E(Y ) = µ = g −1(x′β) for some nonlinear link function g(·).

4 / 10 Other Topics Nonnormal Responses

slide-5
SLIDE 5

ST 516 Experimental Statistics for Engineers II

If the distribution is in the exponential family and the link function is chosen to match it, estimation by maximum likelihood is relatively easy. In general, the variance of Y also depends on µ; examples from the exponential family: Distribution g(µ) V (µ) Normal, σ2 = 1 µ 1 Poisson log µ µ Gamma 1/µ µ2 Inverse Gaussian 1/µ2 µ3 Binomial log

µ 1−µ

µ(1 − µ)

5 / 10 Other Topics Nonnormal Responses

slide-6
SLIDE 6

ST 516 Experimental Statistics for Engineers II

Other combinations of distribution, g(·), and V (·) may also be used, but are not supported by standard software. The binomial case is widely used: P(Y = 1) = ex′β 1 + ex′β . = 1 1 + e−x′β . Example Coupon redemption: Y is the number of customers out of 1000 who redeem the coupon; three factors were used in a 23 factorial design.

6 / 10 Other Topics Nonnormal Responses

slide-7
SLIDE 7

ST 516 Experimental Statistics for Engineers II

R commands Generalized linear models are fitted using glm():

summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A * B + A * C + B * C, coupon, family = "binomial"))

Output

Call: glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A * B + A * C + B * C, family = "binomial", data = coupon) Deviance Residuals: 1 2 3 4 5 6 7 8 0.4723

  • 0.4307
  • 0.4228

0.3949

  • 0.4572

0.4166 0.4238

  • 0.3987

7 / 10 Other Topics Nonnormal Responses

slide-8
SLIDE 8

ST 516 Experimental Statistics for Engineers II

Output, continued

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.011545 0.025515 -39.645 < 2e-16 *** A 0.169208 0.025509 6.633 3.28e-11 *** B 0.169622 0.025515 6.648 2.97e-11 *** C 0.023317 0.025510 0.914 0.361 A:B

  • 0.006285

0.025512

  • 0.246

0.805 A:C

  • 0.002773

0.025432

  • 0.109

0.913 B:C

  • 0.041020

0.025434

  • 1.613

0.107

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 93.0238

  • n 7

degrees of freedom Residual deviance: 1.4645

  • n 1

degrees of freedom AIC: 72.286 Number of Fisher Scoring iterations: 3

8 / 10 Other Topics Nonnormal Responses

slide-9
SLIDE 9

ST 516 Experimental Statistics for Engineers II

Reduced model The analyst decides to fit a reduced model including A, B, and BC (and, to keep it hierarchical, C):

summary(glm(cbind(Redeemed, Customers - Redeemed) ~ A + B * C, coupon, family = "binomial"))

Output

Call: glm(formula = cbind(Redeemed, Customers - Redeemed) ~ A + B * C, family = "binomial", data = coupon) Deviance Residuals: 1 2 3 4 5 6 7 8 0.3402

  • 0.3114
  • 0.3783

0.3531

  • 0.5142

0.4692 0.5509

  • 0.5171

9 / 10 Other Topics Nonnormal Responses

slide-10
SLIDE 10

ST 516 Experimental Statistics for Engineers II

Output, continued

Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.01142 0.02551 -39.652 < 2e-16 *** A 0.16868 0.02542 6.635 3.25e-11 *** B 0.16912 0.02543 6.650 2.94e-11 *** C 0.02308 0.02543 0.908 0.364 B:C

  • 0.04097

0.02543

  • 1.611

0.107

  • Signif. codes:

0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 93.0238

  • n 7

degrees of freedom Residual deviance: 1.5360

  • n 3

degrees of freedom AIC: 68.358 Number of Fisher Scoring iterations: 3

10 / 10 Other Topics Nonnormal Responses