Applied Statistics and Data Modeling Part 3: Analysis of Variance - - - PowerPoint PPT Presentation

applied statistics and data modeling
SMART_READER_LITE
LIVE PREVIEW

Applied Statistics and Data Modeling Part 3: Analysis of Variance - - - PowerPoint PPT Presentation

Applied Statistics and Data Modeling Part 3: Analysis of Variance - One way ANOVA Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS


slide-1
SLIDE 1 UGent

STATS

VM

Applied Statistics and Data Modeling

Part 3: Analysis of Variance - One way ANOVA Luc Duchateau1 Paul Janssen2

1Faculty of Veterinary Medicine

Ghent University, Belgium

2Center for Statistics

Hasselt University, Belgium

2020

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 1 / 48

slide-2
SLIDE 2 UGent

STATS

VM

Analysis of Variance Overview ANOVA

Overview

One way ANOVA Two way ANOVA ANOVA for blocked experiments

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 2 / 48

slide-3
SLIDE 3 UGent

STATS

VM

One way ANOVA Overview

Overview One way ANOVA

Data Models Parameter estimation Hypothesis testing

General Contrast Set of contrasts

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 3 / 48

slide-4
SLIDE 4 UGent

STATS

VM

One way ANOVA One way ANOVA: model

One way ANOVA

One way: one factor, > 2 levels An O Va = Analysis Of Variance

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 4 / 48

slide-5
SLIDE 5 UGent

STATS

VM

One way ANOVA One way ANOVA: model

Example 1: Weight gain in 4 breeds of chicken

Breed 1 Breed 2 Breed 3 Breed 4 1.56 1.38 1.49 1.46 1.54 1.41 1.54 1.49 1.50 1.44 1.48 1.44 1.49 1.37 1.51 1.52 1.51 1.40 1.48 1.49

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 5 / 48

slide-6
SLIDE 6 UGent

STATS

VM

One way ANOVA One way ANOVA: model

Model specification: Cell means model

Yij = µi + eij i = 1, . . . , a j = 1, . . . , ni

  • n. =

a

  • i=1

ni with Yij jth observation of breed i µi population mean of breed i eij random error, independent and N(0; σ2)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 6 / 48

slide-7
SLIDE 7 UGent

STATS

VM

One way ANOVA One way ANOVA: model

Cell means model: Model assumptions

Yij = µi + eij Homogeneity of the variance

Var(Yij) = Var(eij) = σ2

Normality of the observations

Yij is normally distributed because eij is normally distributed E(Yij) = µi because E(eij) = 0 Var(Yij) = σ2 ⇒ Yij ∼ N(µi, σ2)

Independence of the observations

Yij’s are independent because eij’s are independent

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 7 / 48

slide-8
SLIDE 8 UGent

STATS

VM

One way ANOVA One way ANOVA: model

Graphic representation of cell means model

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 8 / 48

slide-9
SLIDE 9 UGent

STATS

VM

One way ANOVA One way ANOVA: model

Normal distribution assumption

Throughout the course we assume Y ∼ N

  • µ, σ2

fY (y) = 1 √ 2πσ2 exp

  • −1

2 y − µ σ 2 E(Y ) =

+∞

  • −∞

yfY (y)dy = µ Var(Y ) =

+∞

  • −∞

(y − µ)2fY (y)dy = σ2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 9 / 48

slide-10
SLIDE 10 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

Estimation using Least Squares (LS) technique

Notation Yi. =

ni

  • j=1

Yij ¯ Yi. =

ni

  • j=1

Yij ni

= Yi.

ni

Y.. =

a

  • i=1

ni

  • j=1

Yij ¯ Y.. =

a

  • i=1

ni

  • j=1

Yij n.

= Y..

n.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 10 / 48

slide-11
SLIDE 11 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

LS technique for cell means model

Estimator of µi by minimising the LS criterion Q =

a

  • i=1

ni

  • j=1

(Yij − µi)2 First take the first parial derivative dQ dµi =

ni

  • j=1

(−2) (Yij − µi) Equate to 0 2

ni

  • j=1

(Yij − ˆ µi) = 0 Parameter estimator for µi: ˆ µi = ¯ Yi.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 11 / 48

slide-12
SLIDE 12 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

The corresponding R code

setwd("c:/users/lduchate/docs/OC/onderwijs/ADEKUS2020/data") chicken<-read.table(file = "chicken.csv",header=T,sep=";") cellmeans.chicken<-lm(weight~breed-1,data=chicken) summary(cellmeans.chicken) ## ## Call: ## lm(formula = weight ~ breed - 1, data = chicken) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0400 -0.0200 -0.0050 0.0125 0.0400 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## breeda 1.52000 0.01265 120.2 <2e-16 *** ## breedb 1.40000 0.01265 110.7 <2e-16 *** ## breedc 1.50000 0.01265 118.6 <2e-16 *** ## breedd 1.48000 0.01265 117.0 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 12 / 48

slide-13
SLIDE 13 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

Alternative model specification: Factor effects model

Yij = µ + αi + eij i = 1, . . . , a j = 1, . . . , ni

  • n. =

a

  • i=1

ni with Yij jth observation of breed i µ a constant, common for all observations αi a constant, the effect of the ith factor level eij the random error term, independent and N(0,σ2)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 13 / 48

slide-14
SLIDE 14 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

Overparameterisation of factor effects model

Cell means model has a parameters µi Factor effects model has a + 1 parameters µ and αi ⇒ Overparameterisation! Parameter restrictions to make meaning of the parameters clear and unique

a

  • i=1

αi = 0 ⇒ µ =

a

  • i=1

µi a as

a

  • i=1

µi = aµ +

a

  • i=1

αi = aµ This is the sum restriction and is the one we will use

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 14 / 48

slide-15
SLIDE 15 UGent

STATS

VM

One way ANOVA One way ANOVA: Estimation

The corresponding R code

  • ptions(contrasts = rep("contr.sum", 2))

cellmeanscm.chicken<-lm(weight~breed,data=chicken) summary(cellmeanscm.chicken) ## ## Call: ## lm(formula = weight ~ breed, data = chicken) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.0400 -0.0200 -0.0050 0.0125 0.0400 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.475000 0.006325 233.218 < 2e-16 *** ## breed1 0.045000 0.010954 4.108 0.000823 *** ## breed2

  • 0.075000

0.010954

  • 6.847 3.93e-06 ***

## breed3 0.025000 0.010954 2.282 0.036501 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 15 / 48

slide-16
SLIDE 16 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-17
SLIDE 17 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-18
SLIDE 18 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean:

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-19
SLIDE 19 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: Yij − ¯ Y..

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-20
SLIDE 20 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: Yij − ¯ Y.. = ¯

  • Yi. − ¯

Y..

  • +
  • Yij − ¯

Yi.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-21
SLIDE 21 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: Yij − ¯ Y.. = ¯

  • Yi. − ¯

Y..

  • +
  • Yij − ¯

Yi.

  • Square and sum over all observations

a

  • i=1

ni

  • j=1
  • Yij − ¯

Y.. 2 =

a

  • i=1

ni ¯

  • Yi. − ¯

Y.. 2 +

a

  • i=1

ni

  • j=1
  • Yij − ¯

Yi. 2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-22
SLIDE 22 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Sum of squares

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal ANOVA based on sums of squares The starting point is the deviation of the observation from the overall mean: Yij − ¯ Y.. = ¯

  • Yi. − ¯

Y..

  • +
  • Yij − ¯

Yi.

  • Square and sum over all observations

a

  • i=1

ni

  • j=1
  • Yij − ¯

Y.. 2 =

a

  • i=1

ni ¯

  • Yi. − ¯

Y.. 2 +

a

  • i=1

ni

  • j=1
  • Yij − ¯

Yi. 2 ↓ ↓ ↓ SStot SStrt SSerr

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 16 / 48

slide-23
SLIDE 23 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Mean sum of squares

Mean sum of squares: MS = SS

df

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 17 / 48

slide-24
SLIDE 24 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Mean sum of squares

Mean sum of squares: MS = SS

df

df = number of degrees of freedom = number of independent terms in SS

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 17 / 48

slide-25
SLIDE 25 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Mean sum of squares

Mean sum of squares: MS = SS

df

df = number of degrees of freedom = number of independent terms in SS SStot: n. − 1 independent terms → MStot = SStot

n.−1 a

  • i=1

ni

  • j=1
  • Yij − ¯

Y..

  • = 0

SStrt: a − 1 independent terms → MStrt = SStrt

a−1 a

  • i=1

ni ¯

  • Yi. − ¯

Y..

  • = 0

SSerr: n. − a independent terms → MSerr = SSerr

n.−a ni

  • j=1
  • Yij − ¯

Yi.

  • = 0 within each level i of the factor
  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 17 / 48

slide-26
SLIDE 26 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

Expected value

Definition

E(Y ) =

  • yfY (y)dy = µ

E

  • S2

= σ2 with S2 =

(yi−¯ y.)2 n−1

Intuitive: repeated sampling

sample: y1, . . . , yn → ¯ y. for n → ∞ : ¯

  • y. → µ
  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 18 / 48

slide-27
SLIDE 27 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: E(MS)

The expected values of MSerr and MStrt are given by: E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 19 / 48

slide-28
SLIDE 28 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: E(MS)

The expected values of MSerr and MStrt are given by: E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n. ⇒ MSerr is always an unbiased estimator for σ2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 19 / 48

slide-29
SLIDE 29 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: E(MS)

The expected values of MSerr and MStrt are given by: E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n. ⇒ MSerr is always an unbiased estimator for σ2 ⇒ MStrt is also an unbiased estimator for σ2 under H0

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 19 / 48

slide-30
SLIDE 30 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: E(MS)

The expected values of MSerr and MStrt are given by: E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n. ⇒ MSerr is always an unbiased estimator for σ2 ⇒ MStrt is also an unbiased estimator for σ2 under H0 Under Ha: E(MStrt) > σ2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 19 / 48

slide-31
SLIDE 31 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Test statistic

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 20 / 48

slide-32
SLIDE 32 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Test statistic

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 20 / 48

slide-33
SLIDE 33 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Test statistic

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n. Test statistic: F ∗ = MStrt

MSerr

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 20 / 48

slide-34
SLIDE 34 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Test statistic

H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal E(MSerr) = σ2 E(MStrt) = σ2 +

a

  • i=1

ni(µi − µ.)2 a − 1 with µ. =

a

  • i=1

niµi n. Test statistic: F ∗ = MStrt

MSerr

Under H0 we expect F ∗ = 1 Under Ha we expect F ∗ > 1

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 20 / 48

slide-35
SLIDE 35 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: P-value

Distribution for F ∗ under H0: F ∗ ∼ F[a − 1, n. − a]

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 21 / 48

slide-36
SLIDE 36 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: P-value

Distribution for F ∗ under H0: F ∗ ∼ F[a − 1, n. − a] P-value = probability of observing the same or a more extreme result as in the experiment, given that H0 is true

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 21 / 48

slide-37
SLIDE 37 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: P-value

Distribution for F ∗ under H0: F ∗ ∼ F[a − 1, n. − a] P-value = probability of observing the same or a more extreme result as in the experiment, given that H0 is true ⇒ P-value = P(F[a − 1, n. − a] ≥ f ∗) with f ∗ the actual value for F ∗

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 21 / 48

slide-38
SLIDE 38 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: P-value

Distribution for F ∗ under H0: F ∗ ∼ F[a − 1, n. − a] P-value = probability of observing the same or a more extreme result as in the experiment, given that H0 is true ⇒ P-value = P(F[a − 1, n. − a] ≥ f ∗) with f ∗ the actual value for F ∗ Reject H0 if P-value < α.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 21 / 48

slide-39
SLIDE 39 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: ANOVA table

Term SS df MS f ∗ Treatment SStrt a − 1 MStrt

MStrt MSerr

Error SSerr

  • n. − a

MSerr Total SStot

  • n. − 1

MStot

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 22 / 48

slide-40
SLIDE 40 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Example 1

Example

Breed 1 Breed 2 Breed 3 Breed 4 1.56 1.38 1.49 1.46 1.54 1.41 1.54 1.49 1.50 1.44 1.48 1.44 1.49 1.37 1.51 1.52 1.51 1.40 1.48 1.49 ¯

  • y1. = 1.52

¯

  • y2. = 1.40

¯

  • y3. = 1.50

¯

  • y4. = 1.48
  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 23 / 48

slide-41
SLIDE 41 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Example 1

Example

Breed 1 Breed 2 Breed 3 Breed 4 1.56 1.38 1.49 1.46 1.54 1.41 1.54 1.49 1.50 1.44 1.48 1.44 1.49 1.37 1.51 1.52 1.51 1.40 1.48 1.49 ¯

  • y1. = 1.52

¯

  • y2. = 1.40

¯

  • y3. = 1.50

¯

  • y4. = 1.48

ANOVA table

Term SS df MS f ∗ P(F≥ f ∗) Breeds 0.0415 3 0.0138 17.29 0.000028 Error 0.0128 16 0.0008 Total 0.0543 19

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 23 / 48

slide-42
SLIDE 42 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Example 1

Example

Breed 1 Breed 2 Breed 3 Breed 4 1.56 1.38 1.49 1.46 1.54 1.41 1.54 1.49 1.50 1.44 1.48 1.44 1.49 1.37 1.51 1.52 1.51 1.40 1.48 1.49 ¯

  • y1. = 1.52

¯

  • y2. = 1.40

¯

  • y3. = 1.50

¯

  • y4. = 1.48

ANOVA table

Term SS df MS f ∗ P(F≥ f ∗) Breeds 0.0415 3 0.0138 17.29 0.000028 Error 0.0128 16 0.0008 Total 0.0543 19

P-value < α ⇒ Reject H0

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 23 / 48

slide-43
SLIDE 43 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

General hypothesis test: Example

Representation of the P-value as the area under the density function F[3,16]

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 24 / 48

slide-44
SLIDE 44 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

The corresponding R code for cell means model

F-statistic in cell means model tests whether all population means are equal to zero H0 : µ1 = µ2 = . . . = µa = 0 vs Ha : Not all µi = 0

anova(cellmeans.chicken) ## Analysis of Variance Table ## ## Response: weight ## Df Sum Sq Mean Sq F value Pr(>F) ## breed 4 43.554 10.8885 13611 < 2.2e-16 *** ## Residuals 16 0.013 0.0008 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 25 / 48

slide-45
SLIDE 45 UGent

STATS

VM

One way ANOVA One way ANOVA: Hypothesis testing

The corresponding R code for factor effects model

F-statistic in factor effects model tests whether population means differ from each other H0 : µ1 = µ2 = . . . = µa vs Ha : Not all µi equal

anova(cellmeanscm.chicken) ## Analysis of Variance Table ## ## Response: weight ## Df Sum Sq Mean Sq F value Pr(>F) ## breed 3 0.0415 0.013833 17.292 2.83e-05 *** ## Residuals 16 0.0128 0.000800 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 26 / 48

slide-46
SLIDE 46 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Distribution of a linear combination

Say L =

k

  • j=1

cjYj with Yj ∼ N(µj; σ2) and mutually independent, then L ∼ N  

k

  • j=1

cjµj; σ2

k

  • j=1

c2

j

  Example sample mean: ¯

  • Yi. =

ni

  • j=1

Yij ni

=

ni

  • j=1

1 ni Yij with Yij ∼ N(µi; σ2 i )

⇒ k = ni and cj = 1

ni

⇒ ¯

  • Yi. ∼ N
  • ni
  • j=1

1 ni µi; σ2 i ni

  • j=1

1 n2

i

  • ⇒ ¯
  • Yi. ∼ N
  • ni 1

ni µi; σ2 i ni 1 n2

i

  • ⇒ ¯
  • Yi. ∼ N(µi; σ2

i

ni )

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 27 / 48

slide-47
SLIDE 47 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Specific comparisons: Contrasts

Hypothesis for contrasts H0 : L = µ0 vs Ha : L = µ0 with L =

a

  • i=1

ciµi and

a

  • i=1

ci = 0

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 28 / 48

slide-48
SLIDE 48 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Specific comparisons: Contrasts

Hypothesis for contrasts H0 : L = µ0 vs Ha : L = µ0 with L =

a

  • i=1

ciµi and

a

  • i=1

ci = 0 Estimator of L and its variance ˆ L =

a

  • i=1

ci ¯ Yi. Var

  • ˆ

L

  • =

a

  • i=1

c2

i Var

¯ Yi.

  • =

a

  • i=1

c2

i

σ2 ni

  • = σ2

a

  • i=1

c2

i

ni S2 ˆ L

  • =

MSerr

a

  • i=1

c2

i

ni

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 28 / 48

slide-49
SLIDE 49 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-50
SLIDE 50 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

Test statistic:

ˆ L−µ0 S(ˆ L) ∼ T[n. − a]

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-51
SLIDE 51 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

Test statistic:

ˆ L−µ0 S(ˆ L) ∼ T[n. − a]

H0 : L = 0 vs Ha : L = 0

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-52
SLIDE 52 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

Test statistic:

ˆ L−µ0 S(ˆ L) ∼ T[n. − a]

H0 : L = 0 vs Ha : L = 0 Test statistic: T ∗ =

ˆ L S(ˆ L)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-53
SLIDE 53 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

Test statistic:

ˆ L−µ0 S(ˆ L) ∼ T[n. − a]

H0 : L = 0 vs Ha : L = 0 Test statistic: T ∗ =

ˆ L S(ˆ L)

P-value: 2 × P(T[n. − a] ≥| t∗ |)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-54
SLIDE 54 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts

ˆ L =

a

  • i=1

ci ¯ Yi. S2 ˆ L

  • = MSerr

a

  • i=1

c2

i

ni

Test statistic:

ˆ L−µ0 S(ˆ L) ∼ T[n. − a]

H0 : L = 0 vs Ha : L = 0 Test statistic: T ∗ =

ˆ L S(ˆ L)

P-value: 2 × P(T[n. − a] ≥| t∗ |) (1-α)100% confidence interval: ˆ L ± T[1 − α/2, n. − a]S

  • ˆ

L

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 29 / 48

slide-55
SLIDE 55 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts: example

H0 : µ1+µ2

2

= µ3+µ4

2

vs Ha : µ1+µ2

2

= µ3+µ4

2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 30 / 48

slide-56
SLIDE 56 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

Contrasts: R code

library(multcomp) chickenPlanned <- glht(cellmeans.chicken, linfct = mcp(breed = c("0.5*a+0.5*b-0.5*c-0.5*d = 0")))

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 31 / 48

slide-57
SLIDE 57 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

summary(chickenPlanned) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = weight ~ breed - 1, data = chicken) ## ## Linear Hypotheses: ## Estimate Std. Error t value ## 0.5 * a + 0.5 * b - 0.5 * c - 0.5 * d == 0 -0.03000 0.01265

  • 2.372

## Pr(>|t|) ## 0.5 * a + 0.5 * b - 0.5 * c - 0.5 * d == 0 0.0306 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- single-step method)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 32 / 48

slide-58
SLIDE 58 UGent

STATS

VM

One way ANOVA One way ANOVA: Contrasts

confint(chickenPlanned) ## ## Simultaneous Confidence Intervals ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = weight ~ breed - 1, data = chicken) ## ## Quantile = 2.1199 ## 95% family-wise confidence level ## ## ## Linear Hypotheses: ## Estimate lwr upr ## 0.5 * a + 0.5 * b - 0.5 * c - 0.5 * d == 0 -0.030000 -0.056815 -0.003185

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 33 / 48

slide-59
SLIDE 59 UGent

STATS

VM

One way ANOVA One way ANOVA: Set of Contrasts

Set of contrasts

Consider the following set of three contrasts      H0 : L1 = µa − µb = 0 vs Ha : L1 = µa − µb = 0 H0 : L2 = µa − µc = 0 vs Ha : L2 = µa − µc = 0 H0 : L3 = µa − µd = 0 vs Ha : L3 = µa − µd = 0 (1) Testing these three contrasts simultaneously corresponds to testing the general hypothesis of no differences between the 4 populations

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 34 / 48

slide-60
SLIDE 60 UGent

STATS

VM

One way ANOVA One way ANOVA: Set of Contrasts

Set of contrasts: R code

chickenPlannedSet <- glht(cellmeans.chicken, linfct = mcp(breed = c("a-b = 0","a-c = 0","a-d = 0"))) summary(chickenPlannedSet,test = Ftest() ) ## ## General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Linear Hypotheses: ## Estimate ## a - b == 0 0.12 ## a - c == 0 0.02 ## a - d == 0 0.04 ## ## Global Test: ## F DF1 DF2 Pr(>F) ## 1 17.29 3 16 2.83e-05

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 35 / 48

slide-61
SLIDE 61 UGent

STATS

VM

One way ANOVA One way ANOVA: Set of Contrasts

All pairwise comparisons: R code

If the general hypothesis of no pairwise differences is rejected, we often proceed to test all pairwise comparisons. We need to adjust for multiple comparisons which is often based on Tukey’s adjustement technique

mc.res<-glht(cellmeans.chicken,linfct=mcp(breed="Tukey")) summary(mc.res)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 36 / 48

slide-62
SLIDE 62 UGent

STATS

VM

One way ANOVA One way ANOVA: Set of Contrasts

summary(mc.res) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: Tukey Contrasts ## ## ## Fit: lm(formula = weight ~ breed - 1, data = chicken) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## b - a == 0 -0.12000 0.01789

  • 6.708

<0.001 *** ## c - a == 0 -0.02000 0.01789

  • 1.118

0.6841 ## d - a == 0 -0.04000 0.01789

  • 2.236

0.1557 ## c - b == 0 0.10000 0.01789 5.590 <0.001 *** ## d - b == 0 0.08000 0.01789 4.472 0.0019 ** ## d - c == 0 -0.02000 0.01789

  • 1.118

0.6841 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- single-step method)

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 37 / 48

slide-63
SLIDE 63 UGent

STATS

VM

One way ANOVA One way ANOVA: Set of Contrasts

plot(mc.res)

−0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 d − c d − b c − b d − a c − a b − a ( ( ( ( ( ( ) ) ) ) ) )

  • 95% family−wise confidence level

Linear Function

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 38 / 48

slide-64
SLIDE 64 UGent

STATS

VM

One way ANOVA Problems

Problem 1. Which of the following statements are correct? If we apply the sum restriction in a one way anova factor effects model with p treatments, µ can be interpreted as µ = p

i=1 µi

p Fitting the cell means model and using R to test, we test the null hypothesis that there are no treatment differences. Depending on which model formulation is used (yij = µi + eij versus yij = µ + αi + eij ), following hypotheses are equivalent:

H0 : µ1 = µ2 = . . . = µp H0 : α1 = α2 = . . . = αp

In a one way anova cell means model, all observations have the same random error term.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 39 / 48

slide-65
SLIDE 65 UGent

STATS

VM

One way ANOVA Problems

Problem 2. Researchers want to investigate whether the time required to teach a dog a new trick depends on the way the dogs are rewarded during training. They make 3 groups and use in each group a different way to reward the dogs: ”a dog snack”, ”petting the dog” and ”voice encouragement”. 5 dogs are randomly assigned to each group. All the dogs are trained by the same person on the same day of the week in the same playground. How many treatments are there in the experiment, how many levels are there, and what is the response variable? 3 treatments, 5 levels, the response variable is day of week 3 treatments, 3 levels, the respons variable is the time required to teach a dog a new trick 3 treatments, 5 levels, the response variable is the time required to teach a dog a new trick 5 treatments, 3 levels, the response variable is the time required to teach a dog a new trick

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 40 / 48

slide-66
SLIDE 66 UGent

STATS

VM

One way ANOVA Problems

Problem 3. With n the number of observations and p the number of different treatment groups, which of the following statements are correct? S2

p is an unbiased estimator for σ2

SSE is an unbiased estimator for σ2 MSE is an unbiased estimator for σ2

  • SSE

n−p is an unbiased estimator for σ2

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 41 / 48

slide-67
SLIDE 67 UGent

STATS

VM

One way ANOVA Problems

Problem 4. Wich of following statements is correct? When using the least squares criterion to obtain parameter estimates The sum of the squared errors must be equal to zero The sum of the squared errors must be minimal The sum of the squared difference between each observation and its prediction must be minimal The sum of the squared difference between each observation and its prediction must be maximal None of the above

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 42 / 48

slide-68
SLIDE 68 UGent

STATS

VM

One way ANOVA Problems

Problem 5. Which of the following statements are correct? The null hypothesis in a

  • ne way anova asks whether

At least two population group means are the same All population variances are equal All population group means are the same Two specifically chosen population group means are the same

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 43 / 48

slide-69
SLIDE 69 UGent

STATS

VM

One way ANOVA Problems

Problem 6. Which of the following statements are correct? If the variability between groups is similar to the variability within groups the F-statistic is close to 0 If the F-statistic is close to 1, we can reject the null hypothesis If there is no difference between the groups, the population group means will be equal to the population overall mean In order to be able to reject the null hypothesis, MST should be fairly larger than MSE None of the above

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 44 / 48

slide-70
SLIDE 70 UGent

STATS

VM

One way ANOVA Problems

Problem 7. The moment at which broilers are transported to the slaughterhouse (early in the morning, during the day or at night) might influence the percentage

  • f broilers that does not survive the transport. To investigate this, 4

transports are done early in the morning, 4 transports are done during the day and 4 transports are done at night. Upon arrival at the slaughterhouse, the percentage of dead broilers is calculated per transport. Fill in the blanks in the ANOVA table below. Source of df SS MS f* P-value variation Trt . . . . . . . . . . . . . . . Error . . . . . . 0.00346 Total 2.56203

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 45 / 48

slide-71
SLIDE 71 UGent

STATS

VM

One way ANOVA Problems

Problem 8. The moment at which broilers are transported to the slaughterhouse (early in the morning, during the day or at night) might influence the percentage of broilers that does not survive the transport. To investigate this, 4 transports are done early in the morning, 4 transports are done during the day and 4 transports are done at night. Upon arrival at the slaughterhouse, the percentage of dead broilers is calculated per

  • transport. Based on the P-value you obtained in the previous question and

assuming a significance level of 5%, which of the following conclusions are true? It is better to transport broilers early in the morning or at night There is a difference in percentage DOA depending on the moment the broilers are transported The DOA percentage is not influenced by the moment of transportation Based on this P-value, no conclusion can be drawn

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 46 / 48

slide-72
SLIDE 72 UGent

STATS

VM

One way ANOVA Problems

Problem 9. Researchers want to investigate whether the weight of chickens depends

  • n their breed. They include 4 breeds in their study. For each breed they

measure the weight of 5 chickens at the age of 26 weeks. You can download the dataset ”chicken.csv”. Use R to answer the following

  • question. Wich of the following statements are true

We can reject the null hypothesis that there is no difference between any of the breeds, the P-value is 2.8e-5 We can reject the null hypothesis that all variances are equal Considering all pairwise comparisons, we can conclude that the weight

  • f breed b is significantly different from the weight of breed a, c en d

Based on the data exploration, we expect that the weight of breed b would be lower than the weight of breed a, c en d

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 47 / 48

slide-73
SLIDE 73 UGent

STATS

VM

One way ANOVA Problems

Problem 10. The moment at which broilers are transported to the slaughterhouse (early in the morning, during the day or at night) might influence the percentage of broilers that does not survive the transport. To investigate this, 4 transports are done early in the morning, 4 transports are done during the day and 4 transports are done at night. Upon arrival at the slaughterhouse, the percentage of dead broilers is calculated per transport. You can download the dataset ”chickentransport.csv”. Use R to perform a full analysis.

  • L. Duchateau & P.Janssen

(UH & UG) Applied Statistics and Data Modeling 2020 48 / 48