Applied Statistics and Data Modeling Part 3: Analysis of Variance - - - PowerPoint PPT Presentation

applied statistics and data modeling
SMART_READER_LITE
LIVE PREVIEW

Applied Statistics and Data Modeling Part 3: Analysis of Variance - - - PowerPoint PPT Presentation

Applied Statistics and Data Modeling Part 3: Analysis of Variance - Two way ANOVA Luc Duchateau 1 Paul Janssen 2 1 Faculty of Veterinary Medicine Ghent University, Belgium 2 Center for Statistics Hasselt University, Belgium 2020 UGent STATS


slide-1
SLIDE 1 UGent

STATS

VM

Applied Statistics and Data Modeling

Part 3: Analysis of Variance - Two way ANOVA Luc Duchateau1 Paul Janssen2

1Faculty of Veterinary Medicine

Ghent University, Belgium

2Center for Statistics

Hasselt University, Belgium

2020

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 1 / 77

slide-2
SLIDE 2 UGent

STATS

VM

Two-Way Analysis of Variance Overview

Overview

Introducing two-way data sets Why factorial experiments? Constructing models for two-way data ANOVA for two-way data Testing specific hypothesis for two-way data ANOVA for two-way data with exactly 1 replication

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 2 / 77

slide-3
SLIDE 3 UGent

STATS

VM

Two-Way Analysis of Variance Introducing two-way data sets

Introducing two-way data sets

Example 1: Treatment effect on PCV for Boran and Holstein cows with trypanosomosis

Cowid Breed Drug PCV-before PCV-after PCV-difference 1 Boran Berenil 18.4 26.3 7.9 2 Boran Berenil 20.3 28.1 7.8 3 Boran Berenil 22.2 27.8 5.6 4 Boran Samorin 16.3 30.1 13.8 5 Boran Samorin 15.4 27.3 11.9 6 Boran Samorin 19.2 32.7 13.5 7 Holstein Berenil 21.3 28.3 7.0 8 Holstein Berenil 17.4 26.8 9.4 9 Holstein Berenil 18.2 25.8 7.6 10 Holstein Samorin 22.2 38.1 15.9 11 Holstein Samorin 19.8 32.3 12.5 12 Holstein Samorin 20.4 30.8 10.4

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 3 / 77

slide-4
SLIDE 4 UGent

STATS

VM

Two-Way Analysis of Variance Introducing two-way data sets

Example 2: Milk production as a function of parity and inoculation dose

Cowid Parity Inoculation dose Milk0 Milk48 Reduction (%) 1 heifer high 32.4 30.2 6.79 2 heifer high 33.6 32.3 3.87 3 heifer medium 29.3 20.5 30.03 4 heifer medium 34.4 21.3 38.08 5 heifer low 31.3 14.5 53.67 6 heifer low 35.3 13.4 62.04 7 multiparous high 42.4 39.5 6.84 8 multiparous high 43.3 39.7 8.31 9 multiparous medium 45.2 23.9 47.12 10 multiparous medium 44.4 24.8 44.14 11 multiparous low 41.5 6.7 83.86 12 multiparous low 45.2 4.1 90.93

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 4 / 77

slide-5
SLIDE 5 UGent

STATS

VM

Two-Way Analysis of Variance Why factorial experiments?

Why factorial experiments?

Factorial versus ’one at a time’ Investigate two factors separately, e.g.

→ Compare heifers with multiparous cows at high inoculation dose → Multiparous cows higher reduction, and thus more appropriate as experimental model → Compare inoculation doses for multiparous cows

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 5 / 77

slide-6
SLIDE 6 UGent

STATS

VM

Two-Way Analysis of Variance Why factorial experiments?

Investigate two factors jointly, e.g.

→ Take 6 heifers and 6 multiparous cows → Assign inoculation doses at random so that each inoculation dose has 2 heifers and 2 multiparous cows

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 6 / 77

slide-7
SLIDE 7 UGent

STATS

VM

Two-Way Analysis of Variance Why factorial experiments?

Disadvantages of the ’one factor at a time’ approach We do not evaluate all treatment combinations We cannot evaluate interaction between 2 factors Treatment combinations (of the two factors) of first experiment are not randomly assigned with respect to those of second experiment Logistically more demanding because 2 experiments

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 7 / 77

slide-8
SLIDE 8 UGent

STATS

VM

Two-Way Analysis of Variance Why factorial experiments?

Advantages of the factorial approach More replications due to ’hidden’ replication In the case of no interaction, all subjects receiving a treatment level

  • f one factor can be considered replication for that factor level

Interaction can be evaluated More easily generalizable in the absence of interaction In the case of no interaction, results of one factor are valid at all levels of the other factor

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 8 / 77

slide-9
SLIDE 9 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Constructing models for two-way data

Assume that all population means are known! Population means of treatment combinations

µij is population mean for level i of factor A and level j of factor B Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 9 / 77

slide-10
SLIDE 10 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Population means of factor levels

µi. is population mean for observations of level i of factor A µi. =

b

  • j=1

µij b µ.j is population mean for observations of level j of factor B µ.j =

a

  • i=1

µij a µ.. is the overall population mean µ.. =

a

  • i=1

b

  • j=1

µij ab =

a

  • i=1

µi. a =

b

  • j=1

µ.j b

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 10 / 77

slide-11
SLIDE 11 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Translation to factor effects Main effect of level i of factor A: αi = µi. − µ.. Main effect of level j of factor B: βj = µ.j − µ.. As µ.. =

a

  • i=1

µi. a

a

  • i=1

αi = 0 As µ.. =

b

  • j=1

µ.j b

b

  • j=1

βj = 0 ⇒ Sum Restrictions

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 11 / 77

slide-12
SLIDE 12 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

What is α1, α2, β1, β2 and β3 in example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 12 / 77

slide-13
SLIDE 13 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

What is α1, α2, β1, β2 and β3 in example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

α1 = 0, α2 = 0

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 12 / 77

slide-14
SLIDE 14 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

What is α1, α2, β1, β2 and β3 in example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

α1 = 0, α2 = 0 β1 = µ.1 − µ.. = 75 − 40 = 35

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 12 / 77

slide-15
SLIDE 15 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

What is α1, α2, β1, β2 and β3 in example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

α1 = 0, α2 = 0 β1 = µ.1 − µ.. = 75 − 40 = 35 β2 = 2, β3 = −37

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 12 / 77

slide-16
SLIDE 16 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Additive factor effects We say that factor effects are additive if we only need the factor effects to obtain the population means, i.e., µij = µ.. + αi + βj This corresponds to

→ Absence of interaction → Effect of one factor does not depend on the level of the other factor

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 13 / 77

slide-17
SLIDE 17 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are the two factors additive in the example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 14 / 77

slide-18
SLIDE 18 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are the two factors additive in the example below?

Factor B: inoculation dose Factor A: Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1, heifer 75 (µ11) 42 (µ12) 3 (µ13) 40 (µ1.) i = 2, multiparous 75 (µ21) 42 (µ22) 3 (µ22) 40 (µ2.) Column mean 75 (µ.1) 42 (µ.2) 3 (µ.3) 40 (µ..)

Parity Milk reduction(%) Heifer Multiparous 20 40 60 80 100 High Medium Low No interaction effect: Additive factor effects

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 14 / 77

slide-19
SLIDE 19 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are the two factors additive in this example?

Factor B: inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row mean i = 1 heifer 64 (µ21) 36 (µ22) 5 (µ22) 35 (µ2.) i = 2 multiparous 74 (µ11) 46 (µ12) 15 (µ13) 45 (µ1.) Column mean 69 (µ.1) 41 (µ.2) 10 (µ.3) 40 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 15 / 77

slide-20
SLIDE 20 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are the two factors additive in this example?

Factor B: inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row mean i = 1 heifer 64 (µ21) 36 (µ22) 5 (µ22) 35 (µ2.) i = 2 multiparous 74 (µ11) 46 (µ12) 15 (µ13) 45 (µ1.) Column mean 69 (µ.1) 41 (µ.2) 10 (µ.3) 40 (µ..)

Dose Milk reduction(%) Low Medium High 20 40 60 80 100 Heifer Multiparous

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 15 / 77

slide-21
SLIDE 21 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Interacting factors If two factors interact

→ µij = µ.. + αi + βj → Extra term is required to describe population means µij (αβ)ij = µij − (µ.. + αi + βj) with restrictions

a

  • i=1

(αβ)ij = 0, j = 1, . . . , b

b

  • j=1

(αβ)ij = 0, i = 1, . . . , a ⇓

a

  • i=1

b

  • j=1

(αβ)ij = 0

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 16 / 77

slide-22
SLIDE 22 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 17 / 77

slide-23
SLIDE 23 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

Dose Milk reduction(%) Low Medium High 20 40 60 80 100 Heifer Multiparous

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 17 / 77

slide-24
SLIDE 24 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

Dose Milk reduction(%) Low Medium High 20 40 60 80 100 Heifer Multiparous

What is for example (αβ)11?

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 17 / 77

slide-25
SLIDE 25 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 18 / 77

slide-26
SLIDE 26 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

We first calculate the main effects

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 18 / 77

slide-27
SLIDE 27 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

We first calculate the main effects

α1 = 32 − 39 = −7, α2 = 46 − 39 = 7 β1 = 73 − 39 = 34, β2 = 38 − 39 = −1, β3 = 6 − 39 = −33

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 18 / 77

slide-28
SLIDE 28 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

We first calculate the main effects

α1 = 32 − 39 = −7, α2 = 46 − 39 = 7 β1 = 73 − 39 = 34, β2 = 38 − 39 = −1, β3 = 6 − 39 = −33

and next the interaction effects

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 18 / 77

slide-29
SLIDE 29 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Are two factors additive in this example?

Factor B - inoculation dose Factor A - parity j = 1, low j = 2, medium j = 3, high Row means i = 1 heifer 60 (µ11) 31 (µ12) 5 (µ13) 32 (µ1.) i = 2 multiparous 86 (µ21) 45 (µ22) 7 (µ22) 46 (µ2.) Column mean 73 (µ.1) 38 (µ.2) 6 (µ.3) 39 (µ..)

We first calculate the main effects

α1 = 32 − 39 = −7, α2 = 46 − 39 = 7 β1 = 73 − 39 = 34, β2 = 38 − 39 = −1, β3 = 6 − 39 = −33

and next the interaction effects

(αβ)11 = 60 − (39 − 7 + 34) = 60 − 66 = −6, (αβ)21 = 86 − (39 + 7 + 34) = 86 − 80 = 6, (αβ)12 = 31 − (39 − 7 − 1) = 31 − 31 = 0, (αβ)22 = 45 − (39 + 7 − 1) = 45 − 45 = 0, (αβ)13 = 5 − (39 − 7 − 33) = 5 + 1 = 6, (αβ)23 = 7 − (39 + 7 − 33) = 7 − 13 = −6,

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 18 / 77

slide-30
SLIDE 30 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Find parameters through R

parity<-as.factor(rep(c(1,2),3));dose<-as.factor(c(1,1,2,2,3,3)); reduction<-c(60,86,31,45,5,7) #Take care of overparameterisation through the sum restriction

  • ptions(contrasts = rep("contr.sum", 2))

#Fit the model with main effects for parity and dose and their interaction lm(reduction~parity*dose) ## ## Call: ## lm(formula = reduction ~ parity * dose) ## ## Coefficients: ## (Intercept) parity1 dose1 dose2 ## 3.900e+01

  • 7.000e+00

3.400e+01

  • 1.000e+00

## parity1:dose1 parity1:dose2 ##

  • 6.000e+00
  • 9.526e-15
  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 19 / 77

slide-31
SLIDE 31 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Types of interaction Interactions come in different forms

Interaction without main effects

Dose Milk reduction (%) Low Medium High 20 40 60 80 100 Heifer Multiparous

Always test for interaction With interaction present, report effect of a factor per level of the other factor

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 20 / 77

slide-32
SLIDE 32 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Negligible interaction Factor B - Inoculation dose Factor A - Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1 Heifer 64 (µ11) 36 (µ12) 5 (µ13) 35 (µ1.) i = 2 Multiparous 75 (µ12) 46 (µ22) 16 (µ23) 45.7 (µ2.) Column mean 69.5 (µ.1) 41 (µ.2) 10.5 (µ.3) 40.3 (µ..)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 21 / 77

slide-33
SLIDE 33 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Negligible interaction Factor B - Inoculation dose Factor A - Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1 Heifer 64 (µ11) 36 (µ12) 5 (µ13) 35 (µ1.) i = 2 Multiparous 75 (µ12) 46 (µ22) 16 (µ23) 45.7 (µ2.) Column mean 69.5 (µ.1) 41 (µ.2) 10.5 (µ.3) 40.3 (µ..)

Dose Milk reduction (%) Low Medium High 20 40 60 80 100 Heifer Multiparous

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 21 / 77

slide-34
SLIDE 34 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Negligible interaction Factor B - Inoculation dose Factor A - Parity j = 1, low j = 2, medium j = 3, high Row mean i = 1 Heifer 64 (µ11) 36 (µ12) 5 (µ13) 35 (µ1.) i = 2 Multiparous 75 (µ12) 46 (µ22) 16 (µ23) 45.7 (µ2.) Column mean 69.5 (µ.1) 41 (µ.2) 10.5 (µ.3) 40.3 (µ..)

Dose Milk reduction (%) Low Medium High 20 40 60 80 100 Heifer Multiparous

(µ11 − µ21) = (µ13 − µ23) = 11% and (µ12 − µ22) = 10%. (µ1. − µ2.) = 10.7%

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 21 / 77

slide-35
SLIDE 35 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Now we add a random error term to obtain a statistical model! The cell means model Yijk = µij + ǫijk i = 1, . . . , a j = 1, . . . , b k = 1, . . . , nij with µij population mean for level i of factor A and level j of factor B ǫijk the error term, independent and N(0,σ2)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 22 / 77

slide-36
SLIDE 36 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

The factor effects model is more relevant. Using sum restrictions we have Yijk = µ.. + αi + βj + (αβ)ij + ǫijk with µ.. =

a

  • i=1

b

  • j=1

µij ab

the overall mean (constant) αi = µi. − µ.. main effect of level i of factor A constant with restriction

a

  • i=1

αi = 0 βj = µ.j − µ.. main effect of level j of factor B constant with restriction

b

  • j=1

βj = 0 (αβ)ij = µij − µi. − µ.j + µ.. interaction terms with restrictions

a

  • i=1

(αβ)ij = 0

b

  • j=1

(αβ)ij = 0

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 23 / 77

slide-37
SLIDE 37 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Parameter estimates Estimation is based on the least squares criterion LS (µ.., . . . , αβab) =

a

  • i=1

b

  • j=1

n

  • k=1

(Yijk − µ.. − αi − βj − (αβ)ij)2 =

a

  • i=1

b

  • j=1

n

  • k=1

ǫ2

ijk

Subject to the sum restrictions

a

  • i=1

αi = 0

b

  • j=1

βj = 0

a

  • i=1

(αβ)ij = 0

b

  • j=1

(αβ)ij = 0 minimising LS(.) leads to parameter estimates ˆ µ.. = ¯ Y... ˆ αi = ¯ Yi.. − ¯ Y... ˆ βj = ¯ Y.j. − ¯ Y... ˆ (αβ)ij = ¯

  • Yij. − ¯

Yi.. − ¯ Y.j. + ¯ Y...

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 24 / 77

slide-38
SLIDE 38 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Manual calculation of the parameter estimates for example 1 We first calculate the means of all treatment combinations, and averages of these

Factor B: Drug Factor A: Breed j = 1, Berenil j = 2, Samorin Row mean i = 1, Boran 7.100 (y 11.) 13.067 (y 12.) 10.083 (y 1..) i = 2, Holstein 8.000 (y 21.) 12.933 (y 22.) 10.467 (y 2..) Column mean 7.55 (y .1.) 13.00 (y .2.) 10.275 (y ...) These averages can then be used to calculate the parameter estimates ˆ µ.. = ¯ y... = 10.275 ˆ α1 = ¯ y1.. − ¯ y... = 10.083 − 10.275 = −0.192 ˆ β1 = ¯ y.1. − ¯ y... = 7.55 − 10.275 = −2.725 ˆ (αβ)11 = ¯

  • y11. − ¯

y1.. − ¯ y.1. + ¯ y... = 7.1 − 10.083 − 7.55 + 10.275 = −0.258

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 25 / 77

slide-39
SLIDE 39 UGent

STATS

VM

Two-Way Analysis of Variance Constructing models for two-way data

Parameter estimates for example 1 using R with sum restriction

setwd("c:/users/lduchate/docs/OC/onderwijs/MOOC/") tryps<-read.table("tryps.csv",header=T,sep=";")

  • ptions(contrasts = rep("contr.sum", 2))

lm(PCVdif~Breed*Drug,data=tryps) ## ## Call: ## lm(formula = PCVdif ~ Breed * Drug, data = tryps) ## ## Coefficients: ## (Intercept) Breed1 Drug1 Breed1:Drug1 ## 10.2750

  • 0.1917
  • 2.7250
  • 0.2583
  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 26 / 77

slide-40
SLIDE 40 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

Analysis of variance for two-way data

Starting point is the sum of squares for the deviation of the

  • bservation from the overall mean

Yijk − ¯ Y... = ¯

  • Yij. − ¯

Y...

  • +
  • Yijk − ¯

Yij.

  • Square both sides and sum over all observations. The crossproducts
  • n the rhs equal zero for balanced data (without proof).

SSTOT ↑

a

  • i=1

b

  • j=1

n

  • k=1
  • Yijk − ¯

Y... 2 = n

a

  • i=1

b

  • j=1

¯

  • Yij. − ¯

Y... 2 +

a

  • i=1

b

  • j=1

n

  • k=1
  • Yijk − ¯

Yij. 2 ↓ ↓ SSTRT SSERR

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 27 / 77

slide-41
SLIDE 41 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

The deviation of the cell mean ¯

  • Yij. from ¯

Y... is further split into ¯

  • Yij. − ¯

Y... = ¯ Yi.. − ¯ Y...

  • +

¯ Y.j. − ¯ Y...

  • +

¯

  • Yij. − ¯

Yi.. − ¯ Y.j. + ¯ Y...

  • Square both sides and sum over all observations. Crossproducts on

the rhs are zero for balanced data (without proof). SSTRT SSA ↑ ↑ n

a

  • i=1

b

  • j=1

¯

  • Yij. − ¯

Y... 2 = nb

a

  • i=1

¯ Yi.. − ¯ Y... 2 + na

b

  • j=1

¯ Y.j. − ¯ Y... 2 + n

a

  • i=1

b

  • j=1

¯

  • Yij. − ¯

Yi.. − ¯ Y.j. + ¯ Y... 2 ↓ ↓ SSB SSAB

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 28 / 77

slide-42
SLIDE 42 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

Expected mean sum of squares Mean sum of squares are obtained by dividing SS by its number of independent terms in SS As

n

  • k=1
  • Yijk − ¯

Yij.

  • = 0 for each of the ab factor levels it follows that

MSERR = SSERR (n − 1)ab As

a

  • i=1

¯ Yi.. − ¯ Y...

  • = 0 it follows that

MSA = SSA a − 1

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 29 / 77

slide-43
SLIDE 43 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

As

b

  • j=1

¯ Y.j. − ¯ Y...

  • = 0 it follows that

MSB = SSB b − 1 Finally the SS for the interaction contains ab terms, but there are b + a − 1 restrictions, which leads to (a − 1)(b − 1) independent terms and thus MSAB = SSAB (a − 1)(b − 1)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 30 / 77

slide-44
SLIDE 44 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

The mean sum of squares have following expected value E(MSERR) = σ2 E(MSA) = σ2 + nb

a

  • i=1

α2

i

a − 1 E(MSB) = σ2 + na

b

  • j=1

β2

j

b − 1 E(MAB) = σ2 + n

a

  • i=1

b

  • j=1

(αβ)2

ij

(a − 1)(b − 1) and are thus appropriate to construct F-tests

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 31 / 77

slide-45
SLIDE 45 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

F-tests We first test the interaction Under the null hypothesis H0 : all (αβ)ij = 0, we have E(MSAB) = E(MSERR) FAB =

MSAB MSERR is F-distributed with (a − 1)(b − 1) and (n − 1)ab

degrees of freedom FAB

H0

∼ F(a−1)(b−1),(n−1)ab The P-value is P = P (FAB ≥ fAB)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 32 / 77

slide-46
SLIDE 46 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

Main effects can be tested in the same manner, e.g., for factor A Under the null hypothesis H0 : all (α)i = 0 it follows that E(MSA) = E(MSERR) FA =

MSA MSERR is F-distributed with (a − 1) and (n − 1)ab

degrees of freedom FA

H0

∼ F(a−1),(n−1)ab the P-value is P (FA ≥ fA)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 33 / 77

slide-47
SLIDE 47 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

Example 1: Effect of breed and drug on PCV of cows with trypanosomosis anova(lm(PCVdif~Breed*Drug,data=tryps)) ## Analysis of Variance Table ## ## Response: PCVdif ## Df Sum Sq Mean Sq F value Pr(>F) ## Breed 1 0.441 0.441 0.147 0.7114204 ## Drug 1 89.107 89.107 29.711 0.0006082 *** ## Breed:Drug 1 0.801 0.801 0.267 0.6193164 ## Residuals 8 23.993 2.999 ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 34 / 77

slide-48
SLIDE 48 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data

Example 2: Effect of parity and dose on milk reduction setwd("c://users//lduchate//docs//OC//onderwijs//MOOC") mast <-read.table("mastitis.csv", header=T,skip=0,sep=";") mast.res<-lm(propreduction~parity*dose,data=mast) mast.anova<-anova(mast.res) mast.anova ## Analysis of Variance Table ## ## Response: propreduction ## Df Sum Sq Mean Sq F value Pr(>F) ## parity 1 626.7 626.7 36.790 0.0009111 *** ## dose 2 8757.8 4378.9 257.065 1.535e-06 *** ## parity:dose 2 384.9 192.5 11.299 0.0092356 ** ## Residuals 6 102.2 17.0 ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 35 / 77

slide-49
SLIDE 49 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Testing specific hypothesis for two-way data

Comparisons in absence of interaction Compare levels of one factor averaged over the levels of the other factor We define hypotheses in terms of the µi.’s and µ.j’s or alternatively in terms of the αi’s and βj’s as µi. = µ.. + αi and µ.j = µ.. + βj Unbiased estimators for the µi.’s and µ.j’s are the ¯ Yi..’s and ¯ Y.j.’s. We thus require the distribution properties of the estimators ¯ Yi.. and ¯ Y.j. In the remainder we only discuss the development of specific hypothesis tests for the µi.’s based on ¯ Yi..; the same holds for specific hypothesis tests for the µ.j’s based on ¯ Y.j.

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 36 / 77

slide-50
SLIDE 50 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The distribution property of the estimator ¯ Yi.. is as follows ¯ Yi.. ∼ N

  • µi., σ2/bn
  • Interest is in a contrast of form L = a

i=1 ciµi. with ci = 0 which

is estimated by ˆ L =

a

  • i=1

ci ¯ Yi.. The distribution property of the contrast is as follows ˆ L ∼ N a

  • i=1

ciµi., σ2/bn

a

  • i=1

c2

i

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 37 / 77

slide-51
SLIDE 51 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Replacing σ2 with its estimator MSERR we get ˆ L − L

  • (MSERR/bn) a

i=1 c2 i

∼ t(n−1)ab The 95% confidence interval for L is thus given by ˆ L ± t(n−1)ab;0.025

  • (MSERR/bn)

a

  • i=1

c2

i

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 38 / 77

slide-52
SLIDE 52 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The statistic for testing the hypothesis H0 : L = c versus Ha : L = c is given by TL = ˆ L − c

  • (MSERR/bn) a

i=1 c2 i

The P-value is given by 2P (TL ≥ |tL|)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 39 / 77

slide-53
SLIDE 53 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Example 1: Effect of breed on PCV of cows with trypanosomosis Interest is in contrast L = 2

i=1 ciµi. = 1 × µ1. − 1 × µ2.

We can estimate µ1. by ¯ y1.. = 2

j=1

2

k=1 y1jk

  • /(bn) = 10.0833333

and µ2. by ¯ y2.. = 2

j=1

2

k=1 y2jk

  • /(bn) = 10.4666667

The contrast L is thus estimated by ˆ L = ¯ y1.. − ¯ y2.. = −0.3833333

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 40 / 77

slide-54
SLIDE 54 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

In the expression of the 95 % confidence interval, we need n = 3; b = 2; t(n−1)ab;0.025 = t8;0.025 = −2.3060041 and MSERR = 2.9991667 Filling in the numbers, we have for the 95 % confidence interval of L ˆ L ± t(n−1)ab;0.025

  • (MSERR/bn)

a

  • i=1

c2

i

−0.3833333 ± 2.3060041

  • (2.9991667/6) × 2

[−2.6890172; 1.9223505]

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 41 / 77

slide-55
SLIDE 55 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The P-value is given by 2 × P  TL ≥

  • ˆ

L − 0

  • (MSERR/bn) a

i=1 c2 i

 with TL ∼ t(n−1)ab = t8 With n = 3, a = 2, b = 2 and MSERR = 2.9991667 P-value = 2 × P  TL ≥

  • ˆ

L − 0

  • (MSERR/bn) a

i=1 c2 i

 2 × P

  • TL ≥
  • −0.3833333
  • 2 × (2.9991667/6)
  • 2 × P (TL ≥ |−0.3833866|)

0.7114204

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 42 / 77

slide-56
SLIDE 56 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The corresponding R code

library(multcomp) tryps$bd<-with(tryps,interaction(Breed,Drug)) tryps.cell<-lm(PCVdif~bd-1,data=tryps) trypsPlanned <- glht(tryps.cell, linfct = mcp(bd = c("0.5*Boran.Berenil+0.5*Boran.Samorin- 0.5*Holstein.Berenil-0.5*Holstein.Samorin = 0"))) confint(trypsPlanned) ## ## Simultaneous Confidence Intervals ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = PCVdif ~ bd - 1, data = tryps) ## ## Quantile = 2.306 ## 95% family-wise confidence level ## ## ## Linear Hypotheses: ## Estimate ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0 -0.3833 ## lwr ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0 -2.6890 ## upr ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0 1.9224

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 43 / 77

slide-57
SLIDE 57 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data summary(trypsPlanned) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = PCVdif ~ bd - 1, data = tryps) ## ## Linear Hypotheses: ## Estimate ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0

  • 0.3833

##

  • Std. Error

## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0 0.9999 ## t value ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0

  • 0.383

## Pr(>|t|) ## 0.5 * Boran.Berenil + 0.5 * Boran.Samorin - 0.5 * Holstein.Berenil - 0.5 * Holstein.Samorin == 0 0.711 ## (Adjusted p values reported -- single-step method)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 44 / 77

slide-58
SLIDE 58 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Comparisons in the presence of interaction Compare levels of one factor at a fixed level of the other factor We define hypotheses in terms of the µij’s Unbiased estimators for the µij’s are the ¯ Yij.’s We thus require the distribution properties of the estimators ¯ Yij.

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 45 / 77

slide-59
SLIDE 59 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The distribution property of the estimator ¯

  • Yij. is as follows

¯

  • Yij. ∼ N
  • µij, σ2/n
  • Interest is in a contrast of form L = a

i=1

b

j=1 cijµij with cij = 0

which is estimated by ˆ L =

a

  • i=1

b

  • j=1

cij ¯ Yij. The distribution property of the contrast is as follows ˆ L ∼ N  

a

  • i=1

b

  • j=1

cijµij, σ2/n

a

  • i=1

b

  • j=1

c2

ij

 

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 46 / 77

slide-60
SLIDE 60 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Replacing σ2 with its estimator MSERR we get ˆ L − L

  • (MSERR/n) a

i=1

b

j=1 c2 ij

∼ t(n−1)ab The 95% confidence interval for L is thus given by ˆ L ± t(n−1)ab;0.025

  • (MSERR/n)

a

  • i=1

b

  • j=1

c2

ij

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 47 / 77

slide-61
SLIDE 61 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The statistic for testing the hypothesis H0 : L = c versus Ha : L = c is given by TL = ˆ L − c

  • (MSERR/n) a

i=1 c2 i

The P-value is given by 2P (TL ≥ |tL|)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 48 / 77

slide-62
SLIDE 62 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

Example 2: Effect of parity on milk reduction at each inoculation dose Interest is in contrasts of form L = µ1j − µ2j for j = 1, 2, 3 For ease of presentation, we only consider the first contrast L = µ11 − µ21 We can estimate µ11 by ¯

  • y11. =

2

k=1 y11k

  • /n = 5.33 and µ21 by

¯

  • y21. =

2

k=1 y21k

  • /n = 7.575

The contrast L is thus estimated by ˆ L = ¯

  • y11. − ¯
  • y21. = −2.245
  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 49 / 77

slide-63
SLIDE 63 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

In the expression of the 95 % confidence interval, we need n = 2; a = 3; b = 2; t(n−1)ab;0.025 = t6;0.025 = −2.4469119 and MSERR = 17.0343333 Filling in the numbers, we have for the 95 % confidence interval of L ˆ L ± t(n−1)ab;0.025

  • (MSERR/n)

a

  • i=1

b

  • j=1

c2

ij

−2.245 ± 2.4469119

  • (17.0343333/2) × 2

[−9.3861129; 4.8961129]

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 50 / 77

slide-64
SLIDE 64 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The P-value is given by 2 × P  TL ≥

  • ˆ

L − 0

  • (MSERR/n) a

i=1 c2 i

 with TL ∼ t(n−1)ab = t6 With n = 2, a = 3, b = 2 and MSERR = 17.0343333 P-value = 2 × P  TL ≥

  • ˆ

L − 0

  • (MSERR/n) a

i=1

b

j=1 c2 ij

 2 × P

  • TL ≥
  • −2.245
  • 2 × (17.0343333/2)
  • 2 × P (TL ≥ |−0.5439435|)

0.6060865

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 51 / 77

slide-65
SLIDE 65 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data

The corresponding R code

mast$pd<-with(mast,interaction(parity,dose)) cell<-lm(propreduction~pd-1,data=mast) library(multcomp) mcmast1<-glht(cell,linfct=mcp(pd = c("heifer.low-multiparous.low = 0"))) confint(mcmast1) ## ## Simultaneous Confidence Intervals ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = propreduction ~ pd - 1, data = mast) ## ## Quantile = 2.4469 ## 95% family-wise confidence level ## ## ## Linear Hypotheses: ## Estimate lwr upr ## heifer.low - multiparous.low == 0

  • 2.2450 -12.3441

7.8541

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 52 / 77

slide-66
SLIDE 66 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data summary(mcmast1) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = propreduction ~ pd - 1, data = mast) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## heifer.low - multiparous.low == 0

  • 2.245

4.127

  • 0.544

0.606 ## (Adjusted p values reported -- single-step method)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 53 / 77

slide-67
SLIDE 67 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data mcmast2<-glht(cell,linfct=mcp(pd = "Tukey")) summary(mcmast2,test = adjusted(type = "none")) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: Tukey Contrasts ## ## ## Fit: lm(formula = propreduction ~ pd - 1, data = mast) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## multiparous.high - heifer.high == 0 29.540 4.127 7.157 0.000375 *** ## heifer.low - heifer.high == 0

  • 52.525

4.127 -12.726 1.44e-05 *** ## multiparous.low - heifer.high == 0

  • 50.280

4.127 -12.182 1.86e-05 *** ## heifer.medium - heifer.high == 0

  • 23.800

4.127

  • 5.767 0.001187 **

## multiparous.medium - heifer.high == 0

  • 12.225

4.127

  • 2.962 0.025217 *

## heifer.low - multiparous.high == 0

  • 82.065

4.127 -19.884 1.05e-06 *** ## multiparous.low - multiparous.high == 0

  • 79.820

4.127 -19.340 1.24e-06 *** ## heifer.medium - multiparous.high == 0

  • 53.340

4.127 -12.924 1.32e-05 *** ## multiparous.medium - multiparous.high == 0

  • 41.765

4.127 -10.119 5.41e-05 *** ## multiparous.low - heifer.low == 0 2.245 4.127 0.544 0.606087 ## heifer.medium - heifer.low == 0 28.725 4.127 6.960 0.000437 *** ## multiparous.medium - heifer.low == 0 40.300 4.127 9.764 6.63e-05 *** ## heifer.medium - multiparous.low == 0 26.480 4.127 6.416 0.000677 *** ## multiparous.medium - multiparous.low == 0 38.055 4.127 9.220 9.18e-05 *** ## multiparous.medium - heifer.medium == 0 11.575 4.127 2.805 0.030979 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- none method)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 54 / 77

slide-68
SLIDE 68 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data summary(mcmast2) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: Tukey Contrasts ## ## ## Fit: lm(formula = propreduction ~ pd - 1, data = mast) ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## multiparous.high - heifer.high == 0 29.540 4.127 7.157 0.00294 ** ## heifer.low - heifer.high == 0

  • 52.525

4.127 -12.726 < 0.001 *** ## multiparous.low - heifer.high == 0

  • 50.280

4.127 -12.182 < 0.001 *** ## heifer.medium - heifer.high == 0

  • 23.800

4.127

  • 5.767

0.00892 ** ## multiparous.medium - heifer.high == 0

  • 12.225

4.127

  • 2.962

0.15311 ## heifer.low - multiparous.high == 0

  • 82.065

4.127 -19.884 < 0.001 *** ## multiparous.low - multiparous.high == 0

  • 79.820

4.127 -19.340 < 0.001 *** ## heifer.medium - multiparous.high == 0

  • 53.340

4.127 -12.924 < 0.001 *** ## multiparous.medium - multiparous.high == 0

  • 41.765

4.127 -10.119 < 0.001 *** ## multiparous.low - heifer.low == 0 2.245 4.127 0.544 0.99160 ## heifer.medium - heifer.low == 0 28.725 4.127 6.960 0.00332 ** ## multiparous.medium - heifer.low == 0 40.300 4.127 9.764 < 0.001 *** ## heifer.medium - multiparous.low == 0 26.480 4.127 6.416 0.00519 ** ## multiparous.medium - multiparous.low == 0 38.055 4.127 9.220 < 0.001 *** ## multiparous.medium - heifer.medium == 0 11.575 4.127 2.805 0.18232 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- single-step method)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 55 / 77

slide-69
SLIDE 69 UGent

STATS

VM

Two-Way Analysis of Variance Testing specific hypothesis for two-way data mcmast3<-glht(cell,linfct=mcp(pd = c("heifer.low-multiparous.low = 0","heifer.medium-multiparous.medium = 0","heifer.high-m summary(mcmast3,test = adjusted(type = "bonferroni")) ## ## Simultaneous Tests for General Linear Hypotheses ## ## Multiple Comparisons of Means: User-defined Contrasts ## ## ## Fit: lm(formula = propreduction ~ pd - 1, data = mast) ## ## Linear Hypotheses: ## Estimate Std. Error ## heifer.low - multiparous.low == 0

  • 2.245

4.127 ## heifer.medium - multiparous.medium == 0

  • 11.575

4.127 ## heifer.high - multiparous.high == 0

  • 29.540

4.127 ## t value Pr(>|t|) ## heifer.low - multiparous.low == 0

  • 0.544

1.00000 ## heifer.medium - multiparous.medium == 0

  • 2.805

0.09294 . ## heifer.high - multiparous.high == 0

  • 7.157

0.00113 ** ## --- ## Signif. codes: ## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- bonferroni method)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 56 / 77

slide-70
SLIDE 70 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

ANOVA for two-way data with exactly 1 replication

Assume we fit model Yij = µij + ǫij = µ.. + αi + βj + (αβ)ij + ǫij SSERR =

a

  • i=1

b

  • j=1

(Yij − Yij)2 = 0 dfERR = ab(n − 1) = 0 MSERR = SSERR dfERR = 0 undefined

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 57 / 77

slide-71
SLIDE 71 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

We have to fit simpler model Yij = µ.. + αi + βj + ǫij i.e, we assume (αβ)ij = 0 i = 1, . . . , a; j = 1, . . . b Expected values of MS are as follows E(MSA) = σ2 + nb

a

  • i=1

α2

i

a − 1 E(MSB) = σ2 + na

b

  • j=1

β2

j

b − 1 E(MSAB) = σ2

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 58 / 77

slide-72
SLIDE 72 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Under H0 : all αi = 0 it follows that E(MSA) = E(MSAB) We define the F statistic as FA∗ = MSA

MSAB which under H0 :

FA∗ ∼ F(a−1),(a−1)(b−1) and P-value: P (FA∗ ≥ fA∗)

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 59 / 77

slide-73
SLIDE 73 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

What happens in presence of interaction? E(MSAB) = σ2 + n

a

  • i=1

b

  • j=1

(αβ)2

ij

(a − 1)(b − 1) → if (αβ)ij = 0, then E(MSAB) > σ2 FA∗ = MSA

MSAB → FA∗ ↓ → smaller probability of rejection

⇒ CONSERVATIVE TEST = ACCEPTABLE!

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 60 / 77

slide-74
SLIDE 74 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Example: ANOVA for mastitis data set Cowid Parity Inoculation dose Milk0 Milk48 Reduction 1 heifer high 32.4 30.2 6.79 2 heifer medium 29.3 20.5 30.03 3 heifer low 31.3 14.5 53.67 4 multiparous high 42.4 39.5 6.84 5 multiparous medium 45.2 23.9 47.12 6 multiparous low 41.5 6.7 83.86

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 61 / 77

slide-75
SLIDE 75 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

ANOVA table with interaction Term SS df MS f ∗ P(F ≥ f ∗) Dose 3838.62 2 1919.31 . . Parity 373.35 1 373.35 . . Dose*Parity 228.40 2 114.20 . . Error . Total 4440.38 5

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 62 / 77

slide-76
SLIDE 76 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

ANOVA table without interaction Term SS df MS f ∗ P(F ≥ f ∗) Dose 3838.62 2 1919.31 16.81 0.0562 Parity 373.35 1 373.35 3.27 0.2123 Error 228.40 2 114.20 Total 4440.38 5

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 63 / 77

slide-77
SLIDE 77 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 1.

Hydroponics is a system whereby plants are grown on water to which nutrients are

  • added. To investigate the effect of the phosporus (P) and potassium (K) concentration

in the water on yield of tomatoes (in kg), three different doses of P, i.e., low (40 ppm P), medium (50 ppm P) and high (60 ppm P) and two different doses of K, i.e., low (180 ppm K) and high (240 ppm K) are used. In the greenhouse, we have 12 plots each

  • f 1 m2 and the 6 treatment combinations are assigned at random to the plots in such a

way that each treatment combination is occurring 2 times. The N concentration is kept

  • constant. We assume that the yield is normally distributed. The data can be obtained

from the text file ”hydroponics.csv”.

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 64 / 77

slide-78
SLIDE 78 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Which of the following statements is correct? The 95 % confidence interval for the difference between the low and the high dose

  • f P averaged out over the two doses of K equals [-8.813;-2.187]. This comparison

makes sense for this dataset. The 95 % confidence interval for the difference between the low and the high dose

  • f P at the low dose of K equals [-5.186;4.186]. This comparison makes sense for

this dataset. The 95 % confidence interval for the difference between the low and the high dose

  • f K averaged out over the three doses of P equals [10.628;16.039]. This

comparison makes sense for this dataset. The 95 % confidence interval for the difference between the low and the high dose

  • f K at the low dose of P equals [-12.686;-3.315]. This comparison makes sense

for this dataset.

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 65 / 77

slide-79
SLIDE 79 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 2.

The Cinchona tree has been used for centuries to produce quinine (from its bark) that can be used to treat malaria. There is an interest in maximizing the percentage quinine in the bark of the Cinchona tree. Two factors are studied simultaneously, shadow (presence or absence) and altitude (low, medium and high). The population means are given in the table below. Altitude Low Medium High Shadow no 5.6 6.4 7.2 yes 6.0 6.8 7.6 Which of the following statements is correct? There is an interaction between shadow and altitude Shadow has an effect on quinine content Altitude has an effect on quinine content The interaction between shadow and altitude is negligible

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 66 / 77

slide-80
SLIDE 80 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 3.

Hydroponics is a system whereby plants are grown on water to which nutrients are

  • added. To investigate the effect of the phosporus (P) and potassium (K) concentration

in the water on yield of tomatoes (in kg), three different doses of P, i.e., low (40 ppm P), medium (50 ppm P) and high (60 ppm P) and two different doses of K, i.e., low (180 ppm K) and high (240 ppm K) are used. The population means are given in the table below. Dose P Low Medium High Dose K Low 22.0 21.5 7.2 High 30.0 35.5 40.5 Assuming we use the sum restrictions for the parameters, which of the following statements is wrong? The effect of the low dose of K is 9.217 The effect of the medium dose of P is 2.383 The interaction effect between the low K dose and the medium P dose is -2.217 The interaction effect between the low K dose and the high P dose is -7.433

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 67 / 77

slide-81
SLIDE 81 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 4.

In Ethiopia there is a shortage of zinc in many food crops. In order to improve this situation, the zinc content (mg/100g) of Faba bean is studied as a function of variety (A and B) and zinc content in the soil (low, medium and high). The population means are presented in the following graph

5.5 6.5 7.5 Zncontent Mean Yield high low medium Variety B A

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 68 / 77

slide-82
SLIDE 82 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

5.5 6.5 7.5 Zncontent Mean Yield high low medium Variety B A

Which of the following statements is wrong? There is an effect of variety There is an interaction between variety and Zinc content The variety effect is larger at the low level Zinc content compared to the high level Zinc content The difference between the high and the low Zn content is higher for variety B compared to variety A

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 69 / 77

slide-83
SLIDE 83 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 5.

A factorial experiment with two factors, factor A with 4 levels and factor B with 3 levels, is run. Each treatment combination is repeated 3 times in a random way. The sum of squares of factor A equals 90, the sum of squares of factor B equals 123, the sum of squares for the interaction equals 30 and the total sum of squares equals 273. Mark the correct statements below The degrees of freedom for factor A equals 3 and the mean sum of squares for the interaction equals 5 The degrees of freedom for factor B equals 2 and the F test statistic for factor A equals 24 The degrees of freedom for the interaction between A and B equals 6 and the mean sum of squares for the error equals 1.25 The degrees of freedom for the error sum of squares equals 24 and the F test statistic for the interaction between A and B equals 4

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 70 / 77

slide-84
SLIDE 84 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 6.

The analysis of variance is based on the F-test. Which of the statements below regarding the F-statistic and the F-distribution is correct? The F-statistic can never be negative Under the null hypothesis of no differences between the factor levels, the expected value of the F-statistic is 1 Both values of the F-statistics well below 1 and well above 1 will lead to rejection

  • f the null hypothesis of no effect

An F-test can also be used as an alternative to the t-test for all possible alternative hypotheses

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 71 / 77

slide-85
SLIDE 85 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 7.

Hydroponics is a system whereby plants are grown on water to which nutrients are

  • added. To investigate the effect of the phosporus (P) and potassium (K) concentration

in the water on yield of tomatoes (in kg), three different doses of P, i.e., low (40 ppm P), medium (50 ppm P) and high (60 ppm P) and two different doses of K, i.e., low (180 ppm K) and high (240 ppm K) are used. In the greenhouse, we have 12 plots each

  • f 1 m2 and the 6 treatment combinations are assigned at random to the plots in such a

way that each treatment combination is occurring 2 times. The N concentration is kept

  • constant. We assume that the yield is normally distributed. The data were given in

Question 2 and can also be obtained from the text file ”hydroponics.csv”. With µ11, µ21 and µ31 representing the population means of plots treated with the low K dose and the low, medium and high dose of P respectively, which of the following statements about the contrast L = µ11 − 0.5µ21 − 0.5µ31 are correct? L ∼ N

  • µ11 − 0.5µ21 − 0.5µ31, 0.75σ2

The 99% confidence interval is given by [-6.148;6.148] The Pvalue to test whether the contrast equals zero is 0.5 The 90 % confidence interval is given by [-4.06;4.06]

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 72 / 77

slide-86
SLIDE 86 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 8.

The Cinchona tree has been used for centuries to produce quinine (from its bark) that can be used to treat malaria. There is an interest in maximizing the percentage quinine in the bark of the Cinchona tree. Two factors are studied simultaneously, shadow (presence or absence) and altitude (low, medium and high). We have 12 plots each of 10 m2 with for each treatment combination 2 plots. The data can be obtained from the text file ”quinine.txt”. With µ1., µ2. and µ3. representing the population means of plots at low, medium and high altitude averaged over the two shadow levels, which of the following statements about the contrast L = µ1. − 0.5µ2. − 0.5µ3. are correct? L ∼ N

  • µ1. − 0.5µ2. − 0.5µ3., 0.375σ2

The 99% confidence interval is given by [-1.534;-0.842] The Pvalue to test whether the contrast equals zero is 0.000155 The 90 % confidence interval is given by [-1.462;-0.913]

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 73 / 77

slide-87
SLIDE 87 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 9.

Assume we have a factorial experiment with two factors, A and B, each at three levels. For each treatment combination there is one observation. Which of the following statements is wrong? When we fit a model including interaction between the two factors, the estimate of the population variance σ2 equals zero A model that only includes the main effects of the two factors will lead to a lower type I error probability for testing the effects of the two factors if interaction is present Under the assumption of no interaction between the two factors, the ratio of the mean sum of squares of A and the mean sum of squares of the interaction AB is F distributed with 2 and 4 degrees of freedom in the numerator and denominator respectively Assuming the absence of interaction, the P-value for testing whether the factor B has an effect is given by P

  • F2,4 > MSB

MSAB

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 74 / 77

slide-88
SLIDE 88 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 10.

In Ethiopia there is a shortage of zinc in many food crops. In order to improve this situation, the zinc content (mg/100g) of Faba bean is studied as a function of variety (A and B) and zinc content in the soil (low, medium and high). We have 6 plots each of 10 m2 with for each treatment combination 1 plot. The data are given in the table below, and can also be obtained from the text file ”zinc.txt”. Which of the statements below are correct? When fitting a model including the interaction between Zncontent and Variety, the mean sum of squares for the Variety equals 1.927 and that of the error term 0 When fitting a model without interaction between Zncontent and Variety, the P-value for testing whether the two varieties differ equals 0.1342; we cannot reject the null hypothesis at the 5% significance level Based on the model without interaction, the 95 % confidence interval for the difference between Variety A and B equals [-3.126;-0.859]. We therefore can reject the null hypothesis that the two varieties have the same yield Based on the model without interaction, the P-value for testing whether there is a difference between the low and high Zn concentration in the soil equals 0.22. We therefore cannot reject the null hypothesis that low and high Zn concentration in the soil leads to the same yield

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 75 / 77

slide-89
SLIDE 89 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Problem 11.

Hydroponics is a system whereby plants are grown on water to which nutrients are

  • added. To investigate the effect of the phosporus (P) and potassium (K) concentration

in the water on yield of tomatoes (in kg), three different doses of P, i.e., low (40 ppm P), medium (50 ppm P) and high (60 ppm P) and two different doses of K, i.e., low (180 ppm K) and high (240 ppm K) are used. In the greenhouse, we have 12 plots each

  • f 1 m2 and the 6 treatment combinations are assigned at random to the plots in such a

way that each treatment combination is occurring 2 times. The N concentration is kept

  • constant. We assume that the yield is normally distributed. The data can be obtained

from the text file ”hydroponics.csv”. Perform a statistical analysis using R comparing all treatment combinations pairwise and using Tukey’s multiple comparisons method for adjustment.

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 76 / 77

slide-90
SLIDE 90 UGent

STATS

VM

Two-Way Analysis of Variance ANOVA for two-way data with exactly 1 replication

Which of the following statements is correct? The P-value with adjustment to compare the medium and low dose of P at the high dose of K equals 0.169. When this would have been the only relevant comparison and we therefore did not need to adjust, this comparison would be significant at the 5% significance level The smallest P-value is found for the largest difference, which corresponds to the comparison between medium P and low K versus high P and K dose. This P-value is the same as the P-value for the comparison between low P and K dose versus high P and K dose The 95% confidence interval with multiple comparisons adjustment for the difference between the medium and high P dose at the high K dose is given by [-12.623;2.623]. Therefore we cannot claim that there is difference between these two treatment combinations The 95 % confidence interval for the difference between the medium and high P dose at the high K dose is given by [-9.686;-0.315]. As we did not adjust for multiple comparisons, we cannot use the observation that zero is not contained in this interval to reject the null hypothesis that the two treatment combinations yield the same results

  • L. Duchateau & P.Janssen

(UG & UH) Applied Statistics and Data Modeling 2020 77 / 77