Simple orthogonal block structures, nesting and marginality R. A. - - PowerPoint PPT Presentation

simple orthogonal block structures nesting and marginality
SMART_READER_LITE
LIVE PREVIEW

Simple orthogonal block structures, nesting and marginality R. A. - - PowerPoint PPT Presentation

Simple orthogonal block structures, nesting and marginality R. A. Bailey University of St Andrews QMUL (emerita) John Nelder Workshop in Methodological Statistics, Imperial College London, 28 March 2015 1/43 Abstract I John Nelder


slide-1
SLIDE 1

Simple orthogonal block structures, nesting and marginality

  • R. A. Bailey

University of St Andrews QMUL (emerita) John Nelder Workshop in Methodological Statistics, Imperial College London, 28 March 2015

1/43

slide-2
SLIDE 2

Abstract I

John Nelder introduced simple orthogonal block structures in

  • ne of his famous 1965 papers. They provide a compact

description of many of the structures in common use in experiments, so much so that some people find it hard to understand a structure that cannot be expressed in this way.

2/43

slide-3
SLIDE 3

Abstract I

John Nelder introduced simple orthogonal block structures in

  • ne of his famous 1965 papers. They provide a compact

description of many of the structures in common use in experiments, so much so that some people find it hard to understand a structure that cannot be expressed in this way. Terry Speed and I later generalized them to poset block structures.

2/43

slide-4
SLIDE 4

Abstract I

John Nelder introduced simple orthogonal block structures in

  • ne of his famous 1965 papers. They provide a compact

description of many of the structures in common use in experiments, so much so that some people find it hard to understand a structure that cannot be expressed in this way. Terry Speed and I later generalized them to poset block structures. But there are still misunderstandings.

◮ If there are 5 blocks of 4 plots each,

should the plot factor have 4 levels or 20?

2/43

slide-5
SLIDE 5

Abstract I

John Nelder introduced simple orthogonal block structures in

  • ne of his famous 1965 papers. They provide a compact

description of many of the structures in common use in experiments, so much so that some people find it hard to understand a structure that cannot be expressed in this way. Terry Speed and I later generalized them to poset block structures. But there are still misunderstandings.

◮ If there are 5 blocks of 4 plots each,

should the plot factor have 4 levels or 20?

◮ What is the difference between nesting and marginality?

2/43

slide-6
SLIDE 6

Abstract I

John Nelder introduced simple orthogonal block structures in

  • ne of his famous 1965 papers. They provide a compact

description of many of the structures in common use in experiments, so much so that some people find it hard to understand a structure that cannot be expressed in this way. Terry Speed and I later generalized them to poset block structures. But there are still misunderstandings.

◮ If there are 5 blocks of 4 plots each,

should the plot factor have 4 levels or 20?

◮ What is the difference between nesting and marginality? ◮ What is the difference between a factor,

the effect of that factor (this effect may be called an interaction in some cases), and the smallest model which includes that factor whilst respecting marginality?

2/43

slide-7
SLIDE 7

Abstract II

John himself expressed strong views about people who ignored marginality in the model-fitting process.

3/43

slide-8
SLIDE 8

Abstract II

John himself expressed strong views about people who ignored marginality in the model-fitting process. My take on this is that there are three different partial orders involved: I will try to explain the difference.

3/43

slide-9
SLIDE 9

Labelling plots in blocks

Suppose that there are five blocks of four plots each. How should we label them? B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

4/43

slide-10
SLIDE 10

Labelling plots in blocks

Suppose that there are five blocks of four plots each. How should we label them? B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Advantage of using P: fewer levels, so less computer storage space.

4/43

slide-11
SLIDE 11

Labelling plots in blocks

Suppose that there are five blocks of four plots each. How should we label them? B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Advantage of using P: fewer levels, so less computer storage space. Advantage of using Q: if the data are analysed by someone who did not design the experiment, they cannot make the mistake of thinking that all plots ω with P(ω) = 1 have something in common.

4/43

slide-12
SLIDE 12

Terminology

B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 We say that P is nested in B because the information that P(ω1) = P(ω2) is irrelevant unless B(ω1) = B(ω2).

5/43

slide-13
SLIDE 13

Terminology

B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 We say that P is nested in B because the information that P(ω1) = P(ω2) is irrelevant unless B(ω1) = B(ω2). We say that Q is finer than P because we know that if Q(ω1) = Q(ω2) then B(ω1) = B(ω2).

5/43

slide-14
SLIDE 14

Terminology

B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 We say that P is nested in B because the information that P(ω1) = P(ω2) is irrelevant unless B(ω1) = B(ω2). We say that Q is finer than P because we know that if Q(ω1) = Q(ω2) then B(ω1) = B(ω2). These relationships are different, and need different words, but many people confuse them.

5/43

slide-15
SLIDE 15

Terminology

B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 We say that P is nested in B because the information that P(ω1) = P(ω2) is irrelevant unless B(ω1) = B(ω2). We say that Q is finer than P because we know that if Q(ω1) = Q(ω2) then B(ω1) = B(ω2). These relationships are different, and need different words, but many people confuse them. P and Q are different types of thing, and play different roles, so I shall call P a pre-factor and Q a factor, but many people confuse them, or use different terminology.

5/43

slide-16
SLIDE 16

Pre-factors and nesting

Write P ⊏ B to indicate that P is nested in B.

6/43

slide-17
SLIDE 17

Pre-factors and nesting

Write P ⊏ B to indicate that P is nested in B. Write P ⊑ B to mean that either P ⊏ B or P = B.

6/43

slide-18
SLIDE 18

Pre-factors and nesting

Write P ⊏ B to indicate that P is nested in B. Write P ⊑ B to mean that either P ⊏ B or P = B. Nesting is a partial order, which means that

◮ F ⊑ F for all pre-factors F;

6/43

slide-19
SLIDE 19

Pre-factors and nesting

Write P ⊏ B to indicate that P is nested in B. Write P ⊑ B to mean that either P ⊏ B or P = B. Nesting is a partial order, which means that

◮ F ⊑ F for all pre-factors F; ◮ if F ⊑ G and G ⊑ F then F = G;

6/43

slide-20
SLIDE 20

Pre-factors and nesting

Write P ⊏ B to indicate that P is nested in B. Write P ⊑ B to mean that either P ⊏ B or P = B. Nesting is a partial order, which means that

◮ F ⊑ F for all pre-factors F; ◮ if F ⊑ G and G ⊑ F then F = G; ◮ if F ⊑ G and G ⊑ H then F ⊑ H.

6/43

slide-21
SLIDE 21

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram.

7/43

slide-22
SLIDE 22

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram. Put a symbol for each object (here, a pre-factor).

7/43

slide-23
SLIDE 23

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram. Put a symbol for each object (here, a pre-factor). If F ⊏ G then the symbol for F is lower in the diagram than the symbol for G, and is joined to it by lines that are traversed upwards.

7/43

slide-24
SLIDE 24

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram. Put a symbol for each object (here, a pre-factor). If F ⊏ G then the symbol for F is lower in the diagram than the symbol for G, and is joined to it by lines that are traversed upwards.

  • P

B

7/43

slide-25
SLIDE 25

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram. Put a symbol for each object (here, a pre-factor). If F ⊏ G then the symbol for F is lower in the diagram than the symbol for G, and is joined to it by lines that are traversed upwards.

  • P

B 4 5 Show the numbers of levels.

7/43

slide-26
SLIDE 26

Hasse diagram

Every partially ordered set (poset) can be shown on a Hasse diagram. Put a symbol for each object (here, a pre-factor). If F ⊏ G then the symbol for F is lower in the diagram than the symbol for G, and is joined to it by lines that are traversed upwards.

  • P

B 4 5 Show the numbers of levels. If we have three rows (R) and eight columns (C) with no nesting then we get

  • 3

R

  • 8

C

7/43

slide-27
SLIDE 27

Combining two factors or pre-factors

If A and B are two factors then their infimum A ∧ B is the factor whose levels are all combinations of levels of A and B that

  • ccur.

(A ∧ B)(ω) = (A(ω), B(ω))

8/43

slide-28
SLIDE 28

Combining two factors or pre-factors

If A and B are two factors then their infimum A ∧ B is the factor whose levels are all combinations of levels of A and B that

  • ccur.

(A ∧ B)(ω) = (A(ω), B(ω)) Other notations: A.B or A : B.

8/43

slide-29
SLIDE 29

Crossing and nesting

Operation Formula Poset crossing (3 R) ∗ (8 C)

  • 3

R

  • 8

C

9/43

slide-30
SLIDE 30

Crossing and nesting

Operation Formula Poset crossing (3 R) ∗ (8 C)

  • 3

R

  • 8

C Experimental units {1, 2, 3} × {1, 2, 3, 4, 5, 6, 7, 8} Factors: U with one level R with 3 levels (1st coordinate) C with 8 levels (2nd coordinate) R ∧ C with 24 levels

9/43

slide-31
SLIDE 31

Crossing and nesting

Operation Formula Poset crossing (3 R) ∗ (8 C)

  • 3

R

  • 8

C Experimental units {1, 2, 3} × {1, 2, 3, 4, 5, 6, 7, 8} Factors: U with one level R with 3 levels (1st coordinate) C with 8 levels (2nd coordinate) R ∧ C with 24 levels Operation Formula Poset nesting (5 B)/(4 P)

  • P

B 4 5

9/43

slide-32
SLIDE 32

Crossing and nesting

Operation Formula Poset crossing (3 R) ∗ (8 C)

  • 3

R

  • 8

C Experimental units {1, 2, 3} × {1, 2, 3, 4, 5, 6, 7, 8} Factors: U with one level R with 3 levels (1st coordinate) C with 8 levels (2nd coordinate) R ∧ C with 24 levels Operation Formula Poset nesting (5 B)/(4 P)

  • P

B 4 5 Experimental units {1, 2, 3, 4, 5} × {1, 2, 3, 4} Factors: U with one level B with 5 levels (1st coordinate) B ∧ P with 20 levels

9/43

slide-33
SLIDE 33

From crossing and nesting to simple orthogonal block structures

The key ingredient of John Nelder’s 1965 paper on ‘Block structure and the null analysis of variance’ was to realise that crossing and nesting could be iterated (maybe with some steps of each sort). He developed an almost-complete theory, notation and algorithms based on this. He called the resulting sets of experimental units with their factor lists simple orthogonal block structures.

10/43

slide-34
SLIDE 34

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B.

11/43

slide-35
SLIDE 35

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B.

11/43

slide-36
SLIDE 36

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B. Refinement is another partial order, because

◮ F F for all factors F;

11/43

slide-37
SLIDE 37

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B. Refinement is another partial order, because

◮ F F for all factors F; ◮ if F G and G F then F = G;

11/43

slide-38
SLIDE 38

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B. Refinement is another partial order, because

◮ F F for all factors F; ◮ if F G and G F then F = G; ◮ if F G and G H then F H.

11/43

slide-39
SLIDE 39

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B. Refinement is another partial order, because

◮ F F for all factors F; ◮ if F G and G F then F = G; ◮ if F G and G H then F H.

11/43

slide-40
SLIDE 40

Factors and refinement

If B and Q are factors on the same set, write Q ≺ B to indicate that Q is finer than B. Write Q B to mean that either Q ≺ B or Q = B. Refinement is another partial order, because

◮ F F for all factors F; ◮ if F G and G F then F = G; ◮ if F G and G H then F H.

(For simplicity here, I am ignoring the possibility of aliasing.) So we can show factors on a Hasse diagram too!

11/43

slide-41
SLIDE 41

Crossing

Hasse diagram for pre-factors:

  • 3

R

  • 8

C Experimental units {1, 2, 3} × {1, 2, 3, 4, 5, 6, 7, 8} Factors: U with one level R with 3 levels (1st coordinate) C with 8 levels (2nd coordinate) R ∧ C with 24 levels

12/43

slide-42
SLIDE 42

Crossing

Hasse diagram for pre-factors:

  • 3

R

  • 8

C Experimental units {1, 2, 3} × {1, 2, 3, 4, 5, 6, 7, 8} Factors: U with one level R with 3 levels (1st coordinate) C with 8 levels (2nd coordinate) R ∧ C with 24 levels Hasse diagram for factors:

✈ R ∧ C

24

✈ R

3

C 8

✈ U

1

❅ ❅ ❅

❅ ❅ ❅

12/43

slide-43
SLIDE 43

Nesting

Hasse diagram for pre-factors:

  • P

B 4 5 Experimental units {1, 2, 3, 4, 5} × {1, 2, 3, 4} Factors: U with one level B with 5 levels (1st coordinate) B ∧ P with 20 levels

13/43

slide-44
SLIDE 44

Nesting

Hasse diagram for pre-factors:

  • P

B 4 5 Experimental units {1, 2, 3, 4, 5} × {1, 2, 3, 4} Factors: U with one level B with 5 levels (1st coordinate) B ∧ P with 20 levels Hasse diagram for factors:

✈ B ∧ P

20

✈ B

5

✈ U

1

13/43

slide-45
SLIDE 45

Iteration: (20 Athletes ∗ ((2 Sessions)/(4 Runs))

  • A

20 R 4 S 2

14/43

slide-46
SLIDE 46

Iteration: (20 Athletes ∗ ((2 Sessions)/(4 Runs))

  • A

20 R 4 S 2

✈ A ∧ S ∧ R

160

A ∧ S 40

✈ S ∧ R

8

A 20

✈ S

2

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • 14/43
slide-47
SLIDE 47

Start with the first poset

Terry Speed and I found that you can start with the nesting poset and use it to directly construct the set Ω of experimental units and its factors. Given pre-factors P1, . . . , Pm with n1, . . . , nm levels, and a nesting relation ⊏: Ω = Ω1 × Ω2 × · · · × Ωm where Ωi = {1, 2, . . . , ni}. If A is any subset of {1, 2, . . . , m} satisfying if i ∈ A and Pi ⊏ Pj then j ∈ A then include the factor

  • i∈A

Pi.

15/43

slide-48
SLIDE 48

Poset block structures

These poset block structures have all John Nelder’s properties, even when the first poset cannot be made by iterated crossing and nesting.

16/43

slide-49
SLIDE 49

Poset block structures

These poset block structures have all John Nelder’s properties, even when the first poset cannot be made by iterated crossing and nesting.

  • Weeks

4 Labs 2

  • Samples

10 Technicians 6

  • 16/43
slide-50
SLIDE 50

Poset block structures

These poset block structures have all John Nelder’s properties, even when the first poset cannot be made by iterated crossing and nesting.

  • Weeks

4 Labs 2

  • Samples

10 Technicians 6

  • ✈ W ∧ L ∧ S ∧ T

480

W ∧ L ∧ S 80

✈ W ∧ L ∧ T

48

W ∧ L 8

✈ L ∧ T

12

W 4

✈ L

2

✈ U

1

❅ ❅ ❅ ❅

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • 16/43
slide-51
SLIDE 51

Too successful

John Nelder’s theory of simple orthogonal block structures, and the ensuing algorithms developed with Graham Wilkinson, have been enormously successful, but perhaps too much so.

17/43

slide-52
SLIDE 52

Too successful

John Nelder’s theory of simple orthogonal block structures, and the ensuing algorithms developed with Graham Wilkinson, have been enormously successful, but perhaps too much so. B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 As factors, B ∧ P = Q = B ∧ Q, but does your software think so?

17/43

slide-53
SLIDE 53

Too successful

John Nelder’s theory of simple orthogonal block structures, and the ensuing algorithms developed with Graham Wilkinson, have been enormously successful, but perhaps too much so. B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 As factors, B ∧ P = Q = B ∧ Q, but does your software think so? Some software cannot detect that Q ≺ B, because B is not in the name of Q.

17/43

slide-54
SLIDE 54

Too successful

John Nelder’s theory of simple orthogonal block structures, and the ensuing algorithms developed with Graham Wilkinson, have been enormously successful, but perhaps too much so. B 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 P 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Q 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 As factors, B ∧ P = Q = B ∧ Q, but does your software think so? Some software cannot detect that Q ≺ B, because B is not in the name of Q. Some software thinks that B ∧ Q has 100 levels, and tries to make 100 × 100 matrices to deal with this.

17/43

slide-55
SLIDE 55

Other orthogonal block structures

There are still other collections of mutually orthogonal factors which obey most of the theory but do not come from pre-factors.

18/43

slide-56
SLIDE 56

Other orthogonal block structures

There are still other collections of mutually orthogonal factors which obey most of the theory but do not come from pre-factors. For example, the Rows (R), Columns (C) and Letters (L) of a 7 × 7 Latin square give the following.

✈ R ∧ C = R ∧ L = C ∧ L

49

R 7

✈ L

7

✈ C

7

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • 18/43
slide-57
SLIDE 57

Combining two factors: II

If A and B are factors then their infimum A ∧ B satisfies:

◮ A ∧ B is finer than A, and A ∧ B is finer than B; ◮ if any other factor is finer than A and finer than B

then it is finer than A ∧ B.

19/43

slide-58
SLIDE 58

Combining two factors: II

If A and B are factors then their infimum A ∧ B satisfies:

◮ A ∧ B is finer than A, and A ∧ B is finer than B; ◮ if any other factor is finer than A and finer than B

then it is finer than A ∧ B. The supremum A ∨ B of factors A and B is defined to satisfy:

◮ A is finer than A ∨ B, and B is finer than A ∨ B; ◮ if there is any other factor C

with A finer than C and B finer than C, then A ∨ B is finer than C. Each level of factor A ∨ B combines levels of A and also combines levels of B, and has replication as small as possible subject to this.

19/43

slide-59
SLIDE 59

Combining two factors: II

If A and B are factors then their infimum A ∧ B satisfies:

◮ A ∧ B is finer than A, and A ∧ B is finer than B; ◮ if any other factor is finer than A and finer than B

then it is finer than A ∧ B. The supremum A ∨ B of factors A and B is defined to satisfy:

◮ A is finer than A ∨ B, and B is finer than A ∨ B; ◮ if there is any other factor C

with A finer than C and B finer than C, then A ∨ B is finer than C. Each level of factor A ∨ B combines levels of A and also combines levels of B, and has replication as small as possible subject to this. I claim that the supremum is even more important than the infimum in designed experiments and data analysis.

19/43

slide-60
SLIDE 60

Factorial treatments plus control

Chemical Z N S K M

  • Dose

1

  • 2
  • Dose ∨ Chemical = Fumigant,

which is the two-level factor distinguishing zero treatment from the rest.

20/43

slide-61
SLIDE 61

Factorial treatments plus control

Chemical Z N S K M

  • Dose

1

  • 2
  • Dose ∨ Chemical = Fumigant,

which is the two-level factor distinguishing zero treatment from the rest. If you do not fit Fumigant, its effect will be included in whichever of Dose and Chemical you fit first.

20/43

slide-62
SLIDE 62

Hasse diagram including supremum

Chemical Z N S K M

  • Dose

1

  • 2
  • ✈ D ∧ C

9

✈ C

5

D 3

✈ F

2

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅

  • 21/43
slide-63
SLIDE 63

Hasse diagram including supremum

Chemical Z N S K M

  • Dose

1

  • 2
  • ✈ D ∧ C

9

✈ C

5

D 3

✈ F

2

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅

  • With F included, all the usual nice results apply.

21/43

slide-64
SLIDE 64

Hasse diagram including supremum

Chemical Z N S K M

  • Dose

1

  • 2
  • ✈ D ∧ C

9

✈ C

5

D 3

✈ F

2

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅

  • With F included, all the usual nice results apply.

Heiko Großmann’s software includes suprema (as well as checking which factors are finer than which others).

21/43

slide-65
SLIDE 65

Hasse diagram including supremum

Chemical Z N S K M

  • Dose

1

  • 2
  • ✈ D ∧ C

9

✈ C

5

D 3

✈ F

2

✈ U

1

❅ ❅ ❅ ❅ ❅ ❅ ❅

  • With F included, all the usual nice results apply.

Heiko Großmann’s software includes suprema (as well as checking which factors are finer than which others). Does yours?

21/43

slide-66
SLIDE 66

Linear model for two factors

Given two treatment factors A and B, the linear model for response Yω on unit ω is often written as follows. If A(ω) = i and B(ω) = j then Yω = µ + αi + βj + γij + εω, where the εω are random variables with zero means and a covariance matrix whose eigenspaces we know.

22/43

slide-67
SLIDE 67

Linear model for two factors

Given two treatment factors A and B, the linear model for response Yω on unit ω is often written as follows. If A(ω) = i and B(ω) = j then Yω = µ + αi + βj + γij + εω, where the εω are random variables with zero means and a covariance matrix whose eigenspaces we know. Some authors: “Too many parameters! Let’s impose constraints.” (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

22/43

slide-68
SLIDE 68

Linear model with constraints: bad consequences

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

23/43

slide-69
SLIDE 69

Linear model with constraints: bad consequences

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

◮ It is too easy to give all parameters the same status,

and then the conclusions “βj = 0 for all j” and “γij = 0 for all i and j” are comparable.

23/43

slide-70
SLIDE 70

Linear model with constraints: bad consequences

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

◮ It is too easy to give all parameters the same status,

and then the conclusions “βj = 0 for all j” and “γij = 0 for all i and j” are comparable.

◮ If some parameters are, after testing, deemed to be zero,

the estimated values of the others may not give the vector

  • f fitted values. For example, if both main effects and

interaction are deemed to be zero, then ˆ µ under constraint (a) is not the fitted overall mean if replications are unequal.

23/43

slide-71
SLIDE 71

Linear model with constraints: bad consequences

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

◮ It is too easy to give all parameters the same status,

and then the conclusions “βj = 0 for all j” and “γij = 0 for all i and j” are comparable.

◮ If some parameters are, after testing, deemed to be zero,

the estimated values of the others may not give the vector

  • f fitted values. For example, if both main effects and

interaction are deemed to be zero, then ˆ µ under constraint (a) is not the fitted overall mean if replications are unequal.

23/43

slide-72
SLIDE 72

Linear model with constraints: bad consequences

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

◮ It is too easy to give all parameters the same status,

and then the conclusions “βj = 0 for all j” and “γij = 0 for all i and j” are comparable.

◮ If some parameters are, after testing, deemed to be zero,

the estimated values of the others may not give the vector

  • f fitted values. For example, if both main effects and

interaction are deemed to be zero, then ˆ µ under constraint (a) is not the fitted overall mean if replications are unequal. Popular software allows both of these.

23/43

slide-73
SLIDE 73

Say goodbye to linear models with constraints

Yω = µ + αi + βj + γij + εω (a) ∑

i

αi = 0, and so on (b) ∑

i

riαi = 0, where ri = |{ω : A(ω) = i}|, and so on.

✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧✧ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜ ❜

24/43

slide-74
SLIDE 74

JAN’s approach to such linear models

Yω = µ + αi + βj + γij + εω John Nelder had a rant about the constraints on parameters in his 1977 paper ‘A reformulation of linear models’ and various later papers too.

25/43

slide-75
SLIDE 75

JAN’s approach to such linear models

Yω = µ + αi + βj + γij + εω John Nelder had a rant about the constraints on parameters in his 1977 paper ‘A reformulation of linear models’ and various later papers too. Essentially he said:

◮ if γij = 0 for all i and j then the model simplifies to

Yω = µ + αi + βj + εω so that the expectation of Y lies in a subspace of dimension at most n + m − 1, where n and m are the numbers of levels

  • f A and B;

25/43

slide-76
SLIDE 76

JAN’s approach to such linear models

Yω = µ + αi + βj + γij + εω John Nelder had a rant about the constraints on parameters in his 1977 paper ‘A reformulation of linear models’ and various later papers too. Essentially he said:

◮ if γij = 0 for all i and j then the model simplifies to

Yω = µ + αi + βj + εω so that the expectation of Y lies in a subspace of dimension at most n + m − 1, where n and m are the numbers of levels

  • f A and B;

◮ if βj = 0 for all j then the model does not simplify at all.

25/43

slide-77
SLIDE 77

JAN’s approach to such linear models

Yω = µ + αi + βj + γij + εω John Nelder had a rant about the constraints on parameters in his 1977 paper ‘A reformulation of linear models’ and various later papers too. Essentially he said:

◮ if γij = 0 for all i and j then the model simplifies to

Yω = µ + αi + βj + εω so that the expectation of Y lies in a subspace of dimension at most n + m − 1, where n and m are the numbers of levels

  • f A and B;

◮ if βj = 0 for all j then the model does not simplify at all.

25/43

slide-78
SLIDE 78

JAN’s approach to such linear models

Yω = µ + αi + βj + γij + εω John Nelder had a rant about the constraints on parameters in his 1977 paper ‘A reformulation of linear models’ and various later papers too. Essentially he said:

◮ if γij = 0 for all i and j then the model simplifies to

Yω = µ + αi + βj + εω so that the expectation of Y lies in a subspace of dimension at most n + m − 1, where n and m are the numbers of levels

  • f A and B;

◮ if βj = 0 for all j then the model does not simplify at all.

(I read this in one of his papers, but could not find it again when preparing these slides.)

25/43

slide-79
SLIDE 79

RAB’s approach to such linear models

Yω = µ + αi + βj + γij + εω This equation is a short-hand for saying that there are FIVE subspaces which we might suppose to contain the vector E(Y).

26/43

slide-80
SLIDE 80

RAB’s approach to such linear models

Yω = µ + αi + βj + γij + εω This equation is a short-hand for saying that there are FIVE subspaces which we might suppose to contain the vector E(Y). Let us parametrize these subspaces separately, and consider the relationships between them.

26/43

slide-81
SLIDE 81

RAB’s approach to such linear models

Yω = µ + αi + βj + γij + εω This equation is a short-hand for saying that there are FIVE subspaces which we might suppose to contain the vector E(Y). Let us parametrize these subspaces separately, and consider the relationships between them. This is the approach which I always use in teaching and in consulting, and in my 2008 book.

26/43

slide-82
SLIDE 82

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i.

27/43

slide-83
SLIDE 83

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i. dim(VA) = number of levels of A.

27/43

slide-84
SLIDE 84

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i. dim(VA) = number of levels of A. E(Y) ∈ VB ⇐ ⇒ there are constants βj such that E(Yω) = βj whenever B(ω) = j.

27/43

slide-85
SLIDE 85

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i. dim(VA) = number of levels of A. E(Y) ∈ VB ⇐ ⇒ there are constants βj such that E(Yω) = βj whenever B(ω) = j. E(Y) ∈ VU ⇐ ⇒ there is a constant µ such that E(Yω) = µ for all ω.

27/43

slide-86
SLIDE 86

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i. dim(VA) = number of levels of A. E(Y) ∈ VB ⇐ ⇒ there are constants βj such that E(Yω) = βj whenever B(ω) = j. E(Y) ∈ VU ⇐ ⇒ there is a constant µ such that E(Yω) = µ for all ω. E(Y) ∈ VA + VB ⇐ ⇒ there are constants θi and φj such that E(Yω) = θi + φj if A(ω) = i and B(ω) = j.

27/43

slide-87
SLIDE 87

Expectation subspaces

E(Y) ∈ VA ⇐ ⇒ there are constants αi such that E(Yω) = αi whenever A(ω) = i. dim(VA) = number of levels of A. E(Y) ∈ VB ⇐ ⇒ there are constants βj such that E(Yω) = βj whenever B(ω) = j. E(Y) ∈ VU ⇐ ⇒ there is a constant µ such that E(Yω) = µ for all ω. E(Y) ∈ VA + VB ⇐ ⇒ there are constants θi and φj such that E(Yω) = θi + φj if A(ω) = i and B(ω) = j. E(Y) ∈ VA∧B ⇐ ⇒ there are constants γij such that E(Yω) = γij if A(ω) = i and B(ω) = j.

27/43

slide-88
SLIDE 88

Dimensions

For general factors A and B: dim(VA + VB) = dim(VA) + dim(VB) − dim(VA ∩ VB).

28/43

slide-89
SLIDE 89

Dimensions

For general factors A and B: dim(VA + VB) = dim(VA) + dim(VB) − dim(VA ∩ VB). If all combinations of levels of A and B occur, then VA ∩ VB = VU, which has dimension 1,

28/43

slide-90
SLIDE 90

Dimensions

For general factors A and B: dim(VA + VB) = dim(VA) + dim(VB) − dim(VA ∩ VB). If all combinations of levels of A and B occur, then VA ∩ VB = VU, which has dimension 1, so dim(VA + VB) = dim(VA) + dim(VB) − 1.

28/43

slide-91
SLIDE 91

Another partial order; another Hasse diagram

The relation “is contained in” gives a partial order on subspaces of a vector space. So we can use a Hasse diagram to show the subspaces being considered to model the expectation of Y. Now it is helpful to show the dimension of each subspace on the diagram.

29/43

slide-92
SLIDE 92

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

30/43

slide-93
SLIDE 93

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

null model

30/43

slide-94
SLIDE 94

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

null model

  • nly factor B makes any difference

30/43

slide-95
SLIDE 95

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

null model

  • nly factor B makes any difference

additive model

30/43

slide-96
SLIDE 96

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

null model

  • nly factor B makes any difference

additive model full model

30/43

slide-97
SLIDE 97

Hasse diagram for model subspaces

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

null model

  • nly factor B makes any difference

additive model full model For complicated families of models, non-mathematicians may find the Hasse diagram easier to understand than the equations.

30/43

slide-98
SLIDE 98

Diagram from a paper in Global Change Biology

Composition × Temp. (45) Composition + Richness + Temp. + Type × Temp. (29) Composition + Type × Temp. (23) Richness × Temp. + Type (15) Richness × Temp. (12) Rich + Type + Temp. (9) Richness + Temp. (6) Type (4) Constant (1) Composition + Richness × Temp. (23) Composition + Temp. (17) Composition (15) Richness + Type (7) Richness (4) Richness × Temp. + Type × Temp. (21) Type × Temp. + Richness (15) Type × Temp. (12) Type + Temp. (6) Temp. (3)

a ¡ c ¡ b ¡ c ¡ d ¡ b ¡ d ¡ g ¡ c ¡ f ¡ e ¡ g ¡ h ¡ f ¡ d ¡ e ¡ b ¡ f ¡ g ¡ e ¡ e ¡ f ¡ d ¡ b ¡ b ¡ c ¡ d ¡ c ¡ g ¡ g ¡ e ¡ f ¡

31/43

slide-99
SLIDE 99

Main effects and interaction

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

32/43

slide-100
SLIDE 100

Main effects and interaction

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

The vector of fitted values in VU has the grand mean in every coordinate.

32/43

slide-101
SLIDE 101

Main effects and interaction

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

The vector of fitted values in VU has the grand mean in every coordinate. The main effect of factor B is the difference between the vector of fitted values in VB and the vector of fitted values in VU.

32/43

slide-102
SLIDE 102

Main effects and interaction

VU 1

  • VA

n VB m VA + VB n + m − 1 VA∧B nm

❅ ❅ ❅

❅ ❅ ❅

The vector of fitted values in VU has the grand mean in every coordinate. The main effect of factor B is the difference between the vector of fitted values in VB and the vector of fitted values in VU. The interaction between factors A and B is the difference between the vector of fitted values in VA∧B and the vector of fitted values in VA + VB.

32/43

slide-103
SLIDE 103

Example with two treatment factors: feeding chickens

Four diets for feeding newly-hatched chickens were

  • compared. The diets

consisted of all levels of Protein (groundnuts or soya bean) with two levels of Fishmeal (added or not). Each diet was fed to two chickens, and they were weighed at the end of six weeks.

33/43

slide-104
SLIDE 104

Example with two treatment factors: feeding chickens

Four diets for feeding newly-hatched chickens were

  • compared. The diets

consisted of all levels of Protein (groundnuts or soya bean) with two levels of Fishmeal (added or not). Each diet was fed to two chickens, and they were weighed at the end of six weeks. VU 1

  • VProtein

2 VFishmeal 2 VP + VF 3 VP∧F 4

❅ ❅ ❅

❅ ❅ ❅

33/43

slide-105
SLIDE 105

Chicken example: anova

Source SS df MS VR Protein 4704.5 1 4704.50 35.57 Fishmeal 3120.5 1 3120.50 23.60 Protein ∧ Fishmeal 128.0 1 128.00 0.97 residual 529.0 4 132.25

34/43

slide-106
SLIDE 106

Chicken example: anova

Source SS df MS VR Protein 4704.5 1 4704.50 35.57 Fishmeal 3120.5 1 3120.50 23.60 Protein ∧ Fishmeal 128.0 1 128.00 0.97 residual 529.0 4 132.25 You know how to interpret the anova table: do the scientists who did the experiment know how to?

34/43

slide-107
SLIDE 107

Scaling the Hasse diagram of expectation subspaces

Suppose that V1 and V2 are expectation subspaces, with V1 < V2, and an edge joining V1 to V2.

35/43

slide-108
SLIDE 108

Scaling the Hasse diagram of expectation subspaces

Suppose that V1 and V2 are expectation subspaces, with V1 < V2, and an edge joining V1 to V2. The mean square for the extra fit in V2 compared to the fit in V1 is SS(fitted values in V2) − SS(fitted values in V1) dim(V2) − dim(V1) .

35/43

slide-109
SLIDE 109

Scaling the Hasse diagram of expectation subspaces

Suppose that V1 and V2 are expectation subspaces, with V1 < V2, and an edge joining V1 to V2. The mean square for the extra fit in V2 compared to the fit in V1 is SS(fitted values in V2) − SS(fitted values in V1) dim(V2) − dim(V1) . Scale the Hasse diagram so that each edge has length proportional to the relevant mean square, and show the residual mean square to give a scale.

35/43

slide-110
SLIDE 110

Chickens: scaled Hasse diagram of expectation subspaces

  • VU
  • VFishmeal
  • VProtein

VP + VF VP∧F

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • residual mean square

36/43

slide-111
SLIDE 111

Chickens: scaled Hasse diagram of expectation subspaces

  • VU
  • VFishmeal
  • VProtein

VP + VF VP∧F

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • residual mean square

There is no evidence of any interaction, so we can simplify to the additive model.

36/43

slide-112
SLIDE 112

Chickens: scaled Hasse diagram of expectation subspaces

  • VU
  • VFishmeal
  • VProtein

VP + VF VP∧F

❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅ ❅

  • residual mean square

There is no evidence of any interaction, so we can simplify to the additive model. Neither main effect is zero, so we cannot simplify further.

36/43

slide-113
SLIDE 113

Example: an experiment about protecting metal

An experiment was conducted to compare two protective dyes for metal, both with each other and with no dye. Ten braided metal cords were broken into three pieces. The three pieces of each cord were randomly allocated to the three treatments. After the dyes had been applied, the cords were left to weather for a fixed time, then their strengths were measured, and recorded as a percentage of the nominal strength specification.

37/43

slide-114
SLIDE 114

Example: an experiment about protecting metal

An experiment was conducted to compare two protective dyes for metal, both with each other and with no dye. Ten braided metal cords were broken into three pieces. The three pieces of each cord were randomly allocated to the three treatments. After the dyes had been applied, the cords were left to weather for a fixed time, then their strengths were measured, and recorded as a percentage of the nominal strength specification. Factors: Dye, with three levels (no dye, dye A, Dye B); Cords, with ten levels; U, with one level; E, with 30 levels.

37/43

slide-115
SLIDE 115

Cords: Hasse diagram of expectation subspaces

  • Vcords

10 Vcords + Vdyes 12 We assume that there are differences between cords, so all the models that we consider include Vcords.

38/43

slide-116
SLIDE 116

Cords: Hasse diagram of expectation subspaces

  • Vcords

10 Vcords + Vdyes 12 Vcords + VT 11 We assume that there are differences between cords, so all the models that we consider include Vcords. There is another factor T (To-dye-or-not-to-dye). It has one level on ‘no dye’ and another level on both real dyes.

38/43

slide-117
SLIDE 117

Cords: Scaled Hasse diagram of expectation subspaces

  • Vcords

10 Vcords + Vdyes 12 Vcords + VT 11 residual mean square

39/43

slide-118
SLIDE 118

Cords: Scaled Hasse diagram of expectation subspaces

  • Vcords

10 Vcords + Vdyes 12 Vcords + VT 11 residual mean square There is no evidence of a difference between dye A and dye B; but there is definitely a difference between no dye and real dyes.

39/43

slide-119
SLIDE 119

Using scaled Hasse diagrams

I have found that non-mathematicians find scaled Hasse diagrams easier to interpret than anova tables, especially for complicated families of models.

40/43

slide-120
SLIDE 120

Using scaled Hasse diagrams

I have found that non-mathematicians find scaled Hasse diagrams easier to interpret than anova tables, especially for complicated families of models. These diagrams can be extended to deal with non-orthogonal models,

40/43

slide-121
SLIDE 121

Using scaled Hasse diagrams

I have found that non-mathematicians find scaled Hasse diagrams easier to interpret than anova tables, especially for complicated families of models. These diagrams can be extended to deal with non-orthogonal models, and with situations with more than one residual mean square (use different colours for the corresponding edges).

40/43

slide-122
SLIDE 122

References I

◮ J. A. Nelder: The analysis of randomized experiments with

  • rthogonal block structure. I. Block structure and the null

analysis of variance. Proceedings of the Royal Society of London, Series A 282 (1965), 147–162.

◮ T. P. Speed & R. A. Bailey: On a class of association

schemes derived from lattices of equivalence relations. In Algebraic Structures and Applications (eds. P. Schultz,

  • C. E. Praeger & R. P. Sulllivan), Marcel Dekker, New York,

1982, pp. 55–74.

◮ H. Großmann: Automating the analysis of variance of

  • rthogonal designs. Computational Statistics and Data

Analysis 70 (2014), 1–18.

41/43

slide-123
SLIDE 123

References II

◮ J. A. Nelder: A reformulation of linear models (with

discussion). Journal of the Royal Statistical Society, Series A 140 (1977), 48–77.

◮ J. A. Nelder: The statistics of linear models: back to basics.

Statistics and Computing 4 (1994), 221–234.

◮ J. A. Nelder & P. W. Lane: The computer analysis of

factorial experiments: In memorian—Frank Yates. The American Statistician 49 (1995), 382–385.

◮ J. A. Nelder: The great mixed-model muddle is alive and

flourishing, alas! Food and Quality Preference 9 (1998), 157–159.

◮ R. A. Bailey: Design of Comparative Experiments.

Cambridge University Press, Cambridge, 2008.

42/43

slide-124
SLIDE 124

References III

◮ D. M. Perkins, R. A. Bailey, M. Dossena, L. Gamfeldt,

  • J. Reiss, M. Trimmer & G. Woodward:

Higher biodiversity is required to sustain multiple ecosystem processes across temperature regimes. Global Change Biology 21 (2015), 396–406.

◮ R. A. Bailey & J. Reiss: Design and analysis of experiments

testing for biodiversity effects in ecology. Journal of Statistical Planning and Inference 144 (2014), 69–80.

◮ K. J. Carpenter & J. Duckworth: Economies in the use of

animal by-products in poultry rations. I. Vitamin and amino-acid provision for starting and growing chicks. Journal of Agricultural Science 41 (1941), 297–308.

◮ M. Crowder & A. Kimber: A score test for the multivariate

Burr and other Weibull mixture distributions. Scandinavian Journal of Statistics 24 (1997), 419–432.

43/43