Advanced Mathematical Methods Part II Statistics GLM Analysis of - - PowerPoint PPT Presentation

advanced mathematical methods
SMART_READER_LITE
LIVE PREVIEW

Advanced Mathematical Methods Part II Statistics GLM Analysis of - - PowerPoint PPT Presentation

Advanced Mathematical Methods Part II Statistics GLM Analysis of Variance Designs Mel Slater http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/ 1 Outline Factorial Design One Way Analysis of Variance Two Way


slide-1
SLIDE 1

1

Advanced Mathematical Methods

Part II – Statistics GLM – Analysis of Variance Designs

Mel Slater

http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/

slide-2
SLIDE 2

2

Outline

Factorial Design One Way Analysis of Variance Two Way Analysis of Variance Two Way Analysis of Variance with

replications

Mixed Models – Analysis of

Covariance

slide-3
SLIDE 3

3

Factorial Design

In regression analysis we assume that the

‘independent variables’ are numerical variables.

Often this is not the case –

  • for example, in a virtual reality experiment a possible

impact on the response may be ‘status’ of the person (undergraduate student, PhD student, staff, administrator, etc)

Such ‘qualitative’ variables are called ‘factors’. An experiment that is designed where the

independent variables are ‘factors’ is called a Factorial Design.

These are often called Analysis of Variance

(ANOVA) models.

slide-4
SLIDE 4

4

One Way ANOVA

This is the simplest design There is one factor, and the factor

has k levels.

  • Gender, k = 2 (M,F)
  • Education, k=5 (None,‘A level’,

BSc/BA, Masters, PhD)

  • Anxiety level, k=3 (low, moderate, high)
  • etc
slide-5
SLIDE 5

5

One Way ANOVA Model

The way to express this situation is

  • E(yij) = µ +αi

– i = 1,2,…,k (number of levels of the factor) – j = 1,2,…,n (number of observations at each level)

  • µ is the ‘grand mean’
  • αI is the effect of being at the ith level
  • yij ~ independent normal distribution r.v.s with

constant variance σ2.

slide-6
SLIDE 6

6

1 Way ANOVA Analysis

We can therefore use the GLM to

find the LS estimates

We can also construct an analysis of

variance table

y y y

i i

− = =

*

* α µ

slide-7
SLIDE 7

7

1-Way ANOVA and GLM

This can easily be reformulated as a

special case of the general linear model, with:

X is a matrix that consists entirely of 0s

and 1s

(What is it?) Note that the X matrix is not of full rank

and therefore (XTX) is singular.

An additional constraint must be put on

the αi

  • Their sum = 0, OR
  • α1 = 0 (GLIM convention)

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ =

k k

α α α α α α µ β ... ... ... ...

2 2 1 1

slide-8
SLIDE 8

8

Example

This example refers to the ‘paranoia’

data.

We will use 1-Way ANOVA to look

at the influence a factor on the response variable ‘vrtotal’

  • Gender

vrtotal is the ‘total’ paranoia

experienced by subjects in the VR.

slide-9
SLIDE 9

9

MATLAB

Make the variables by extracting

from the relevant columns of the spreadsheet (see answers to exercises 3)

vrtotal = s.data(:,14); sex = s.data(:,3);

slide-10
SLIDE 10

10

Influence of Gender

H0: no difference between mean paranoia of

males and females

[P,anovatab] = anova1(vrtotal,sex,'on')

  • vrtotal is the response
  • ‘sex’ is the gender factor
  • ‘on’ means that we want a graphical display
  • P is the resulting significance level of the fit under H0
  • anovatab is the corresponding Analysis of Variance

table that is output

slide-11
SLIDE 11

11

Influence of Gender

23 384.96 Total 17.50 22 384.9 Error 0.9615 0.0024 0.0417 1 0.0417 Groups Prob>F F MS Df SS Source

In this case we clearly do not reject the null hypothesis. We conclude that gender has no influence on the response.

slide-12
SLIDE 12

12

Using GLIM

$factor sex 2 !a factor with 2 levels $fit $ !the deviance gives the total SS and

d.f.

$fit sex !the deviance gives the residual SS

and d.f.

The rest can be computed from these two

  • Fitted SS = total SS – residual SS
  • etc
slide-13
SLIDE 13

13

GLIM Note on Factor Levels

Factor levels must start from 1 not 0 Hence in GLIM for this data we

would have to do:

$cal sex = sex+1 (since sex is coded

as 0,1)

slide-14
SLIDE 14

14

Two Way ANOVA

Of course just examining one factor

is restrictive

There may be several factors We will consider 2 factors

influencing the response

slide-15
SLIDE 15

15

2 Way ANOVA Model

The way to express this situation is

  • E(yij) = µ +αi +βj

– i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2)

  • µ is the ‘grand mean’
  • αI is the effect of being at the ith level of factor 1
  • βj is the effect of being at the jth level of factor 2
  • yij ~ independent normal distribution r.v.s with

constant variance σ2.

slide-16
SLIDE 16

16

2-Way ANOVA and GLM

Again, this is a special case of GLM LS estimators are: Similar ANOVA table can be constructed

y y y y y

j j i i

− = − = = β α µ

*

*

Mean of ith row – grand mean Mean of jth column – grand mean

slide-17
SLIDE 17

17

Example – Fear of Public Speaking in Virtual Reality

Two Factors

  • Immersion – 2 levels Desktop or Head

Mounted Display (HMD)

  • Group – 3 levels – had a neutral, positive or

negative virtual audience.

Objective – to see how anxiety varies with

Group, Immersion, and prior tendency to fear of public speaking (FOPS)

Response variables – various measures

  • f anxiety and comfort.
slide-18
SLIDE 18

18

2-Way ANOVA Example

Response variable ‘interested’

  • This is the person’s personal self

assessment of the ‘interest’ of the virtual audience

– “How interested was the audience in what you had to say?” scored on a 1-7 scale with 1= not at all, 7=very much

  • We take the average of the interest

scores for each person in each cell of the factorial table….

slide-19
SLIDE 19

19

Factorial Table for ‘Interested’

3.6 6.3 1.1 3.4 Grand Total 4.0 6.7 1.2 4.2 2 HMD 3.2 6.0 1.0 2.7 1 Desktop Grand Total 3 2 1 Immersion

Negative Positive Group: Neutral

Average of interested

Each entry is the average ‘interested’ score for the 6 people in that group

slide-20
SLIDE 20

20

ANOVA Using MATLAB

y =[2.7 1.0

6.0; 4.2 1.2 6.7];

anova2(y,1,'on')

  • y is the response
  • 1 means the number of observations

(replications) in each cell

  • ‘on’ means a graphical display is output
slide-21
SLIDE 21

21

2-Way ANOVA Table

5 29.0533 Total 0.2150 2 0.4300 Error 0.1689 4.4651 0.9600 1 0.9600 Rows 0.0153 64.3333 13.8317 2 27.6633 Columns Prob>F F MS Df SS Source

H0a: All row means equal ----- not rejected at 5% level H0b: All column means equal ---- rejected at 5% level

slide-22
SLIDE 22

22

Using GLIM

$units 6 $data x $read 2.7 1.0 6.0 4.2 1.2 6.7 $cal row = %gl(2,3) $ $c this generates 2 levels with 3 copies at each level $cal col = %gl(3,1) $c this calculates the column factor levels $factor row 2 col 3 $fit $c the deviance and df are for the total SS $fit row+col $c the deviance and df are for the residual SS $fit –row $c the change in deviance and df are the row SS $fit row + col $c the full model again $fit –col $c the change in deviance and df are the col SS

$c from these the complete table can be constructed.

slide-23
SLIDE 23

23

Example FOPS Continued

It is throwing data away to average the response

variable within each cell.

Instead we can deal with the actual replications This enhances the theoretical model since then

we can include an ‘interaction’ term between row and column effects.

This interaction term models the non-additivity,

i.e., allows for the possibility that the row and column factor together produce a sum that is more than the parts.

slide-24
SLIDE 24

24

2-Way ANOVA with p replications per cell

The way to express this situation is

  • E(yijk) = µ +αi +βj +γij

– i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2) – k = 1,2,…,p (number of replications in each cell)

  • µ is the ‘grand mean’
  • αi is the effect of being at the ith level of factor 1
  • βj is the effect of being at the jth level of factor 2
  • γij is the interaction effect in the (i,j)th cell
  • yijk ~ independent normal distribution r.v.s with

constant variance σ2.

slide-25
SLIDE 25

25

Using MATLAB

The data must be put in the form of

the m*n table, but each cell must consist of a row of replications.

There will therefore be (mp) rows

and n columns.

[p,table] = anova2(y,6,'on'); will produce the table …

slide-26
SLIDE 26

26

ANOVA Table

35 29.0533 Total 1.1444 30 34.3333 Error 0.3184 1.1893 1.3611 2 2.7222 Interaction 0.0371 4.7573 5.4444 1 5.4444 Rows 0.0000 72.5485 83.0278 2 166.0556 Columns Prob>F F MS Df SS Source

The hypothesis that all column means (groups) are equal Would be rejected. The hypothesis that all row means (immersion) are equal Would be rejected. There is no interaction effect however.

slide-27
SLIDE 27

27

Using GLIM

We can read the data file for GLIM

directly without having to organise the data into the rows and columns

We read in the variables immersion,

group and interested

$factor immersion 2 group 3$ $c declares the factors

slide-28
SLIDE 28

28

GLIM

$echo $input 10 132 $echo File name? fops3006.txt $units 36 $data ID Immersion Group w Age sex Ethnic Language Occupation Games PRCS FNE SAD Comfortable pleased Audience People Computer aware impression friendly interested again selfrating somatic MPRCS $read Data goes here

slide-29
SLIDE 29

29

GLIM

$yvar interest$ $factor immersion 2 group 3$ $fit $c deviance = total SS $fit immersion*group $c change in deviance = residual SS $fit -immersion.group $c change in deviance = interaction SS $fit –immersion $c change in deviance = immersion(row) SS $fit +immersion $c model is immersion+group $fit –group $c change in deviance = group (col) SS

slide-30
SLIDE 30

30

GLIM Notation

Suppose there are two factors row,

col

$fit row.col is the interaction effect $fit row*col is the same as

  • $fit row+col + row.col
slide-31
SLIDE 31

31

Using GLIM for Means

$tab the interest mean for immersion;group$ GROUP 1 2 3 IMMERSIO 1 2.667 1.000 6.000 2 4.167 1.167 6.667 $tab the interest mean for immersion$ IMMERSIO 1 2 MEAN 3.222 4.000 $tab the interest mean for group$ GROUP 1 2 3 MEAN 3.417 1.083 6.333

slide-32
SLIDE 32

32

Mixed Models: Analysis of Covariance

With GLIM it is very easy to combine

factors and variables in the fit.

In our example we might want to

also include the potential impact of age on the results.

In general suppose that y is the

response and the factors are A,B and x is another variable.

slide-33
SLIDE 33

33

Mixed Models

$fit row*col+x !the same slope everywhere $fit row*col+row.x ! can be diff each row $fit row*col+col.x !can be diff each col $fit row*col*x !can be diff each cell

x y

ij j i ijk

. λ γ β α µ + + + + =

x y

i ij j i ijk

λ γ β α µ + + + + =

x y

j ij j i ijk

λ γ β α µ + + + + =

x y

ij ij j i ijk

λ γ β α µ + + + + =

slide-34
SLIDE 34

34

Note on Missing Data

Often during an experiment data is lost, or

damaged in some way.

Analysis can still be carried out Be careful because missing data is often

coded as some number such as ‘99999’

If you don’t watch for this then you can

completely invalidate an analysis.

slide-35
SLIDE 35

35

GLIM and Missing Data

In GLIM it is very easy to exclude a data record

that has missing data.

Make a vector w which has 1 for each unit of

data which does not contain missing values, and 0 otherwise.

w is the same length as the number of units –

eg, the number of subjects in an experiment.

$weight w$

  • will only include in the analysis those units (people)

that do not have missing data.

$weight $

  • will restore the default equivalent to all w=1