1
Advanced Mathematical Methods
Part II – Statistics GLM – Analysis of Variance Designs
Mel Slater
http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/
Advanced Mathematical Methods Part II Statistics GLM Analysis of - - PowerPoint PPT Presentation
Advanced Mathematical Methods Part II Statistics GLM Analysis of Variance Designs Mel Slater http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/ 1 Outline Factorial Design One Way Analysis of Variance Two Way
1
Part II – Statistics GLM – Analysis of Variance Designs
Mel Slater
http://www.cs.ucl.ac.uk/staff/m.slater/Teaching/Statistics/
2
Factorial Design One Way Analysis of Variance Two Way Analysis of Variance Two Way Analysis of Variance with
Mixed Models – Analysis of
3
In regression analysis we assume that the
‘independent variables’ are numerical variables.
Often this is not the case –
impact on the response may be ‘status’ of the person (undergraduate student, PhD student, staff, administrator, etc)
Such ‘qualitative’ variables are called ‘factors’. An experiment that is designed where the
independent variables are ‘factors’ is called a Factorial Design.
These are often called Analysis of Variance
(ANOVA) models.
4
This is the simplest design There is one factor, and the factor
5
The way to express this situation is
– i = 1,2,…,k (number of levels of the factor) – j = 1,2,…,n (number of observations at each level)
constant variance σ2.
6
We can therefore use the GLM to
We can also construct an analysis of
7
This can easily be reformulated as a
special case of the general linear model, with:
X is a matrix that consists entirely of 0s
and 1s
(What is it?) Note that the X matrix is not of full rank
and therefore (XTX) is singular.
An additional constraint must be put on
the αi
k k
2 2 1 1
8
This example refers to the ‘paranoia’
We will use 1-Way ANOVA to look
vrtotal is the ‘total’ paranoia
9
Make the variables by extracting
vrtotal = s.data(:,14); sex = s.data(:,3);
10
H0: no difference between mean paranoia of
[P,anovatab] = anova1(vrtotal,sex,'on')
table that is output
11
23 384.96 Total 17.50 22 384.9 Error 0.9615 0.0024 0.0417 1 0.0417 Groups Prob>F F MS Df SS Source
In this case we clearly do not reject the null hypothesis. We conclude that gender has no influence on the response.
12
$factor sex 2 !a factor with 2 levels $fit $ !the deviance gives the total SS and
$fit sex !the deviance gives the residual SS
The rest can be computed from these two
13
Factor levels must start from 1 not 0 Hence in GLIM for this data we
$cal sex = sex+1 (since sex is coded
14
Of course just examining one factor
There may be several factors We will consider 2 factors
15
The way to express this situation is
– i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2)
constant variance σ2.
16
Again, this is a special case of GLM LS estimators are: Similar ANOVA table can be constructed
j j i i
*
Mean of ith row – grand mean Mean of jth column – grand mean
17
Two Factors
Mounted Display (HMD)
negative virtual audience.
Objective – to see how anxiety varies with
Response variables – various measures
18
Response variable ‘interested’
– “How interested was the audience in what you had to say?” scored on a 1-7 scale with 1= not at all, 7=very much
19
3.6 6.3 1.1 3.4 Grand Total 4.0 6.7 1.2 4.2 2 HMD 3.2 6.0 1.0 2.7 1 Desktop Grand Total 3 2 1 Immersion
Negative Positive Group: Neutral
Average of interested
Each entry is the average ‘interested’ score for the 6 people in that group
20
y =[2.7 1.0
anova2(y,1,'on')
21
5 29.0533 Total 0.2150 2 0.4300 Error 0.1689 4.4651 0.9600 1 0.9600 Rows 0.0153 64.3333 13.8317 2 27.6633 Columns Prob>F F MS Df SS Source
H0a: All row means equal ----- not rejected at 5% level H0b: All column means equal ---- rejected at 5% level
22
$units 6 $data x $read 2.7 1.0 6.0 4.2 1.2 6.7 $cal row = %gl(2,3) $ $c this generates 2 levels with 3 copies at each level $cal col = %gl(3,1) $c this calculates the column factor levels $factor row 2 col 3 $fit $c the deviance and df are for the total SS $fit row+col $c the deviance and df are for the residual SS $fit –row $c the change in deviance and df are the row SS $fit row + col $c the full model again $fit –col $c the change in deviance and df are the col SS
$c from these the complete table can be constructed.
23
It is throwing data away to average the response
variable within each cell.
Instead we can deal with the actual replications This enhances the theoretical model since then
we can include an ‘interaction’ term between row and column effects.
This interaction term models the non-additivity,
i.e., allows for the possibility that the row and column factor together produce a sum that is more than the parts.
24
The way to express this situation is
– i = 1,2,…,m (number of rows – levels of factor 1) – j = 1,2,…,n (number of columns – levels of factor 2) – k = 1,2,…,p (number of replications in each cell)
constant variance σ2.
25
The data must be put in the form of
There will therefore be (mp) rows
[p,table] = anova2(y,6,'on'); will produce the table …
26
35 29.0533 Total 1.1444 30 34.3333 Error 0.3184 1.1893 1.3611 2 2.7222 Interaction 0.0371 4.7573 5.4444 1 5.4444 Rows 0.0000 72.5485 83.0278 2 166.0556 Columns Prob>F F MS Df SS Source
The hypothesis that all column means (groups) are equal Would be rejected. The hypothesis that all row means (immersion) are equal Would be rejected. There is no interaction effect however.
27
We can read the data file for GLIM
We read in the variables immersion,
$factor immersion 2 group 3$ $c declares the factors
28
$echo $input 10 132 $echo File name? fops3006.txt $units 36 $data ID Immersion Group w Age sex Ethnic Language Occupation Games PRCS FNE SAD Comfortable pleased Audience People Computer aware impression friendly interested again selfrating somatic MPRCS $read Data goes here
29
$yvar interest$ $factor immersion 2 group 3$ $fit $c deviance = total SS $fit immersion*group $c change in deviance = residual SS $fit -immersion.group $c change in deviance = interaction SS $fit –immersion $c change in deviance = immersion(row) SS $fit +immersion $c model is immersion+group $fit –group $c change in deviance = group (col) SS
30
Suppose there are two factors row,
$fit row.col is the interaction effect $fit row*col is the same as
31
$tab the interest mean for immersion;group$ GROUP 1 2 3 IMMERSIO 1 2.667 1.000 6.000 2 4.167 1.167 6.667 $tab the interest mean for immersion$ IMMERSIO 1 2 MEAN 3.222 4.000 $tab the interest mean for group$ GROUP 1 2 3 MEAN 3.417 1.083 6.333
32
With GLIM it is very easy to combine
In our example we might want to
In general suppose that y is the
33
$fit row*col+x !the same slope everywhere $fit row*col+row.x ! can be diff each row $fit row*col+col.x !can be diff each col $fit row*col*x !can be diff each cell
ij j i ijk
i ij j i ijk
j ij j i ijk
ij ij j i ijk
34
Often during an experiment data is lost, or
Analysis can still be carried out Be careful because missing data is often
If you don’t watch for this then you can
35
In GLIM it is very easy to exclude a data record
that has missing data.
Make a vector w which has 1 for each unit of
data which does not contain missing values, and 0 otherwise.
w is the same length as the number of units –
eg, the number of subjects in an experiment.
$weight w$
that do not have missing data.
$weight $