Contrast Coding Or: One of These Levels is Not Like the Others - - PowerPoint PPT Presentation
Contrast Coding Or: One of These Levels is Not Like the Others - - PowerPoint PPT Presentation
Contrast Coding Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group 03.10.11 Administrivia 3/10 (TODAY): Contrast coding overview 4/7: Simple vs main effects 4/21: Principal components
Administrivia
- 3/10 (TODAY): Contrast coding overview
- 4/7: Simple vs main effects
- 4/21: Principal components analysis
- 1st week of May: Harald Baayen visit
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Why Use Contrast Coding?
- Scott's example study:
- Examining recall memory for spoken
discourse as a function of:
- Location of disfluencies (categorical variable)
- Prior story knowledge (continuous variable)
=
LOCATION OF DISFLUENCY SUBJECT ITEM
+ + +
PRIOR KNOWLEDGE
Why Use Contrast Coding?
- Regression equation: Predicts values
- Could use this to predict whether or not
something will be remembered
- But in cognitive psych:
- Often interested in the effect of specific levels
- Test which ones differ significantly
=
LOCATION OF DISFLUENCY SUBJECT ITEM
+ + +
PRIOR KNOWLEDGE
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Contrast Coding
- Example: Fluent vs. disfluencies in typical
locations vs. in atypical locations
- Which ones differ significantly?
Typical Atypical Fluent
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
% of story recalled
Contrast Coding
- Contrasts: Test differences between
specific levels
– Same as a planned comparison in an ANOVA – Also analogous to a post-hoc test
- Planned comparisons vs post-hoc tests
– If we are deciding tests post-hoc, greater chance
- f capitalizing on chance / spurious effect
– Contrasts are set before you fit the model, but it
would be possible to go back and change the contrasts afterwards
– We are basically on the honor system here—no
way to prove the comparison was planned ahead of time
Contrasts!
- Contrasts like weighted sums of means
– In multiple regression / MLM context, also
subject to other variables in the model
- Using your scale to test what's different
Typical Atypical Fluent
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
% of story recalled
Contrast Coding
It looks like the Fluent stories might not be remembered as well. Let's use a contrast to test this.
Contrasts
TYPICAL
ATYPICAL
FLUENT
Question 1: Do disfluencies affect recall?
Contrasts
Contrast weights are assigned
.33 .33
- .66
One side positive. One side negative. This determines which levels are being compared (+ versus -) Doesn't really matter which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical significance
TYPICAL
ATYPICAL
FLUENT
Contrasts
Contrast weights are assigned
.33 .33
- .66
One side positive. One side negative. Codes add up to zero. Also nice to have the absolute values of the + code and the – code sum to 1. (We'll see why later.) abs(.33) + abs(-.66) = 1
TYPICAL
ATYPICAL
FLUENT
Contrasts
Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)
.33 .33
- .66
One side positive. One side negative. Codes add up to zero. Does contrast differ significantly from zero? If so, difference between levels is significant.
TYPICAL
ATYPICAL
FLUENT
Contrasts
Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)
.33 .33
- .66
*
TYPICAL
ATYPICAL
FLUENT
Typical Atypical Fluent
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Contrast Coding
*
% of story recalled Our first contrast reveals that fluent stories are remembered worse. Now let's look at Typical vs Atypical
We always have j – 1 contrasts, where j = the # of levels
- f the factor
So, here 2 contrasts needed to fully describe
Contrasts
TYPICAL
ATYPICAL
Question 2: Does location of disfluencies matter?
Contrasts
Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)
.50
- .50
One side positive. One side negative. Codes add up to zero. Sum of absolute values
- f codes is 1.
FLUENT (zeroed
- ut here!)
TYPICAL
ATYPICAL
Typical Atypical Fluent
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Contrast Coding
*
% of story recalled
n.s.
One Important Point!
- Choice of contrasts doesn't affect total
variance accounted for by variable
- Only about differences between levels
- Can divide this up in multiple different ways
and still account for same total variance
LOCATION IN STORY
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Why -.5 and .5?
- Why [-.5 .5] instead of [-1 1]?
- Doesn't affect significance test
- Does affect β weight (estimate)
– Std error is also scaled accordingly
FILLER LOCATION: [-1 1] FILLER LOCATION: [-.5 .5]
Contrast Estimates
ATYPICAL LOCATION TYPICAL LOCATION
.5
- .5
CONTRAST CODE
}1
Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant In this case, a 1-unit change in contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions
Contrast Estimates
ATYPICAL LOCATION TYPICAL LOCATION
1
- 1
CONTRAST CODE
}2
Here, the total difference between the levels' codes is 2 So, a 1-unit change in the contrast is only HALF the difference between the levels' codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions
Contrast Estimates
ATYPICAL LOCATION TYPICAL LOCATION
.5
- .5
CONTRAST CODE
}1
ATYPICAL LOCATION TYPICAL LOCATION
1
- 1
CONTRAST CODE
1 unit change in contrast IS the difference between levels (.04825 in this case) 1 unit change in contrast IS
- nly half the difference
between levels
}2
Beta weight (estimate) represents the effect of a 1-unit change in the contrast
So Why -.5 and .5?
- Better tell you about difference in means!
– The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as
half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
FILLER LOCATION: [-1 1] FILLER LOCATION: [-.5 .5]
So Why -.5 and .5?
- Better tell you about difference in means!
– The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as
half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
- Both contrasts would account for the same
amount of variance
- This is just another case of deciding the
scale of a variable
– Akin to measuring temperature in C versus F …
both account for the same variance, but the numbers are on different scales
Imbalanced Designs
- You may have an
unequal number of
- bservations per cell
–
e.g. some data lost,
- r responses not
codable
- Correct for this
in your contrast codes if you want things centered
–
Ask Tuan or Scott about how to do this :)
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Contrasts in R
- To check what the current contrasts are:
– contrasts(YourDataFrame$VariableName)
- To set the contrasts:
– contrasts(YourDataFrame$VariableName) =
cbind(c(.33,.33,-.66),c(.50,-.50,0))
- Each c(xx,yy,zz) is the weights for one of the
contrasts you want to run
- e.g. (.33, .33, -.66) is one contrast
- After setting contrasts, run lmer model to get
the results of the contrasts
Contrasts in R
- Should have j – 1 contrasts, where k = # of
levels of the factor
- If using a subset of data, some levels of the
factor may no longer be present
– e.g. you dropped a condition – But, R still “remembers” that these levels exist
and will get mad you didn't specify enough contrasts
– Fix this by reconverting to a factor:
- YourDataFrame$Variable =
factor(YourDataFrame$Variable)
Another R Tip
- To see the mean of each level of an I.V.:
– tapply(YourDataFrame$DVName,
YourDataFrame$IVName,mean)
– Could also do median, sd, etc.
- For a 2-way (or more!) table
– tapply(YourDataFrame$DVName,
list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)
- Doesn't work if you have missing values
– But Tuan has made a version of
tapply that fixes this problem
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Multiple Comparisons (Here Comes Trouble!)
Multiple Comparisons
- Lots of comparisons you can run
- Suppose we tested both young & older
adults on the disfluency task:
FLUENT / YOUNGER FLUENT / OLDER TYPICAL / YOUNGER TYPICAL / OLDER ATYPICAL / YOUNGER ATYPICAL / OLDER
Multiple Comparisons
- Some comparisons are (wholly or partial)
redundant
- Suppose we find typical > fluent, but
typical and atypical don't reliably differ
- Should expect atypical > fluent (to at least
some degree)
- Or, we find a main effect of age
- Would expect to find an effect of age
within at least some conditions if we looked at them individually
Multiple Comparisons
- Some comparisons are (wholly or partial)
redundant
- j – 1 contrasts actually describe everything
- j = # of levels
FLUENT MEAN OF: Typical Atypical
.35730
}
TYPICAL ATYPICAL }.04825 Can calculate all differences between levels based on this!
Multiple Comparisons
- Want to avoid multiple comparisons
- Error rate increases if you run overlapping,
redundant tests
- Suppose we have the wrong value for one of
means (due to sampling error, etc.)
- In a single test, we set alpha so there is a 5%
chance of incorrectly rejecting H0
.05
Multiple Comparisons
- But now we run a 2nd test comparing that
same “bad” condition to another condition
- Outcome of this test is correlated with the
previous one since they both refer to
- ne of the same conditions
- Not an independent 5% chance of error
- Multiple tests compound Type I error rate
Orthogonality
- Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0 – Matrix of contrast is made up of orthogonal
vectors
– Can think of this as the contrasts being
uncorrelated with each other
Orthogonality
- Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0
.25
.25
- .5
.33
.33
- .66
.50
- .50
.165
- .165
x =
= 0
CONTRAST 1 CONTRAST 2 PRODUCT
TYPICAL ATYPICAL FLUENT
+ x x
Orthogonality
- Avoid this issue w/ orthogonal contrasts
– Products of weights (across contrasts) sum to 0
.25
.25
- .5
.50
- .50
.50
- .50
.25 .0 .0 x =
= .25
CONTRAST 1 CONTRAST 2 PRODUCT
TYPICAL ATYPICAL FLUENT
+ x x
Corrections
- “But, Scott, I really
want to do more than j – 1 comparisons”
- Can apply
corrections to control Type I error
- Bonferroni: Multiply
p value by # of comparisons
– Worst case
scenario
- Less conservative
corrections may be available
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
How Does it Work?
=
LOCATION OF DISFLUENCY SUBJECT ITEM
+ + +
PRIOR KNOWLEDGE
Behind the scenes...
How Does it Work?
β2X2 + β3X3 + ... Y=β0
- Each categorical factor gets coded as
j - 1 variables
- j = number of levels in that factor
- Number of contrasts you have
β0+ β1X1 +
=
LOCATION OF DISFLUENCY SUBJECT ITEM
+ + +
PRIOR KNOWLEDGE
How Does it Work?
- Each coded variable represents one of
your contrasts
β2X2 + β3X3 + ... Y=β0 β0+ β1X1 + .33 .33
- .66
CONTRAST 1
X2 =
if typical location for disfluencies if atypical if fluent
Value of contrast: β2
- Sig. difference
between levels if β differs from 0
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Other Kinds of Coding
- Dummy/Treatment
Coding
– Compare all levels to a
baseline level
– Doesn't allow direct
comparisons between non-baseline levels
– R does this by default :(
1 1 Typical Atypical Fluent
X2 X2 X3
Other Kinds of Coding
- Dummy/Treatment Coding
– Compare all levels to a
baseline level
– Doesn't allow
comparisons between levels
– R does this by default :(
- Sum/Effects Coding
– Test whether each level
differs from overall mean or from chance
Outline
- Why use contrast coding?
- Example contrasts
- Contrast estimates
- Contrasts in R
- Multiple comparisons
- How does it work?
- Other kinds of coding
- Interactions
Contrasts & Interactions
- Contrasts also apply in cases where we
have interactions between variables
- Interaction term represents whether the
value of the contrast depends on another variable
- We'll see some examples on the next
slides
Interaction Example
- Suppose we also sampled different age
groups in the disfluency experiment
– 3 x 2 design
- What are possible patterns of results?
Fluent, young Typical disfluencies, young Atypical disfluencies, young Fluent,
- lder
Typical disfluencies,
- lder
Atypical disfluencies,
- lder
YOUNG ADULTS OLDER ADULTS Group FLUENT TYPICAL ATYPICAL Story Type
Possible Result 1
- Contrast 1 significant
– Effect of disfluencies
- Contrast 2 non-sig.
– Location irrelevant
- No effect of age at all in
this case
– Everything the same
for both age groups
YOUNG OLDER
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
CONTRAST 1 CONTRAST 2 no AGE no CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?
Possible Result 2
- Contrast 2 is now
significant
– Typical > atypical
- Still no effect of AGE
CONTRAST 1 CONTRAST 2 yes AGE no CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
YOUNG OLDER
Possible Result 3
- Now, AGE effect
– Older adults remember
more across the board
- But, no interaction
– Disfluency effect is the
same under both load conditions
CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
YOUNG OLDER
Possible Result 4
- Contrast 1 interacts with
AGE
– Presence of disfluencies
differs across age
- Effect only for
young adults
- Contrast 2 (location) still
same in all cases
CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE yes C2 x AGE no SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
YOUNG OLDER
Possible Result 5
- Now, Contrast 2 also
interacts with AGE
– Reversal of Typical vs
Atypical effect across age
CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE yes C2 x AGE yes SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
YOUNG OLDER
Possible Result 6
- Contrast 2 interaction
but not Contrast 1
– Typical vs Atypical
comparison does depend on age
– Overall effect of having
fillers does not
CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE no C2 x AGE yes SIGNIFICANT?
Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9
YOUNG OLDER
Interactions in R
- Implementing interactions in an R model
formula (lmer or otherwise):
– A + B
- Main effects of A and B, no interaction
– A * B
- All possible interactions and main effects of A
and B
– A : B
- Interaction of A and B, no main effect (unless
you add it separately)
- In, say, a corpus analysis with 20 predictors,
you wouldn't want to test a 20-way interaction … but this lets you control what to include