Contrast Coding Or: One of These Levels is Not Like the Others - - PowerPoint PPT Presentation

contrast coding
SMART_READER_LITE
LIVE PREVIEW

Contrast Coding Or: One of These Levels is Not Like the Others - - PowerPoint PPT Presentation

Contrast Coding Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group 03.10.11 Administrivia 3/10 (TODAY): Contrast coding overview 4/7: Simple vs main effects 4/21: Principal components


slide-1
SLIDE 1

Contrast Coding

Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11

slide-2
SLIDE 2

Administrivia

  • 3/10 (TODAY): Contrast coding overview
  • 4/7: Simple vs main effects
  • 4/21: Principal components analysis
  • 1st week of May: Harald Baayen visit
slide-3
SLIDE 3

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-4
SLIDE 4

Why Use Contrast Coding?

  • Scott's example study:
  • Examining recall memory for spoken

discourse as a function of:

  • Location of disfluencies (categorical variable)
  • Prior story knowledge (continuous variable)

=

LOCATION OF DISFLUENCY SUBJECT ITEM

+ + +

PRIOR KNOWLEDGE

slide-5
SLIDE 5

Why Use Contrast Coding?

  • Regression equation: Predicts values
  • Could use this to predict whether or not

something will be remembered

  • But in cognitive psych:
  • Often interested in the effect of specific levels
  • Test which ones differ significantly

=

LOCATION OF DISFLUENCY SUBJECT ITEM

+ + +

PRIOR KNOWLEDGE

slide-6
SLIDE 6

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-7
SLIDE 7

Contrast Coding

  • Example: Fluent vs. disfluencies in typical

locations vs. in atypical locations

  • Which ones differ significantly?

Typical Atypical Fluent

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

% of story recalled

slide-8
SLIDE 8

Contrast Coding

  • Contrasts: Test differences between

specific levels

– Same as a planned comparison in an ANOVA – Also analogous to a post-hoc test

  • Planned comparisons vs post-hoc tests

– If we are deciding tests post-hoc, greater chance

  • f capitalizing on chance / spurious effect

– Contrasts are set before you fit the model, but it

would be possible to go back and change the contrasts afterwards

– We are basically on the honor system here—no

way to prove the comparison was planned ahead of time

slide-9
SLIDE 9

Contrasts!

  • Contrasts like weighted sums of means

– In multiple regression / MLM context, also

subject to other variables in the model

  • Using your scale to test what's different
slide-10
SLIDE 10

Typical Atypical Fluent

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

% of story recalled

Contrast Coding

It looks like the Fluent stories might not be remembered as well. Let's use a contrast to test this.

slide-11
SLIDE 11

Contrasts

TYPICAL

ATYPICAL

FLUENT

Question 1: Do disfluencies affect recall?

slide-12
SLIDE 12

Contrasts

Contrast weights are assigned

.33 .33

  • .66

One side positive. One side negative. This determines which levels are being compared (+ versus -) Doesn't really matter which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical significance

TYPICAL

ATYPICAL

FLUENT

slide-13
SLIDE 13

Contrasts

Contrast weights are assigned

.33 .33

  • .66

One side positive. One side negative. Codes add up to zero. Also nice to have the absolute values of the + code and the – code sum to 1. (We'll see why later.) abs(.33) + abs(-.66) = 1

TYPICAL

ATYPICAL

FLUENT

slide-14
SLIDE 14

Contrasts

Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)

.33 .33

  • .66

One side positive. One side negative. Codes add up to zero. Does contrast differ significantly from zero? If so, difference between levels is significant.

TYPICAL

ATYPICAL

FLUENT

slide-15
SLIDE 15

Contrasts

Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)

.33 .33

  • .66

*

TYPICAL

ATYPICAL

FLUENT

slide-16
SLIDE 16

Typical Atypical Fluent

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Contrast Coding

*

% of story recalled Our first contrast reveals that fluent stories are remembered worse. Now let's look at Typical vs Atypical

We always have j – 1 contrasts, where j = the # of levels

  • f the factor

So, here 2 contrasts needed to fully describe

slide-17
SLIDE 17

Contrasts

TYPICAL

ATYPICAL

Question 2: Does location of disfluencies matter?

slide-18
SLIDE 18

Contrasts

Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)

.50

  • .50

One side positive. One side negative. Codes add up to zero. Sum of absolute values

  • f codes is 1.

FLUENT (zeroed

  • ut here!)

TYPICAL

ATYPICAL

slide-19
SLIDE 19

Typical Atypical Fluent

0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8

Contrast Coding

*

% of story recalled

n.s.

slide-20
SLIDE 20

One Important Point!

  • Choice of contrasts doesn't affect total

variance accounted for by variable

  • Only about differences between levels
  • Can divide this up in multiple different ways

and still account for same total variance

LOCATION IN STORY

slide-21
SLIDE 21

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-22
SLIDE 22

Why -.5 and .5?

  • Why [-.5 .5] instead of [-1 1]?
  • Doesn't affect significance test
  • Does affect β weight (estimate)

– Std error is also scaled accordingly

FILLER LOCATION: [-1 1] FILLER LOCATION: [-.5 .5]

slide-23
SLIDE 23

Contrast Estimates

ATYPICAL LOCATION TYPICAL LOCATION

.5

  • .5

CONTRAST CODE

}1

Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant In this case, a 1-unit change in contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions

slide-24
SLIDE 24

Contrast Estimates

ATYPICAL LOCATION TYPICAL LOCATION

1

  • 1

CONTRAST CODE

}2

Here, the total difference between the levels' codes is 2 So, a 1-unit change in the contrast is only HALF the difference between the levels' codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions

slide-25
SLIDE 25

Contrast Estimates

ATYPICAL LOCATION TYPICAL LOCATION

.5

  • .5

CONTRAST CODE

}1

ATYPICAL LOCATION TYPICAL LOCATION

1

  • 1

CONTRAST CODE

1 unit change in contrast IS the difference between levels (.04825 in this case) 1 unit change in contrast IS

  • nly half the difference

between levels

}2

Beta weight (estimate) represents the effect of a 1-unit change in the contrast

slide-26
SLIDE 26

So Why -.5 and .5?

  • Better tell you about difference in means!

– The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as

half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

FILLER LOCATION: [-1 1] FILLER LOCATION: [-.5 .5]

slide-27
SLIDE 27

So Why -.5 and .5?

  • Better tell you about difference in means!

– The actual difference between conditions is .048 – It would be perfectly correct to describe .024 as

half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

  • Both contrasts would account for the same

amount of variance

  • This is just another case of deciding the

scale of a variable

– Akin to measuring temperature in C versus F …

both account for the same variance, but the numbers are on different scales

slide-28
SLIDE 28

Imbalanced Designs

  • You may have an

unequal number of

  • bservations per cell

e.g. some data lost,

  • r responses not

codable

  • Correct for this

in your contrast codes if you want things centered

Ask Tuan or Scott about how to do this :)

slide-29
SLIDE 29

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-30
SLIDE 30

Contrasts in R

  • To check what the current contrasts are:

– contrasts(YourDataFrame$VariableName)

  • To set the contrasts:

– contrasts(YourDataFrame$VariableName) =

cbind(c(.33,.33,-.66),c(.50,-.50,0))

  • Each c(xx,yy,zz) is the weights for one of the

contrasts you want to run

  • e.g. (.33, .33, -.66) is one contrast
  • After setting contrasts, run lmer model to get

the results of the contrasts

slide-31
SLIDE 31

Contrasts in R

  • Should have j – 1 contrasts, where k = # of

levels of the factor

  • If using a subset of data, some levels of the

factor may no longer be present

– e.g. you dropped a condition – But, R still “remembers” that these levels exist

and will get mad you didn't specify enough contrasts

– Fix this by reconverting to a factor:

  • YourDataFrame$Variable =

factor(YourDataFrame$Variable)

slide-32
SLIDE 32

Another R Tip

  • To see the mean of each level of an I.V.:

– tapply(YourDataFrame$DVName,

YourDataFrame$IVName,mean)

– Could also do median, sd, etc.

  • For a 2-way (or more!) table

– tapply(YourDataFrame$DVName,

list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)

  • Doesn't work if you have missing values

– But Tuan has made a version of

tapply that fixes this problem

slide-33
SLIDE 33

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-34
SLIDE 34

Multiple Comparisons (Here Comes Trouble!)

slide-35
SLIDE 35

Multiple Comparisons

  • Lots of comparisons you can run
  • Suppose we tested both young & older

adults on the disfluency task:

FLUENT / YOUNGER FLUENT / OLDER TYPICAL / YOUNGER TYPICAL / OLDER ATYPICAL / YOUNGER ATYPICAL / OLDER

slide-36
SLIDE 36

Multiple Comparisons

  • Some comparisons are (wholly or partial)

redundant

  • Suppose we find typical > fluent, but

typical and atypical don't reliably differ

  • Should expect atypical > fluent (to at least

some degree)

  • Or, we find a main effect of age
  • Would expect to find an effect of age

within at least some conditions if we looked at them individually

slide-37
SLIDE 37

Multiple Comparisons

  • Some comparisons are (wholly or partial)

redundant

  • j – 1 contrasts actually describe everything
  • j = # of levels

FLUENT MEAN OF: Typical Atypical

.35730

}

TYPICAL ATYPICAL }.04825 Can calculate all differences between levels based on this!

slide-38
SLIDE 38

Multiple Comparisons

  • Want to avoid multiple comparisons
  • Error rate increases if you run overlapping,

redundant tests

  • Suppose we have the wrong value for one of

means (due to sampling error, etc.)

  • In a single test, we set alpha so there is a 5%

chance of incorrectly rejecting H0

.05

slide-39
SLIDE 39

Multiple Comparisons

  • But now we run a 2nd test comparing that

same “bad” condition to another condition

  • Outcome of this test is correlated with the

previous one since they both refer to

  • ne of the same conditions
  • Not an independent 5% chance of error
  • Multiple tests compound Type I error rate
slide-40
SLIDE 40

Orthogonality

  • Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0 – Matrix of contrast is made up of orthogonal

vectors

– Can think of this as the contrasts being

uncorrelated with each other

slide-41
SLIDE 41

Orthogonality

  • Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

.25

.25

  • .5

.33

.33

  • .66

.50

  • .50

.165

  • .165

x =

= 0

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL ATYPICAL FLUENT

+ x x

slide-42
SLIDE 42

Orthogonality

  • Avoid this issue w/ orthogonal contrasts

– Products of weights (across contrasts) sum to 0

.25

.25

  • .5

.50

  • .50

.50

  • .50

.25 .0 .0 x =

= .25

CONTRAST 1 CONTRAST 2 PRODUCT

TYPICAL ATYPICAL FLUENT

+ x x

slide-43
SLIDE 43

Corrections

  • “But, Scott, I really

want to do more than j – 1 comparisons”

  • Can apply

corrections to control Type I error

  • Bonferroni: Multiply

p value by # of comparisons

– Worst case

scenario

  • Less conservative

corrections may be available

slide-44
SLIDE 44

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-45
SLIDE 45

How Does it Work?

=

LOCATION OF DISFLUENCY SUBJECT ITEM

+ + +

PRIOR KNOWLEDGE

Behind the scenes...

slide-46
SLIDE 46

How Does it Work?

β2X2 + β3X3 + ... Y=β0

  • Each categorical factor gets coded as

j - 1 variables

  • j = number of levels in that factor
  • Number of contrasts you have

β0+ β1X1 +

=

LOCATION OF DISFLUENCY SUBJECT ITEM

+ + +

PRIOR KNOWLEDGE

slide-47
SLIDE 47

How Does it Work?

  • Each coded variable represents one of

your contrasts

β2X2 + β3X3 + ... Y=β0 β0+ β1X1 + .33 .33

  • .66

CONTRAST 1

X2 =

if typical location for disfluencies if atypical if fluent

Value of contrast: β2

  • Sig. difference

between levels if β differs from 0

slide-48
SLIDE 48

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-49
SLIDE 49

Other Kinds of Coding

  • Dummy/Treatment

Coding

– Compare all levels to a

baseline level

– Doesn't allow direct

comparisons between non-baseline levels

– R does this by default :(

1 1 Typical Atypical Fluent

X2 X2 X3

slide-50
SLIDE 50

Other Kinds of Coding

  • Dummy/Treatment Coding

– Compare all levels to a

baseline level

– Doesn't allow

comparisons between levels

– R does this by default :(

  • Sum/Effects Coding

– Test whether each level

differs from overall mean or from chance

slide-51
SLIDE 51

Outline

  • Why use contrast coding?
  • Example contrasts
  • Contrast estimates
  • Contrasts in R
  • Multiple comparisons
  • How does it work?
  • Other kinds of coding
  • Interactions
slide-52
SLIDE 52

Contrasts & Interactions

  • Contrasts also apply in cases where we

have interactions between variables

  • Interaction term represents whether the

value of the contrast depends on another variable

  • We'll see some examples on the next

slides

slide-53
SLIDE 53

Interaction Example

  • Suppose we also sampled different age

groups in the disfluency experiment

– 3 x 2 design

  • What are possible patterns of results?

Fluent, young Typical disfluencies, young Atypical disfluencies, young Fluent,

  • lder

Typical disfluencies,

  • lder

Atypical disfluencies,

  • lder

YOUNG ADULTS OLDER ADULTS Group FLUENT TYPICAL ATYPICAL Story Type

slide-54
SLIDE 54

Possible Result 1

  • Contrast 1 significant

– Effect of disfluencies

  • Contrast 2 non-sig.

– Location irrelevant

  • No effect of age at all in

this case

– Everything the same

for both age groups

YOUNG OLDER

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

CONTRAST 1 CONTRAST 2 no AGE no CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?

slide-55
SLIDE 55

Possible Result 2

  • Contrast 2 is now

significant

– Typical > atypical

  • Still no effect of AGE

CONTRAST 1 CONTRAST 2 yes AGE no CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

YOUNG OLDER

slide-56
SLIDE 56

Possible Result 3

  • Now, AGE effect

– Older adults remember

more across the board

  • But, no interaction

– Disfluency effect is the

same under both load conditions

CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE no C2 x AGE no SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

YOUNG OLDER

slide-57
SLIDE 57

Possible Result 4

  • Contrast 1 interacts with

AGE

– Presence of disfluencies

differs across age

  • Effect only for

young adults

  • Contrast 2 (location) still

same in all cases

CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE yes C2 x AGE no SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

YOUNG OLDER

slide-58
SLIDE 58

Possible Result 5

  • Now, Contrast 2 also

interacts with AGE

– Reversal of Typical vs

Atypical effect across age

CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE yes C2 x AGE yes SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

YOUNG OLDER

slide-59
SLIDE 59

Possible Result 6

  • Contrast 2 interaction

but not Contrast 1

– Typical vs Atypical

comparison does depend on age

– Overall effect of having

fillers does not

CONTRAST 1 CONTRAST 2 yes AGE yes CONTRAST 1 yes C1 x AGE no C2 x AGE yes SIGNIFICANT?

Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9 Before Plot Point After Plot Point Rest of Story 1 2 3 4 5 6 7 8 9

YOUNG OLDER

slide-60
SLIDE 60

Interactions in R

  • Implementing interactions in an R model

formula (lmer or otherwise):

– A + B

  • Main effects of A and B, no interaction

– A * B

  • All possible interactions and main effects of A

and B

– A : B

  • Interaction of A and B, no main effect (unless

you add it separately)

  • In, say, a corpus analysis with 20 predictors,

you wouldn't want to test a 20-way interaction … but this lets you control what to include