How to test them, & how to test them well Individual Differences - - PowerPoint PPT Presentation
How to test them, & how to test them well Individual Differences - - PowerPoint PPT Presentation
Individual Differences & Item Effects: How to test them, & how to test them well Individual Differences & Item Effects Properties of subjects Properties of items Cognitive abilities (WM Lexical frequency task scores, inhibition)
Individual Differences & Item Effects
Cognitive abilities (WM task scores, inhibition) Gender Age L2 proficiency Task strategy Lexical frequency Segmental properties Plausibility
Properties of subjects Properties of items
Two Challenges
Subject & item properties are not at the level of individual trials
How to implement in your model? What do they mean statistically?
Subject & item properties often not experimentally manipulated
How to best investigate?
Example Study Fraundorf et al., 2010
Both the British and the French biologists had been searching Malaysia and Indonesia for the endangered monkeys. Finally, the British spotted one of the monkeys in MALAYSIA and planted a radio tag on it.
In Malaysia or in Indonesia? British found it or French found it?
INTRO. EXPT 1 EXPT 2 DISC.
Manipulate presentational vs contrastive accents
Finally, the British spotted one of the monkeys in MALAYSIA... Finally, the BRITISH spotted one of the monkeys in Malaysia... Finally, the BRITISH spotted one of the monkeys in MALAYSIA... Finally, the British spotted one of the monkeys in Malaysia...
INTRO. EXPT 1 EXPT 2 DISC.
INTRO. EXPT 1 EXPT 2 DISC.
Original Results
Contrastive
(L+H*) accent benefits memory
No effect of
accent on other item
Effects seem
localized
Implementing in R
New experiment: do these effects vary with individual differences in working memory? Need trial level and subject level variables in the same dataframe
Implementing in R
Then, can add to model just like any other factor:
lmer(Correct ~ Accent * WM_Score + (1| Subject) + (1|StoryID), family=binomial)
R automatically figures out it's subject-level
Each subject always has the same score
Merging Dataframes
What if trials & subjects in separate files? Load them both into R and use merge:
FullDataframe = merge(Data1, Data2, all.x=TRUE)
Data1: Trial-level Data2: Subject-level
Merging Dataframes
But, these may be separate files Load them both into R and use merge:
FullDataframe = merge(Data1, Data2, all.x=TRUE)
Need some column that has the same name in both data frames
Data1: Trial-level Data2: Subject-level
Merging Dataframes
Load them both into R and use merge:
FullDataframe = merge(Data1, Data2, all.x=TRUE) Need some column that has the same name in both data frames Can specify WHICH columns to use with the by parameter. See ?merge for more details. Default is to delete subjects if they can't be matched across data frames. all.x = TRUE fills in NA values instead so you can track these subjects
What's Going On Statistically?
LEVEL 2: Subjects, Items LEVEL 1: Trial
Knight story Monkey story Knight Monkey Knight Monkey
What's Going On Statistically?
LEVEL 2: Subjects, Items
Knight Monkey
Have random effects of our subjects &
- items. Results in residuals:
Level 2 factors may help us explain this variation
Eun-Kyung accuracy: 80% Tuan accuracy: 72% +4 vs mean
- 4 vs mean
What's Going On Statistically?
Model without WM: Model with main effect of WM:
Unexplained subject variance reduced Unexplained variance between subjects Fixed effects unchanged because these were manipulated within subjects
Random Slopes &
Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS with trial level factors
These help explain the random slopes
Effect of Subject-Level Variables
Remember random slopes?
Variance between subjects in a fixed effect
Memory Accuracy
Other Item Has Presentational Accent Other Item Has Contrastive Accent
Alison Zhenghan
Random Slopes &
Adding main effects at Level 2 will not change fixed effects at Level 1 But can also add INTERACTIONS
These help explain the random slopes May be more interesting, theoretically
People with low WM scores DO show a penalty to memory if something else in the story gets a contrastive accent
Random Slopes &
Illogical to have a random slope by subject for something at the subject level
There isn't a separate WM effect for each subject lmer lets you fit this … but I'm not sure what it represents
Individual Differences: How to Do Them Well
What Scott has learned from the individual differences literature Example study:
Pitch accenting as cue to reference resolution (deaccented referents are usually given) Can we predict individual differences in use of this cue?
Discriminant Validity
Many individual differences are correlated
Discriminant Validity
Many individual differences are correlated e.g. some subjects may just try harder than others
Consequently, they would do better on both WM task & eye-tracking task
Usually not theoretically interesting Principle #1: Include >1 construct so we know what really matters
Discriminant Validity
How to deal with correlated predictors? Simple solution:
Regress 1 on the other
ModelWM <- lm(WMMean ~ PSpeed, data=Cyclops)
Then use the residuals as new measure
Cyclops$ResidWM <- residuals(ModelWM) “The part of WM we couldn't explain from perceptual speed”
Better solutions: path analysis & structural equation modeling
Discriminant Validity
Discriminant Validity
Some people asked about how to get these colored scatterplots... Need to download & load package gclus Then...
Cyclops.short <- subset(Cyclops, select=c('PSpeed', 'GoodProsody', 'ResidWM')) Cyclops.r <- abs(cor(Cyclops.short, use="pairwise.complete.obs")) Cyclops.col <- dmat.color(Cyclops.r) Cyclops.o
- rder.single(Cyclops.col)
cpairs(Cyclops.short, Cyclops.o, panel.colors=Cyclops.col, gap=.5)
Here you select which variables go in the scatterplot
Reliability
Not all individual measures are good measures
Measures may be noisy Measures may not measure a stable or meaningful characteristic
Suppose you found vocab predicted outcome but not WM
Maybe you had a bad WM measure
Reliability
Good tests produce consistent scores
Measuring something real about a person
Can test this yourself with >1 assessment … or split halves
Calculate Pearson's r: cor.test(Cyclops$PSpeed1, Cyclops$PSpeed2) Scatterplot: plot(Cyclops$PSpeed1, Cyclops$PSpeed2)
Typical standard may be r = .70 - .80 needed for “good” reliability
Reliability
Good tests produce consistent scores
Measuring something real about a person
Can test this yourself with >1 assessment … or split halves Principle 2: Check reliability of measures!
r = .77 Good! r = .16 Bad!
Latent Variables
Some things can be measured directly
e.g. gender of a subject, segmental properties of a work
Many things in psychology measured indirectly
i k | d n Ə
Ability to do tasks in spite of interference Alphabet Span Task (Read words & recall alphabeticaly)
Latent Variables
But, few tasks are process pure
Alphabet Span Alphabet knowledge Working memory Reading ability Reading Span
Latent Variables
Principle 3: Overcome task-specific factors with multiple measures of same construct Simple analysis: Use sum or average as your predictor Advanced techniques
Verify measures are related with factor analysis Examine only common variance: latent variable analysis, structural equation modeling
Continuous Predictors
Many individual differences are continuous Good to include continuous variation if you have full range
Splits needed in ANOVA But throws away info.; less powerful
Histogram:
hist(Cyclops$WM, breaks=20)
Continuous Predictors
Don't want to treat predictor as continuous if sampling was dichotomous In this case, we didn't sample middle-aged people
Continuous Predictors
Don't want to treat predictor as continuous if sampling was dichotomous Pattern could be this...
Continuous Predictors
Don't want to treat predictor as continuous if sampling was dichotomous ...or this!
Continuous Predictors
Don't want to treat predictor as continuous if sampling was dichotomous We have no
- info. about
what should be in the middle
Here there be dragons
Comparing Predictors
How do we tell which has a stronger effect?
Measure: # of same/different judgments in 2 min. Beta = 6.03 1 add'l trial: prosody score + 6
Perceptual Speed
Measure: # of multiple- choice Qs correct of 40 Beta = 14.69 1 add'l correct word: prosody score + 15
Vocab
QVT QVR
(A) rashness (B) timidity (C) desire (D) kindness TEMERITY
Comparing Predictors
Issue: Measures often on different scales
Beta = 6.03 Range: 82 to 236 Mean: 160
- Std. Dev.: 28.75
Perceptual Speed
Beta = 14.69 Range: 12.00 to 32.00 Mean: 20.80
- Std. Dev.: 5.30
Vocab
Comparing Predictors
Issue: Measures often on different scales Solution: Standardize the predictors so you are comparing z scores
Cyclops$Vocab_z = scale(Cyclops$Vocab, center=TRUE, scale=TRUE)
Changes your parameter estimates but not your hypothesis tests
Perceptual speed: Standardized beta = .31 Vocab: Standardized beta = .14
Center so mean = 0 Scale so SD = 1