Applying Signal Detection Theory to Multi-Level Modeling: When - - PowerPoint PPT Presentation

applying signal detection theory to multi level modeling
SMART_READER_LITE
LIVE PREVIEW

Applying Signal Detection Theory to Multi-Level Modeling: When - - PowerPoint PPT Presentation

Applying Signal Detection Theory to Multi-Level Modeling: When Accuracy Isn't Always Accurate December 5 th , 2011 Scott Fraundorf & Jason Finley Outline Why do we need SDT? Sensitivity vs. Response Bias Example SDT


slide-1
SLIDE 1

Applying Signal Detection Theory to Multi-Level Modeling: When “Accuracy” Isn't Always Accurate

December 5th, 2011 Scott Fraundorf & Jason Finley

slide-2
SLIDE 2

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions
slide-3
SLIDE 3

Categorical Decisions

  • Lots of paradigms involve asking

participants to make a categorical decision

– Could be explicit or implicit – Here, we're focusing on cases where there are just two categories … but can generalize to >2

slide-4
SLIDE 4

Some Categorical Decisions

Did Anna dress the baby? (D) Yes (K) No The cop saw the spy with the binoculars. “The coach knew (that) you missed practice.”

Comprehension questions Assigning a meaning to a novel word like “bouba” Choosing to include

  • ptional words like

“that” Baby looking at 1 screen or another Interpreting an ambiguous sentence Choosing referent in perspective-taking task

slide-5
SLIDE 5

Some Categorical Decisions

VIKING (1) Seen (4) New VIKING (1) Male talker (4) Female talker

Recognition item memory – did you see this word or face in the study list? Source memory – was this word said by a male or female talker? Detecting whether or not a faint signal is present Did something change between the two displays?

slide-6
SLIDE 6

What is Signal Detection Theory?

 For experiments with categorical judgments

– Part method for analyzing judgments – Part theory about how people make judgments

 Originally developed for

psychophysics – Operators trying to detect radar signals amidst noise

 Purpose:

– Better metric properties than ANOVA on proportions (logistic regression has already taken care of this for us) – Distinguish sensitivity from response bias

slide-7
SLIDE 7

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions
slide-8
SLIDE 8

Study: POTATO SLEEP RACCOON WITCH NAPKIN BINDER Test: SLEEP POTATO BINDER WITCH RACCOON NAPKIN

Early recognition memory experiments: Study a list of words, then see the same list again. Circle the ones you remember. Problem: People might realize that all of these are words they studied! Could circle them all even if they don't really remember them.

slide-9
SLIDE 9

Study: POTATO SLEEP RACCOON WITCH NAPKIN BINDER Test: POTATO HEDGE WITCH BINDER SHELL SLEEP MONKEY OATH

Later experiments: Add foils or lures that aren't words you studied. Here we see someone circled half of the studied words … but they circled half of the lures, too. No real ability to tell apart new & old items. They're just circling 50% of everything.

slide-10
SLIDE 10

Study: POTATO SLEEP RACCOON WITCH NAPKIN BINDER Test: POTATO HEDGE WITCH BINDER SHELL SLEEP MONKEY OATH

What we want is a measure that examines correct endorsements relative to incorrect endorsements … and is not influenced by an overall bias to circle things

slide-11
SLIDE 11

Sensitivity vs. Response Bias

“C is the most common answer in multiple choice exams” Knowing which answers are C and which aren't Response bias Sensitivity (or discrimination)

slide-12
SLIDE 12

Sensitivity vs. Response Bias

 Imagine asking L2 learners of English to

judge grammaticality...

 It appears that our participants are more

accurate at accepting grammatical items than rejecting ungrammatical ones...

Grammatical condition Ungrammatical cond.

70% 30% 70% 70%

ACCURACY SAID “GRAMMATICAL”

Group A

slide-13
SLIDE 13

Sensitivity vs. Response Bias

 But, really, they are just judging all

sentences as “grammatical” 70% of the time – a response bias

 No evidence they're showing sensitivity to

grammaticality here

 “Accuracy” can confound these 2 influences

Grammatical condition Ungrammatical cond.

70% 30% 70% 70%

ACCURACY SAID “GRAMMATICAL”

Group A

slide-14
SLIDE 14

Sensitivity vs. Response Bias

 Now imagine we have speakers of two

different first languages...

Grammatical condition Ungrammatical cond.

70% 30% 70% 70%

ACCURACY SAID “GRAMMATICAL”

Group A Group B

Grammatical condition Ungrammatical cond.

60% 40%

slide-15
SLIDE 15

Sensitivity vs. Response Bias

 It looks like Group B is better at rejecting

ungrammatical sentences...

 But groups just have different biases

Grammatical condition Ungrammatical cond.

70% 30% 70% 70%

ACCURACY SAID “GRAMMATICAL”

Group A Group B

60% 60%

Grammatical condition Ungrammatical cond.

60% 40%

slide-16
SLIDE 16

Sensitivity vs. Response Bias

 This would be particularly misleading if we

  • nly looked at ungrammatical items

 No way to distinguish response bias vs.

sensitivity in that case!

Ungrammatical cond.

30% 70%

ACCURACY SAID “GRAMMATICAL”

Group A Group B

60%

Ungrammatical cond.

40%

slide-17
SLIDE 17

Sensitivity vs. Response Bias

 We see participants can give the “right”

answer without really knowing it

 Comparisons to “chance” attempt to deal

with this

 But “chance” = 50% assumes both

responses equally likely

– Probably not true for e.g., attachment ambiguities – People have overall bias to answer questions with “yes”

slide-18
SLIDE 18

Sensitivity vs. Response Bias

 Common to balance frequency of intended

responses

– e.g. 50% true statements, 50% false

 But bias may still exist for other reasons

– Prior frequency

  • e.g. low attachments are more common in English than

high attachments … might create a bias even if they're equally common in your experiment

– Motivational factors (e.g., one error “less bad” than another)

  • Better to suggest a healthy patient undergo additional

screening for a disease than miss someone w/ disease

slide-19
SLIDE 19

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions
slide-20
SLIDE 20

Fraundorf, Watson, & Benjamin (2010)

Both the British and the French biologists had been searching Malaysia and Indonesia for the endangered monkeys. Finally, the British spotted one of the monkeys in Malaysia and planted a radio tag on it.

Presentational or contrastive pitch accent?

Hear recorded discourse: Then, later, get true/false memory test

slide-21
SLIDE 21

The British scientists spotted the endangered monkey and tagged it.

D

TRUE

K

FALSE

slide-22
SLIDE 22

The French scientists spotted the endangered monkey and tagged it.

D

TRUE

K

FALSE

N.B. Actual experiment had multiple types of false probes … an important part of the actual experiment, but not needed for this demonstration

slide-23
SLIDE 23

SDT & Multi-Level Models

 Traditional logistic regression model:  Accuracy confounds sensitivity and

response bias

– Manipulation might just make you say true to everything more

Accuracy

  • f Response

= Probe Type x Pitch Accent

CORRECT MEMORY or INCORRECT MEMORY

slide-24
SLIDE 24

SDT & Multi-Level Models

 Traditional logistic regression model:  SDT model:

Accuracy

  • f Response

= Probe Type x Pitch Accent

Response Made

CORRECT MEMORY or INCORRECT MEMORY JUDGED GRAMMATICAL or JUDGED UNGRAMMATICAL JUDGED TRUE vs JUDGED FALSE

= Probe Type x Pitch Accent

SDT model involves changing the way your DV is parameterized.

slide-25
SLIDE 25

Respond correctly

  • r

Respond incorrectly? True statement

  • r

False statement? This better reflects the actual judgment we are asking participants to make. They are deciding whether to say “this is true” or “this is false” … not whether to respond accurately or respond inaccurately

slide-26
SLIDE 26

SDT & Multi-Level Models

 SDT model:

Said “TRUE”

=

Actually is TRUE Intercept Baseline rate of responding TRUE. Does item being true make you more likely to say TRUE? Response bias Sensitivity

+

w/ centered predictors... At this point, we haven't looked at any differences between conditions (e.g. contrastive vs presentational accent or L1 vs L2). We are just analyzing overall performance.

slide-27
SLIDE 27

SDT & Multi-Level Models

 SDT model:

Said “TRUE”

=

Actually is TRUE Contrastive Accent Intercept Accent x TRUE Baseline rate of responding TRUE. Does item being true make you more likely to say TRUE? Does contrastive accent change

  • verall rate of saying TRUE?

Does accent especially increase TRUE responses to true items? Response bias Sensitivity Effect on bias Effect on sensitivity

+ + +

w/ centered predictors...

slide-28
SLIDE 28

SDT & Multi-Level Models

 SDT model:

Said “TRUE”

=

Actually is TRUE * Contrastive Accent Intercept Accent x TRUE * Baseline rate of responding TRUE. Does item being true make you more likely to say TRUE? Does contrastive accent change

  • verall rate of saying TRUE?

Does accent especially increase TRUE responses to true items? Response bias Sensitivity Effect on bias Effect on sensitivity

+ + +

Contrastive accent improves actual sensitivity. No effect on response bias.

slide-29
SLIDE 29

SDT & Multi-Level Models

 SDT model:

Said “TRUE”

=

Actually is TRUE * Contrastive Accent Intercept Accent x TRUE * Response bias Sensitivity Effect on bias Effect on sensitivity

+ + +

General heuristic: Effects that don't interact with item type = effects on bias Effects that do involve item type = effects on sensitvity

slide-30
SLIDE 30

Ferreira & Dell (2000)

  • Are people sensitive to ambiguity in language

production?

  • Ambiguous: “The coach knew you....”

– “The coach knew you for a long time.”

  • Here, you is the direct object of knew

– “The coach knew you missed practice.”

  • Here, you is actually the subject of an embedded

sentence (you missed practice). Confusing if you were expecting a direct object!

  • “The coach knew that you missed practice.”

– Including that avoids this ambiguity

slide-31
SLIDE 31

Ferreira & Dell (2000)

  • Task: Read & recall sentences
  • Ambiguous: “The coach knew (that) you....”
  • Unambiguous: “The coach knew (that) I...”

– This has to be a sentential complement (would be The coach knew me if DO)

  • Will people produce “that” in the ambiguous

conditions?

– Especially if task instructions emphasize being clear?

slide-32
SLIDE 32

SDT & Multi-Level Models

 SDT model:

Said “that”

=

Ambiguity Instructions Intercept Instructions x Ambiguity Baseline rate of producing “that” Do people produce “that” more for you (ambig.) than I (unambig.)? Are people told to avoid ambiguity? Do instructions especially increase use of “that” for ambiguous items? Response bias Sensitivity Effect on bias Effect on sensitivity

+ + +

slide-33
SLIDE 33

SDT & Multi-Level Models

 SDT model:

Said “that”

=

Ambiguity Intercept Baseline rate of producing “that” Do people produce “that” more for you (ambig.) than I (unambig.)? Response bias Sensitivity

+

People show little sensitivity to ambiguity!

slide-34
SLIDE 34

SDT & Multi-Level Models

Ambiguity Instructions Intercept Instructions x Ambiguity Baseline rate of producing “that” Do people produce “that” more for you (ambig.) than I (unambig.)? Are people told to avoid ambiguity? Do instructions especially increase use of “that” for ambiguous items? Response bias Sensitivity Effect on bias Effect on sensitivity

+ + +

Instructions to be clear don't increase sensitivity to ambiguity. They just increase overall rate of “that,” in all conditions. An interesting response bias effect that tells us about participant's strategy! People try to increase clarity by inserting complementizer everywhere

slide-35
SLIDE 35

Other Designs

 Imagine if critical comprehension questions

should all be answered “yes”

 Not possible to tease apart sensitivity &

response bias

– Could say “yes” 85% of the time because you knew the correct answer 85% of the time (sensitivity) – But maybe you would just say “yes” to 85% of everything (response bias) – Like the memory test that only has studied words

 Would need “no” probes to do SDT analysis  A common design in psycholinguistics … but

limited in the conclusions we can draw from it

slide-36
SLIDE 36

Other Designs

 This concern holds even if we manipulate

some other variable...

 The priming manipulation...

– Might improve sensitivity at realizing these questions should be answered “yes” – Might just increase bias to say “yes”

 Again, we would need some “no” questions

to distinguish these hypotheses

“Yes” comprehension questions – NOT PRIMED “Yes” comprehension questions – PRIMED

81% 93%

RATE OF “YES” RESPONSES

slide-37
SLIDE 37

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions

These slides courtesy Jason Finley

slide-38
SLIDE 38

# Hits # Signal Trials Hit Rate (HR) = # False Alarms # Noise Trials False Alarm Rate (FAR) =

Signal Detection Performance

for each trial:

Hit Miss False Alarm Correct Rejection

summary statistics:

(Type II error in stats) (Type I error)

slide-39
SLIDE 39

Theory

NOISE trials SIGNAL trials

  • 1. Trials = events
  • 2. Strength of evidence: continuous dimension
  • 3. Conditional probability distributions for noise, signal
  • 4. Decision/response criterion
  • 5. Evidence has an arbitrary scale. By convention, noise distribution

has mean 0, variance 1

“No” “Yes”

Respond “yes” if evidence above criterion Respond “no” if evidence below criterion

slide-40
SLIDE 40

decision criterion

Correct Rejection

NOISE trials SIGNAL trials Noise trial, and evidence is below criterion

slide-41
SLIDE 41

decision criterion

False Alarm

NOISE trials SIGNAL trials Noise trial, but evidence is above criterion

slide-42
SLIDE 42

decision criterion

Miss

NOISE trials SIGNAL trials Signal trial, but evidence is below criterion

slide-43
SLIDE 43

decision criterion

Hit

NOISE trials SIGNAL trials Signal trial, and evidence is above criterion

slide-44
SLIDE 44

NOISE trials SIGNAL trials

Response Bias

decision criterion Optimal Criterion

  • signal probability
  • payoff structure
slide-45
SLIDE 45

NOISE trials SIGNAL trials

Response Bias

decision criterion

High criterion will increase correct rejections, but also increase misses (The kind of criterion intended in null hypothesis significance testing)

slide-46
SLIDE 46

NOISE trials SIGNAL trials

Response Bias

decision criterion

Low criterion will increase hits, but also false alarms Good if misses especially bad (medical screening example)

slide-47
SLIDE 47

NOISE trials SIGNAL trials

Response Bias

decision criterion

c = -.5[z(HR) + z(FAR)] c parameter describes location of criterion

slide-48
SLIDE 48

d’

Sensitivity

NOISE trials SIGNAL trials

d’ = z(HR) – z(FAR)

Traditional SDT measure of sensitivity is d' … measuring the distance between peaks

slide-49
SLIDE 49

Lower Sensitivity

Sensitivity

result of:

  • external factors
  • internal factors
slide-50
SLIDE 50

Higher Sensitivity

Sensitivity

result of:

  • external factors
  • internal factors
slide-51
SLIDE 51

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions
slide-52
SLIDE 52

Probit vs Logit

 How to make binomial

response continuous?

 Logit = log odds

– lnOR = log([p(hit)/p(miss)] / [p(FA)/p(CR)])

 Probit = cumulative

distribution function (CDF) of normal distribution

– d' = z[p(hit)] – z[p(FA)]

CDF at x is area under curve from -Inf to point x

slide-53
SLIDE 53

 Very similar!

– Probit changes more quickly in middle of distribution, more slowly at tails – Logit has somewhat easier interpretation (can convert to

  • dds / odds ratios)

– Probably, you will get qualitatively similar results with both – Could try both & see which fits your dataset better – Literatures differ in which is used more commonly

Figures from http://www.indiana.edu/~statmath/stat/all/cdvm/cdvm1.html

PROBIT LOGIT

slide-54
SLIDE 54

 Can pick one or the other:

– lmer(Y ~ X, family=binomial, link=logit)

  • Default, used if you don't specify a link

– lmer(Y ~ X, family=binomial, link=probit) PROBIT LOGIT

slide-55
SLIDE 55

 Both the logit and probit are undefined if

you have probability 0 or 1 in a cell

– Can apply some type of adjustment (e.g. empirical logit)

PROBIT LOGIT

slide-56
SLIDE 56

Outline

  • Why do we need SDT?
  • Sensitivity vs. Response Bias
  • Example SDT Analyses
  • Terminology & Theory
  • Logit & probit
  • Extensions
slide-57
SLIDE 57

Extensions

 Generalizes to > 2 ordered categories:

– Traditionally, collapse over participants or items & use da – MLM needs multinomial model, currently available in SAS but not R

slide-58
SLIDE 58

Extensions

 Variance in parameters over participants,

items

– e.g. different sensitivity, different response bias – Captured by random slopes – Likely that such variance exists!

slide-59
SLIDE 59

Extensions

 Unequal variance

– So far, variability of response is a constant d' = B0 + B1X1 + (1|Subject) + ɛ – Definitely not true for recognition memory (although lots of debate about why this is)

 Noisy criterion (Benjamin, Diaz, & Wee, 2009)

– Typically, all error is in the response (evidence) – But criterion could vary from trial to trial

(from Wixted, 2002) N.B. This is definitely not the

  • nly account of this

difference! (see Yonelinas, 2002)