[PPT] - Training to Improve Judgmental Expertise by Using Decompositions of PowerPoint Presentation

SLIDE 1

Training to Improve Judgmental Expertise by Using Decompositions of Judgment Accuracy Measures

Eric R. Stone Wake Forest University

SLIDE 2

Overarching Goal

How can we improve judgment accuracy?
Two types of judgments:
1. Judgments of discrete events, typically in probabilistic form

What is the probability the Cardinals will win the World Series?

2. Quantitative judgments of continuous quantities

How many users of Facebook will there be at the end of 2011?

SLIDE 3

Overarching Approach

Rather than develop training techniques designed to increase

judgment accuracy generally, our approach targets specific aspects (components) of judgment accuracy.

Each of these components are related to different skills,

which are typically relatively unrelated to each other.

By focusing intervention efforts on these specific skills, we

can train each of the elements underlying overall judgment accuracy, leading to maximal improvement.

SLIDE 4

Today‟s Plan

1. Discrete events
Training of the component measures (Stone & Opel, 2000)
2. Continuous events
Accuracy measures, as seen in Extended-MSE analysis (Lee & Yates,

1992)

Training of the component measures (Youmans & Stone, 2005)
3. ACES Project
(Preliminary) instantiation of these ideas in an applied forecasting situation

SLIDE 5

Judgments of Discrete Events

where f = probability judgment d = outcome (0 if event does not happen; 1 if it does)

n d f / ) (

2

Example Problem: What is the probability that the home team

(e.g., Rangers) will win?

d = 1 if Rangers win; 0 if Rangers lose
f = judged probability of Rangers winning (0 to 1)

SLIDE 6

Judgments of Discrete Events

n d f / ) (

2

Mean Probability Score

Make judgments of the same type repeatedly

Game 1 -- p (HT wins) = .90; home team does win Game 2 -- p (HT wins) = .60; home team does not win Game 3 -- p (HT wins) = .20, home team does not win

) 2 (. ) 6 (. ) 1 9 (.

2 2 2

= (.01 + .36 + .04) / 3 = .14

SLIDE 7

Judgments of Discrete Events

n d f / ) (

2

discrimination (sometimes referred to as resolution) reflects

“substantive expertise” – domain-specific knowledge in a specific area

calibration reflects “calibration expertise” – the ability to assign

probabilities that match the percentage of times that the target event actually occurs

SLIDE 8

Judgments of Discrete Events

Calibration

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Judged Probability of the Home Team Winning Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

SLIDE 9

Judgments of Discrete Events

Calibration

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Judged Probability of the Home Team Winning

SLIDE 10

Judgments of Discrete Events

Calibration

Types of poor calibration

1) Over (under) estimation 2) Over (under) confidence

SLIDE 11

Judgments of Discrete Events

Calibration

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Judged Probability of the Home Team Winning

SLIDE 12

Judgments of Discrete Events

Calibration

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Judged Probability of the Home Team Winning

SLIDE 13

Judgments of Discrete Events

Discrimination

Judged Probability of the Home Team Winning

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

SLIDE 14

Judgments of Discrete Events

Discrimination

Judged Probability of the Home Team Winning

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

SLIDE 15

Judgments of Discrete Events

Discrimination

Judged Probability of the Home Team Winning

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Proportion (HT Wins) in Each Judgment Category

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

SLIDE 16

Judgments of Discrete Events

Judgment expertise reflects an ability to make well-calibrated

and well-discriminated probability judgments

Summary of Decomposition Analysis Next Questions

What judgment skills underlie good calibration and good

discrimination?

SLIDE 17

Judgments of Discrete Events

translation of a “feeling of confidence” into a probability judgment

(Ferrell & McGoey, 1980; Suantak, Bolger, & Ferrell, 1996)

Underlying good calibration Judgment Skills

“the forecaster‟s ability to assign the „right‟ labels to his or her

forecasts” (Yates, 1982)

we refer to this ability as “calibration expertise” (Stone & Hoffman,

1999; Stone & Opel, 2000)

Underlying good discrimination

“the ability … to discriminate individual occasions on which the event
f interest will and will not take place” (Yates, 1982)
requires substantive knowledge about the events of interest
we refer to this ability as “substantive expertise” (Stone & Hoffman,

1999; Stone & Opel, 2000)

SLIDE 18

Judgments of Discrete Events

Many types of poor calibration (e.g., overconfidence) are resistant to

training techniques (e.g., Sieck & Arkes, 2005).

Judgment Training: Calibration

In particular, providing general advice seems to have little effect,

presumably because people dismiss this advice as not relevant to them.

The approach that seems to be most fruitful is to provide performance

feedback about past judgment sessions (e.g., Lichtenstein & Fischhoff, 1980). This performance feedback entails providing more than outcome feedback; typically it entails providing calibration graphs of one‟s performance.

The approach is particularly useful in reducing judgments that are
verly extreme (Lichtenstein, Fischhoff, & Phillips, 1982).
To maximize the effectiveness of this approach, we both present people

with their calibration graphs and provide assistance in the interpretation of it (Stone & Opel, 2000; Stone, Rittmayer, & Parker, 2004).

SLIDE 19

Judgments of Discrete Events

To improve discrimination, one needs to provide substantive information

related to the task at hand, or to train people to better use the information they have.

Judgment Training: Discrimination

Because it requires actual substantive information, discrimination is

sometimes referred to as a more fundamental skill (e.g., Yates, 1982).

Thus, discrimination training entails providing environmental feedback, i.e.,

information about the environment one is making predictions in / about.

SLIDE 20

Judgments of Discrete Events

Goal: Do calibration expertise (e.g., calibration) and substantive expertise

(e.g., discrimination) reflect two conceptually distinct skills that need to be trained separately?

Judgment Training: Stone & Opel (2000)

Basic Approach: Provide performance feedback to train calibration and

environmental feedback to train discrimination, and examine the effect of each on the other measure.

Performance feedback – Present participants with information related to

their performance, in terms of a calibration diagram and accompanying individual feedback (e.g., you were overconfident…).

Environmental feedback – Present participants with substantive information

regarding the task at hand

SLIDE 21

Judgments of Discrete Events

Design Judgment Training: Stone & Opel (2000)

All participants responded twice, once during pretraining

(baseline) and once during posttraining.

Between pretraining and posttraining participants received

either: 1) No feedback 3) Environmental feedback 2) Performance feedback

SLIDE 22

Judgments of Discrete Events

Materials Judgment Training: Stone & Opel (2000)

Example question:

What period was this slide from? a) Medieval (earlier period) b) Renaissance (later period) Probability from the later time period: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

SLIDE 23

Judgments of Discrete Events

Materials Judgment Training: Stone & Opel (2000)

100 hard slides (mean = 60% from the pretest).
Participants saw 50 easy and 50 hard slides in both the

pretraining and posttraining test phases.

100 easy slides (mean = 80% from the pretest).

SLIDE 24

Judgments of Discrete Events

Procedure Judgment Training: Stone & Opel (2000)

Participants arrived in groups of 15, and received a brief (10-15

minute) lecture on calibration and discrimination.

All participants went through the pretraining phase, responding

to 50 hard and 50 easy slides.

Participants were split into groups of 5, and underwent the

appropriate training technique.

SLIDE 25

Judgments of Discrete Events

Procedure Judgment Training: Stone & Opel (2000)

Performance feedback group -- Provided calibration diagrams,

and individualized feedback.

Environmental feedback group -- Given lecture on art history.
No Feedback -- No intervention.
Participants reconvened in the main room, and responded to

another 50 hard and 50 easy slides.

SLIDE 26

Judgments of Discrete Events

Results – Hard Slides: Mean Probability Score Judgment Training: Stone & Opel (2000)

Performance Feedback Environmental Feedback No Feedback

Scores stayed the same, going from .286 to .273
Scores decreased from .293 to .259 **
Scores decreased from .287 to .233 **

** indicates p < .01

SLIDE 27

Judgments of Discrete Events

Results – Hard Slides: Calibration Judgment Training: Stone & Opel (2000)

Performance Feedback Environmental Feedback No Feedback

Scores stayed the same, going from .094 to .093
Scores decreased from .095 to .067 **
Scores stayed the same, going from .094 to .090

SLIDE 28

Judgments of Discrete Events

Results – Hard Slides: Overconfidence Judgment Training: Stone & Opel (2000)

Performance Feedback Environmental Feedback No Feedback

Scores stayed the same, going from .188 to .168
Scores decreased from .188 to .106 **
Scores stayed the same, going from .176 to .184

SLIDE 29

Judgments of Discrete Events

Results – Hard Slides: Discrimination Judgment Training: Stone & Opel (2000)

Performance Feedback Environmental Feedback No Feedback

Scores stayed the same, going from .055 to .056
Scores stayed the same, going from .048 to .039
Scores increased from .054 to .092 **

SLIDE 30

Judgments of Discrete Events

Results – Hard Slides: Percent Correct Judgment Training: Stone & Opel (2000)

Performance Feedback Environmental Feedback No Feedback

Scores stayed the same, going from 58.2% to 60.4% correct
Scores stayed the same, going from 56.2% to 58.3% correct
Scores increased from 57.0% to 71.4% correct **

SLIDE 31

Judgments of Discrete Events

Results – Easy Slides Judgment Training: Stone & Opel (2000)

The results with the easy slides were similar to those with the hard slides. In particular:

Performance feedback reduced overconfidence (and improved calibration

generally, although not significantly), but did not influence discrimination

r percent correct.
Environmental feedback improved discrimination and percent correct, but

did not improve calibration or overconfidence. Additionally, the provision of environmental feedback actually led to an increase in overconfidence. This result is in keeping with many studies (e.g., Oskamp, 1965) that show that providing information can increase perception

f knowledge more than actual knowledge.

SLIDE 32

Judgments of Discrete Events

Conclusions Judgment Training: Stone & Opel (2000)

Calibration and discrimination appear to reflect two separate skills (which

we call calibration and substantive expertise), which require separate training techniques.

Further, there is some risk that training of one skill can actually cause

decrements in the other skill.

SLIDE 33

Judgments of Continuous Events

Overall accuracy – Mean Squared Error (MSE)

where f = quantity judgment d = quantitative outcome

n d f / ) (

2

Example Problem: What will be the high temperature on

Tuesday October 23rd?

d = actual temperature
f = judged temperature

SLIDE 34

Judgments of Continuous Events

n d f / ) (

2

Mean Squared Error

Make judgments of the same type repeatedly

Day 1 – predicted high = 71 degrees; actual high =75 Day 2 – predicted high = 68 degrees; actual high =74 Day 3 – predicted high = 67 degrees; actual high =66

) 66 67 ( ) 74 68 ( ) 75 71 (

2 2 2

= (16 + 36 + 1) / 3 = 17.67

SLIDE 35

Judgments of Continuous Events

n d f / ) (

2

Extended-MSE analysis (Lee & Yates, 1992)

e s e s

Y Y a Y Y e s

S S r S S Y Y 1 2

2 2

where

s

Y = the average judged value

e

Y = the average criterion value

s

Y

S = the standard deviation of the judge‟s values

= the standard deviation of the criterion values

e

Y

S

= achievement

a

r

SLIDE 36

Judgments of Continuous Events

Combines the decomposition below with a lens model

approach to decompose achievement.

e s e s

Y Y a Y Y e s

S S r S S Y Y 1 2

2 2

where

s

Y = the average judged value

e

Y = the average criterion value

s

Y

S = the standard deviation of the judge‟s values

= the standard deviation of the criterion values

e

Y

S

= achievement

a

r

SLIDE 37

Judgments of Continuous Events

Lens Model Analysis

R

e

cue validity

Y

e

x

2

x

3 1

x Y

s

cue utilization

r

a e

Y

s

Y R

s

G Y

e e

Y

Y

s s

Y

C

SLIDE 38

Judgments of Continuous Events

R

e

cue validity

Y

e

x

2

x

3 1

x Y

s

cue utilization

r

a e

Y

s

Y R

s

G Y

e e

Y

Y

s s

Y

C

achievement, the correlation between the judgments and the criterion values

a

r

G

linear knowledge, the correlation between the best linear prediction of the judge and the best linear prediction of the criterion

C

non-linear knowledge, the correlation between the residual from the best linear prediction of the judge and the best linear prediction of the criterion

s

R

linear consistency, the correlation between the judgments and best linear prediction

f the judgments

= = = =

e

R = environmental predictability, the correlation between the criterion and best linear

prediction of the criterion

SLIDE 39

Judgments of Continuous Events

From a lens model analysis:

e s e s

Y Y a Y Y e s

S S r S S Y Y 1 2

2 2

2 / 1 2 2 / 1 2

1 1

s e s e a

R R C R GR r

Recall:
Thus:

e s e s

Y Y s e s e Y Y e s

S S R R C R GR S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

SLIDE 40

Judgments of Continuous Events

e s e s

Y Y s e s e Y Y e s

S S R R C R G R S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

Controllable elements:

1)

s

Y

= average judged value 2)

s

Y

S

= the standard deviation of the judge‟s values 3) G = linear knowledge 4)

e s e s Y Y s e s e Y Y e s

S S R R C R GR S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

= linear consistency C?

SLIDE 41

Judgments of Continuous Events

e s e s

Y Y s e s e Y Y e s

S S R R C R G R S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

1)

s

Y

= average judged value

The goal here is to match the average judged value to the

average criterion value to the extent possible

Thus, helping the judge to be aware of the average criterion

value should reduce bias

SLIDE 42

Judgments of Continuous Events

e s e s

Y Y s e s e Y Y e s

S S R R C R G R S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

2)

s

Y

S

= the standard deviation of the judge‟s values

The standard deviation influences accuracy in two ways:
- in comparison to the criterion standard deviation
- in an absolute sense
In combination, the standard deviation of the judge‟s values

should generally be smaller than the standard deviation of the criterion values, but how much smaller depends on

achievement. Specifically, MSE is maximized when:

s

Y

S

=

a

r

x

e

Y

S

(Gigone & Hastie, 1997)

SLIDE 43

Judgments of Continuous Events

e s e s

Y Y s e s e Y Y e s

S S R R C R G R S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

3) G = linear knowledge

To have good knowledge, one needs to have diagnostic

information in the relevant domain.

Thus, the same factors that influence discrimination

(environmental feedback) should influence knowledge.

SLIDE 44

Judgments of Continuous Events

e s e s

Y Y s e s e Y Y e s

S S R R C R G R S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

4)

e s e s Y Y s e s e Y Y e s

S S R R C R GR S S Y Y ] 1 1 [ 1 2

2 / 1 2 2 / 1 2 2 2

= linear consistency

Linear consistency can be low for at least two reasons:
- applying a judgment policy inconsistently (i.e., having

random error in one‟s judgments)

- including irrelevant cues in one‟s judgments
Thus, any interventions that target either of the above

should improve consistency.

SLIDE 45

Judgments of Continuous Events

Goal: Investigate the effects of task information and cognitive information

feedback on each of the controllable measures in E-MSE analysis

Judgment Training: Youmans & Stone (2005)

Basic Approach: We provided participants with a prediction task, and then

gave them feedback in terms of either task information (TI), cognitive information (CI), or both. Participants then made another set of judgments.

Task information (TI) – Information about the policies that should be

followed to make good judgments (a type of environmental feedback).

Cognitive information (CI) – Information about an individual‟s judgment

policy.

SLIDE 46

Judgments of Continuous Events

Design

All participants responded twice, once during pretraining

(baseline) and once during posttraining.

Between pretraining and posttraining participants received

either: 1) No feedback 3) CI feedback 2) TI feedback 4) TI + CI feedback

Judgment Training: Youmans & Stone (2005)

SLIDE 47

Judgments of Continuous Events

The Task

Participants judged the income level of respondents to the General Social

Survey on a scale from 1 (under $5,000) to 12 ($75,000 or over).

Judgment Training: Youmans & Stone (2005)

Participants were told the average income level of all participants.
To make their predictions, participants were given information about three cues:

1) education level (diagnostic cue) 3) attitude toward easy listening music (non-diagnostic cue) 2) time spent socializing with relatives (non-diagnostic cue)

SLIDE 48

Judgments of Continuous Events

Feedback

TI Group: Participants were given the non-standardized regression weights

between the cues and the criterion. Specifically, these were .52 for education, .007 for time with relatives, and -.08 for easy music listening.

Judgment Training: Youmans & Stone (2005)

CI Group: Participants were given the non-standardized regression weights

corresponding to their predictions, i.e., their linear judgment policy

TI + CI Group: Participants were given both pieces of information above

(i.e., the actual regression weights and their judgment policy)

SLIDE 49

Judgments of Continuous Events

Results: Standard deviation of participants’ judgments Judgment Training: Youmans & Stone (2005)

No TI TI

No CI .058

.438 **

CI .014

.775 **
At pretest, the average standard deviation was 2.42 (optimal level = 1.09)

Change in

s

Y

S

SLIDE 50

Judgments of Continuous Events

Results: Linear knowledge Judgment Training: Youmans & Stone (2005)

No TI TI

No CI .004 .006 CI .000 .019 *

At pretest, linear knowledge was .98.

Change in G

SLIDE 51

Judgments of Continuous Events

Results: Linear consistency Judgment Training: Youmans & Stone (2005)

No TI TI

No CI .031 .100 CI .032 .097

At pretest, linear consistency was .87.

Change in

s

R

SLIDE 52

Judgments of Continuous Events

Conclusions Judgment Training: Youmans & Stone (2005)

CI alone did not produce improvements on any of the measures.
TI alone produced improvements in the standard deviation of the

judgments and in linear consistency.

CI when added to TI led to improvement in the standard deviation
f the judgments and in knowledge vs. TI alone.
These results thus provide a starting point for learning how to train

the various components in E-MSE analysis.

SLIDE 53

Judgments of Continuous Events

Limitations Judgment Training: Youmans & Stone (2005)

Although TI (and environmental feedback more generally) are

helpful for training various components, constructing this feedback is not straightforward in many applied situations.

Thus, the development of other techniques for training specific

components would be very beneficial.

SLIDE 54

ACES

Response to an IARPA announcement designed to improve

intelligence forecasting.

Provides an important applied situation for testing many of the ideas

described previously.

(PI: Dirk Warnaar, Applied Research Associates)

In particular, our approach is to provide training related to the various
components. Our main focus so far has been on discrete events but

we will extend training to continuous events in the future.

All training techniques have to be part of the ACES architecture and

thus be automated. This provides considerable challenges in adapting what we have done previously.

SLIDE 55

Automated Calibration Training

Purpose Eric Stone (WFU), Jason Luu (WFU), & Ben Simpkins (ARA)

Determine if this feedback is successful in a forecasting task
Develop an automated procedure to provide the feedback given in Stone and Opel (2000)

Additional Goals Task

Predict the winner of baseball games
What is the probability of the home team winning?

Experimental Design

Independent variable: Provision of calibration training (yes vs. no)
Dependent variables: calibration, overconfidence (both individual and aggregated)
Doing so would allow us to utilize this feedback in the ACES project
Examine the impact on aggregated forecasts as well as on individual forecasts

SLIDE 56

Automated Calibration Training

SLIDE 57

Automated Calibration Training

Example of individualized feedback – accompanying text

[Note: This text was provided below the calibration diagram, so both were simultaneously visible]

Your responses indicate substantial overconfidence: you were much too certain about which team would win. This overconfidence can be seen in results that are typically below the line for judgments greater than 50%, and/or above the line for judgements less than 50%. To increase the accuracy of your predictions, we recommend that you make less confident predictions. In particular, your responses were too extreme, especially when you stated you were certain the home team would win or lose. Specifically, as you can see on your calibration diagram, when you said you were 100% sure the home team would win, the home team won only 40% of the time. The same observation holds when you said you were very sure the home team would *not* win. When you said there was no chance the home team would win, the home team actually won 40% of the time. This was also true when you said there was a 90% or 10% chance of the home team winning, but the overconfidence was most serious when you stated you were certain the home team would win or lose. To improve your predictions, we recommend that you be more reluctant to make these types of highly confident predictions.

SLIDE 58

Future Goals

Training Discrimination

We are presently testing information sharing procedures, which should increase

substantive knowledge and hence discrimination, at least at an individual level.

We have discussed additional procedures for improving discrimination,

including 1) a database of relevant questions and answers for an item, and 2) providing contributors the option of asking questions about an item that can be answered by other contributors.

Improving Judgment of Continuous Quantities

Many of the procedures designed to improve discrimination should also

increase linear knowledge.

At the same time that we are trying to increase access to relevant information, we

also are trying to implement procedures for decreasing use of irrelevant information, which should improve linear consistency in particular.

Due to the use of aggregation procedures in this project, however, linear

consistency may be of less relevance than other components.

SLIDE 59