On the Large Scale Assessment of Academic Achievement: The Role of Performance Assessment
Richard J. Shavelson Stanford University
Invited Address Congress of the German Society for Educational Research Göttingen University September 21, 2000
On the Large Scale Assessment of Academic Achievement: The Role of - - PowerPoint PPT Presentation
On the Large Scale Assessment of Academic Achievement: The Role of Performance Assessment Richard J. Shavelson Stanford University Invited Address Congress of the German Society for Educational Research Gttingen University September 21,
Invited Address Congress of the German Society for Educational Research Göttingen University September 21, 2000
2
3
4
5
6
7
Students are asked to: Part I: Examine four powders using five tests (sight, touch, water, vinegar and iodine). Part II: Find the content in two mystery powders based on their
8
9
10
11
Flashlight Sticky Towers Student Notebooks and Pencils
12
PULSE
At this station you should have
A watch A step on the floor to climb on
Read ALL directions carefully. Your task:
Find out how your pulse changes when you climb up and down on a step for 5 minutes.
This is what you should do:
ASK A TEACHER FOR HELP
1. Make a table and write down the times at which you measured your pulse and the measurements you made. 2. How did your pulse change during the exercise? 3. Why do you think your pulse changed in this way?
13
14
15
Comparative Investigation Component Identification Classification Observation Others
Procedure- Based Evidence- Based Rubric Others Others
Analytic Analytic
Holistic Holistic
Astronomy
(CAP Assessment) Dimension- Based Data Accuracy- Based
16
Declarative Procedural Strategic Knowledge Knowledge Knowledge
(Knowing the “that”) (Knowing the “how”) (Knowing the “which,” “when,” and “why”)
Proficiency
Low High
Extent
(How much?)
Structure
(How is it organized?)
Others
(Precision? Efficiency? Automaticity?)
Cognitive Cognitive Tools: Tools:
Planning Planning Monitoring Monitoring
Domain-specific content:
Domain-specific production systems Problem schemata/ strategies/
17
Declarative Procedural Strategic Knowledge Knowledge Knowledge
Performance Assessments Concept Maps
Assessments
Procedure Maps Models/ Mental Maps
Extent Structure
Others
18
Declarative Knowledge Declarative vs. Procedural Knowledge
19
20
Standard
Science as Inquiry: “Design and Conduct a Scientific Investigation”
Construct
Declara- tive Procedu- ral Strategic Extent Structure ?
Observed Behavior
Task/Response Sampled Domain Force & Motion
Task/ Response Task/ Response Task/ Response Task/ Response Task/ Response Task/ Response Friction Task/ Response
Define Define Define Sample Generalizable to Other Tasks in the Domain?
21
A score assigned to a student is but one possible sample from a large domain of possible scores that the student might have received if a different sample of assessment tasks were included, if different judges evaluated performance, and the like...
Is a score assigned generalizable, for example, across:
Validity Reliability
22
23
Table 1 Variance Component Estimates for the Person x Rater x Task x Occasion G Study Using the Science Data (from Shavelson, Baxter & Gao, 1983)
Percent Source of Variance Total Variability n Component Variability
26 .07 4 Rater 2 0.00a T Ta as sk k ( (t t) ) 2 2 0. .0 00 0a
a
O Oc cc ca as si io
n ( (o
) 2 2 0. .0 01 1 1 1 pr 0.01 1 p pt t 0. .6 63 3 3 32 2 p po
.0 00 0a
a
rt 0.00 ro 0.00 t to
.0 00 0a
a
prt 0.00a pro 0.01 p pt to
1. .1 16 6 5 59 9 rto 0.00a p pr rt to
,e e 0. .0 08 8 4 4
24
H
1
H
2
C
1
C
2
rH1H2 = .53 rC1C2 = ? rH1C1 = .52 rH2C2 = ? rH1C2 = ? rC1H2 = .45
25
ISSUE STUDY FOCUS FINDINGS
different tasks.
interaction
raters
domain expertise
assessments to the curriculum characteristics
across tasks. Task sampling variability is large at both individual and school level.
reliably estimate performance.
across occasions even though they receive about the same scores.
variability.
generally higher than .80. However, coefficients lower than .70 have been
important disagreements among raters.
limited due to volatility in students performance across occasions.
sensitive than proximal-curriculum assessments to changes in students’ performance.
26
– Task x Scoring System Classification (Shavelson, Ruiz-Primo, Baxter) – Content x Process Characterization (Baxter & Glaser) – Basic, Quantitative and Spatial Reasoning (Ayala, Shavelson, & Ruiz-Primo)