Benchmarked standard setting Steve Ferrara Measured Progress June - - PowerPoint PPT Presentation

benchmarked standard setting
SMART_READER_LITE
LIVE PREVIEW

Benchmarked standard setting Steve Ferrara Measured Progress June - - PowerPoint PPT Presentation

Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA Overview, this session Background on this session Benchmarked standard setting


slide-1
SLIDE 1

Benchmarked standard setting

Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference

  • n Student Assessment, San Diego, CA
slide-2
SLIDE 2

Overview, this session

  • Background on this session
  • Benchmarked standard setting (BSS)
  • Embedded standard setting (ESS)
  • Rob Cook, ACT
  • Excellent discussant
  • Mary Pitoniak, ETS
  • Comments, discussion
  • 15 minutes each segment

2

slide-3
SLIDE 3

Background this session

  • Principled approaches to assessment provide

frameworks for designing, producing, and implementing assessments

  • A claim in state assessment:
  • Valid score interpretations are predicated on

achievement level descriptors (ALDs)…

  • …and items whose response demands align with

the ALD that corresponds to their location on a score scale (Ferrara, Lai, Reilly, & Nichols, 2016)

  • Item-ALD alignment (Ferrara, 2017)

3

slide-4
SLIDE 4

What is BSS?

  • “Benchmarking performance standards

requires defining policy goals for student achievement through achievement level descriptors that are based on criteria external to a test as the first step in setting standards” (Ferrara, Lewis, & D’Brot, 2017, p. 2)

  • Keys
  • Define policy goal (e.g., NAEP-like standards)
  • ALDs based on the external criterion
  • Usually, a statistical link to the external criterion

4

slide-5
SLIDE 5

Why BSS?

  • Policy makers may use benchmarked

performance standards as a tool to achieve a policy goal

  • E.g., higher performance standards than the

current standards

  • Or they may want to demonstrate that a state

program’s performance standards are as rigorous as an external benchmark

  • E.g., College and Career Readiness standards,

national standards (e.g., NAEP), or international standards

5

slide-6
SLIDE 6

Examples of BSSs

  • International benchmarking for NAEP (Pashley & Philips, 1993)
  • Interpretation of NAEP scales (Philips et. al., 1993)
  • Integrate results from multiple standard settings (Green, Trimble,

and Lewis, 2003)

  • Support articulation of cut scores across grades (Lewis & Haug,

2005)

  • Integrating content, performance, and other information (Haertel,

2002, 2012)

  • Evidence based standard setting (McClarty, Way, Porter, Beimers, &

Miles, 2013)

  • Smarter Balanced, PARCC
  • NAEP-like standards in MO, WV
  • NAEP grade 12 preparedness standards

All citations in Ferrara, Lewis, & D’Brot (2017)

6

slide-7
SLIDE 7

Features of typical standard settings that are benchmarkings

  • Introduction of impact data
  • To content experts, not policy people…
  • Vertical articulation
  • To achieve a policy goal…
  • Policy adjustments after standard setting
  • To achieve political acceptance

7

slide-8
SLIDE 8

Again, why BSS?

  • Build policy goals into the standard setting

process from the beginning

  • Instead of making policy adjustments that

undermine content based judgments by content experts

  • Instead of muddying the waters in a standard

setting process

  • Maintain clear lines of responsibility and

expertise

  • Policy makers make policy judgments on behalf
  • f the public
  • Content experts make content based judgments

8

slide-9
SLIDE 9

Steps in BSS process

  • Policy makers choose an external criterion that

represents their policy goal

  • Content experts write ALDs that reflect the

assessment’s and external criterion’s knowledge and skill demands

  • Psychometricians create a statistical link to the

external criterion

  • And identify one or more benchmarked cut scores that

correspond to performance standards on the criterion

  • Content experts in a BSS workshop recommend

retaining or adjusting the benchmarked cut scores

  • Write content based rationales for retaining or adjusting

the target cut scores

9

slide-10
SLIDE 10

BSS workshop August 2017

  • eMPower
  • Reading, Writing & Language,

Mathematics grades 3-8

  • Interim assessments (fall, winter, spring)
  • Vertical scales
  • Link to PSAT scale and CCR Benchmarks

10

Not intended as a validation for eMPower standards or an advertisement A real demo

slide-11
SLIDE 11

BSS process for eMPower

  • Content experts wrote initial ALDs
  • Focus on the eMPower content standards
  • Psychometricians linked the grade 8

eMPower scale to the PSAT grade 9 scale

  • Identified the eMPower grade score that

corresponds to the grade 8 PSAT CCR Benchmark

  • Cascaded this Proficient standard to grades 7, 6,

… ,3

  • Identified Advanced as 1 SD above Proficient

11

slide-12
SLIDE 12

BSS process for eMPower

  • Content developers examined item-ALD

alignment and refined ALDs

  • Standard setting panelists reviewed the

benchmarked cut scores and recommended retaining or adjusting these cut scores

  • Wrote content based rationales for their

recommendations

12

slide-13
SLIDE 13

Results

  • Results relevant to the “validity” of this (and other)

BSS processes

  • i.e., Does BSS work? Does it enable reasonable,

supportable standards?

  • Procedural, internal, and external validity evidence

(Hambleton & Pitoniak, 2006)

  • Evidence
  • Item-ALD alignment by content development expert
  • Benchmarked and final cut scores
  • Confidence
  • Coercion

13

slide-14
SLIDE 14

Evidence: Item ALD alignment

14`

ItemID OIB rp50 Item-ALD Alignment 401872 23 0.06 Identifying details...aligns to a Basic ALD. 401296 24 0.07 Identifying a central idea...Basic ALD. 401318 25 0.13 Citing (two pieces of) textual evidence...barely aligns to a Proficient ALD. 401808 26 0.16 Identify how events...Proficient ALD. 128731A 27 0.16 Citing (one piece of) textual evidence...Basic ALD….low cognitive demand and cognitive

  • complexity. The distractors are very "tasty,"

which is the only reason it seems this item is so difficult.

Proficient benchmark = 0.106

Reading grade 7 as illustration

slide-15
SLIDE 15

Evidence: Benchmarked, interim, and final cut scores

15

Performance_Level REA07_R REA07_R 1 REA07_R 2 REA07_R 3 REA07_R 4 Advanced Cut 40 41 41 41 41 Advanced 25.59 18.4 18.4 18.4 18.4 Proficient Cut 20 21 21 21 21 Proficient 36.14 43.33 43.33 43.33 43.33 Proficient + Advanced 61.7 61.7 61.7 61.7 61.7

Reading grade 7 as illustration

slide-16
SLIDE 16

Evidence: Confidence

16

Strongly Disagree Disagree Undecided Agree Strongly Agree

I understood the goals of the standard setting workshop. 2 7 I understood the procedures we used to recommend standards. 1 8 I understood how to use the standard setting materials. 2 7 Reading 6-8 as illustration

slide-17
SLIDE 17

Evidence: Coercion

17

Strongly Disagree Disagree Undecided Agree Strongly Agree

I understood how to think about the benchmarked cut scores. 3 6 I understood that I could retain or adjust the benchmarked Proficient and Advanced cut scores. 1 8 I understood how to write content based rationales for my recommended Proficient and Advanced cut scores on the Content Based Rationales form. 1 3 5

Not Useful at All Extremely Useful

The Achievement Level Descriptors (ALDs): Overall descriptors 1 8 My answers to the two questions about each item 1 5 3 My judgments about the match of items to ALDs 1 1 7 My experience working with students 2 6 Reading 6-8 as illustration

slide-18
SLIDE 18

References

Ferrara, S. (2017 April 28). Aligning item response demands with knowledge and skill requirements in achievement level descriptors: An approach to achieving full alignment and engineering cut scores. In D. Lewis (Chair), Engineered cut scores: Aligning standard setting methodology with contemporary assessment design principles. Coordinated session conducted at the annual meeting of the National Council on Measurement in Education, San Antonio, TX. Ferrara, S., Lai, E., Reilly, A., & Nichols. (2016). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook

  • n cognition and assessment: Frameworks, methodologies, & applications (pp.

41-74). Malden, MA: Wiley Ferrara, S., Lewis, D., & D’Brot, J. (2017). Setting benchmarked performance standards: A method, procedures, and empirical results. Manuscript submitted for publication. Hambleton, R. K., & Pitoniak, M. J. (2006). In R. L. Brennan (Ed.), Educational Measurement (4th ed.). Setting performance standards. Westport: American Council on Education and Praeger.

18

slide-19
SLIDE 19

Thank you.

ferrara.steve@measuredprogress.org +1 603-749-9102, ext. 7065