benchmarked standard setting
play

Benchmarked standard setting Steve Ferrara Measured Progress June - PowerPoint PPT Presentation

Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA Overview, this session Background on this session Benchmarked standard setting


  1. Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA

  2. Overview, this session  Background on this session  Benchmarked standard setting (BSS)  Embedded standard setting (ESS)  Rob Cook, ACT  Excellent discussant  Mary Pitoniak, ETS  Comments, discussion  15 minutes each segment 2

  3. Background this session  Principled approaches to assessment provide frameworks for designing, producing, and implementing assessments  A claim in state assessment:  Valid score interpretations are predicated on achievement level descriptors (ALDs)…  …and items whose response demands align with the ALD that corresponds to their location on a score scale (Ferrara, Lai, Reilly, & Nichols, 2016)  Item-ALD alignment (Ferrara, 2017) 3

  4. What is BSS?  “Benchmarking performance standards requires defining policy goals for student achievement through achievement level descriptors that are based on criteria external to a test as the first step in setting standards” (Ferrara, Lewis, & D’Brot, 2017, p. 2)  Keys  Define policy goal (e.g., NAEP-like standards)  ALDs based on the external criterion  Usually, a statistical link to the external criterion 4

  5. Why BSS?  Policy makers may use benchmarked performance standards as a tool to achieve a policy goal  E.g., higher performance standards than the current standards  Or they may want to demonstrate that a state program’s performance standards are as rigorous as an external benchmark  E.g., College and Career Readiness standards, national standards (e.g., NAEP), or international standards 5

  6. Examples of BSSs  International benchmarking for NAEP (Pashley & Philips, 1993)  Interpretation of NAEP scales (Philips et. al., 1993)  Integrate results from multiple standard settings (Green, Trimble, and Lewis, 2003)  Support articulation of cut scores across grades (Lewis & Haug, 2005)  Integrating content, performance, and other information (Haertel, 2002, 2012)  Evidence based standard setting (McClarty, Way, Porter, Beimers, & Miles, 2013)  Smarter Balanced, PARCC  NAEP-like standards in MO, WV  NAEP grade 12 preparedness standards All citations in Ferrara, Lewis, & D’Brot (2017) 6

  7. Features of typical standard settings that are benchmarkings  Introduction of impact data  To content experts, not policy people…  Vertical articulation  To achieve a policy goal…  Policy adjustments after standard setting  To achieve political acceptance 7

  8. Again, why BSS?  Build policy goals into the standard setting process from the beginning  Instead of making policy adjustments that undermine content based judgments by content experts  Instead of muddying the waters in a standard setting process  Maintain clear lines of responsibility and expertise  Policy makers make policy judgments on behalf of the public  Content experts make content based judgments 8

  9. Steps in BSS process  Policy makers choose an external criterion that represents their policy goal  Content experts write ALDs that reflect the assessment’s and external criterion’s knowledge and skill demands  Psychometricians create a statistical link to the external criterion  And identify one or more benchmarked cut scores that correspond to performance standards on the criterion  Content experts in a BSS workshop recommend retaining or adjusting the benchmarked cut scores  Write content based rationales for retaining or adjusting the target cut scores 9

  10. BSS workshop August 2017  eMPower  Reading, Writing & Language, Mathematics grades 3-8  Interim assessments (fall, winter, spring)  Vertical scales  Link to PSAT scale and CCR Benchmarks Not intended as a validation for eMPower standards or an advertisement A real demo 10

  11. BSS process for eMPower  Content experts wrote initial ALDs  Focus on the eMPower content standards  Psychometricians linked the grade 8 eMPower scale to the PSAT grade 9 scale  Identified the eMPower grade score that corresponds to the grade 8 PSAT CCR Benchmark  Cascaded this Proficient standard to grades 7, 6, … ,3  Identified Advanced as 1 SD above Proficient 11

  12. BSS process for eMPower  Content developers examined item-ALD alignment and refined ALDs  Standard setting panelists reviewed the benchmarked cut scores and recommended retaining or adjusting these cut scores  Wrote content based rationales for their recommendations 12

  13. Results  Results relevant to the “validity” of this (and other) BSS processes  i.e., Does BSS work? Does it enable reasonable, supportable standards?  Procedural, internal, and external validity evidence (Hambleton & Pitoniak, 2006)  Evidence  Item-ALD alignment by content development expert  Benchmarked and final cut scores  Confidence  Coercion 13

  14. Evidence: Item ALD alignment ItemID OIB rp50 Item-ALD Alignment 401872 23 0.06 Identifying details...aligns to a Basic ALD. 401296 24 0.07 Identifying a central idea...Basic ALD. Citing (two pieces of) textual evidence...barely 401318 25 0.13 aligns to a Proficient ALD. 401808 26 0.16 Identify how events...Proficient ALD. Citing (one piece of) textual evidence...Basic ALD….low cognitive demand and cognitive complexity. The distractors are very "tasty," which is the only reason it seems this item is so 128731A 27 0.16 difficult. Proficient benchmark = 0.106 Reading grade 7 as illustration 14`

  15. Evidence: Benchmarked, interim, and final cut scores REA07_R REA07_R REA07_R REA07_R REA07_R Performance_Level 0 1 2 3 4 Advanced Cut 40 41 41 41 41 Advanced 25.59 18.4 18.4 18.4 18.4 Proficient Cut 20 21 21 21 21 Proficient 36.14 43.33 43.33 43.33 43.33 Proficient + Advanced 61.7 61.7 61.7 61.7 61.7 Reading grade 7 as illustration 15

  16. Evidence: Confidence Strongly Strongly Disagree Disagree Undecided Agree Agree I understood the goals of the standard setting 0 0 0 2 7 workshop. I understood the procedures we used to 0 0 0 1 8 recommend standards. I understood how to use the standard setting 0 0 0 2 7 materials. Reading 6-8 as illustration 16

  17. Evidence: Coercion Strongly Strongly Disagree Disagree Undecided Agree Agree I understood how to think about the benchmarked cut scores. 0 0 0 3 6 I understood that I could retain or adjust the benchmarked Proficient and Advanced cut scores. 0 0 0 1 8 I understood how to write content based rationales for my recommended Proficient and Advanced cut scores on the Content Based Rationales form. 0 1 0 3 5 Not Useful at Extremely All Useful The Achievement Level Descriptors (ALDs): Overall descriptors 0 0 1 0 8 My answers to the two questions about each item 0 0 1 5 3 My judgments about the match of items to ALDs 0 0 1 1 7 My experience working with students 0 0 0 2 6 Reading 6-8 as illustration 17

  18. References Ferrara, S. (2017 April 28). Aligning item response demands with knowledge and skill requirements in achievement level descriptors: An approach to achieving full alignment and engineering cut scores. In D. Lewis (Chair), Engineered cut scores: Aligning standard setting methodology with contemporary assessment design principles . Coordinated session conducted at the annual meeting of the National Council on Measurement in Education, San Antonio, TX. Ferrara, S., Lai, E., Reilly, A., & Nichols. (2016). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook on cognition and assessment: Frameworks, methodologies, & applications (pp. 41-74). Malden, MA: Wiley Ferrara, S., Lewis, D., & D’Brot, J. (2017). Setting benchmarked performance standards: A method, procedures, and empirical results . Manuscript submitted for publication. Hambleton, R. K., & Pitoniak, M. J. (2006). In R. L. Brennan (Ed.), Educational Measurement (4 th ed.). Setting performance standards. Westport: American Council on Education and Praeger. 18

  19. Thank you. ferrara.steve@measuredprogress.org +1 603-749-9102, ext. 7065

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend