Benchmarked standard setting Steve Ferrara Measured Progress June - PowerPoint PPT Presentation

Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA

Overview, this session  Background on this session  Benchmarked standard setting (BSS)  Embedded standard setting (ESS)  Rob Cook, ACT  Excellent discussant  Mary Pitoniak, ETS  Comments, discussion  15 minutes each segment 2

Background this session  Principled approaches to assessment provide frameworks for designing, producing, and implementing assessments  A claim in state assessment:  Valid score interpretations are predicated on achievement level descriptors (ALDs)…  …and items whose response demands align with the ALD that corresponds to their location on a score scale (Ferrara, Lai, Reilly, & Nichols, 2016)  Item-ALD alignment (Ferrara, 2017) 3

What is BSS?  “Benchmarking performance standards requires defining policy goals for student achievement through achievement level descriptors that are based on criteria external to a test as the first step in setting standards” (Ferrara, Lewis, & D’Brot, 2017, p. 2)  Keys  Define policy goal (e.g., NAEP-like standards)  ALDs based on the external criterion  Usually, a statistical link to the external criterion 4

Why BSS?  Policy makers may use benchmarked performance standards as a tool to achieve a policy goal  E.g., higher performance standards than the current standards  Or they may want to demonstrate that a state program’s performance standards are as rigorous as an external benchmark  E.g., College and Career Readiness standards, national standards (e.g., NAEP), or international standards 5

Examples of BSSs  International benchmarking for NAEP (Pashley & Philips, 1993)  Interpretation of NAEP scales (Philips et. al., 1993)  Integrate results from multiple standard settings (Green, Trimble, and Lewis, 2003)  Support articulation of cut scores across grades (Lewis & Haug, 2005)  Integrating content, performance, and other information (Haertel, 2002, 2012)  Evidence based standard setting (McClarty, Way, Porter, Beimers, & Miles, 2013)  Smarter Balanced, PARCC  NAEP-like standards in MO, WV  NAEP grade 12 preparedness standards All citations in Ferrara, Lewis, & D’Brot (2017) 6

Features of typical standard settings that are benchmarkings  Introduction of impact data  To content experts, not policy people…  Vertical articulation  To achieve a policy goal…  Policy adjustments after standard setting  To achieve political acceptance 7

Again, why BSS?  Build policy goals into the standard setting process from the beginning  Instead of making policy adjustments that undermine content based judgments by content experts  Instead of muddying the waters in a standard setting process  Maintain clear lines of responsibility and expertise  Policy makers make policy judgments on behalf of the public  Content experts make content based judgments 8

Steps in BSS process  Policy makers choose an external criterion that represents their policy goal  Content experts write ALDs that reflect the assessment’s and external criterion’s knowledge and skill demands  Psychometricians create a statistical link to the external criterion  And identify one or more benchmarked cut scores that correspond to performance standards on the criterion  Content experts in a BSS workshop recommend retaining or adjusting the benchmarked cut scores  Write content based rationales for retaining or adjusting the target cut scores 9

BSS workshop August 2017  eMPower  Reading, Writing & Language, Mathematics grades 3-8  Interim assessments (fall, winter, spring)  Vertical scales  Link to PSAT scale and CCR Benchmarks Not intended as a validation for eMPower standards or an advertisement A real demo 10

BSS process for eMPower  Content experts wrote initial ALDs  Focus on the eMPower content standards  Psychometricians linked the grade 8 eMPower scale to the PSAT grade 9 scale  Identified the eMPower grade score that corresponds to the grade 8 PSAT CCR Benchmark  Cascaded this Proficient standard to grades 7, 6, … ,3  Identified Advanced as 1 SD above Proficient 11

BSS process for eMPower  Content developers examined item-ALD alignment and refined ALDs  Standard setting panelists reviewed the benchmarked cut scores and recommended retaining or adjusting these cut scores  Wrote content based rationales for their recommendations 12

Results  Results relevant to the “validity” of this (and other) BSS processes  i.e., Does BSS work? Does it enable reasonable, supportable standards?  Procedural, internal, and external validity evidence (Hambleton & Pitoniak, 2006)  Evidence  Item-ALD alignment by content development expert  Benchmarked and final cut scores  Confidence  Coercion 13

Evidence: Item ALD alignment ItemID OIB rp50 Item-ALD Alignment 401872 23 0.06 Identifying details...aligns to a Basic ALD. 401296 24 0.07 Identifying a central idea...Basic ALD. Citing (two pieces of) textual evidence...barely 401318 25 0.13 aligns to a Proficient ALD. 401808 26 0.16 Identify how events...Proficient ALD. Citing (one piece of) textual evidence...Basic ALD….low cognitive demand and cognitive complexity. The distractors are very "tasty," which is the only reason it seems this item is so 128731A 27 0.16 difficult. Proficient benchmark = 0.106 Reading grade 7 as illustration 14`

Evidence: Benchmarked, interim, and final cut scores REA07_R REA07_R REA07_R REA07_R REA07_R Performance_Level 0 1 2 3 4 Advanced Cut 40 41 41 41 41 Advanced 25.59 18.4 18.4 18.4 18.4 Proficient Cut 20 21 21 21 21 Proficient 36.14 43.33 43.33 43.33 43.33 Proficient + Advanced 61.7 61.7 61.7 61.7 61.7 Reading grade 7 as illustration 15

Evidence: Confidence Strongly Strongly Disagree Disagree Undecided Agree Agree I understood the goals of the standard setting 0 0 0 2 7 workshop. I understood the procedures we used to 0 0 0 1 8 recommend standards. I understood how to use the standard setting 0 0 0 2 7 materials. Reading 6-8 as illustration 16

Evidence: Coercion Strongly Strongly Disagree Disagree Undecided Agree Agree I understood how to think about the benchmarked cut scores. 0 0 0 3 6 I understood that I could retain or adjust the benchmarked Proficient and Advanced cut scores. 0 0 0 1 8 I understood how to write content based rationales for my recommended Proficient and Advanced cut scores on the Content Based Rationales form. 0 1 0 3 5 Not Useful at Extremely All Useful The Achievement Level Descriptors (ALDs): Overall descriptors 0 0 1 0 8 My answers to the two questions about each item 0 0 1 5 3 My judgments about the match of items to ALDs 0 0 1 1 7 My experience working with students 0 0 0 2 6 Reading 6-8 as illustration 17

References Ferrara, S. (2017 April 28). Aligning item response demands with knowledge and skill requirements in achievement level descriptors: An approach to achieving full alignment and engineering cut scores. In D. Lewis (Chair), Engineered cut scores: Aligning standard setting methodology with contemporary assessment design principles . Coordinated session conducted at the annual meeting of the National Council on Measurement in Education, San Antonio, TX. Ferrara, S., Lai, E., Reilly, A., & Nichols. (2016). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook on cognition and assessment: Frameworks, methodologies, & applications (pp. 41-74). Malden, MA: Wiley Ferrara, S., Lewis, D., & D’Brot, J. (2017). Setting benchmarked performance standards: A method, procedures, and empirical results . Manuscript submitted for publication. Hambleton, R. K., & Pitoniak, M. J. (2006). In R. L. Brennan (Ed.), Educational Measurement (4 th ed.). Setting performance standards. Westport: American Council on Education and Praeger. 18

Thank you. ferrara.steve@measuredprogress.org +1 603-749-9102, ext. 7065

Benchmarked standard setting Steve Ferrara Measured Progress June - PowerPoint PPT Presentation

Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA Overview, this session Background on this session Benchmarked standard setting

Deriving bioconcentration factors of constituents of essential oils using in-vivo benchmarked

Reading Strand and Ideas Standard Statement 9 Range of Reading and Standard Statement 10 Level

Setting the Test Standard for Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR

Bookmark Standard Setting and Louisiana Louisiana has a long history of Bookmark Standard Setting

Setting the Test Standard for Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR

STANDARD SETTING IN AN EVOLVING ASSESSMENT ENVIRONMENT Andrew Wiley, ACS Ventures, LLC February

Standard Error & Confidence Interval Standard Error A particular kind of standard

ISO 22000: 2018 & PAS 96: 2017 @ 2019 Setting the Standard for Business Setting the

Standard Seven: The blood standard quality improvement cycle Philippa Kirkpatrick The NSQHS

Standard Cell Design Advanced VLSI Design CMPE 641 Standard Cell Libraries Standard cell

Standard Cell Design Advanced VLSI Design CMPE 414 Standard Cell Libraries Standard cell

Function for INTOSAIs standard-setting activities? PSC-SC meeting, 2017 Agenda item 10

Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR Nasdaq: AEHR Forward Looking

Business Models and the Standard Setting Process Anne Layne-Farrar, Director Global Competition

The SPS Agreement & International Standards Codex standard setting procedures Current

ONLINE MACHINE LEARNING AND DATA MINING EDO LIBERTY STANDARD MACHINE LEARNING SETTING =

University of Hawaii Community Colleges Use of Perkins Funds 2008-09 Carl D. Perkins Career

Public Workshop Eunice Kim, City of Salem Heather Austin, 3J Consulting Elizabeth Decker, JET

Information Session Spring 2020 Information Session What is the purpose? Provide

OUTLINE OF PRESENTATION Difference between College and University Articulation agreements

Identifying Essential Standards Todays Focus Understand and be able to articulate how

Creative Ideas for Using iPad Apps in Speech- Language Therapy Erik X. Raj, Ph.D., CCC-SLP

The Role of the Speech Language Pathologist in Schools SEPAC Meeting March 29, 2018 A little

A follow-up study of phonological development in bilingual children: Implications for clinical

Benchmarked standard setting Steve Ferrara Measured Progress June - PowerPoint PPT Presentation

Benchmarked standard setting Steve Ferrara Measured Progress June 28, 2018 Presentation in the National Conference on Student Assessment, San Diego, CA Overview, this session Background on this session Benchmarked standard setting

Deriving bioconcentration factors of constituents of essential oils using in-vivo benchmarked

Reading Strand and Ideas Standard Statement 9 Range of Reading and Standard Statement 10 Level

Setting the Test Standard for Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR

Bookmark Standard Setting and Louisiana Louisiana has a long history of Bookmark Standard Setting

Setting the Test Standard for Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR

STANDARD SETTING IN AN EVOLVING ASSESSMENT ENVIRONMENT Andrew Wiley, ACS Ventures, LLC February

Standard Error &amp; Confidence Interval Standard Error A particular kind of standard

ISO 22000: 2018 &amp; PAS 96: 2017 @ 2019 Setting the Standard for Business Setting the

Standard Seven: The blood standard quality improvement cycle Philippa Kirkpatrick The NSQHS

Standard Cell Design Advanced VLSI Design CMPE 641 Standard Cell Libraries Standard cell

Standard Cell Design Advanced VLSI Design CMPE 414 Standard Cell Libraries Standard cell

Function for INTOSAIs standard-setting activities? PSC-SC meeting, 2017 Agenda item 10

Setting the Test Standard for Tomorrow Tomorrow Nasdaq: AEHR Nasdaq: AEHR Forward Looking

Business Models and the Standard Setting Process Anne Layne-Farrar, Director Global Competition

The SPS Agreement &amp; International Standards Codex standard setting procedures Current

ONLINE MACHINE LEARNING AND DATA MINING EDO LIBERTY STANDARD MACHINE LEARNING SETTING =

University of Hawaii Community Colleges Use of Perkins Funds 2008-09 Carl D. Perkins Career

Public Workshop Eunice Kim, City of Salem Heather Austin, 3J Consulting Elizabeth Decker, JET

Information Session Spring 2020 Information Session What is the purpose? Provide

OUTLINE OF PRESENTATION Difference between College and University Articulation agreements

Identifying Essential Standards Todays Focus Understand and be able to articulate how

Creative Ideas for Using iPad Apps in Speech- Language Therapy Erik X. Raj, Ph.D., CCC-SLP

The Role of the Speech Language Pathologist in Schools SEPAC Meeting March 29, 2018 A little

A follow-up study of phonological development in bilingual children: Implications for clinical

Standard Error & Confidence Interval Standard Error A particular kind of standard

ISO 22000: 2018 & PAS 96: 2017 @ 2019 Setting the Standard for Business Setting the

The SPS Agreement & International Standards Codex standard setting procedures Current