E ng ine e ring PL Ds and T e st Co nte nt to E ng ine e r Cut - - PowerPoint PPT Presentation

e ng ine e ring pl ds and t e st co nte nt to e ng ine e
SMART_READER_LITE
LIVE PREVIEW

E ng ine e ring PL Ds and T e st Co nte nt to E ng ine e r Cut - - PowerPoint PPT Presentation

E ng ine e ring PL Ds and T e st Co nte nt to E ng ine e r Cut Sc o re s OR Aligning Test Development with Intended Score Interpretations Steve Ferrara A presentation in Engineered Cut Scores: Aligning Standard Setting Methodology with


slide-1
SLIDE 1

OR Aligning Test Development with Intended Score Interpretations

Steve Ferrara A presentation in Engineered Cut Scores: Aligning Standard Setting Methodology with Contemporary Assessment Design Principles, a session in the National Conference on Student Assessment June 22, 2016

E ng ine e ring PL Ds and T e st Co nte nt to E ng ine e r Cut Sc o re s

slide-2
SLIDE 2

2

Define and illustrate test content‐PLD alignment

Principled design and validity argumentation call for this alignment

Engineering PLDs

Engineering test forms

Engineering items

Concept not new; engineering PLDs, test forms, and items is WIP

Ove rvie w

Engineered Cut Scores NCSA Session

slide-3
SLIDE 3

3

 Need a broader conception of alignment to engineer cut

scores

 Proposed: Alignment is the degree to which an item’s

response demands are consistent with the knowledge and skill requirements described in the corresponding PLD

 Aside: Need a broader type of alignment to provide

evidence to support intended score interpretations and uses, as well

Pre mise

Response demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination) Response demands: Content, cognitive, and linguistic features of items that are related to item difficulty and quality (i.e., discrimination)

Engineered Cut Scores NCSA Session

slide-4
SLIDE 4

4

What is te st c o nte nt-PL D alig nme nt?

PARCC PLDs, grade 6 Reading; http://www.parccon line.org/assessment s/test‐design/ela‐ literacy/ela‐ performance‐level‐ descriptors

Engineered Cut Scores NCSA Session

Aligned Aligned

High xxx Item Response Demands xxx Content, Cognitive, Linguistic xxx Consistent with Level 4 KSAs xxx Item Response Demands xxx Content, Cognitive, Linguistic xxx Consistent with Level 3 KSAs xxx Item Response Demands xxx Content, Cognitive, Linguistic xxx Consistent with Level 2 KSAs Low Level 4: Approaches Expectations for assessed content Very complex text: General accuracy and understanding Moderately complex text: General accuracy and understanding Readily accessible text: Mostly accurate analyses, showing understanding Level 3: Approaches Expectations for assessed content Very complex text: Minimal accuracy and understanding Moderately complex text: General accuracy, basic understanding Readily accessible text: Mostly accurate analyses, showing understanding Level 2: Partially Meets Expectations for assessed standards Very complex text: Inaccurate analysis, limited understanding Moderately complex text: Minimal accuracy and understanding Readily accessible text: Partial accuracy and understanding Expectations Level 2 Partially Meets Expectations Level 4 Meets Expectations Level 3 Approaches

Articulated Articulated

slide-5
SLIDE 5

5

T e st c o nte nt-PL D alig nme nt and artic ulatio n no t auto matic

Misalignment Misalignment Misalignment Misalignment Misalignment Misalignment Misalignment Misalignment

Ferrara, Svetina, Skucha, & Davidson, 2011

Alignment within and across grades Alignment within and across grades

Engineered Cut Scores NCSA Session

slide-6
SLIDE 6

Engineered Cut Scores NCSA Session 6

L inking re ading c o mpre he nsio n ite ms to the Co mmo n E uro pe an F rame o f Re fe re nc e (CE F R)

Figueras, N., Kaftandjieva, F., & Takala, S. (2013)

slide-7
SLIDE 7

PRINC IPL ED A PPRO A C HES

7 Engineered Cut Scores NCSA Session

slide-8
SLIDE 8

 Several names, several conceptualizations  Common elements, varying details (Ferrara, Lai, Reilly, &

Nichols, 2016)

  • Evidence Centered Design (ECD)
  • Assessment Engineering (AE)
  • Cognitive Design Systems (CDS)
  • BEAR Assessment System
  • Principled Design for Efficacy (PDE)

Princ iple d appro ac he s to asse ssme nt de sig n, de ve lo pme nt, and imple me ntatio n

Engineered Cut Scores NCSA Session

slide-9
SLIDE 9

Princ iple d appro ac he s: De sig n (e tc .) fo r alig nme nt

Table 1 Foundation and Organizing Elements of Principled Approaches to Assessment Design, Development, and Implementation and their Relationship to the Assessment Triangle Framework Elements Assessment Triangle Alignment Organizing Element Ongoing accumulation of evidence to support validity arguments Overall evidentiary reasoning goal Foundational Elements Clearly defined assessment targets Cognition Statement of intended score interpretations and uses Cognition Model of cognition, learning, or performance Cognition Aligned measurement models and reporting scales Interpretation Manipulation of assessment activities to align with assessment targets and intended score interpretations and uses Observation From Ferrara, Lai, Reilly, & Nichols (2017) Principled approaches are Assessment Engineering, BEAR Assessment System, Cognitive Design System, Evidence Centered Design, and Principled Design for Efficacy Intended score interpretations Intended score interpretations Alignment Alignment 9 Engineered Cut Scores NCSA Session

slide-10
SLIDE 10

ENG INEERING PL DS

10 Engineered Cut Scores NCSA Session

slide-11
SLIDE 11

L

  • g ic o f infe re nc e s

Ferrara, Lai, Reilly, & Nichols, 2016, Fig. 1

PLDs Everything else, including items, aligned to PLDs

Engineered Cut Scores NCSA Session

slide-12
SLIDE 12

F rame wo rk fo r de ve lo ping PL Ds

From Egan, Schneider, & Ferrara, 2012, Table 2 Engineered Cut Scores NCSA Session

slide-13
SLIDE 13

13

 Develop PLDs to guide test development, articulate standards across grades, link to learning

trajectories (Bejar, Braun, & Tannenbaum, 2006, 2007)

 Policy definitions

  • Generic, by policy makers (Loomis & Bourque, 2001; Perie, 2008)
  • Number of levels; labels and meaning (Beck, 2003; Burt & Stapleton, 2010; Cizek & Bunch, 2007; Egan,

Schneider, & Ferrara, 2012; Perie, 2008; Zieky, Perie, & Livingston, 2008)  Range and other PLDs

  • Explicit about content knowledge (Egan et al., 2012; Mills & Jaeger, 1998; Perie, 2008; US Department of

Education 2004)

  • Explicit about cognitive processes (Egan et al., 2012; Perie, 2008)
  • Nouns and verbs, defining phrases (Egan et al., 2012)

 Would like to find guidance on working from policy PLDs to content standards to range

PLDs

Guidanc e o n e ng ine e ring PL Ds

Engineered Cut Scores NCSA Session

slide-14
SLIDE 14

Engineered Cut Scores NCSA Session 14

PURSUING PL D- IT EM A L IG NMENT

slide-15
SLIDE 15

 Code items for response demands  Determine which items are aligned with the KSA

requirements in the corresponding PLD

 Build aligned test forms

  • Select only aligned items

 Item development

  • Develop item specifications and item writer training to

improve alignment

  • Re‐field test items that are not aligned to PLDs

Ho w c an we pursue alig nme nt?

15 Engineered Cut Scores NCSA Session

WIP: Next study, I hope WIP: Next study, I hope

slide-16
SLIDE 16

ENG INEERING T EST FO RMS FO R A L IG NMENT : IL L UST RA T IO N

16 Engineered Cut Scores NCSA Session

slide-17
SLIDE 17

17

 Question Type

  • The cognitive task an item poses (e.g., explain, analyze)

 Depth of Knowledge

  • Recall, Skill/concept, Strategic thinking

 Relational Complexity

  • No. of facts, concepts, and skills to be processed

 Linguistic Complexity

  • No. of prepositional phrases, as a proxy

 Command of Textual Evidence

  • Single, multiple pieces of text

 Response Mode

  • Select response, multiple responses, construct responses

Se le c te d ite m re spo nse de mand c o de s

Engineered Cut Scores NCSA Session

slide-18
SLIDE 18

E xc e rpt fro m Of F at, F e athe rs…; ite m 3874

Finger here Finger here Finger here

Engineered Cut Scores NCSA Session

slide-19
SLIDE 19

19

Item 3874

Item Response Demands Question Type USE DOK level

  • 2. Skill/concept

Relational Complexity 5 Number of Prepositions 5 Command of Textual Evidence Low Response Mode Low PLD Alignment Target Moderate Complexity text: Minimal accuracy and understanding Do correct responses to this item support this intended interpretation (or claim)?

Engineered Cut Scores NCSA Session

slide-20
SLIDE 20

E xc e rpt fro m T urn, T urn My Whe e l and ite m 3182

Prior knowledge not going to help

slide-21
SLIDE 21

21

Item 3182

Item Response Demands Question Type USE and INF DOK level

  • 3. Strat. Thinking

Relational Complexity 6 or more Number of Prepositions 7 Command of Textual Evidence Low‐Moderate Response Mode Low PLD Alignment Target Readily Accessible text: Partial accuracy and understanding Do correct responses to this item support this intended interpretation (or claim)?

Engineered Cut Scores NCSA Session

Item Response Demands Question Type USE DOK level

  • 2. Skill/concept

Relational Complexity 5 Number of Prepositions 5 Command of Textual Evidence Low Response Mode Low

slide-22
SLIDE 22

22

E xc e rpt fro m A S ing le S hard: ite m 3175

Prior knowledge of “arid” might help, but “arid bones” is figurative language.

Engineered Cut Scores NCSA Session

“Arid” seems pretty clear; “cold,”, “long,” and “rotten” seem

  • unlikely. So what

makes this relatively difficult?

slide-23
SLIDE 23

23

Item 3175

Item Response Demands Question Type USE or INF DOK level

  • 2. Skill/concept

Relational Complexity 5 or more Number of Prepositions 4 Command of Textual Evidence Low Response Mode Low PLD Alignment Target Very complex text: Inaccurate analysis, limited understanding Do correct responses to this item support this intended interpretation (or claim)?

Engineered Cut Scores NCSA Session

Item Response Demands Question Type USE and INF DOK level

  • 3. Strat. Thinking

Relational Complexity 6 or more Number of Prepositions 7 Command of Textual Evidence Low‐Moderate Response Mode Low

slide-24
SLIDE 24

ENG INEERING IT EMS FO R A L IG NMENT

24 Engineered Cut Scores NCSA Session

slide-25
SLIDE 25

25

 Content, cognitive, and linguistic response demands

frameworks

  • Select item response demand codes most appropriate for the

test and PLDs working with

 Empirical support

  • Reliable coding (some of it semi‐automated)
  • Important predictors of difficulty and discrimination
  • R2 = {.15, .30}: Need more work (e.g., additional or better

frameworks, learning science lit reviews, OTL question)

 Still working out PLD‐item alignment criteria

  • For now, normative criteria

F rame wo rk fo r e ng ine e ring ite ms

Engineered Cut Scores NCSA Session

slide-26
SLIDE 26

Engineered Cut Scores NCSA Session 26

E ng ine e ring ite ms to targ e t PL Ds

slide-27
SLIDE 27

Engineered Cut Scores NCSA Session 27

 Theoretically, once you know some things about the

relationship between item response demands and item difficulty:

  • Create item templates to guide development of more

items with similar difficulties

  • Engineer existing items to hit difficulty targets

 We’re just getting started

  • Gierl, M. J., & Haladyna, T. M. (Eds.). (2012). Automated

item generation: Theory and practice. New York: Routledge.

E ng ine e ring ite ms (c o nt.)

slide-28
SLIDE 28

Engineered Cut Scores NCSA Session 28

We -are -no t-Alo ne (WANA) Alig nme nt Mo de l

<‐A‐> <‐A‐> <‐A‐>

Infrastructure: Professional development, curriculum materials, instructional approaches Policy: CCSS, NGSS PLDs: Range PLDs Test Content: Standards targeted, item types, item response demands A = alignment; Coburn, Hill, & Spillane (2016) Enactment and implementation: McDonnell & Weatherford (2016)

Local State

Infrastructure Policy PLDs Test Content

Enactment Implementation

slide-29
SLIDE 29

Steve Ferrara sferrara1951@gmail.com

T hanks!

29 Engineered Cut Scores NCSA Session

slide-30
SLIDE 30

Beck, M. (2003, April). Standard setting: If it is science, it’s sociology and linguistics, not psychometrics. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL. Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2006).A prospective approach to standard setting. Paper presented in Assessing and modeling development in school: Intellectual growth and standard setting, October 19–20, University of Maryland , College Park. Bejar, I. I., Braun, H. I., & Tannenbaum, R. (2007).A prospective approach to standard setting. In R. Lissitz (Ed.), Assessing and modeling cognitive development in school (pp. ***). Maple Grove, MN: JAM Press. Burt, W. M., & Stapleton, L. M. (2010). Connotative meanings of student performance labels used in standard setting. Educational Measurement: Issues and Practice, 29(4), 28–38. Cizek, G. J., & Bunch, M. B. (2007). Standard setting: A guide to establishing and evaluating performance standards on tests. Thousand Oaks, CA: Sage. Coburn, C. E., Hill, H. C. & Spillane, J. P. Alignment and accountability in policy design and implementation: The Common Core State Standards and implementation research. Educational Researcher, 45(4), 243‐251. Egan, K. L., Schneider, M. C., & Ferrara, S. (2012). Performance level descriptors: History, practice, and a proposed

  • framework. In G. J. Cizek (Ed.) Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 79‐106).

New York: Routledge. Ferrara, S., Lai, E., Reilly, A., & Nichols. (2017, in press). Principled approaches to assessment design, development, and implementation: Cognition in score interpretation and use. In A. A. Rupp and J. P. Leighton (Eds.), The handbook on cognition and assessment: Frameworks, methodologies, and applications (pp. 41‐74). Malden, MA: Wiley.

Re fe re nc e s

30 Engineered Cut Scores NCSA Session

slide-31
SLIDE 31

Engineered Cut Scores NCSA Session 31

Ferrara, S., Svetina, D., Skucha, S., & Murphy, A. (2011). Test design with performance standards and achievement growth in

  • mind. Educational Measurement: Issues and Practice, 30(4), 3‐15.

Loomis, S. C., & Bourque, M. L. (2001). From tradition to innovation: Standard setting on the National Assessment of Educational Progress. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 175–217). Mahwah, NJ: Lawrence Erlbaum Associates. McDonnell, L. M., & Weatherford, M. S. (2016). Recognizing the political in implementation research. Educational Researcher, 45(4), 233‐242. Mills, C. N., & Jaeger, R. M. (1998). Creating descriptions of desired student achievement when setting performance

  • standards. In L. Hansche (Ed.), Handbook for the development of performance standards: Meeting the requirements of Title I (pp.

73–85). Washington, DC: Council of Chief State School Officers. Perie, M. (2008). A guide to understanding and developing performance‐level descriptors. Educational Measurement: Issues and Practice, 27(4), 15‐29. U.S. Department of Education, Office of Elementary and Secondary Education. (2004). Standards and assessments peer review guidance: Information and examples for meeting requirements of the No Child Left Behind Act of 2001. Washington, DC: U.S. Department of Education, Office of Elementary and Secondary Education. Zieky, M., Perie, M., & Livingston, S. (2008). Cut scores: A manual for setting performance standards on educational and

  • ccupational tests. Princeton, NJ: Educational Testing Service.

Re fe re nc e s (c o nt.)