Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA - - PowerPoint PPT Presentation

susan lyons scott marion
SMART_READER_LITE
LIVE PREVIEW

Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA - - PowerPoint PPT Presentation

Comparability Evaluation Options for the Innovative Assessment and Accountability Demonstration Authority Susan Lyons & Scott Marion Center for Assessment CCSSOs NCSA 2017 June 28, 2017 Project Goals 1.Articulate a framework for


slide-1
SLIDE 1

Comparability Evaluation Options for the Innovative Assessment and Accountability Demonstration Authority

Susan Lyons & Scott Marion

Center for Assessment

CCSSO’s NCSA 2017

June 28, 2017

slide-2
SLIDE 2

Project Goals

1.Articulate a framework for comparability for the Demonstration Authority under ESSA 2.Expand the comparability options in draft regulations 3.Support states in planning innovative assessment pilots Thank you to the William and Flora Hewlett Foundation funding of this work.

2 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-3
SLIDE 3

Innovative Assessment and Accountability

  • Allows for a pilot for up to seven (7) states to use

competency-based or other innovative assessment approaches for use in making accountability determinations

  • Initial demonstration period of three (3) years with a two (2)

year extension based on satisfactory report from the director

  • f Institute for Education Sciences (IES), plus a potential 2

year waiver

  • Rigorous assessment, participation, and reporting

requirements and subject to a peer review process

  • Maybe used with a subset of districts based on strict

“guardrails,” with a plan to move statewide by end of extension

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 3

slide-4
SLIDE 4
  • Approved states may pilot with a subset of

districts before scaling the system statewide by the end of the Demonstration Authority.

May Pilot in a Subset

  • f Districts
  • Approved states may design an assessment or

system of assessments that consists of all performance tasks, portfolios, or extended learning tasks.

Can Be Entirely Performance-Based

  • Approved states may assess students when

they are ready to demonstrate mastery of standards and competencies as applicable so long as states can also report grade-level information.

Can Administer when Students are Ready

What does “innovative” mean?

Innovative Assessment and Accountability

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 4

slide-5
SLIDE 5

Purpose of ESEA

“From the beginning, Title 1 of ESEA included assessment and accountability requirements as a safeguard to ensure that the federal money being allocated to programs to improve the achievement of the disadvantaged was being spent wisely.” (DePascale, 2015) – The purpose of ESEA accountability is to ensure that public tax dollars are resulting in improved educational programming and the intended student

  • utcomes related to achievement and equity (Bailey

& Mosher, 1968).

Page 5 • Lyons • NCME 2017 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-6
SLIDE 6

Why Should We Care About Comparability?

  • 1. Fairness: Because states

must use assessment results from the pilot districts in the state accountability system.

  • 2. Equity in Opportunity to

Learn: Make sure that the pilot districts are not getting a “hall pass”, all students are held to same expectations.

6 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-7
SLIDE 7

Too Narrow of a Focus on Comparability A narrow focus on pilot to non-pilot comparability misses the bigger picture in two important ways:

–by failing to address additional, and potentially more important, comparability questions, and –by potentially inhibiting innovation.

7 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-8
SLIDE 8

Building an Evidence-Base for Score Comparability

Scoring calibration sessions, external audits on inter-rater reliability, audits on the generalizability of the local scores, reviews of local assessment quality and alignment. Social moderation comparability audits on common and local tasks, standard setting, and validating pilot performance standards with samples of student work. Common achievement level descriptors and common assessments in select grades/subjects.

Comparable Annual Determinations Pilot Results District A Results Within District Results District B Results Within District Results Non-pilot Results

The focus of the regulations

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 8

slide-9
SLIDE 9

Threat to Real Innovation

Legitimate reasons for non-comparability:

  • 1. To measure the state-defined learning targets more

efficiently (e.g., reduced testing time);

  • 2. To measure the learning targets more flexibly (e.g., when

students are ready to demonstrate “mastery”);

  • 3. To measure the learning targets more deeply; or
  • 4. To measure targets more completely (e.g., listening,

speaking, extended research, scientific investigations).

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 9

slide-10
SLIDE 10

Threat to Real Innovation

Legitimate reasons for non-comparability:

  • 1. To measure the state-defined learning targets more

efficiently (e.g., reduced testing time);

  • 2. To measure the learning targets more flexibly (e.g., when

students are ready to demonstrate “mastery”);

  • 3. To measure the learning targets more deeply; or
  • 4. To measure targets more completely (e.g., listening,

speaking, extended research, scientific investigations).

“Perfect agreement would be an indication of failure.” – Dr. Robert Brennan

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 10

slide-11
SLIDE 11

Comparability by Design

How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 11

slide-12
SLIDE 12

Comparability by Design

How does the design of the innovative assessment system yield evidence to support comparability claims? How will the state evaluate the degree of comparability achieved across differing assessment conditions? If comparability is not achieved, how will the state adjust the classification scale to account for systematic differences across assessment systems?

The focus of the regulations

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 12

slide-13
SLIDE 13

What’s Our Inference?

  • Many comparability studies focus on item- and

score-level interchangeability

  • The innovative pilot requires comparability at

the level of the annual determination

– In other words, would a student considered proficient in one district also be considered proficient in another district given the same level of work?

13 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-14
SLIDE 14

Expanding our notions of comparability

14

Adapted from Winter (2010)

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-15
SLIDE 15

Two Major Categories of Evidence

  • 1. The alignment of the assessment systems to the

content standards

– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.

  • 2. The consistency of achievement classifications

across the two systems.

15 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-16
SLIDE 16

Two Major Categories of Evidence

  • 1. The alignment of the assessment systems to the

content standards

– We strongly recommend that evidence of alignment for the two assessment systems should come from alignment to the content standards rather than alignment to one another.

  • 2. The consistency of achievement classifications

across the two systems.

16 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

The focus of the regulations

slide-17
SLIDE 17

Comparability Options in the Regulations

17

  • Administering both the innovative and statewide

assessments to all students in pilot schools at least

  • nce in any grade span

Audit

  • Administering full assessments from both the innovative

and statewide assessment system to a demographically representative sample of students at least once every grade span

Sample

  • Including common items in both the statewide

and innovative assessment system

Common Items

  • This is where we come in. We

needed to offer additional options!

Other

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-18
SLIDE 18

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

All Students Some Students No Students in Common Both Measures Some Measures Third Measure in Common Other

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 18

slide-19
SLIDE 19

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating”

Some Measures

Concurrent: Embedded common items across both systems

Third Measure in Common

Concurrent: Common independent assessment

Other

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 19

slide-20
SLIDE 20

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment

  • nce per grade span in

lieu of innovative assessment

Some Measures

Concurrent: Embedded common items across both systems 

Third Measure in Common

Concurrent: Common independent assessment

Other

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 20

slide-21
SLIDE 21

All Students Some Students No Students in Common Both Measures

Concurrent (in past): “Pre-equating” Not Concurrent: Statewide assessment

  • nce per grade span in

lieu of innovative assessment Concurrent: Random assignment of assessment system to classrooms

Some Measures

Concurrent: Embedded common items across both systems 

Third Measure in Common

Concurrent: Common independent assessment

Concurrent: Propensity score matching

Other

Concurrent: Standard setting design

16 Design Options for Evaluating Pilot to Non-Pilot Comparability in Rigor of Performance Standards

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017 21

slide-22
SLIDE 22

How Comparable is Comparable Enough?

22

Do the differences exceed in magnitude those that are typically seen within assessment programs due to variations in administration conditions? Do the differences pose a significant threat to the validity

  • f the accountability system? Do the differences pose a

significant threat to equity in opportunity to learn? Do the results potentially disadvantage specific subgroups or institutions? Is the disadvantage consequential enough that it is not

  • ffset by potential gains in other important dimensions

that might justify that loss (e.g., positive impact on teaching and learning)?

If YES If YES If YES

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-23
SLIDE 23

So, did ED listen to us?

Comment: Clarify that not every assessment within an innovative assessment system meet the peer review guidelines but that there is sufficient validity evidence to support the annual determinations resulting from the assessment system for their intended uses. Changes by ED: Clarification made!

23 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-24
SLIDE 24

So, did ED listen to us?

Comment: Clarify that comparability be established at the level of the summative annual determinations, not at the raw or scale score levels. Changes by ED: Clarification made!

24 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-25
SLIDE 25

So, did ED listen to us?

Comment: In addition to evidence of consistency in performance classifications, states should be required to submit evidence of alignment to the content standards as part of their comparability argument. Changes by ED: No changes, ED feels the regulations as written provide sufficient clarity that the innovative system must be aligned to the content standards.

25 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-26
SLIDE 26

So, did ED listen to us?

Comment: As the system scales statewide, comparability among pilot districts becomes much more relevant than comparability from pilot to non-pilot districts. Changes by ED: Added a regulation to require that the innovative assessment system generate results that are comparable among pilot schools and LEA.

26 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-27
SLIDE 27

So, did ED listen to us?

Comment: Provide a multitude of examples of comparability designs in non-regulatory guidance instead of the regulations, allow a State to develop an evaluation methodology for establishing comparability that is consistent with the design and context of its innovative assessment. Changes by ED: ED feels the regulations as written provide sufficient flexibility for states to pursue alternate methods of gathering comparability evidence, but they did clarify one of their listed methods and add an additional method.

27 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-28
SLIDE 28

So, did ED listen to us?

Comment: Once strong evidence of comparability is established across assessment systems, it does not need to be re-established annually unless either of the two systems changes. Changes by ED: ED does not feel it is

  • verly burdensome to demonstrate

comparability annually as the system scales statewide.

28 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-29
SLIDE 29

Where are we now?

  • ESSA says that the “Secretary may” release an

application for the demonstration authority

  • Regulations are still in place (have not been rescinded)
  • ED has indicated that they will not release an

application until next year

  • States do not appear to be clamoring to apply:

– Concerns about scaling statewide – Concerns about technical requirements – Concerns about resources and capacity

29 Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

slide-30
SLIDE 30

External Experts

Center for Assessment

Lyons & Marion_Comparability Options for the Innovative Pilot_July 28, 2017

  • Bob Brennan, U of Iowa
  • Randy Bennett, ETS
  • Henry Braun, B.C.
  • Derek Briggs, U of CO
  • Linda Cook, ETS, retired
  • Joan Herman, CRESST
  • Stuart Kahl, Measured Progress
  • Ric Luecht, U of NC
  • Laurie Wise, HumRRO
  • Scott Marion
  • Susan Lyons
  • Nathan Dadey
  • Juan D’Brot
  • Chris Domaleski
  • Erika Hall
  • Joseph Martineau

30

Thanks to This “Who’s Who” of Comparability