Can the Assessment Consortia Meet the Intended Comparability Goals? - - PowerPoint PPT Presentation

can the assessment consortia meet the intended
SMART_READER_LITE
LIVE PREVIEW

Can the Assessment Consortia Meet the Intended Comparability Goals? - - PowerPoint PPT Presentation

Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O S N C S A C O N F E R E N C E


slide-1
SLIDE 1

NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE

P R E S E N T A T I O N A T C C S S O ’ S N C S A C O N F E R E N C E J U N E , 2 0 1 1

Can the Assessment Consortia Meet the Intended Comparability Goals?

OR What Types of Comparability Goals Can Be Met?

slide-2
SLIDE 2

NCME Committee on Assessment Policy

NCME Assessment Policy Committee Comparability Symposium

2

 Formed as an Ad-Hoc committee May, 2010  Became a standing committee in April 2011  Purposes:

 To use the measurement expertise within NCME to influence

and hopefully improve educational policies based on assessment results

 To increase the visibility of NCME so that it might be seen as a

“go to” organization for assessment-related policy issues

 The Committee:

 Co-Chairs: Kristen Huff and Scott Marion  Members: Judy Koenig, Joseph Martineau, Cornelia Orr,

Christina Schneider, Zachary Warner

slide-3
SLIDE 3

The Committee’s Initial Approach

NCME Assessment Policy Committee Comparability Symposium

3

 Recognizing the committee is completely comprised

  • f volunteers and that NCME doesn’t have a staff for

this, the committee decided to start modestly.

 Focus on a single issue, using the following

approaches:

 An invited symposium at the NCME annual meeting with

leading researchers

 Symposium at CCSSO-NCSA  Follow-up policy brief  Other?

slide-4
SLIDE 4

This Year’s Focus

NCME Assessment Policy Committee Comparability Symposium

4

The challenges of designing for and

producing comparable score inferences across states, consortia, and countries

 NCME Symposium Participants:

 Authors: Mike Kolen, Suzanne Lane, Joseph Martineau  Discussants: Bob Brennan and Deb Harris

slide-5
SLIDE 5

A Simplified View of Comparability

NCME Assessment Policy Committee Comparability Symposium

5

 The Problem:  State assessments produced results that were not

comparable across states

 Positive state results were clearly not trusted

 The “Solution:”  Common content standards  Common large-scale assessments

slide-6
SLIDE 6

A Simplified Theory of Action

NCME Assessment Policy Committee Comparability Symposium

6

 Accurate cross-state comparisons  Can be used to benchmark within-state performance  To motivate and guide reforms (and policies)  And lead to improved student achievement  Trying to put a positive spin on the call for

comparability

slide-7
SLIDE 7

What the policy makers wanted….

NCME Assessment Policy Committee Comparability Symposium

7

 Governors and state chiefs (many of them) wanted a

single assessment consortium

 Not clear about all the reasons, but clearly some had

to do with efficiency of effort, lack of competition, and to facilitate comparisons across states…

 What they got…  Two consortia with some very different ideas about

large-scale assessment

slide-8
SLIDE 8

What is Meant by “Comparability” (Brennan)?

 The Public and Policy-makers:

 Doesn’t matter which form is used  Same score means the same thing for all students  Math is math is math …

 Psychometricians:

 All sorts of things (equating, vertical scaling, scaling to achieve

comparability, projection, moderation, concordance, judgmental standard setting)

 Some degree/type of comparability (i.e., linking) is attainable

in practically any context

 Comparability “scale”: very strong to very weak  Ideal: “matter of indifference” criterion

8

NCME Assessment Policy Committee Comparability Symposium

slide-9
SLIDE 9

What comparisons?

NCME Assessment Policy Committee Comparability Symposium

9

 The term comparability is getting thrown

around a lot and many appear focused on across-consortium and international comparability

 Brennan: What should be the

comparability goals for the two consortia, and how should these goals be pursued?

slide-10
SLIDE 10

What comparisons?

NCME Assessment Policy Committee Comparability Symposium

10

 Many appear to believe that because of a “common

assessment,” within-consortium comparability is a given

 It’s not! Following Brennan’s reformulation, we

argue that until within-consortium comparability can be assured, we should not distract ourselves with across-consortium comparability

 Further, most accountability designs, especially

those that incorporate measures of student growth must be based on strong within-consortium (actually, within-state) year-to-year comparability

slide-11
SLIDE 11

Key interpretative Challenges

NCME Assessment Policy Committee Comparability Symposium

11

 The central challenges that Mislevy (1992) outlined in

Linking Educational Assessments are still with us as in the current Race to the Top environment:

 discerning the relationships among the evidence the

assessments provide about conjectures of interest, and

 figuring out how to interpret this evidence correctly”

(p. 21).

 In other words, just because we can statistically link,

doesn’t mean we will be able to interpret these links

slide-12
SLIDE 12

Conditions for comparability

NCME Assessment Policy Committee Comparability Symposium

12

 Mislevy and Holland focused considerable attention

  • n the match between the two test blueprints and for

good reason, because this is a crucial concern if we are to compare student-level scores from two different sets of tests

 There is a good reason that most end-of-year state

tests are referred to as “standardized tests”

slide-13
SLIDE 13

What do we know about Comparability?

 Score interchangeability is (approximately)

achievable only under the strongest form of comparability (equating in a technical sense)

 Weaker types of comparability often/usually do not

lead to group invariance

 The degree of comparability desired/required

should reflect the intended use of scores

 Score interchangeability and group invariance:

 Crucial sometimes  Often not achievable  Not so important sometimes

13

NCME Assessment Policy Committee Comparability Symposium

slide-14
SLIDE 14

Minimum Set of Conditions

NCME Assessment Policy Committee Comparability Symposium

14

 Common test blueprint  Specified test administration windows  Common administration rules/procedures  Standard accommodation policies  Same inclusion rules  Clearly specified and enforced security protocols  Specific computer hardware and software (for CBT

and CAT), or at least a narrow range of specifications.

slide-15
SLIDE 15

What’s working for within-consortium comparability?

NCME Assessment Policy Committee Comparability Symposium

15

 Common, fairly well-articulated content standards

that all consortium states are adopting

 All states within-consortia have promised to

implement a common test(s)

 All states, within-consortium, have promised to

adopt common performance-level descriptors and common achievement cutscores.

slide-16
SLIDE 16

Challenges for within-year, within-consortium

NCME Assessment Policy Committee Comparability Symposium

16

 High stakes uses of test scores  Significantly different fidelity of implementation of CCSS

within state and across states

 Time frame is very short  Demand for innovative item types is unrelenting  Huge testing windows  Large variability in school calendars  Assessments designed to measure a wide range of K & S  Differences in inclusion policy (potentially?)  Differences in accommodations policies and certainly

practices

 Differences in mode (computer-paper)  Differences in hardware platforms

slide-17
SLIDE 17

Challenges across years (within consortium)

NCME Assessment Policy Committee Comparability Symposium

17

 All of the previous, plus:  Mixed format assessments, especially through course

assessments

 For CR prompts/performance tasks, large p x t interaction is pervasive

and small number of prompts is common. These two facts virtually guarantee that scores for CR forms can be only weakly comparable

 Quality of field testing (high quality sampling) to produce

stable item parameters for pre-equating designs (at least SBAC)

 These are fairly significant challenges  We have seen serious problems with year-to-year equating

within-states

 Doing this in 20+ states is an not insignificant extension

slide-18
SLIDE 18

What does all of this Suggest About how to Proceed?

 Need to help policy maker prioritize comparability

goals

 Trying to assure comparability on the back-end will

fail if we have not designed the assessments with the types of comparisons that we’d like to make in mind

 In attempting to achieve comparability, no type/amount of

statistical manipulation of data can make up for mediocre test development

 Don’t promise more than can be delivered, but do

deliver a high degree of comparability for scores that will be used for accountability (i.e., within- consortium, across years)

18

NCME Assessment Policy Committee Comparability Symposium

slide-19
SLIDE 19

How to Proceed (continued)

 For accountability scores, data must be collected

that:

 Permit obtaining score “equivalence” tables/relationships  Facilitate examining the degree to which comparability goals

have been attained

 Provide users with readily interpretable statements

about comparability.

 View the attainment of comparability as a goal to be

pursued, more than an end product to be attained at

  • ne point in time.

19

NCME Assessment Policy Committee Comparability Symposium

slide-20
SLIDE 20

Across Consortium Comparability

NCME Assessment Policy Committee Comparability Symposium

20

 Tukey: “It is better to have an approximate answer

to the right question than an exact answer to the wrong question.”

 In other words, we need to be humble about the

types of comparisons that we can and should make in this arena

slide-21
SLIDE 21

An Embedded Research Program

 In a good testing program there is a healthy

tension between

 the need for stability and  the desire for, and pursuit of, innovation

 Stability helps to ensure comparability; innovation

helps to ensure relevance

 Both goals need to be pursued simultaneously on a

CONTINUING basis

 Achieving comparability (to the greatest extend

possible) and innovation, at the same time, requires an active, embedded, and on-going PROGRAM OF RESEARCH

21

NCME Assessment Policy Committee Comparability Symposium

slide-22
SLIDE 22

Bottom line

NCME Assessment Policy Committee Comparability Symposium

22

 Become crystal clear about the types of

comparisons we’d like to make and why?

 Design for these types of comparability upfront to

the extent possible

 Focus intently on within-consortium, across-year

comparability! If we cannot do this, everything else is irrelevant.

 This does not mean that we must restrict ourselves

to traditional psychometric forms of comparability and might need to think about non-U.S. forms of comparability as well