Fleshing out Systems of Assessment in the Context of the Next - - PowerPoint PPT Presentation

fleshing out systems of assessment in the context of the
SMART_READER_LITE
LIVE PREVIEW

Fleshing out Systems of Assessment in the Context of the Next - - PowerPoint PPT Presentation

Fleshing out Systems of Assessment in the Context of the Next Generation Science Standards Nathan Dadey Center for Assessment National Conference On Student Assessment San Diego, CA This work is licensed under June 27 th , 2018 a Creative


slide-1
SLIDE 1

Fleshing out Systems of Assessment in the Context of the Next Generation Science Standards

National Conference On Student Assessment San Diego, CA June 27th, 2018

Nathan Dadey Center for Assessment

This work is licensed under a Creative Commons Attribution 4.0 International License.

slide-2
SLIDE 2

`

6/27/2018

Systems of Assessment

2

1

  • Definition & Context
  • Re-framing
slide-3
SLIDE 3

6/27/2018 3

Why?

slide-4
SLIDE 4

6/27/2018 4

Why? Why systems of assessment?

slide-5
SLIDE 5

6/27/2018 5

Too many assessments given by too many people for too many purposes, all providing too little information. The alternative? Develop a system of assessments.

Definition & Context | Re-framing

^Work towards

slide-6
SLIDE 6

One Starting Point

6/27/2018 6

Pellegrino, Chudowsky & Glaser, 2001

Call for “coordinated systems of multiple assessments that work together, along with curriculum and instruction, to promote learning” (p. 242).

Definition & Context | Re-framing

slide-7
SLIDE 7

6/27/2018 7

Conceptual work defining “balanced”, “next generation” or “comprehensive” assessment systems followed.

Definition & Context | Re-framing

slide-8
SLIDE 8

6/27/2018 8

Conceptual work defining “balanced”, “next generation” or “comprehensive” assessment systems followed. Which led to the National Research Council Report, Developing Assessments for the Next Generation Science Standards.

Definition & Context | Re-framing

slide-9
SLIDE 9

Systems & the NGSS

6/27/2018 9

National Research Council, 2014

A system of assessments will be needed to measure the NGSS performance expectations and provide students, teachers, administrators, policy makers, and the public with the information each needs about student learning.

Definition & Context | Re-framing

slide-10
SLIDE 10

Systems & the NGSS

6/27/2018 10

National Research Council, 2014

A system of assessments will be needed to measure the NGSS performance expectations and provide students, teachers, administrators, policy makers, and the public with the information each needs about student learning.

“It will not be feasible to assess all of the performance expectations for a given grade level during a single assessment occasion. Students will need multiple—and varied—assessment

  • pportunities”

Definition & Context | Re-framing

slide-11
SLIDE 11

6/27/2018 11

So what exactly is a system of assessments? Here’s my working definition.

slide-12
SLIDE 12

12

Multiple assessments with potentially different designs, sponsored by different people, who are at different levels of control

Coordinated by a common theory of learning Working together to meet a specific use or uses.

6/27/2018

slide-13
SLIDE 13

6/27/2018 13

Despite all of this work, much of the vision laid out by the NRC report and others has not been realized.

slide-14
SLIDE 14

A Response

6/27/2018 14

  • Instead of working from the perspective that a

system is tightly defied and purpose built,

  • Consider all of the assessments students take as a

system

– Then consider the ways which the levels, or layers, of that system can be made to work together, i.e., complement one another

Definition & Context | Re-framing

slide-15
SLIDE 15

Operationalizing in Terms of Layers

6/27/2018 15 Definition & Context | Re-framing

Let’s consider one way to visualize the layers of an assessment system.

slide-16
SLIDE 16

Operationalizing in Terms of Layers

6/27/2018 16

Level or Layer Purpose Timing

Definition & Context | Re-framing

slide-17
SLIDE 17

Operationalizing in Terms of Layers

6/27/2018 17

Level or Layer Purpose Timing

There is no one “system.” Instead, within each layer decision makers build, or at least implement, their

  • wn set of assessments.

Definition & Context | Re-framing

slide-18
SLIDE 18

Operationalizing in Terms of Layers

6/27/2018 18

Level or Layer Purpose Timing

Thus each classroom’s, school’s or district’s layer may look different. Meaning within any given state, there are numerous “systems,” when taken as a whole.

Definition & Context | Re-framing

slide-19
SLIDE 19

Operationalizing in Terms of Layers

6/27/2018 19

Level or Layer Purpose Timing

By juxtaposing the layers, hopefully we can start thinking about ways the layers can complement one another.

Definition & Context | Re-framing

slide-20
SLIDE 20

6/27/2018 20

This approach doesn’t fully meet the vison for a system of assessments, but it is a practical start.

Definition & Context | Re-framing

slide-21
SLIDE 21

6/27/2018 21

By showing what is actually going on, we can then start to think about ways in which each layer can be made more complementary, along the lines of comprehensiveness, coherence and continuity.

Definition & Context | Re-framing

slide-22
SLIDE 22

A Starting Point

6/27/2018 22

  • Define the state layer as concretely as possible,

then examine where and how the district layer can be made more complementary (and vice-versa).

  • Consider key chunks of each layer:

– Use – Assessment Design – Assessment Framework (i.e., Domain) – Measurement

Definition & Context | Re-framing

slide-23
SLIDE 23

A Starting Point

6/27/2018 23

  • Define the district layer as concretely as possible,

then examine where and how the district layer can be made more complementary (and vice-versa).

  • Consider key chunks of each layer:

– Use – Assessment Framework (i.e., Domain) – Assessment Design – Measurement

Definition & Context | Re-framing

slide-24
SLIDE 24

6/27/2018

The District Layer

24

2

  • Assessment Framework
  • Assessment Design
slide-25
SLIDE 25

Assessment Framework

6/27/2018 25

Possible focuses of district assessment:

  • The same domain as the state-level…

– exactly – at a lesser degree of complexity

  • A different domain at the state level…

– progressions or precursors to the state-level domain – parts of the standards not covered by the domain

What does the assessment measure and how does it do so?

slide-26
SLIDE 26

Complementarity

6/27/2018 26

State District The Same Domain Model

slide-27
SLIDE 27

Complementarity

6/27/2018 27

State District The Different Domain Model

slide-28
SLIDE 28

Assessment Design

6/27/2018 28

Often, sensitivity to curriculum and instruction is paramount for district-level assessments. One way to make assessments better match local curriculum and instruction is through modular design.

What does the assessment(s) look like?

slide-29
SLIDE 29

Fixed Designs A single assessment that measures the entire domain

6/27/2018

Fixed Modular

Block Designs Multiple assessments, each measuring a chunk

  • f the domain

Modular Designs Multiple assessments, each measuring a very small chunk of the domain

slide-30
SLIDE 30

6/28/2018 3D Science Assessment Reporting 30

Supplemental Slides

slide-31
SLIDE 31

6/27/2018

The State Layer

31

  • Uses
  • Assessment Framework
  • Measurement
slide-32
SLIDE 32

Uses

32

  • For better or worse, state-level assessments are
  • ften “one use, and one use only”

– As part of a system of school identification

  • Unlike Math and Reading/English Language Arts, Science is

not a mandatory part of state accountability plans1 under ESSA.

– And, possibly, to signal the importance of a given set

  • f standards
  • Any other uses are likely to not be directly

supported via the state-layer alone.

– The massive use of assessments by districts is testament to this.

How are the assessment results used?

1Although an initial look at state plans did suggest many states were including science in their

accountability systems .

6/27/2018

slide-33
SLIDE 33

Uses

33

  • For any intended use, a key question is on

sufficiency:

– Is the use of the state-level assessment results likely to lead to the intended outcome (hopefully when couched in terms of a theory of action)?

How are the assessment results used?

6/27/2018

slide-34
SLIDE 34

6/27/2018 34

Defining the use should help us determine what we want to measure, and vice versa.

slide-35
SLIDE 35

Assessment Framework

6/27/2018 35

  • At the state-level, the domain is almost always

premised on an approximation of the domain sampling approach

  • Even under this approach, defining the domain in

any subject requires a judgmental process to translate standards into assessment specifications

– As a reflection of the nature of the NGSS, the judgements for the NGSS seem particularly complex

What does the assessment measure and how?

slide-36
SLIDE 36

Assessment Framework

6/27/2018 36

In terms of the NGSS, much of this translation involves the Performance Expectations (PEs), including whether:

  • The state considers the PEs as the standards
  • r as examples of valuable intersections

– If the latter, will new PEs be defined?

  • All PEs vs. Full Sampling (aka Matrix, within
  • r across years) vs. a Limited Sample are

assessed

NGSS Specific

slide-37
SLIDE 37

Assessment Framework

6/27/2018 37

Also key is in which the PEs are mapped to items, and item clusters, including

– Whether and how PEs are bundled to create item clusters – Phenomenon also play a substantial role in cluster design

NGSS Specific

slide-38
SLIDE 38

38

These types of questions help better define the domain. A detailed domain definition comes from a back and forth between design choices and domain definition.

6/27/2018

slide-39
SLIDE 39

39

These types of questions help better define the domain. A detailed domain definition comes from a back and forth between design choices and domain definition.

Domain Definition Design Decisions Domain Definition Design Decisions

6/27/2018

slide-40
SLIDE 40

40

These types of questions help better define the domain. A detailed domain definition comes from a back and forth between design choices and domain definition.

Domain Definition Design Decisions Domain Definition Design Decisions

Without this type of refinement, the domain often remains vaguely and unhelpfully defined.

6/27/2018

slide-41
SLIDE 41

6/27/2018 41

Looking at what aspects of the standards are being emphasized, or foregrounded, also provides some clarity around the domain definition.

slide-42
SLIDE 42

Foregrounding

6/27/2018 42

  • Any aspect of the standards could

“foregrounded,” or emphasized, to provide additional structure to the claims, achievement level descriptors, score reports, blueprints, item clusters, or any combination thereof

– For example, reporting three subscores based on the Disciplinary Core Idea domains will foreground these domains to those interpreting the assessment reports - potentially signaling that the DCI topic domains are important – In other content areas the overarching standards are generally foregrounded

slide-43
SLIDE 43

6/27/2018 43

Ideally, the domain definition should be specific enough to let us know whether the measurement model employed is a good fit.

slide-44
SLIDE 44

Measurement

6/27/2018 44

  • Unidimensional IRT vs. Multidimensional IRT

– Bifactor Model & Cluster Effects

slide-45
SLIDE 45

6/27/2018 45

Is the bifactor model the best model? It all depends

  • n the domain definition and intended use.

G? General 3D Science Achievement? Transfer

  • f 3D Learning to

New Contexts?

Understanding Phenomenon?

slide-46
SLIDE 46

Measurement

6/27/2018 46

  • Much of the innovation in large-scale educational

measurement has come from outside of state assessment of Mathematics and Reading/English Language Arts:

– Special populations (e.g., DLM, ELPA21) – National & International Comparisons (e.g., NAEP)

  • Could some other model also work with the

NGSS? Likely, but unexplored and needs to be examined in terms of the assessment framework!

slide-47
SLIDE 47

Criteria for Systems

6/27/2018 47

Comprehensiveness - multiple sources of information collected through different processes are considered. Coherence – the models of student learning underlying the assessments are compatible and also align with curriculum and instruction. Continuity - assessments measure student progress in relation to a model of student progression.

Pellegrino et al. (2001)

slide-48
SLIDE 48

6/27/2018

The District Layer

48

  • Assessment Framework
  • Assessment Design
slide-49
SLIDE 49

6/27/2018 49

There is, rightly, much concern about going into this layer as a state. Some states have been entirely hands off, while

  • thers have:

–Provided access to a commercial product –Develop an assessment or assessments for district use

slide-50
SLIDE 50

6/27/2018 50

There is, rightly, much concern about going into this layer as a state. Some states have been entirely hands off, while

  • thers have:

–Provided access to a commercial product –Develop an assessment or assessments for district use

Simply providing assessments alone will not ensure that the use of assessments results will lead to intended outcomes.

slide-51
SLIDE 51

Uses

51

  • Uses tend to proliferate
  • Uses of assessment results are often conflated

– Frequently assessments are touted as having relevance to teaching and learning, when, at best, these assessments are distal measures of learning with loose and undefined connections to classroom instruction

  • Recommendations around ways assessments

results should, and should not, be used could be valuable

How are the assessment results used?

6/27/2018

slide-52
SLIDE 52

District Layer Measurement

6/27/2018 52

  • Treating more modular forms of assessment as if

they have come from the same assessment does not appear to be terribly problematic

– A better question is whether the samples of students assessed are reflecting the patterns of learning hypothesized in the assessment development

slide-53
SLIDE 53

District Layer Measurement

6/27/2018 53

  • Does the use require a measurement model?

– Depending on the use, a model might not be needed (e.g., for performance tasks, common rubrics might be enough) – If a measurement model is used, should the model be as the state-level model? – It is likely that the same issues that arise at the state level are likely to also arise at the district level (i.e., cluster effects)