Developing Scale Scores & Cut Scores for On-Demand Assessments - - PowerPoint PPT Presentation

developing scale scores cut scores for on demand
SMART_READER_LITE
LIVE PREVIEW

Developing Scale Scores & Cut Scores for On-Demand Assessments - - PowerPoint PPT Presentation

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan Dadey 1 , Shuqin Tao 2 , and Leslie Keng 1 1 2 NCME - New York, NY April 16th, 2018 Context Much work has been done on improving a single


slide-1
SLIDE 1

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards

Nathan Dadey1, Shuqin Tao2, and Leslie Keng1

1 2

NCME - New York, NY April 16th, 2018

slide-2
SLIDE 2

Context

4/16/2018 On-Demand Assessments of Individual Standards 2
  • Much work has been done on improving a single

assessment, in terms of efficiency and information.

– Although the definition of an “assessment” continues to blur.

  • This work takes a different tack, instead examining

how scale scores and cut scores can be developed for a set of assessments, motivated by the ideas around the concept of a system of assessments.

slide-3
SLIDE 3 4/16/2018 On-Demand Assessments of Individual Standards 3

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity.

slide-4
SLIDE 4 4/16/2018 On-Demand Assessments of Individual Standards 4

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value

Say a student takes a quiz, or “mini- assessment” on place value at the beginning of the year.

slide-5
SLIDE 5 4/16/2018 On-Demand Assessments of Individual Standards 5

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value 2: Compare Whole Numbers

Then takes another mini-assessment

  • n whole numbers.
slide-6
SLIDE 6 4/16/2018 On-Demand Assessments of Individual Standards 6

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value 2: Compare Whole Numbers 3: Add and Subtract Whole Numbers

And so on….

slide-7
SLIDE 7 4/16/2018 On-Demand Assessments of Individual Standards 7

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

Let’s say the student also takes an “general” purpose assessment that surveys the full set

  • f standards.

… …

slide-8
SLIDE 8 4/16/2018 On-Demand Assessments of Individual Standards 8

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓

slide-9
SLIDE 9 4/16/2018 On-Demand Assessments of Individual Standards 9

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓

slide-10
SLIDE 10 4/16/2018 On-Demand Assessments of Individual Standards 10

Given data like this, how can we make sense of it? In particular, how can we develop scale scores and achievement-level classifications?

slide-11
SLIDE 11

Research Questions

4/16/2018 On-Demand Assessments of Individual Standards 11
  • 1. In what ways can the mini-assessments be scaled?
  • 2. How can provisional mastery classifications be

created based on the results of the mini- assessment results? This work is exploratory and presents a picture of

  • ur first efforts to tackle this unique type of

assessment in the context of fourth grade mathematics.

slide-12
SLIDE 12

Measures

4/16/2018 On-Demand Assessments of Individual Standards 12
  • Assessments of Fourth Grade Mathematics based
  • n the Common Core State Standards
  • Two types of on-demand, computer administered

assessments:

– 31 “mini-assessments” aligned to individual standards – A “general assessment” of the standards broadly (adaptive and vertically scaled)

slide-13
SLIDE 13 4/16/2018 On-Demand Assessments of Individual Standards 13

Mini-Assessments (31) General Assessment

Individual standards (e.g., 4.NBT.A.1) CCSS Fourth Grade Mathematics Flexibly administered Open Access to Items Secure Short & Fixed Form (7 Items) Longer & Adaptive (66 Items Max) Machine Scored, Instant Reporting Non-overlapping (no common items) Adaptive from the same item pool

  • Scale scores, CCSS domain

subscores, & classifications on individual standards

slide-14
SLIDE 14

Data

4/16/2018 On-Demand Assessments of Individual Standards 14
  • 2016-2017 academic year
  • 91,440 of the students taking at least one mini-

assessment & the general assessment

  • Mini-Assessments

– Approximate number of administrations per mini- assessment: ranges from 3,000 to 47,000, mean of 12,000 and a median of 8,000 – Approximate number of forms per student: ranges from 1 to 80, with a median of 6 and a mean of 7.6 (including re-tests)

slide-15
SLIDE 15 4/16/2018

Scaling the mini- assessments

15

RQ1

On-Demand Assessments of Individual Standards
slide-16
SLIDE 16

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 16

Conduct Rasch scaling, place the mini-assessments

  • nto:
  • the scale of the general assessment (via a fixed

theta calibration approach).

  • a single scale across all mini-assessments.
  • CCSS domain specific scales (5 in all).
  • individual scales for each mini-assessment.
slide-17
SLIDE 17

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 17

Conduct Rasch scaling, place the mini-assessments

  • nto:
  • the scale of the general assessment (via a fixed

theta calibration approach).

  • a single scale across all mini-assessments.
  • CCSS domain specific scales (5 in all).
  • individual scales for each mini-assessment.
slide-18
SLIDE 18

Domain Scaling Approach

4/16/2018 On-Demand Assessments of Individual Standards 18
  • Create unidimensional scales for each CCSS

Domain using the Rasch Model

  • Use a pooled item response matrix (item responses

from different time points and different administration patterns)

– Best case for detecting multidimensionality

slide-19
SLIDE 19

Domain Scaling Approach

4/16/2018 On-Demand Assessments of Individual Standards 19
  • Examine results in terms of:

– Unidimensionality via Principal Components Analysis

  • f Item Residuals

– Model Fit (Unweighted and Weighted Mean Squared Fit Statistics)

slide-20
SLIDE 20

Results - PCA

4/16/2018 On-Demand Assessments of Individual Standards 20

Does not exceed 2%

slide-21
SLIDE 21

Results – Item Fit (Weighted MS)

4/16/2018 On-Demand Assessments of Individual Standards 21

% <0.75 % > 1.33 # Items

Operations & Algebraic Thinking

0% 1% 72

Numbers & Operations - Base Ten

0% 0% 72

Numbers & Operations - Fractions

0% 0% 108

Measurement & Data

0% 2% 84

Geometry

3% 3% 36

Max 3% 3%

slide-22
SLIDE 22

Future Directions

4/16/2018 On-Demand Assessments of Individual Standards 22
  • Additional Dimensionality Investigations

– EFA – DIMTEST & DETECT – Comparison Data

  • Modeling Approaches

– Multigroup on time (e.g., month) – Selecting data that best matches recommended instructional sequences – Other models (e.g., treating the tests as attributes in a “system level DCM”; longitudinal Rasch model)

slide-23
SLIDE 23 4/16/2018

Creating Classifications

23

RQ2

On-Demand Assessments of Individual Standards
slide-24
SLIDE 24

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 24

Create Preliminary Cut Scores, and thus Student Classifications based on:

  • Cluster analysis (e.g., what DCMs devolve into

with one attribute)

  • Content Expert Judgments
  • The relationship between each mini-assessment

and the matching standard classification from the general assessment

slide-25
SLIDE 25

The Prediction Approach

4/16/2018 On-Demand Assessments of Individual Standards 25
  • Predict the probability of the “can do”

classification from the general assessment using the raw scores from the mini-assessment.

  • To do so, conduct quantile regression where

– The dependent variable is the probability of classification from the closest general assessment to the student’s mini-assessment administration – The independent variables are the mini-assessment raw score and the different between administrations (in days)

  • Evaluate at multiple probabilities & quantiles
slide-26
SLIDE 26 4/16/2018 On-Demand Assessments of Individual Standards 26

Probability of “Can Do” or Indicator Mastery Mini-Assessment 1A - Place Value Total Score

0.67 0.50

7.2

This value seems reasonable, but the value for P = 0.67 is outside of the range of most of the quantiles. To investigate further, we looked at the relationship, but only using data from the second half of the year.

slide-27
SLIDE 27 4/16/2018 On-Demand Assessments of Individual Standards 27

Probability of “Can Do” or Indicator Mastery Mini-Assessment 1A - Place Value Total Score

0.67 0.50

5.5

After January 1st, 2017

But… the quantile regression controlled for time?

slide-28
SLIDE 28

What’s going on?

4/16/2018 On-Demand Assessments of Individual Standards 28

In general, the probability of the general assessment classification rate increases over the year, while the mini-assessment total scores do not.

General Assessment Mini-Assessment

It comes down to the use case for each type of assessment.

slide-29
SLIDE 29

Future Directions

4/16/2018 On-Demand Assessments of Individual Standards 29

Further examine the time issue.

  • Re-sample to have

equal numbers of administrations by month?

  • Look at changes in

scores on the mini- assessments?