Developing Scale Scores & Cut Scores for On-Demand Assessments - - PowerPoint PPT Presentation

▶

developing scale scores cut scores for on demand

Developing Scale Scores & Cut Scores for On-Demand Assessments - - PowerPoint PPT Presentation

Dec 21, 2022 29 likes •325 views

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan Dadey 1 , Shuqin Tao 2 , and Leslie Keng 1 1 2 NCME - New York, NY April 16th, 2018 Context Much work has been done on improving a single

slide-1

SLIDE 1

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards

Nathan Dadey1, Shuqin Tao2, and Leslie Keng1

1 2

NCME - New York, NY April 16th, 2018

slide-2

SLIDE 2

Context

4/16/2018 On-Demand Assessments of Individual Standards 2

Much work has been done on improving a single

assessment, in terms of efficiency and information.

– Although the definition of an “assessment” continues to blur.

This work takes a different tack, instead examining

how scale scores and cut scores can be developed for a set of assessments, motivated by the ideas around the concept of a system of assessments.

slide-3

SLIDE 3 4/16/2018 On-Demand Assessments of Individual Standards 3

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity.

slide-4

SLIDE 4 4/16/2018 On-Demand Assessments of Individual Standards 4

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value

Say a student takes a quiz, or “mini- assessment” on place value at the beginning of the year.

slide-5

SLIDE 5 4/16/2018 On-Demand Assessments of Individual Standards 5

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value 2: Compare Whole Numbers

Then takes another mini-assessment

n whole numbers.

slide-6

SLIDE 6 4/16/2018 On-Demand Assessments of Individual Standards 6

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

1: Place Value 2: Compare Whole Numbers 3: Add and Subtract Whole Numbers

…

And so on….

slide-7

SLIDE 7 4/16/2018 On-Demand Assessments of Individual Standards 7

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example:

Let’s say the student also takes an “general” purpose assessment that surveys the full set

f standards.

… …

slide-8

SLIDE 8 4/16/2018 On-Demand Assessments of Individual Standards 8

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓

slide-9

SLIDE 9 4/16/2018 On-Demand Assessments of Individual Standards 9

Context, Continued (Grade 4 Math)

Key to this set of assessments is the idea of modularity. Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓

slide-10

SLIDE 10 4/16/2018 On-Demand Assessments of Individual Standards 10

Given data like this, how can we make sense of it? In particular, how can we develop scale scores and achievement-level classifications?

slide-11

SLIDE 11

Research Questions

4/16/2018 On-Demand Assessments of Individual Standards 11

1. In what ways can the mini-assessments be scaled?
2. How can provisional mastery classifications be

created based on the results of the mini- assessment results? This work is exploratory and presents a picture of

ur first efforts to tackle this unique type of

assessment in the context of fourth grade mathematics.

slide-12

SLIDE 12

Measures

4/16/2018 On-Demand Assessments of Individual Standards 12

Assessments of Fourth Grade Mathematics based
n the Common Core State Standards
Two types of on-demand, computer administered

assessments:

– 31 “mini-assessments” aligned to individual standards – A “general assessment” of the standards broadly (adaptive and vertically scaled)

slide-13

SLIDE 13 4/16/2018 On-Demand Assessments of Individual Standards 13

Mini-Assessments (31) General Assessment

Individual standards (e.g., 4.NBT.A.1) CCSS Fourth Grade Mathematics Flexibly administered Open Access to Items Secure Short & Fixed Form (7 Items) Longer & Adaptive (66 Items Max) Machine Scored, Instant Reporting Non-overlapping (no common items) Adaptive from the same item pool

Scale scores, CCSS domain

subscores, & classifications on individual standards

slide-14

SLIDE 14

Data

4/16/2018 On-Demand Assessments of Individual Standards 14

2016-2017 academic year
91,440 of the students taking at least one mini-

assessment & the general assessment

Mini-Assessments

– Approximate number of administrations per mini- assessment: ranges from 3,000 to 47,000, mean of 12,000 and a median of 8,000 – Approximate number of forms per student: ranges from 1 to 80, with a median of 6 and a mean of 7.6 (including re-tests)

slide-15

SLIDE 15 4/16/2018

Scaling the mini- assessments

15

RQ1

On-Demand Assessments of Individual Standards

slide-16

SLIDE 16

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 16

Conduct Rasch scaling, place the mini-assessments

nto:
the scale of the general assessment (via a fixed

theta calibration approach).

a single scale across all mini-assessments.
CCSS domain specific scales (5 in all).
individual scales for each mini-assessment.

slide-17

SLIDE 17

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 17

Conduct Rasch scaling, place the mini-assessments

nto:
the scale of the general assessment (via a fixed

theta calibration approach).

a single scale across all mini-assessments.
CCSS domain specific scales (5 in all).
individual scales for each mini-assessment.

slide-18

SLIDE 18

Domain Scaling Approach

4/16/2018 On-Demand Assessments of Individual Standards 18

Create unidimensional scales for each CCSS

Domain using the Rasch Model

Use a pooled item response matrix (item responses

from different time points and different administration patterns)

– Best case for detecting multidimensionality

slide-19

SLIDE 19

Domain Scaling Approach

4/16/2018 On-Demand Assessments of Individual Standards 19

Examine results in terms of:

– Unidimensionality via Principal Components Analysis

f Item Residuals

– Model Fit (Unweighted and Weighted Mean Squared Fit Statistics)

slide-20

SLIDE 20

Results - PCA

4/16/2018 On-Demand Assessments of Individual Standards 20

Does not exceed 2%

slide-21

SLIDE 21

Results – Item Fit (Weighted MS)

4/16/2018 On-Demand Assessments of Individual Standards 21

% <0.75 % > 1.33 # Items

Operations & Algebraic Thinking

0% 1% 72

Numbers & Operations - Base Ten

0% 0% 72

Numbers & Operations - Fractions

0% 0% 108

Measurement & Data

0% 2% 84

Geometry

3% 3% 36

Max 3% 3%

slide-22

SLIDE 22

Future Directions

4/16/2018 On-Demand Assessments of Individual Standards 22

Additional Dimensionality Investigations

– EFA – DIMTEST & DETECT – Comparison Data

Modeling Approaches

– Multigroup on time (e.g., month) – Selecting data that best matches recommended instructional sequences – Other models (e.g., treating the tests as attributes in a “system level DCM”; longitudinal Rasch model)

slide-23

SLIDE 23 4/16/2018

Creating Classifications

23

RQ2

On-Demand Assessments of Individual Standards

slide-24

SLIDE 24

One Set of Possible Approaches

4/16/2018 On-Demand Assessments of Individual Standards 24

Create Preliminary Cut Scores, and thus Student Classifications based on:

Cluster analysis (e.g., what DCMs devolve into

with one attribute)

Content Expert Judgments
The relationship between each mini-assessment

and the matching standard classification from the general assessment

slide-25

SLIDE 25

The Prediction Approach

4/16/2018 On-Demand Assessments of Individual Standards 25

Predict the probability of the “can do”

classification from the general assessment using the raw scores from the mini-assessment.

To do so, conduct quantile regression where

– The dependent variable is the probability of classification from the closest general assessment to the student’s mini-assessment administration – The independent variables are the mini-assessment raw score and the different between administrations (in days)

Evaluate at multiple probabilities & quantiles

slide-26

SLIDE 26 4/16/2018 On-Demand Assessments of Individual Standards 26

Probability of “Can Do” or Indicator Mastery Mini-Assessment 1A - Place Value Total Score

0.67 0.50

7.2

This value seems reasonable, but the value for P = 0.67 is outside of the range of most of the quantiles. To investigate further, we looked at the relationship, but only using data from the second half of the year.

slide-27

SLIDE 27 4/16/2018 On-Demand Assessments of Individual Standards 27

Probability of “Can Do” or Indicator Mastery Mini-Assessment 1A - Place Value Total Score

0.67 0.50

5.5

After January 1st, 2017

But… the quantile regression controlled for time?

slide-28

SLIDE 28

What’s going on?

4/16/2018 On-Demand Assessments of Individual Standards 28

In general, the probability of the general assessment classification rate increases over the year, while the mini-assessment total scores do not.

General Assessment Mini-Assessment

It comes down to the use case for each type of assessment.

slide-29

SLIDE 29

Future Directions

4/16/2018 On-Demand Assessments of Individual Standards 29

Further examine the time issue.

Re-sample to have

equal numbers of administrations by month?

Look at changes in

scores on the mini- assessments?