Integrated Profile Method: An Innovative Approach in Standard - - PowerPoint PPT Presentation

integrated profile method an innovative approach in
SMART_READER_LITE
LIVE PREVIEW

Integrated Profile Method: An Innovative Approach in Standard - - PowerPoint PPT Presentation

Integrated Profile Method: An Innovative Approach in Standard Setting National Conference on Student Assessment Dr. Liru Zhang liru.zhang@doe.k12.de.us Theresa Bennett Theresa.bennett@doe.k12.de.us June 27-29, 2018 Standard Setting for


slide-1
SLIDE 1

Integrated Profile Method: An Innovative Approach in Standard Setting

National Conference on Student Assessment

  • Dr. Liru Zhang liru.zhang@doe.k12.de.us

Theresa Bennett Theresa.bennett@doe.k12.de.us June 27-29, 2018

slide-2
SLIDE 2

Standard Setting for Performance Assessments

  • Standard setting is a critical step in the design of high-stakes testing

programs (Kane, 2001). With advanced technology and increasing use of performance assessments, the standard setting methods have evolved in the past decades to meet the challenges, such as to verify the defensibility of cut scores and provide evidence to validate the process (Zieky, 2001).

  • For battery tests and performance tasks (e.g., CR items, direct

writing, and scientific simulation), multiple scores are usually reported to capture all of the aspects of an examinee’s performance. Those scores generally form a profile.

  • The context of the judgmental policy capturing (Jaeger, 1994) is

judgment-centered method based on an overall review of profiles of scores to classify each profile into pass or fail category.

2

slide-3
SLIDE 3

Profile Approach

  • Profile approach has been used to set performance standards for

the Kindergarten Readiness Assessment in GA (Donahue, et al., 2000), the California Alternate Performance Assessment (Morgan, 2004), and for redesigned AP tests (Morgan, et al., 2015).

  • In practice, the profile-based approach provides a visual

representation of student performance in the form of a profile to facilitate panelists’ review and evaluation. In some cases, multiple profiles are presented by raw scores into an ordered profile packet, the panelists could place bookmarks for achievement levels.

  • With the profile approach, the strategy, such as compensatory or

non-compensatory; conjunctive or disjunctive, should be decided for decision making.

3

slide-4
SLIDE 4

Integrated Profile Method (1)

  • The Integrated Profile Method (IPM) is an innovative approach to set

cut scores for performance-based assessments with limited number

  • f tasks.
  • The performance task (PT) is typically designed to measure different

components or dimensions in terms of content categories at various cognitive complexity levels. PT often derives multiple scores.

  • Those scores may or may not be aggregated depending the test

design and the scoring process.

4

slide-5
SLIDE 5

Integrated Profile Method (2)

  • To implement the hybrid model, a two-phase procedure is developed

in general. In Phase I, the participants review a sample of responses with full performance continuum by component to identify the minimally acceptable score based on the non-compensatory approach. In Phase II, the participants focus their review on the overall performance by task to distinguish proficient from non-proficient profiles based on the compensatory approach. Two examples

5

slide-6
SLIDE 6

Two Examples

Assessment A

  • Component 1

One Task - Component 2

  • Component 3

Assessment B

  • Comp. 1a
  • T1 - Comp. 1b
  • Comp. 1c

Two Tasks

  • Comp. 2a
  • T2 - Comp. 2b
  • Comp. 2c

6

slide-7
SLIDE 7

Integrated Profile Method (3)

The notion of replicability is central to standard setting regardless of the specific context or standard setting method (AERA, APA, NCME, 1999).

  • The IPM is designed to replicate with multiple groups to examine the

generalizability of the performance standards (or cut scores) and provide validity evidence for the standard setting process.

  • The IPM employs the two-phase design that is enable the

participants to establish the decision rules in a more efficient manner.

  • The IPM comprises the two-round process in each phase to provide

participants sufficient time for individual review and multiple

  • pportunities to discuss professional judgments for decisions.

7

slide-8
SLIDE 8

An Application of IPM

  • The redesigned SAT Essay measures how well students understand

the passage and uses it as the basis for a well-written, thought-out

  • discussion. In operation, examinees read the passage (topic) that is

adapted from previous publications, analyze information and evidence, and write an essay within 50 minutes.

  • The quality of an essay is evaluated in three categories and

awarded 1-4 points of each. Examinees receive three non- aggregated dimension scores of reading, analysis, and writing, ranging from 2 to 8 points by two raters.

  • Due to the inconsistency of hand scoring between raters and

misclassifications on the narrow raw-score scale (2-8), only two achievement levels, Proficient or Non-Proficient, are reported.

8

slide-9
SLIDE 9

The Essay

9

slide-10
SLIDE 10

Achievement Level Descriptors

10

slide-11
SLIDE 11

Achievement Level Descriptor – Level 1

11

slide-12
SLIDE 12

Achievement Level Descriptor – Level 2

12

slide-13
SLIDE 13

Achievement Level Descriptor – Level 3

13

slide-14
SLIDE 14

Achievement Level Descriptor – Level 4

14

slide-15
SLIDE 15

SAT Essay Standard Setting

On February 23-24, 2017:

  • Educators representing 15 districts and charters

engaged in Standard Setting

  • Administrators from 6 districts and charters, served as
  • bservers to give feedback on the process
  • Institutes of High Education included in the process
  • Literacy Cadre Instructional Coaches served as

content-expert facilitators to ensure content aligns to the standards

15

slide-16
SLIDE 16

Roles and Responsibilities of Table Leaders

  • Made sure all participants completed a

confidentiality agreement

  • Facilitated discussions both small group and

large group

  • Made sure participants understand and stay
  • n task
  • Documented results of each phase of

standard setting

  • Disseminated and collected all materials
  • Gave feedback on the process
  • Reported breaches of confidentiality

16

slide-17
SLIDE 17

Process and Results (1)

In 2016, Delaware adopted SAT as the high-school assessments. Essay scores are used as a supplemental indicator for the high-stakes

  • accountability. The two-day standard setting was held in February,

2017.

  • The panel consists of 26 participants from public schools and higher
  • education. The majority of them were classroom teachers with

expertise in writing at the high-school level. The participants were assigned into four groups with 6-7 of each.

  • A sample of 141 essays was randomly selected based on observed

profiles from grade 11 on the 2016 School Day. The essay sample was split into four packages, 30-35 of each with 5-7 overlapping essays.

  • Three half-day trainings were designed to fit the needs: The Group

Leader training, Phase I training, and Phase II training.

17

slide-18
SLIDE 18

Standard Setting (2)

  • In Phase I, the panelists build a better understanding of the

dimension scores, the scoring rubric, and student performance through a careful review of each dimension of essays. From the first round to the second round, the range of minimally acceptable dimension scores was noticeably narrowed down (Table 1). The median of the panel ratings changed from 4, 4, 4 to 5, 4, 5 respectively for reading, analysis, and writing.

  • Identified dimension scores served as the starting points in Phase II,

which efficiently reduced the scope of profiles to facilitate the

  • process. The review of the overall essay quality helped panelists

comprehend the uniqueness of each dimension, and their connections and contributions to quality writing. The panel focused

  • n borderline performance and meaningful profiles to achieve the

decision rules (Table 2).

18

slide-19
SLIDE 19

Summary of Phase I

19

Dimension Score Round One Round Two Reading Analysis Writing Reading Analysis Writing

3 6 4 0.0 23.1 0.0 0.0 15.4 0.0 4 14 14 16 5 18 12 53.8 53.8 61.5 19.2 69.2 46.2 5 9 5 9 21 4 14 34.6 19.2 34.6 80.0 15.4 53.8 6 3 1 1 11.5 3.8 3.8 0.0 0.0 0.0 Median 4 (4-6) 4 (3-6) 4 (4-6) 5 (4-5) 4 (3-5) 5 (4-5)

slide-20
SLIDE 20

Impact Data in Phase I

20

Score Reading Analysis Writing N c% Impact N c% Impact N c% Impact

2 427 5.7 2321 31.2 691 9.3 3 786 16.3 1402 50.0 911 21.5 4 2113 44.7 1735 73.3 50% 2029 48.8 5 1848 69.5 55% 1082 87.9 1594 70.2 51% 6 1794 93.6 657 96.7 1802 94.4 7 376 98.7 186 99.2 324 98.8 8 99 100.0 60 100.0 92 100.0 Total 7443 7443 7443

slide-21
SLIDE 21

Summary of Phase II

21

Round One Round Two Condition 1 Condition 2 Sum N (%) Condition 1 Condition 2 Sum N (%) DS ≥ 3 13, 13-14 7 (.27) DS ≥ 3 13 8 (.31) DS ≥ 3

  • ne

condition 13, 12-14 5 (.19) DS ≥ 3 13-14 6 (.23) AS ≥ 3 11, 12 11-12 5 (.19) DS ≥ 3 14 2 (.08) Other or multiple conditions 12-14 9 (.35) Other or multiple conditions 14 10 (.38) DS ≥ 3 13-14

slide-22
SLIDE 22

Impact Data in Phase II

22

Descriptive Statistics for 2016 SAT on School Day Score N Minimum Maximum Mean SD Reading 7443 2 8 4.7 1.305 Analysis 7443 2 8 3.6 1.463 Writing 7443 2 8 4.6 1.393 HES 7443 6 24 12.9 3.817

Sum ≥ 14 and Dimension Scores ≥ 3 42% Proficient Sum ≥ 13 and Dimension Scores ≥ 3 49% proficient

slide-23
SLIDE 23

Two-Year Impact Data

23

Test Form 2016 2017 N Mean SD Prof. N Mean SD Prof. Major Forms 8041 12.7 3.84 47 8566 13.17 3.46 55 Form 1 204 11.01 3.13 28 235 12.40 3.84 43 Form 2 394 9.83 3.21 17 550 10.21 3.18 19 Form 3 7443 12.9 3.82 49 7671 13.46 3.35 58 Form 4

  • 110

9.88 2.98 16

slide-24
SLIDE 24

Final Notes

  • As with other aspects of an assessment program, it is important to

provide information and evidence regarding the generalizability of the results of a standard setting (Kingston, 2001).

  • Additional validity evidence should be collected from participants to

support the process and the performance standards.

  • It is important to note that the sample profiles should be randomly

selected from observed profiles, including typical and unique profiles, observed and non-observed profiles.

  • The more tasks, more components of each task, and more score

points of each components, the more potential profiles.

  • Standard setting is not the process for ‘re-scoring’. Carefully select

and pre-review sample responses are needed.

24