Efficiency of Scoring Innovative Items in Educational Assessment - - PowerPoint PPT Presentation

efficiency of scoring innovative items in educational
SMART_READER_LITE
LIVE PREVIEW

Efficiency of Scoring Innovative Items in Educational Assessment - - PowerPoint PPT Presentation

Efficiency of Scoring Innovative Items in Educational Assessment Shudong Wang NWEA Paper presented at the NCSA National Conference on Student Assessment June 24-26, 2019, Orlando, Florida I. Introduction Choosing Item Format/Type in


slide-1
SLIDE 1

Efficiency of Scoring Innovative Items in Educational Assessment

Shudong Wang NWEA Paper presented at the NCSA National Conference on Student Assessment June 24-26, 2019, Orlando, Florida

slide-2
SLIDE 2

2

▪ Choosing Item Format/Type in Assessment

✓ Selected-response/Objective scoring ✓ Constructed-response/Objective scoring ✓ Constructed-response/Subjective scoring

▪ Computer Use in Education and Technology-enhanced Items

(TEI) Types (Zenisky & Sireci, 2002; Bennett, 1993)

✓ Selection/identification (drag-and-drop, hot-spot) ✓ Reordering/rearrangement (concept-mapping, create-a-tree) ✓ Completion (graphical modeling, mathematical expressions) ✓ Construction (generating examples, formulating hypotheses, essay/short

answer, passage-editing)

✓ Presentation (problem-solving vignettes, role play)

  • I. Introduction
slide-3
SLIDE 3

3

Graphic Gap Match Choice Multiple

slide-4
SLIDE 4

4

▪ Advantages and Disadvantages of Multiple-choice (MC) Items

and TEIs

MC:

✓ Advantages: efficient administration, automated scoring, broad content coverage, and high reliability ✓ Disadvantages: difficult to write MC items that evoke complex cognitive processes

TEI:

✓ Advantages: improved construct representation - facilitate more authentic and direct measurement of knowledge, skills, and abilities (KSA) than the MC format allows - higher fidelity

✓ Disadvantages: source of construct irrelevant variance, such as computer literacy

▪ Five Dimensions of TEI

Item format

Response action

Media inclusion

Level of interactivity

Scoring method

slide-5
SLIDE 5

5

▪ Relationship between Score (D vs. P) and Item Type (MC vs.

TEI)

TEI MC D & P

Dichotomous (D) Polytomous (P)

There are three commonly used scoring methods for TEI (N is number of components): 1. N Method 2. N/2 Method 3. All or Nothing Method (AONM)

D_D D_P P_P

slide-6
SLIDE 6

6

AONM:

0=0; 1=1, 2, 3, 4 0=0, 1; 1=2, 3, 4 0=0, 1, 2; 1=3, 4 0=0, 1, 2, 3; 1=4

D: 1 P: 1 2 3 4 Score Dichotomous (D) Polytomous (P) Total Components (N) Total Categories Response D1 D2 D3 D4 N N/2 1 2 1 1 4 5 1 1 1 2 1 1 2 1 3 1 1 1 3 1 4 1 1 1 1 4 2 Table 1. Examples of Different Scoring Methods

D_D D_P P_P

slide-7
SLIDE 7

7 Type of Research Response Time Involved Item level Test level Results (Efficiency*) Relationship between Dichotomous (D) and Polytomous (P) 1 No Yes Yes P is better than D 2 Yes Yes Yes D is better than P Partial Credit Scoring Method 3 Yes/No Yes Yes Optimal is better than both N and N/2 methods Relationship between Dichotomous-D (D-D) and Dichotomous-P (D-P) 4 No Yes Yes

? ▪ Review of Researches on Scoring Methods for TEI Types

*: The efficiency is defined as the mean weighted item information divided by the average time spent on an item within an item type (Wan and Henly (2012). 1 & 2: Ripkey and Case,1996; Jiao etal.,2012; Bauer etal.,2011; Ben-Simonetal.,1997; Wan and Henly, 2012. 3: Muckle, Becker, & Wu, 2011; Becker & Soni, 2013; Lorié, 2014; Clyne, 2015; Tao, 2018; Tao & Mix, 2017. 4: Current research

Table 2: Types of Researches

slide-8
SLIDE 8

8

Purposes of This Study:

To investigate the efficiency of scoring method on TEIs in educational assessments

slide-9
SLIDE 9

9

1. Monte Carlo technique seems to be an appropriate choice, and both descriptive methods and inferential procedures are used in this study. 2. Independent Variable:

Scoring method (MC, CR3, 1CR4, 2CR4, 1CR5, 2CR5, 3CR5) in Table 2.

  • 3. Dependent Variables

p-Value, point-biserial, KR20 reliability, test information, and test efficiency (ratio of test information between two tests)

  • II. Method

Scoring Method MC CR3 1CR4 2CR4 1CR5 2CR5 3CR5 Type of Item MC CR3 CR4 CR5 N of Category 2 3 4 5 Original Response String (ORS) 0, 1 0,1,2 0,1,2,3 0,1,2,3,4 New Response String (NRS) 0,1 0,1 0,1 0,1 0,1 0,1 0,1 Collapse Rule to generate NRS None 0=(0) 0=(0) 0=(0,1) 0=(0) 0=(0,1) 0=(0,1,2) 1=(1,2) 1=(1,2,3) 1=(2,3) 1=(1,2,3,4) 1=(2,3,4) 1=(3,4)

Table 3: Scoring Method

slide-10
SLIDE 10

10

  • 4. Major Steps of Simulation

Step 1: Generate person (2000) and item parameter (20 for each of scoring methods) for each of tests MC(20) + CR3(20), MC(20) + CR4(20), MC(20) + CR5(20) Step 2: Generate items responses based on Rasch and PCM models for each of 40 item tests Step 3: Collapse original CR response strings into MC (D-P) response strings by different scoring methods used different collapsing rules Step 4: Calibrate item parameters by fixing person parameters Step 5: Repeat Step 2 to 4 for 100 times (100 simulated tests) and for each of 100 replications (tests), person parameters are different and item parameters are fixed across 100 replications Step 6: Calculate item and test statistics by the CTT and IRT methods (five types of dependent variables) based on results obtained from Step 4 and 5.

slide-11
SLIDE 11

11

  • III. Results
  • 1. Item/Test Analysis Results from CTT

Scoring Method p-Value Point-biserial KR20 D MC 0.52 0.44 0.78 D_CR3 0.67 0.46 0.81 D_1CR4 0.71 0.50 0.84 D_2CR4 0.50 0.55 0.88 D_1CR5 0.72 0.48 0.80 D_2CR5 0.58 0.51 0.84 D_3CR5 0.47 0.50 0.84 Table 4. Overall Means (20 Items) of p-Value, Point-biserial, and KR20 for Different Scoring Methods

slide-12
SLIDE 12

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Test Infomation

  • 4
  • 3
  • 2
  • 1

1 2 3 4

theta

Inf_CR5 Inf_CR4 Inf_CR3 Inf_MC

Test Information of Different Type of Item Responses (Dichotomou to Polytomous)

Figure 1. Person Test Information from Both Dichotomous and Polytomous Responses Based on True Item Parameters for a Given Test 1 (Replication 1)

  • 2. Item/Test Analysis Results from IRT
slide-13
SLIDE 13

13

Figure 2. Person Test Information from Dichotomous Responses Based on Estimated Item Parameters by Different Scoring Methods for a Given Test 80 (Replication 80)

1 2 3 4 5

Test Infomation

  • 4
  • 3
  • 2
  • 1

1 2 3 4

theta

Inf_D_3CR5 Inf_D_2CR5 Inf_D_1CR5 Inf_D_2CR4 Inf_D_1CR4 Inf_D_CR3 Inf_D_MC

Test Information from Dichotomous Responses

slide-14
SLIDE 14

14

  • 4
  • 3
  • 2
  • 1

1 2 3 4

Theta

1 2 3

Relative Efficiency

EF_D_3CR5 EF_D_2CR5 EF_D_1CR5 EF_D_2CR4 EF_D_1CR4 EF_D_CR3

Rative Efficiency of Person Tests of Dichotomous Responses

Figure 3. Relative Efficiency of Person Tests with Non-MC Dichotomous Responses Items Over MC Responses Based on Esitmated Item Parameters by Different Scoring Methods For a Given Test 80 (Replication 80)

slide-15
SLIDE 15

15 Dependent Variable Type Scoring Method N MIN MAX MEAN STD SEM Information I inf_MC 100 3.54 3.63 3.58 0.01 0.53 inf_CR3 100 6.79 6.91 6.84 0.02 0.38 inf_CR4 100 11.60 11.91 11.73 0.05 0.29 inf_CR5 100 12.43 12.56 12.49 0.02 0.28 inf_D_MC 100 3.58 3.68 3.62 0.02 0.53 II inf_D_CR3 100 2.92 3.02 2.96 0.02 0.58 inf_D_1CR4 100 2.69 2.83 2.76 0.03 0.60 inf_D_2CR4 100 3.20 3.25 3.22 0.01 0.56 inf_D_1CR5 100 2.23 2.70 2.47 0.07 0.64 inf_D_2CR5 100 2.24 2.51 2.39 0.11 0.65 inf_D_3CR5 100 2.18 2.25 2.21 0.01 0.67 I EF_CR3 100 1.92 1.94 1.93 0.00 EF_CR4 100 3.25 3.27 3.26 0.00 EF_CR5 100 3.53 3.60 3.57 0.01 Efficiency II EF_D_CR3 100 0.81 0.84 0.82 0.01 EF_D_1CR4 100 0.75 0.79 0.77 0.01 EF_D_2CR4 100 0.89 0.91 0.90 0.00 EF_D_1CR5 100 0.63 0.76 0.69 0.02 EF_D_2CR5 100 0.63 0.72 0.68 0.03 EF_D_3CR5 100 0.61 0.65 0.64 0.01 III EF_D_CR3M 100 0.43 0.45 0.44 0.00 EF_D_1CR4M 100 0.24 0.25 0.24 0.00 EF_D_2CR4M 100 0.28 0.28 0.28 0.00 EF_D_1CR5M 100 0.18 0.22 0.20 0.01 EF_D_2CR5M 100 0.18 0.20 0.19 0.01 EF_D_3CR5M 100 0.17 0.18 0.18 0.00

Table 5. Overall Average

  • f Test Information and

Efficiency for Different Scoring Methods

slide-16
SLIDE 16

16

3. Inferential Statistics Results Statistical Hypotheses: There are no effects of the scoring method on all the dependent variables used in different simulation conditions. All hypotheses have been rejected, meaning, scoring method makes difference for any given dependent variable. Summary of Results ▪ Efficiency of person scores increases as number of category of item responses increases ▪ On average, information of D_P responses is less than that of D_D responses ▪ Based on the simulation conditions, for D_P scoring methods, optimal number

  • f category of items is 4, not 5.
slide-17
SLIDE 17

17

  • IV. Conclusions

1.

Different scoring methods have impact on efficiency of scores

2.

Scoring TEI as MC does not increase efficiency

3.

Large number of categories (or components) are not necessarily the best choice for D_P scoring method

slide-18
SLIDE 18

Thank you ! For any question:

Shudong.wang@NWEA.org

18