Data Forensics: Review of Findings and Use of Realistic Simulation - - PowerPoint PPT Presentation

data forensics review of findings and use of realistic
SMART_READER_LITE
LIVE PREVIEW

Data Forensics: Review of Findings and Use of Realistic Simulation - - PowerPoint PPT Presentation

Data Forensics: Review of Findings and Use of Realistic Simulation Mayuko Simon Christie Plackner David Chayer Data Recognition Corporation June, 2014 Introduction What we have learned thus far Experimentation


slide-1
SLIDE 1

Data Forensics: Review of Findings and Use of Realistic Simulation

  • Mayuko Simon
  • Christie Plackner
  • David Chayer

Data Recognition Corporation

  • June, 2014
slide-2
SLIDE 2
  • Introduction
  • What we have learned thus far
  • Experimentation with realistic

simulation

slide-3
SLIDE 3
  • Supporting clients with multiple measures to allow for more

information and perspective on the data

  • Focus is not on students
  • Aware of emerging guidelines and best practices
  • TILSA Test Security Guidebook (Olson & Fremer, 2013)
  • CCSSO Operational Best Practices, part 2 (September, 2013)
  • Testing Integrity Symposium: Issues and Recommendations for Best

Practice (U.S. Dept of Education, Institute of Education Sciences, National Center for Education Statistics (2013)

  • Testing and Data Integrity in the Administration of Statewide Student

Assessment Programs (NCME, October 2012)

  • Handbook of Test Security (Wollack & Fremer, 2013)
  • Conference on the Statistical Detection of Test Fraud
  • Test Fraud: Statistical Detection and Methodology (Kingston & Clark, 2014)
slide-4
SLIDE 4

A Sample of Forensic Methods

  • Erasure
  • Scale Score
  • Pattern Analysis
  • Model Fit
  • Local Outlier Detection
slide-5
SLIDE 5

WR Erasure Distribution

  • Wrong-to-right (WR)

erasure rate higher than expected from random events

  • The baseline for the

erasure analysis is the state average

  • Statistical test resulting

in an Outlier Score

slide-6
SLIDE 6

Erasure Map: Typical Behavior

Secure ID Total WR Math WR Read WR MLevel MLevel* RLevel RLevel* 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 21 3 2 1 Bel Bel Pro Pro 4* 3 A 1 A* 3 2 D C 4 1 B 3 A B 3 2 D A 3 3 2 4 3 22 3 2 1 Bel Bel Bel Bel 2 1 3 4* 2 2 A 3 2 4 1 3 D 2 4 1 D 3 A C* C D 2 C 23 3 3 0 Adv Adv Adv Adv C* 1 3 1 A 2* 2 D C C D B D A B B A* D B C B C 2* A 24 3 1 2 Adv Adv Adv Adv C B A B A 2 A D C C D B 2 A B B A 2 B C B C D A 25 3 1 2 Bel Bel Bas Bas 1 B A 4 2* 1 4 3 4 2 D 4 D 1 1 C B 4 1 A B 1 1 1 26 3 2 1 Pro Pro Bas Bel C B 3 1 A D 2 D 2 C 1 B D 4 A B D B* A C C 2 C 1 27 3 1 2 Bas Bas Bel Bel C B A B A 1 A D 4 1 1 4 1 1 2 4 2 A C 4 1 D 2 4 28 3 2 1 Bel Bel Bel Bel C* B 2 4 2 1 2 D 4 4 D* 3 1 A 4 4 2 4 1 2 C 3 B A 29 3 2 1 Bel Bel Bel Bel 2* 1 3 B 2 2 2 1 C 2 1 1 D 1 B* C 3 B C 4 B B A 30 2 2 0 Bas Bas Bel Bel C* B 4 B A 1 2* D 1 2 D B 2 A B 4 B D 4 D 2 A 1 2 31 2 2 0 Adv Pro Bel Bel 2 1 2 4 2 2 4 1 2 1 3 B D 2 4 3 D 4 C A C D 2 B 32 2 2 0 Bel Bel Bel Bel 1 1 2 B 3 2 3 D C 4 D B 3 4* 2 4 2 2 D C 1 4 3 2 33 2 2 Adv Adv Adv Adv C B A B A 2* A D C C D B D A B B A D A A D D B A 34 2 1 1 Bas Bel Pro Pro C B 2 B 4 2 3 1 2 C D 1 2 2 4 C A 3 C C D 1 B 4 35 2 2 0 Bel Bel Bel Bel C* B 3 B 2 D 3 2 4 1 3 B 3 4 B 4 3 1 B 3 2 A 1 3 36 2 2 0 Bas Bel Pro Pro 2 4 4 3 4 2 3 D 2 1 2 B D A 1 4 B D* 4 3 2 A D 3 37 2 2 Bel Bel Bas Bel 4 3 2 3 2 1 3 D C 2 1 B D 3 B 3 A D 3 3 D D 1 4 38 2 1 1 Bel Bel Bas Bas 2 4 3 1 4 2 2 D C 2 1 4 3 3 2 4 B 4 3 A 4 3 B 1 39 2 2 0 Bas Bas Bel Bel 2 1 2 4 2 2 3 2 C 2 1 B 2 C B 1 4 3 2 4 3 2 1 C 40 1 1 Bas Bas Pro Pro C B 3 3 A 2 3 1 2 4 1 B D 4 B 4 B D 1 2 2 4 D A Math Session 3

slide-7
SLIDE 7

Erasure Map: Atypical Behavior

Secure ID Total WR Math WR Read WR MLevel MLevel* RLevel RLevel* 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 1 21 16 5 Pro Bas Pro Pro 2* B A* B A 3 A D* 2 4 1 1 1 3 B 1 A 1 3 C B 4 D A 2 20 11 9 Adv Adv Adv Pro C* B A B A* 2 2 D C C D B D* 4 1 C 1 D B 2* A* B B A 3 20 20 0 Adv Bas Bel Bel C 4 A* B* A D A 2 C* C* D B D 2 C 4 2 1 D* 2 C B B 4 4 19 18 1 Pro Bas Adv Adv 2 4 3 B A* 2* 4 D* C C* D 3 D 2 B* B A 3* B* C* 4 C* D A 5 19 13 6 Adv Pro Adv Pro C B A B A* D 2 D 2 C* D B D* A B 4 3 D 3 3 C 4 D 2 6 19 14 5 Adv Pro Pro Pro C* B* 2 1 A* D A D C* 4 D B D 3 1 2 3 1 B C A B B 3 7 18 8 10 Adv Pro Adv Pro 1 B 3 B A D 3* D 2 C 1 4 3 C B 4 3 2 2 C D C 1 4 8 18 12 6 Adv Pro Adv Pro C B* A 1 A 2 A D* C 4 2 B D A C 4 D* 4 3 C C B* B 4* 9 18 11 7 Adv Pro Adv Pro 2 1 A 3 4 2 A 1 1 2 D B D 4 1 4 B 3 4 2 2 A 3 4 10 17 11 6 Adv Pro Adv Pro C 3 A B A 2 A 1 C* C* D B 1 C A B D B A C C D C 1 11 17 8 9 Adv Adv Adv Adv C B A B* A D 3 D C C D B D C 2 B D A C 2 C D C B 12 17 11 6 Adv Pro Adv Pro C B A* B A* 2 A* D* 2 C D B D* C A 1 D B A C C D C C 13 16 9 7 Adv Pro Adv Adv 4 B A 4 A D 2 D C C D B 1 C A B D A C 2 1 D C B 14 16 3 13 Adv Adv Adv Pro 1 B A 3 2 D A D C C D B D A B C B D* B 2 2 A D A 15 16 11 5 Adv Adv Adv Adv C* B A B A* 2* A D C* C* D B D 4 B C D 3 B C A B B A 16 16 4 12 Adv Adv Adv Pro C* B A 3 A D A D C C D B D C A B D B A C C D C C 17 16 10 6 Adv Adv Adv Adv C* B A 1 A D 2 D 2 C D B D A B B A D B C B C D A 18 16 8 8 Bel Bel Bas Bel 1 3* A 3 A 3 A 3 4 C 2 4 1 3* B C 1 3 B C 2 4 B 4 19 16 13 3 Adv Pro Adv Adv C B A B A 1 A D C C D B D 3 C 2 B* A D A 3 D B 2 20 15 14 1 Adv Pro Pro Pro C* B A* B A D A D C* C D B* D 3 2 A 3 2 2 C 4 3 B 3 Math Session 3

slide-8
SLIDE 8

Erasure by Test Mode

  • Erasure behavior could be different

by mode

Primoli & Liassou (2013)

slide-9
SLIDE 9

Scale Score Changes

  • Scale score changes statistically higher or lower

than the previous year

  • Cohort and Non-cohort
  • Statistical test resulting in an Outlier Score
slide-10
SLIDE 10

Pattern Analysis

  • Modified Jacob and Levitt
  • Combination of two indicators:

– Index 1: unexpected test score fluctuations across years using a cohort of students, and – Index 2: unexpected patterns in student answers

  • Modified application of Jacob and Levitt

(2003) – 2 years of data – Sample size

slide-11
SLIDE 11

Measurement Model Misfit

  • Performed better or worse than

expected

  • Rasch residuals summed across
  • perational items and students
slide-12
SLIDE 12

Regression Based Local Outlier Detection

  • We wish to find schools that are very

similar to the peers in most respects (in terms of most independent variables) but differ significantly in current year’s score (the dependent variable).

12

slide-13
SLIDE 13

RegLOD Example: Grade 4 Reading

13 2011 Reading (G4) 2010 Reading (G4) 2010 Cohort Math (G3) 2010 Cohort Reading (G3) 2010 Math (G4) 2011 Math (G4)

DV IV

R2 = 0.99

slide-14
SLIDE 14

RegLOD Findings

  • RegLOD have shown great promise
  • Its applicability is not limited to cheating

detection in educational testing

  • Given its robust design - specifically its model-

based design (the concept of dependent and independent variables in data mining) - and ability to adapt makes it applicable to a wide range of outlier detection problems

  • We continue to study its capabilities, extend and

apply it to other contexts and tasks

14

slide-15
SLIDE 15

Multiple Methods Comparison

  • Used PCA to determine if multiple

methods can be reduced for an efficient approach

  • All methods seem to account for

variation in detecting test taking irregularities

  • Accounting for the most

– Cohort regression – Cohort scale score change – Cohort performance level change

slide-16
SLIDE 16

What We Need to Try…

  • Using empirical data has a drawback: we

don’t know how accurate we are detecting aberrant behavior

  • Typical simulation study uses simulation

data but it is not real data

  • Solution: use real data and simulate

aberrant behavior to examine the sensitivity of methods.

slide-17
SLIDE 17

Realistic Simulation Design

slide-18
SLIDE 18

Detection Techniques

1. Erasure analysis 2. Scale score (SS) analysis (non-cohort) 3. Cohort scale score (SSco) analysis 4. Measurement Model Misfit 5. Modified Jacob and Levitt: Index 2

slide-19
SLIDE 19

Before and After

slide-20
SLIDE 20

Sensitivity: Consistent Cheating

Median

6 Erasure 100 SS 7 SSco Rasch 1 MJL 24 12 Erasure 100 SS 13 SSco 4 Rasch 7 MJL 40 18 Erasure 100 SS 29 SSco 30 Rasch 10 MJL 44

slide-21
SLIDE 21

Sensitivity: Copying Cheating

Median 6 Erasure 72 SS SSco Rasch 1 MJL 18 12 Erasure 97 SS 3 SSco Rasch 1 MJL 32 18 Erasure 99 SS 8 SSco 1 Rasch 3 MJL 41

slide-22
SLIDE 22

Sensitivity: Random Cheating

Median 6 Erasure 100 SS 20 SSco 21 Rasch MJL 6 12 Erasure 100 SS 63 SSco 68 Rasch MJL 31 18 Erasure 100 SS 88 SSco 93 Rasch MJL 44

slide-23
SLIDE 23

Sensitivity: Ability Cheating

Median 6 Erasure 89 SS 3 SSco Rasch MJL 9 12 Erasure 99 SS 7 SSco 1 Rasch MJL 21 18 Erasure 100 SS 14 SSco 8 Rasch MJL 28

slide-24
SLIDE 24

Realistic Simulation Results

  • WR erasure analysis is most sensitive methods, if all the

erasures are actually captured

  • Scale score and Scale score cohort works well with random

cheating – approximately twice of MJL

  • MJL works well with copying cheating behavior where scale

score methods fail to detect

  • With realistic simulation, it was clear to see sensitivity

differences among detection techniques

  • The results support argument to use multiple techniques

for detecting test fraud

slide-25
SLIDE 25

Conclusion

  • With real data, we could be more

certain how well we are detecting aberrant behavior.

  • We can also tell which methods

works in what situation.