Developing Automated Scoring for Large-scale Assessments of Three-dimensional Learning
Jay Thomas1, Ellen Holste2, Karen Draney3, Shruti Bathia3, and Charles W. Anderson2
1.
ACT, Inc.
2.
Michigan State University
3.
UC Berkeley, BEAR Center
Developing Automated Scoring for Large-scale Assessments of - - PowerPoint PPT Presentation
Developing Automated Scoring for Large-scale Assessments of Three-dimensional Learning Jay Thomas 1 , Ellen Holste 2 , Karen Draney 3 , Shruti Bathia 3 , and Charles W. Anderson 2 1. ACT, Inc. Michigan State University 2. UC Berkeley, BEAR
Jay Thomas1, Ellen Holste2, Karen Draney3, Shruti Bathia3, and Charles W. Anderson2
1.
ACT, Inc.
2.
Michigan State University
3.
UC Berkeley, BEAR Center
Based on the NRC Developing Assessments for the Next Generation Science Standards (Pellegrino et al, 2014)
Example of a Carbon TIME Item
the confidence that we have classified students correctly
is a practice that we are focusing
requires CR to assess the construct fully
Item Development Students respond to Items WEW (Rubric) Development Using WEW (Human scoring) to create training set Creating Machine Learning (ML) Models Using ML Model (Computer scoring) Backcheck coding (human) QWK Check for Reliability Psychometric Analysis (IRT, WLE) Interpretation by larger research group
Processes moving towards final interpretation Feedback loops that indicate that a question, rubric,
addressed
increase power of statistics
through back-checking samples and revising models
envisioned by Pellegrino et al (2014) for NGSS can be reached at scale with low cost
As of March 6, 2019
School Year Responses Scored 2015-16 175,265 2016-17 532,825 2017-18 693,086 2018-19 227,041 TOTAL 1,628,217 Cost Savings and scalability Labor hours needed to human score responses @ 100 per hour
Labor cost per hour (undergraduate students including
$18 per hour Cost to human score all responses
= 0.81, p<0.01, n=49
written versus interview item
jay.thomas@act.org
kdraney@berkeley.edu
holste@msu.edu
shruti_bathia@berkeley.edu