 
              Combing Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions Laine P. Bradshaw James Madison University Jonathan Templin University of Georgia Author Note Correspondence concerning this article should be addressed to Laine Bradshaw, Department of Graduate Psychology, James Madison University, MSC 6806, 821 S. Main St., Harrisonburg, VA 22807. Email: laineb@uga.edu. This research was funded by National Science Foundation grants DRL-0822064; SES-0750859; and SES-1030337.
Abstract The Scaling Individuals and Classifying Misconceptions (SICM) model is presented as a combination of an item response theory (IRT) model and a diagnostic classification model (DCM). Common modeling and testing procedures utilize unidimensional IRT to provide an estimate of a student’s overall ability. Recent advances in psychometric s have focused on measuring multiple dimensions to provide more detailed feedback for students, teachers, and other stakeholders. DCMs provide multidimensional feedback by using multiple categorical variables that represent skills underlying a test that students may or may not have mastered. The SICM model combines an IRT model with a DCM that uses categorical variables that represent misconceptions instead of skills. In addition to the type of information common testing procedures provide about an examinee — an overall continuous ability, the SICM model also is able to provide multidimensional, diagnostic feedback in the form of statistical estimates of misconceptions. This additional feedback can be used by stakeholders to tailor instruction for students’ needs. Results of a simulation study demonstrate that the SICM MCMC estimation algorithm yields reasonably accurate estimates under large-scale testing conditions. Results of an empirical data analysis highlight the need to address statistical considerations of the model from the onset of the assessment development process.
Running head: Scaling Ability and Diagnosing Misconceptions 3 Combining Item Response Theory and Diagnostic Classification Models: A Psychometric Model for Scaling Ability and Diagnosing Misconceptions The need for more fine-grained feedback from assessments that can be used to understand students’ strengths and weakness has been emphasized at all levels of education. Educational policy (No Child Left Behind, 2001), modern curriculum standards (i.e., Common Core Standards; National Research Council, 2010), and classroom teachers (Huff & Goodman, 2007) have described multidimensional, diagnostic feedback as essential for tailoring instruction to students’ specific needs and making educational progress. In spite of this need, most state- level educational tests have been, and continue to be, designed from a unidimensional Item Response Theory (IRT; e.g., Hambleton, Swaminathan, & Rogers, 1991) perspective. This perspective optimizes the statistical estimation of a single continuous variable representing an overall ability and provides a single, composite score to describe a students’ performance with respect to an entire academic course. A common solution to providing ―diagnostic‖ feedback from IRT-designed state-level tests has been to report summed scores on sub-sections of a test. These subscores, because they are based on a small number of items, often lack of reliability. Decisions based on unreliable subscores counterproductively may misguide instructional strategies and resources (Wainer, Vevea, Camacho, Reeve, Rosa, Nelson, Swygert, & Thissen, 2001). These types of subscores are computed from items selected to be on the test due to their correlation with the other items on the test, so, as expected, they are highly related to the total score and thus do not provide distinct information from or additional information beyond the total score (Haberman, 2005; Harris & Hanson, 1991; Haberman, et al., 2009; Sinharay & Haberman, 2007).
Scaling Ability and Diagnosing Misconceptions 4 The new psychometric model presented in this paper presents a means to providing reliable multidimensional feedback within the framework of prevailing unidimensional IRT methods. The model capitalizes on advances regarding multidimensional measurement models which recently have been at the forefront of psychometric research because they promise detailed feedback for students, teachers, and other stake holders. Diagnostic classification models (DCMs; e.g., Rupp, Templin, & Henson, 2010) provide one approach to the measurement of multiple dimensions. DCMs use categorical latent attributes to represent skills or content components underlying a test that students may or may not have mastered. Using DCMs, the focus of the results of the assessment is shifted to identify which components each student has mastered. Attributes students have mastered can be viewed as areas in which students do not need further instruction. Similarly, attributes that students have not mastered indicate areas in which instruction or remediation should be focused. Thus, the attribute pattern can provide feedback to students and teachers with respect to more fine-grain components of a content area, which can be used to tailor instruction to students’ speci fic needs. DCMs sacrifice fine-grained measurement to provide multidimensional measurements. Instead of finely locating each examinee along a set of continuous traits as a multidimensional IRT model does, DCMs coarsely classify each examinee with respect to each trait (i.e., as masters or non-masters of the trait). This trade-off enables DCMs to provide diagnostic, multidimensional feedback with reasonable data demands (i.e., few number of items and examinee, Bradshaw & Cohen, 2010). DCMs’ low cost in terms of data demands and high benefits in terms of diagnostic information make them very attractive models in an educational setting where time for testing is limited but multidimensional feedback is needed to reflect the realities of the multifaceted nature of the objectives of educational courses. However, given the
Scaling Ability and Diagnosing Misconceptions 5 current reliance of testing on measuring an overall ability, DCMs, while efficient for providing detailed feedback, may not fulfill all the needs of policy-driven assessment systems centered on scaling examinee ability. In this paper, we propose a new nominal response psychometric model, the Scaling Individuals and Classifying Misconceptions (SICM) model, that blends the IRT and DCM frameworks. The SICM model alters traditional DCM practices by defining the attributes for a nominal response DCM as misconceptions that students have instead of as abilities (skills) that students have. The SICM model alters traditional nominal response IRT (NR IRT; Bock 1970) practices by having these categorical misconceptions predict the incorrect item response while a continuous ability predicts the correct response. When coupled together through the SICM model, the IRT and DCM components provide a more thorough description of the traits possessed by students. The SICM model both describes a measure of composite ability measured by the correctness of the responses and identifies distinct errors in understanding manifested through specific incorrect responses. The model therefore serves dual purposes of scaling examinee ability for comparative and accountability purposes and also diagnosing misconceptions for remediation purposes. The following section overviews the measurement of misconceptions through previous assessment development projects. The next section provides the statistical specifications of the SICM model and is followed by an illustration of the model through a contrast with two more familiar models. Then results of a simulation study are provided to establish the efficacy of the model and a real data analysis is given to illustrate the use of the model in practice. Measuring Misconceptions A key feature needed by the SICM model is that the incorrect alternatives for multiple
Recommend
More recommend