can the assessment consortia meet the intended
play

Can the Assessment Consortia Meet the Intended Comparability Goals? - PowerPoint PPT Presentation

Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O S N C S A C O N F E R E N C E


  1. Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O ’ S N C S A C O N F E R E N C E J U N E , 2 0 1 1

  2. NCME Committee on Assessment Policy 2  Formed as an Ad-Hoc committee May, 2010  Became a standing committee in April 2011  Purposes:  To use the measurement expertise within NCME to influence and hopefully improve educational policies based on assessment results  To increase the visibility of NCME so that it might be seen as a “go to” organization for assessment -related policy issues  The Committee:  Co-Chairs: Kristen Huff and Scott Marion  Members: Judy Koenig, Joseph Martineau, Cornelia Orr, Christina Schneider, Zachary Warner NCME Assessment Policy Committee Comparability Symposium

  3. The Committee’s Initial Approach 3  Recognizing the committee is completely comprised of volunteers and that NCME doesn’t have a staff for this, the committee decided to start modestly.  Focus on a single issue, using the following approaches:  An invited symposium at the NCME annual meeting with leading researchers  Symposium at CCSSO-NCSA  Follow-up policy brief  Other? NCME Assessment Policy Committee Comparability Symposium

  4. This Year’s Focus 4  The challenges of designing for and producing comparable score inferences across states, consortia, and countries  NCME Symposium Participants:  Authors: Mike Kolen, Suzanne Lane, Joseph Martineau  Discussants: Bob Brennan and Deb Harris NCME Assessment Policy Committee Comparability Symposium

  5. A Simplified View of Comparability 5  The Problem:  State assessments produced results that were not comparable across states  Positive state results were clearly not trusted  The “Solution:”  Common content standards  Common large-scale assessments NCME Assessment Policy Committee Comparability Symposium

  6. A Simplified Theory of Action 6  Accurate cross-state comparisons  Can be used to benchmark within-state performance  To motivate and guide reforms (and policies)  And lead to improved student achievement  Trying to put a positive spin on the call for comparability NCME Assessment Policy Committee Comparability Symposium

  7. What the policy makers wanted…. 7  Governors and state chiefs (many of them) wanted a single assessment consortium  Not clear about all the reasons, but clearly some had to do with efficiency of effort, lack of competition, and to facilitate comparisons across states…  What they got…  Two consortia with some very different ideas about large-scale assessment NCME Assessment Policy Committee Comparability Symposium

  8. What is Meant by “Comparability” (Brennan)? 8  The Public and Policy-makers:  Doesn’t matter which form is used  Same score means the same thing for all students  Math is math is math …  Psychometricians:  All sorts of things (equating, vertical scaling, scaling to achieve comparability, projection, moderation, concordance, judgmental standard setting)  Some degree/type of comparability (i.e., linking) is attainable in practically any context  Comparability “scale”: very strong to very weak  Ideal: “matter of indifference” criterion NCME Assessment Policy Committee Comparability Symposium

  9. What comparisons? 9  The term comparability is getting thrown around a lot and many appear focused on across-consortium and international comparability  Brennan: What should be the comparability goals for the two consortia, and how should these goals be pursued? NCME Assessment Policy Committee Comparability Symposium

  10. What comparisons? 10  Many appear to believe that because of a “common assessment,” within-consortium comparability is a given  It’s not! Following Brennan’s reformulation, we argue that until within-consortium comparability can be assured, we should not distract ourselves with across-consortium comparability  Further, most accountability designs, especially those that incorporate measures of student growth must be based on strong within-consortium (actually, within-state) year-to-year comparability NCME Assessment Policy Committee Comparability Symposium

  11. Key interpretative Challenges 11  The central challenges that Mislevy (1992) outlined in Linking Educational Assessments are still with us as in the current Race to the Top environment:  discerning the relationships among the evidence the assessments provide about conjectures of interest, and  figuring out how to interpret this evidence correctly” (p. 21).  In other words, just because we can statistically link, doesn’t mean we will be able to interpret these links NCME Assessment Policy Committee Comparability Symposium

  12. Conditions for comparability 12  Mislevy and Holland focused considerable attention on the match between the two test blueprints and for good reason, because this is a crucial concern if we are to compare student-level scores from two different sets of tests  There is a good reason that most end-of-year state tests are referred to as “standardized tests” NCME Assessment Policy Committee Comparability Symposium

  13. What do we know about Comparability? 13  Score interchangeability is (approximately) achievable only under the strongest form of comparability ( equating in a technical sense)  Weaker types of comparability often/usually do not lead to group invariance  The degree of comparability desired/required should reflect the intended use of scores  Score interchangeability and group invariance:  Crucial sometimes  Often not achievable  Not so important sometimes NCME Assessment Policy Committee Comparability Symposium

  14. Minimum Set of Conditions 14  Common test blueprint  Specified test administration windows  Common administration rules/procedures  Standard accommodation policies  Same inclusion rules  Clearly specified and enforced security protocols  Specific computer hardware and software (for CBT and CAT), or at least a narrow range of specifications. NCME Assessment Policy Committee Comparability Symposium

  15. What’s working for within -consortium comparability? 15  Common, fairly well-articulated content standards that all consortium states are adopting  All states within-consortia have promised to implement a common test(s)  All states, within-consortium, have promised to adopt common performance-level descriptors and common achievement cutscores. NCME Assessment Policy Committee Comparability Symposium

  16. Challenges for within-year, within-consortium 16  High stakes uses of test scores  Significantly different fidelity of implementation of CCSS within state and across states  Time frame is very short  Demand for innovative item types is unrelenting  Huge testing windows  Large variability in school calendars  Assessments designed to measure a wide range of K & S  Differences in inclusion policy (potentially?)  Differences in accommodations policies and certainly practices  Differences in mode (computer-paper)  Differences in hardware platforms NCME Assessment Policy Committee Comparability Symposium

  17. Challenges across years (within consortium) 17  All of the previous, plus:  Mixed format assessments, especially through course assessments  For CR prompts/performance tasks, large p x t interaction is pervasive and small number of prompts is common. These two facts virtually guarantee that scores for CR forms can be only weakly comparable  Quality of field testing (high quality sampling) to produce stable item parameters for pre-equating designs (at least SBAC)  These are fairly significant challenges  We have seen serious problems with year-to-year equating within-states  Doing this in 20+ states is an not insignificant extension NCME Assessment Policy Committee Comparability Symposium

  18. What does all of this Suggest About how to Proceed? 18  Need to help policy maker prioritize comparability goals  Trying to assure comparability on the back-end will fail if we have not designed the assessments with the types of comparisons that we’d like to make in mind  In attempting to achieve comparability, no type/amount of statistical manipulation of data can make up for mediocre test development  Don’t promise more than can be delivered, but do deliver a high degree of comparability for scores that will be used for accountability (i.e., within- consortium, across years) NCME Assessment Policy Committee Comparability Symposium

  19. How to Proceed (continued) 19  For accountability scores, data must be collected that:  Permit obtaining score “equivalence” tables/relationships  Facilitate examining the degree to which comparability goals have been attained  Provide users with readily interpretable statements about comparability.  View the attainment of comparability as a goal to be pursued, more than an end product to be attained at one point in time. NCME Assessment Policy Committee Comparability Symposium

  20. Across Consortium Comparability 20  Tukey: “It is better to have an approximate answer to the right question than an exact answer to the wrong question.”  In other words, we need to be humble about the types of comparisons that we can and should make in this arena NCME Assessment Policy Committee Comparability Symposium

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend