Handbook For Professional Development In Assessment Literacy: A - - PowerPoint PPT Presentation
Handbook For Professional Development In Assessment Literacy: A - - PowerPoint PPT Presentation
Handbook For Professional Development In Assessment Literacy: A Resource For States and School Districts NCSA Conference June 20, 2013 REVISION OF THE HANDBOOK FOR PROFESSIONAL DEVELOPMENT IN ASSESSMENT LITERACY Project of the Title I
Project of the Title I Comprehensive Assessment Systems CCSSO SCASS
Wayne Neuburger (PhD) Advisor
REVISION OF THE HANDBOOK FOR PROFESSIONAL DEVELOPMENT IN ASSESSMENT LITERACY
T1-CAS Mission
- The Title I Comprehensive Assessment
Systems SCASS supports states in efforts to use assessment and accountability systems to support the improvement of education in schools using ESEA funds.
- As a national consortium of assessment and
Title I professionals, T1-CAS addresses issues in standards, assessment, and accountability systems, and the effects of these systems on the education of Title I students.
Presenters
- Moderator: Wayne Neuburger, Consultant
- Project Director: Jan Sheinker, Consultant
- Author: Doris Redfield, Consultant
- Reviewer: Beth Cipoletti, West Virginia
Dept of Ed
- Discussant: Elizabeth Davis, Alaska Dept of
Ed
Jan Sheinker, Ed.D. Doris Redfield, Ph.D. Developed Under the Direction of: Phoebe C. Winter Project Director
And
The Professional Development for Assessment Literacy Study Group Comprehensive Assessment Systems for IASA Title I State Collaborative on Assessment and Student Standards Council of Chief State School Officers
Purpose 2001 and 2013
- It is intended to provide a resource to states and
districts as they deploy state and district assessment systems aligned with standards for purposes of improving student learning through accountability and school improvement .
- We hope that this document also serves as a
resource for informing many constituents of education about the purposes of assessment and the importance of an aligned assessment system to the
- verall educational system .
SCASS GROUPS PROVIDING INPUT
- Formative Assessment for Students and
Teachers (FAST)
- Assessing Special Education Students
(ASES)
- English Language Learners (ELL)
- Accountability Systems and Reporting
(ASR)
- Technical Issues in Large-Scale Assessment
(TILSA)
Need for Revision
- Significant changes have occurred in
assessment systems since 2001.
- Accountability systems have changed since
2001.
- Assessment technical issues have
progressed to address new assessment and accountability systems.
- Distribution systems are more sophisticated
Organization and Content of the Handbook
Jan Sheinker jansheinker@gmail.com
Organization – Actually FOUR documents
Documents Users Uses
States, districts, schools, and
- ther interested audiences
To preserve the original document
Word And Ppt (Static & Animated)
State and district PD presenters For customization by individual states, districts, & schools as PD scripts and handouts for PD presenters Individual states, districts, & schools For customization by individual states, districts, & schools as district and school newsletter inserts to inform parents and community
PowerPoint for use in Trainings & Presentations
Table of Contents
Guide to Using the Handbook Chapter One: Why Build an Assessment System? Chapter Two: What Is Technical Quality? Chapter Three: How Are the Purposes of Assessment Related to Technical Quality? Chapter Four: How Are the Uses of Assessment Related to Technical Quality? Chapter Five: How Do Schools/States Report Results in Proper Context? Chapter Six: How Should Results Be Used to Make Decisions? Appendices
Glossary References
Guide to Using the Handbook
Using the Handbook Audiences for the Handbook State Department Personnel Schoolwide Planning Participants Local School Boards and Administrators Legislative Committees Regional Professional Development Participants Members of Professional Associations Pre-Service and Graduate Students in Education Community Members Customizing CONCLUSION
Chapter One: Why Build an Assessment System?
- Standards Aligned Assessment Systems
– How have aligned assessment systems evolved? – How have Common Core State Standards influenced the development of assessment systems?
- Relationships of Tests to Assessment Systems to Accountability
Systems
– What are comprehensive assessment systems? – What are the purposes and relationships among formative assessment strategies and classroom, school, district, and state tests? – How do assessment systems relate to accountability systems? – Why use an assessment system instead of individual tests to set goals and make decisions?
- Developing a Comprehensive System
– What is the role of formative assessment strategies in a comprehensive system? – What is the role of classroom tests in a comprehensive system? – What is the role of interim and benchmark tests in a comprehensive system? – What is the role of district tests in a comprehensive system? – What is the role of English language proficiency tests in a comprehensive system? – What is the role of state tests in a comprehensive system?
- Assessment Systems and the Classroom
– How do assessment and accountability systems relate to the way we do business in classrooms? – Who does what in standards-based schools?
Chapter Two: What is Technical Quality?
- Aspects of Technical Quality
- Validity
– Why is validity important? – How can validity be increased?
- Reliability
- Fairness, Bias and Accessibility
- Comparability
- Procedures for test administration, scoring, data
analysis, and reporting
- Evaluation of the technical quality of
accommodations
Chapter Three: How are the Purposes of Assessment Related to Technical Quality?
- Alignment of Assessments with Standards
– What is alignment with the purpose of the assessment? – Why are specific characteristics important to alignment? – Why are vertical and horizontal alignment important?
- Accountability Purposes
– Why is accountability needed? – Who is accountable for student learning? – For what should schools be held accountable? – What is accountability for growth?
- Assessments Purposes
– Why do assessment systems include different types or combinations of tests? – What are norm-referenced tests? – What are standards-based tests? – What are augmented assessment systems? – What are computer adaptive tests? – What are APIP enabled technology enhanced assessment systems?
Chapter Four: How are the Uses of Assessment Results Related to Technical Quality?
- Using Results Appropriately to Avoid Misuses
– What are appropriate uses and potential misuses of standards-based tests? – What are appropriate uses and potential misuses of norm-referenced tests? – What are appropriate uses and potential misuses of interim/benchmark tests? – What are appropriate uses and potential misuses of classroom tests? – What are appropriate uses and potential misuses of formative assessment strategies?
- Usability of Results
– What should be considered in determining the usability of results? – What factors affect the credibility of results? – What factors affect the accuracy of score interpretation? – What results are needed to adjust instruction? – What factors affect the usefulness of results for teacher and leader evaluation?
Chapter Five: How Do Schools/States Report Results in Proper Context?
Reporting Results
- How are indicators selected for reporting?
- What indicators provide direct measures of student achievement of
standards?
- What other indicators provide direct measures of student knowledge?
- What student learning indicators provide indirect measures of student
performance?
Reporting Related Indicators
- What indicators provide measures of opportunity to learn?
- What context variables affect student learning?
Cautions for Reporting Results Sampling and Sample Size
- How does sampling affect the technical quality of a tests?
- How does population and sample size affect the technical quality of a test?
Chapter Six: How Should Results Be Used to Make Decisions?
- Using Results to Make Decisions
- Using Results to Make Decisions About School Improvement
– How can the results be used to profile student performance? – How is the profile used to set school improvement goals? – What is considered in developing a plan to achieve the goals? – How are school improvement results monitored and documented?
- Using Results to Make Decisions about Policy Changes and
Evaluating program effectiveness
– How can the system be monitored and evaluated? – How can results be used to make policy decisions concerning resources? – How can the delivery system be altered to improve results?
- Using Results to Make Decisions About Rewards and Sanctions
Technical Quality (Chapter 2: Handbook for Professional Development in Assessment Literacy)
Doris Redfield, Ph.D. dredfield@suddenlink.net 304-344-3083
Contents
- Aspects of Technical Quality
- Validity
– Importance – Ways to increase
- Reliability
- Fairness, Bias, & Accessibility
- Comparability
- Procedures: Test Administration, Scoring, Data
Analysis, Reporting
- Evaluation of Accommodations Use
What is Technical Quality (TQ)
- The integrity of each assessment,
instrument, process, & procedure for contributing to fair and defensible decisions.
Aspects of TQ
- Validity (accuracy – test measures what it purports
to measure)
- Reliability (consistency – whatever is measured is
consistently measured across circumstances)
- Other
– Fairness & accessibility – Comparability – Procedures for test administration, scoring, data analysis & reporting – Interpretation & use of results
TECHNICAL QUALITY IN
ASSESSMENT SYSTEMS
Rigorous standards and standards-aligned tests Valid, reliable, comparable, fair, and accessible tests that include ALL students Technically sound administration and scoring Accurate reporting and interpretation of results
Additions and/or Increased Emphasis
- Methodologies
– Comparability – Generalizability – Standard Setting
- Attention to
– Peer Review Guidance – Joint Standards for Educational & Psychological Testing – Evaluation of the TQ of Accommodations
VALIDITY
- Are we measuring
what we say we are measuring?
- Can we make valid
interpretations/ inferences?
Validity
- Four broad categories (Standards for
Educational & Psychological Testing)
- 1. Test content (content validity)
- 2. Relationship to other variables or criteria
- 3. Student response processes
- 4. Internal structure of the assessment
Validity: Peer Review Guidance Questions (Section 4.2)
Has the State
- Specified the purposes of the assessments . . . ?
- Ascertained that the assessments measure the knowledge and skills described
in its academic content standards . . . ?
- Ascertained that its assessment items are tapping the intended cognitive
processes & are at the appropriate grade level?
- Ascertained that scoring and reporting structures are consistent with the sub-
domain structures of its academic content standards?
- Ascertained that test & item scores are related to outside variables as
intended?
- Ascertained that the decisions based on assessment results are consistent with
the purposes for which the assessments were designed?
- Ascertained whether the assessment produces intended and unintended
consequences? KEY: Provide clear & explicit documentation of evidence
INCREASING VALIDITY
- Multiple measures
- Face validity
- Content validity: Breadth; depth; cognitive complexity; range
- f knowledge & skills, including thinking skills, represented by
the content standards
- Construct validity
– Evidence of convergent criterion validity or the relationship of particular items to the other items on the same test/assessment (internal consistency reliability – Evidence of divergent criterion validity – Expert review
Reliability
- Test reliability: reliability coefficients;
standard errors of measurement (SEMs); confidence bands
- Rater reliability: inter- & intra-rater
- Standard setting and reliability of cut scores
– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work
STANDARD SETTING METHODS
METHOD METHODOLOGY
Angoff
Panel of content experts Sum of item averages Raw cut score
Bookmark
Statistical item difficulty Ordered item booklets reviewed by Panel of raters Panel “Bookmarks” cut points IRT based cut scores
Body of Work
Student response booklets reviewed by Panel of content experts Identify knowledge and skills Assign achievement level Final cut score recommendations
Reliability
- Test reliability: reliability coefficients;
standard errors of measurement (SEMs); confidence bands
- Rater reliability: inter- & intra-rater
- Standard setting and reliability of cut scores
– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work
Reliability Cont’d . . .
- Generalizability: extent to which reliability findings can
be replicated across situations
- Purpose specific
- Peer Review Guidance Questions (Section 4.2) – Has the
State
‾ Determined the reliability of the scores it reports . . . ? ‾ Quantified & documented the conditional SEM and student classification consistent at each cut score in the State’s academic achievement standards? ‾ Reported generalizability evidence for all relevant sources, . . . ?
KEY: Provide clear & explicit documentation of evidence
Fairness, Bias, & Accessibility
Peer Review Guidance Question (Section 4.3) – Has the State
- Ensured that the assessments provide an appropriate
variety of accommodations . . . ?
- Ensured that the assessments provide an appropriate
variety of linguistic accommodations . . . ?
- Taken steps to ensure fairness in the development of the
assessments?
- Used accommodations &/or alternate assessments to yield
meaningful scores?
Most Likely Sources of Unfairness
(Standards for Educational & Psychological Tests)
- Items or tasks do not provide an equal
- pportunity for all students to demonstrate their
knowledge & skills fully.
- The assessments are not administered in ways
that ensure fairness.
- The results are not reported in ways that ensure
fairness.
- The results are not interpreted or used in ways
that lead to equitable treatment of those affected by the results.
Key Sources of Challenge to Fairness, Bias, & Accessibility
- Language loading
- Background knowledge requirements
- Cultural bias
- Other factors specific to certain students
FAIRNESS AND BIAS
FAIRNESS
- f
assessment
BIAS
- f
items/results
for purpose for group for use
for male/female for English language learners for racial or ethnic minorities for students with disabilities
Fairness and Access
- Access through
accommodations
– For students with disabilities – For English language learners – That yield meaningful scores
- Fairness in
– Item/task design – Test administration – Reporting results – Interpretation of results
Fairness, Access & Bias in Items & Tasks
language loading: use of vocabulary not required by the content that impedes access to assessment items and tasks for students who lack proficiency in language, either due to their having another primary language, a learning disability, or delayed language development. background knowledge requirements: design or content of assessment items or tasks that impedes access based on assumptions about what knowledge students bring to the assessment situation apart from instruction.
other: incorporation of other factors in assessment item and task design that increase the challenge of performing correctly in ways unrelated to the content being assessed.
cultural bias: inclusion of situations and contexts in assessment items and tasks that may be misinterpreted by students from different cultural backgrounds in ways that interfere with performance.
Comparability
- An aspect of reliability that allows for comparisons
from year to year, student to student, school to school, form to form, etc.
- Methodologies
– Linking: The relationship between test scores on two different tests that are not necessarily built to have the same content or level of difficulty – Equating: A type of linking that provides the strongest possible linking relationship, rendering test scores across different forms of the same test interchangeable. – Scaling: A process for transforming raw scores to scores on another scale (e.g., SAT scores). Item Response Theory (IRT) is one example of a type of scaling.
Redfield
Comparability: Peer Review Guidance Questions (Section 4.4)
- Has the State taken steps to ensure
consistence of test forms over time?
- If the state administers both an online and
paper-and-pencil test, has the State documented the comparability of the electronic and paper forms?
Procedures: Test Administration, Scoring, Data Analysis, & Reporting
- Peer Review Guidance Questions (Section 4.5)
– Has the State established clear criteria for the administration, scoring, analysis, and reporting components of the assessment system, including all alternate assessments? – Does the State have a system for monitoring and improving the on-going quality of its assessment system?
CONSISTENT TEST PROCEDURES
administer score analyze
Uniform presentation Structured accommodations Consistently applied scoring rules Uniform data analysis Consistently applied reporting rules and templates
report
Procedures specified in policies and guidelines Implementation monitored and findings documented Security prescribed and enforced Improvements planned and implemented
Administration, Scoring, Analysis & Reporting
Evaluating the TQ of Accommodations
- States must provide for the use of
appropriate accommodations AND must have conducted studies confirming that scores from accommodated assessments can be meaningfully combines with the scores from non-accommodated administrations of the assessment.
TEST ACCOMMODATIONS Are
Changes in setting, schedule, timing, presentation, response, etc. Intended to equalize opportunity to perform Consistent with instructional experiences Consistently and securely administered Valid and reliable
Peer Review Guidance Questions (Section 4.6)
How has the State
- Ensured that appropriate accommodations are available to students
with disabilities & students covered by Section 504 AND that these accommodations are used in a manner consistent with the student’s IEP or 504 plan?
- Determined that scores for students with disabilities based on
accommodated test administrations will allow for valid inferences about the student’s knowledge & skills AND can be combined meaningfully with scores from non-accommodated administrations?
- Ensured that appropriate accommodations are available to limited
English proficient (LEP) students . . . ?
- Determined that scores for LEP students based on accommodated
administrations will allow for valid inferences . . . ?n
Using the Handbook to Guide your Assessment
Beth Cipoletti, Ed.D. Assistant Director Office of Assessment and Accountability
Directions
Anticipated
- Easy trip
– Have directions – Have GPS – Experience with route – Know best places to stop – Be in front of the storm
- Reading, PA by 6:30 p.m.
Anticipated
Anticipated
Reality
- Office meeting ran longer than expected
– Departure from Charleston later than planned
- Storm started earlier than forecast
– Snow in the mountains – Rain to the east
- Limited visibility
- Poor road conditions
Reality
Reality
Changing Landscape
- New world of assessment and
accountability systems
- Growth to standards adds a new dimension
- Valid systems are more complex
- Different tools to measure school and
teacher effectiveness
Anticipated Work
- Stakeholder information
- Systems monitoring
- Validation of technical adequacy of
assessments
- Peer Review
- Changes in federal requirements
Reality
- Work does not start on time
- Changes happen faster than anticipated
- Inaccurate vision of future
- More time is needed to finish than expected
Handbook (GPS)
- Resource on assessment and accountability
- Straightforward and plainspoken language
- Overview of relevant topics
- References and links to documents for more
in-depth study
Using the Handbook
- Determine course of action
- Respond to specific questions
- Include sections and parts in responses to
inquiries or newsletters
– May qualify, modify or make state specific
- Incorporate PowerPoint slides into
presentations
Elizabeth Davis Discussant Alaska Department of Education elizabeth.davis@alaska.gov
Next Steps
- Finalize Revisions with T1-CAS
- CCSSO Review and Publication
- Handbook made available to T1-CAS
members
- CCSSO makes Handbook an Official
Publication for distribution
- Possible Webinar to acquaint states with
Handbook
For information on joining T1-CAS, please contact
- Joe Crawford
Program Associate, CCSSO joec@ccsso.org 202-312-6436 (Office) 330-687-1185 (Mobile)
65
For information on activities of T1-CAS, please contact
- Wayne Neuburger
T1 – CAS Program Director wayne-maryn@comcast.net 503-390-8045 503-580-5779 (Mobile)
66