[PPT] - Handbook For Professional Development In Assessment Literacy: A PowerPoint Presentation

SLIDE 1

Handbook For Professional Development In Assessment Literacy: A Resource For States and School Districts

NCSA Conference June 20, 2013

SLIDE 2

Project of the Title I Comprehensive Assessment Systems CCSSO SCASS

Wayne Neuburger (PhD) Advisor

REVISION OF THE HANDBOOK FOR PROFESSIONAL DEVELOPMENT IN ASSESSMENT LITERACY

SLIDE 3

T1-CAS Mission

The Title I Comprehensive Assessment

Systems SCASS supports states in efforts to use assessment and accountability systems to support the improvement of education in schools using ESEA funds.

As a national consortium of assessment and

Title I professionals, T1-CAS addresses issues in standards, assessment, and accountability systems, and the effects of these systems on the education of Title I students.

SLIDE 4

Presenters

Moderator: Wayne Neuburger, Consultant
Project Director: Jan Sheinker, Consultant
Author: Doris Redfield, Consultant
Reviewer: Beth Cipoletti, West Virginia

Dept of Ed

Discussant: Elizabeth Davis, Alaska Dept of

Ed

SLIDE 5

Jan Sheinker, Ed.D. Doris Redfield, Ph.D. Developed Under the Direction of: Phoebe C. Winter Project Director

And

The Professional Development for Assessment Literacy Study Group Comprehensive Assessment Systems for IASA Title I State Collaborative on Assessment and Student Standards Council of Chief State School Officers

SLIDE 6

Purpose 2001 and 2013

It is intended to provide a resource to states and

districts as they deploy state and district assessment systems aligned with standards for purposes of improving student learning through accountability and school improvement .

We hope that this document also serves as a

resource for informing many constituents of education about the purposes of assessment and the importance of an aligned assessment system to the

verall educational system .

SLIDE 7

SCASS GROUPS PROVIDING INPUT

Formative Assessment for Students and

Teachers (FAST)

Assessing Special Education Students

(ASES)

English Language Learners (ELL)
Accountability Systems and Reporting

(ASR)

Technical Issues in Large-Scale Assessment

(TILSA)

SLIDE 8

Need for Revision

Significant changes have occurred in

assessment systems since 2001.

Accountability systems have changed since

2001.

Assessment technical issues have

progressed to address new assessment and accountability systems.

Distribution systems are more sophisticated

SLIDE 9

Organization and Content of the Handbook

Jan Sheinker jansheinker@gmail.com

SLIDE 10

Organization – Actually FOUR documents

Documents Users Uses

PDF

States, districts, schools, and

ther interested audiences

To preserve the original document

Word And Ppt (Static & Animated)

State and district PD presenters For customization by individual states, districts, & schools as PD scripts and handouts for PD presenters Individual states, districts, & schools For customization by individual states, districts, & schools as district and school newsletter inserts to inform parents and community

SLIDE 11

SLIDE 12

PowerPoint for use in Trainings & Presentations

SLIDE 13

Guide to Using the Handbook Chapter One: Why Build an Assessment System? Chapter Two: What Is Technical Quality? Chapter Three: How Are the Purposes of Assessment Related to Technical Quality? Chapter Four: How Are the Uses of Assessment Related to Technical Quality? Chapter Five: How Do Schools/States Report Results in Proper Context? Chapter Six: How Should Results Be Used to Make Decisions? Appendices

Glossary References

SLIDE 14

Guide to Using the Handbook

Using the Handbook Audiences for the Handbook State Department Personnel Schoolwide Planning Participants Local School Boards and Administrators Legislative Committees Regional Professional Development Participants Members of Professional Associations Pre-Service and Graduate Students in Education Community Members Customizing CONCLUSION

SLIDE 15

Chapter One: Why Build an Assessment System?

Standards Aligned Assessment Systems

– How have aligned assessment systems evolved? – How have Common Core State Standards influenced the development of assessment systems?

Relationships of Tests to Assessment Systems to Accountability

Systems

– What are comprehensive assessment systems? – What are the purposes and relationships among formative assessment strategies and classroom, school, district, and state tests? – How do assessment systems relate to accountability systems? – Why use an assessment system instead of individual tests to set goals and make decisions?

Developing a Comprehensive System

– What is the role of formative assessment strategies in a comprehensive system? – What is the role of classroom tests in a comprehensive system? – What is the role of interim and benchmark tests in a comprehensive system? – What is the role of district tests in a comprehensive system? – What is the role of English language proficiency tests in a comprehensive system? – What is the role of state tests in a comprehensive system?

Assessment Systems and the Classroom

– How do assessment and accountability systems relate to the way we do business in classrooms? – Who does what in standards-based schools?

SLIDE 16

Chapter Two: What is Technical Quality?

Aspects of Technical Quality
Validity

– Why is validity important? – How can validity be increased?

Reliability
Fairness, Bias and Accessibility
Comparability
Procedures for test administration, scoring, data

analysis, and reporting

Evaluation of the technical quality of

accommodations

SLIDE 17

Chapter Three: How are the Purposes of Assessment Related to Technical Quality?

Alignment of Assessments with Standards

– What is alignment with the purpose of the assessment? – Why are specific characteristics important to alignment? – Why are vertical and horizontal alignment important?

Accountability Purposes

– Why is accountability needed? – Who is accountable for student learning? – For what should schools be held accountable? – What is accountability for growth?

Assessments Purposes

– Why do assessment systems include different types or combinations of tests? – What are norm-referenced tests? – What are standards-based tests? – What are augmented assessment systems? – What are computer adaptive tests? – What are APIP enabled technology enhanced assessment systems?

SLIDE 18

Chapter Four: How are the Uses of Assessment Results Related to Technical Quality?

Using Results Appropriately to Avoid Misuses

– What are appropriate uses and potential misuses of standards-based tests? – What are appropriate uses and potential misuses of norm-referenced tests? – What are appropriate uses and potential misuses of interim/benchmark tests? – What are appropriate uses and potential misuses of classroom tests? – What are appropriate uses and potential misuses of formative assessment strategies?

Usability of Results

– What should be considered in determining the usability of results? – What factors affect the credibility of results? – What factors affect the accuracy of score interpretation? – What results are needed to adjust instruction? – What factors affect the usefulness of results for teacher and leader evaluation?

SLIDE 19

Chapter Five: How Do Schools/States Report Results in Proper Context?

 Reporting Results

How are indicators selected for reporting?
What indicators provide direct measures of student achievement of

standards?

What other indicators provide direct measures of student knowledge?
What student learning indicators provide indirect measures of student

performance?

 Reporting Related Indicators

What indicators provide measures of opportunity to learn?
What context variables affect student learning?

 Cautions for Reporting Results  Sampling and Sample Size

How does sampling affect the technical quality of a tests?
How does population and sample size affect the technical quality of a test?

SLIDE 20

Chapter Six: How Should Results Be Used to Make Decisions?

Using Results to Make Decisions
Using Results to Make Decisions About School Improvement

– How can the results be used to profile student performance? – How is the profile used to set school improvement goals? – What is considered in developing a plan to achieve the goals? – How are school improvement results monitored and documented?

Using Results to Make Decisions about Policy Changes and

Evaluating program effectiveness

– How can the system be monitored and evaluated? – How can results be used to make policy decisions concerning resources? – How can the delivery system be altered to improve results?

Using Results to Make Decisions About Rewards and Sanctions

SLIDE 21

Technical Quality (Chapter 2: Handbook for Professional Development in Assessment Literacy)

Doris Redfield, Ph.D. dredfield@suddenlink.net 304-344-3083

SLIDE 22

What is Technical Quality (TQ)

The integrity of each assessment,

instrument, process, & procedure for contributing to fair and defensible decisions.

SLIDE 24

Aspects of TQ

Validity (accuracy – test measures what it purports

to measure)

Reliability (consistency – whatever is measured is

consistently measured across circumstances)

Other

– Fairness & accessibility – Comparability – Procedures for test administration, scoring, data analysis & reporting – Interpretation & use of results

SLIDE 25

TECHNICAL QUALITY IN

ASSESSMENT SYSTEMS

 Rigorous standards and standards-aligned tests  Valid, reliable, comparable, fair, and accessible tests that include ALL students  Technically sound administration and scoring  Accurate reporting and interpretation of results

SLIDE 26

Additions and/or Increased Emphasis

Methodologies

– Comparability – Generalizability – Standard Setting

Attention to

– Peer Review Guidance – Joint Standards for Educational & Psychological Testing – Evaluation of the TQ of Accommodations

SLIDE 27

VALIDITY

Are we measuring

what we say we are measuring?

Can we make valid

interpretations/ inferences?

SLIDE 28

Validity

Four broad categories (Standards for

Educational & Psychological Testing)

1. Test content (content validity)
2. Relationship to other variables or criteria
3. Student response processes
4. Internal structure of the assessment

SLIDE 29

Validity: Peer Review Guidance Questions (Section 4.2)

Has the State

Specified the purposes of the assessments . . . ?
Ascertained that the assessments measure the knowledge and skills described

in its academic content standards . . . ?

Ascertained that its assessment items are tapping the intended cognitive

processes & are at the appropriate grade level?

Ascertained that scoring and reporting structures are consistent with the sub-

domain structures of its academic content standards?

Ascertained that test & item scores are related to outside variables as

intended?

Ascertained that the decisions based on assessment results are consistent with

the purposes for which the assessments were designed?

Ascertained whether the assessment produces intended and unintended

consequences? KEY: Provide clear & explicit documentation of evidence

SLIDE 30

INCREASING VALIDITY

Multiple measures
Face validity
Content validity: Breadth; depth; cognitive complexity; range
f knowledge & skills, including thinking skills, represented by

the content standards

Construct validity

– Evidence of convergent criterion validity or the relationship of particular items to the other items on the same test/assessment (internal consistency reliability – Evidence of divergent criterion validity – Expert review

SLIDE 31

Reliability

Test reliability: reliability coefficients;

standard errors of measurement (SEMs); confidence bands

Rater reliability: inter- & intra-rater
Standard setting and reliability of cut scores

– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work

SLIDE 32

STANDARD SETTING METHODS

METHOD METHODOLOGY

Angoff

Panel of content experts Sum of item averages Raw cut score

Bookmark

Statistical item difficulty Ordered item booklets reviewed by Panel of raters Panel “Bookmarks” cut points IRT based cut scores

Body of Work

Student response booklets reviewed by Panel of content experts Identify knowledge and skills Assign achievement level Final cut score recommendations

SLIDE 33

Reliability

Test reliability: reliability coefficients;

standard errors of measurement (SEMs); confidence bands

Rater reliability: inter- & intra-rater
Standard setting and reliability of cut scores

– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work

SLIDE 34

Reliability Cont’d . . .

Generalizability: extent to which reliability findings can

be replicated across situations

Purpose specific
Peer Review Guidance Questions (Section 4.2) – Has the

State

‾ Determined the reliability of the scores it reports . . . ? ‾ Quantified & documented the conditional SEM and student classification consistent at each cut score in the State’s academic achievement standards? ‾ Reported generalizability evidence for all relevant sources, . . . ?

KEY: Provide clear & explicit documentation of evidence

SLIDE 35

Fairness, Bias, & Accessibility

Peer Review Guidance Question (Section 4.3) – Has the State

Ensured that the assessments provide an appropriate

variety of accommodations . . . ?

Ensured that the assessments provide an appropriate

variety of linguistic accommodations . . . ?

Taken steps to ensure fairness in the development of the

assessments?

Used accommodations &/or alternate assessments to yield

meaningful scores?

SLIDE 36

Most Likely Sources of Unfairness

(Standards for Educational & Psychological Tests)

Items or tasks do not provide an equal
pportunity for all students to demonstrate their

knowledge & skills fully.

The assessments are not administered in ways

that ensure fairness.

The results are not reported in ways that ensure

fairness.

The results are not interpreted or used in ways

that lead to equitable treatment of those affected by the results.

SLIDE 37

Key Sources of Challenge to Fairness, Bias, & Accessibility

Language loading
Background knowledge requirements
Cultural bias
Other factors specific to certain students

SLIDE 38

FAIRNESS AND BIAS

FAIRNESS

f

assessment

BIAS

f

items/results

for purpose for group for use

for male/female for English language learners for racial or ethnic minorities for students with disabilities

SLIDE 39

Fairness and Access

Access through

accommodations

– For students with disabilities – For English language learners – That yield meaningful scores

Fairness in

– Item/task design – Test administration – Reporting results – Interpretation of results

SLIDE 40

Fairness, Access & Bias in Items & Tasks

 language loading: use of vocabulary not required by the content that impedes access to assessment items and tasks for students who lack proficiency in language, either due to their having another primary language, a learning disability, or delayed language development.  background knowledge requirements: design or content of assessment items or tasks that impedes access based on assumptions about what knowledge students bring to the assessment situation apart from instruction.

 other: incorporation of other factors in assessment item and task design that increase the challenge of performing correctly in ways unrelated to the content being assessed.

 cultural bias: inclusion of situations and contexts in assessment items and tasks that may be misinterpreted by students from different cultural backgrounds in ways that interfere with performance.

SLIDE 41

Comparability

An aspect of reliability that allows for comparisons

from year to year, student to student, school to school, form to form, etc.

Methodologies

– Linking: The relationship between test scores on two different tests that are not necessarily built to have the same content or level of difficulty – Equating: A type of linking that provides the strongest possible linking relationship, rendering test scores across different forms of the same test interchangeable. – Scaling: A process for transforming raw scores to scores on another scale (e.g., SAT scores). Item Response Theory (IRT) is one example of a type of scaling.

Redfield

SLIDE 42

Comparability: Peer Review Guidance Questions (Section 4.4)

Has the State taken steps to ensure

consistence of test forms over time?

If the state administers both an online and

paper-and-pencil test, has the State documented the comparability of the electronic and paper forms?

SLIDE 43

Procedures: Test Administration, Scoring, Data Analysis, & Reporting

Peer Review Guidance Questions (Section 4.5)

– Has the State established clear criteria for the administration, scoring, analysis, and reporting components of the assessment system, including all alternate assessments? – Does the State have a system for monitoring and improving the on-going quality of its assessment system?

SLIDE 44

CONSISTENT TEST PROCEDURES

administer score analyze

Uniform presentation Structured accommodations Consistently applied scoring rules Uniform data analysis Consistently applied reporting rules and templates

report

SLIDE 45

Procedures specified in policies and guidelines Implementation monitored and findings documented Security prescribed and enforced Improvements planned and implemented

Administration, Scoring, Analysis & Reporting

SLIDE 46

Evaluating the TQ of Accommodations

States must provide for the use of

appropriate accommodations AND must have conducted studies confirming that scores from accommodated assessments can be meaningfully combines with the scores from non-accommodated administrations of the assessment.

SLIDE 47

TEST ACCOMMODATIONS Are

 Changes in setting, schedule, timing, presentation, response, etc.  Intended to equalize opportunity to perform  Consistent with instructional experiences  Consistently and securely administered  Valid and reliable

SLIDE 48

Peer Review Guidance Questions (Section 4.6)

How has the State

Ensured that appropriate accommodations are available to students

with disabilities & students covered by Section 504 AND that these accommodations are used in a manner consistent with the student’s IEP or 504 plan?

Determined that scores for students with disabilities based on

accommodated test administrations will allow for valid inferences about the student’s knowledge & skills AND can be combined meaningfully with scores from non-accommodated administrations?

Ensured that appropriate accommodations are available to limited

English proficient (LEP) students . . . ?

Determined that scores for LEP students based on accommodated

administrations will allow for valid inferences . . . ?n

SLIDE 49

Using the Handbook to Guide your Assessment

Beth Cipoletti, Ed.D. Assistant Director Office of Assessment and Accountability

SLIDE 50

SLIDE 51

Directions

SLIDE 52

Anticipated

Easy trip

– Have directions – Have GPS – Experience with route – Know best places to stop – Be in front of the storm

Reading, PA by 6:30 p.m.

SLIDE 53

Anticipated

SLIDE 54

Anticipated

SLIDE 55

Reality

Office meeting ran longer than expected

– Departure from Charleston later than planned

Storm started earlier than forecast

– Snow in the mountains – Rain to the east

Limited visibility
Poor road conditions

SLIDE 56

Reality

SLIDE 57

Reality

SLIDE 58

Changing Landscape

New world of assessment and

accountability systems

Growth to standards adds a new dimension
Valid systems are more complex
Different tools to measure school and

teacher effectiveness

SLIDE 59

Anticipated Work

Stakeholder information
Systems monitoring
Validation of technical adequacy of

assessments

Peer Review
Changes in federal requirements

SLIDE 60

Reality

Work does not start on time
Changes happen faster than anticipated
Inaccurate vision of future
More time is needed to finish than expected

SLIDE 61

Handbook (GPS)

Resource on assessment and accountability
Straightforward and plainspoken language
Overview of relevant topics
References and links to documents for more

in-depth study

SLIDE 62

Using the Handbook

Determine course of action
Respond to specific questions
Include sections and parts in responses to

inquiries or newsletters

– May qualify, modify or make state specific

Incorporate PowerPoint slides into

presentations

SLIDE 63

Elizabeth Davis Discussant Alaska Department of Education elizabeth.davis@alaska.gov

SLIDE 64

Next Steps

Finalize Revisions with T1-CAS
CCSSO Review and Publication
Handbook made available to T1-CAS

members

CCSSO makes Handbook an Official

Publication for distribution

Possible Webinar to acquaint states with

Handbook

SLIDE 65

For information on joining T1-CAS, please contact

Joe Crawford

Program Associate, CCSSO joec@ccsso.org 202-312-6436 (Office) 330-687-1185 (Mobile)

65

SLIDE 66

For information on activities of T1-CAS, please contact

Wayne Neuburger

T1 – CAS Program Director wayne-maryn@comcast.net 503-390-8045 503-580-5779 (Mobile)

66

SLIDE 67

Handbook For Professional Development In Assessment Literacy: A Resource For States and School Districts

NCSA Conference June 20, 2013

Project of the Title I Comprehensive Assessment Systems CCSSO SCASS

Wayne Neuburger (PhD) Advisor

REVISION OF THE HANDBOOK FOR PROFESSIONAL DEVELOPMENT IN ASSESSMENT LITERACY

T1-CAS Mission

Systems SCASS supports states in efforts to use assessment and accountability systems to support the improvement of education in schools using ESEA funds.

Title I professionals, T1-CAS addresses issues in standards, assessment, and accountability systems, and the effects of these systems on the education of Title I students.

Presenters

Dept of Ed

Ed

Purpose 2001 and 2013

districts as they deploy state and district assessment systems aligned with standards for purposes of improving student learning through accountability and school improvement .

resource for informing many constituents of education about the purposes of assessment and the importance of an aligned assessment system to the

SCASS GROUPS PROVIDING INPUT

Teachers (FAST)

(ASES)

(ASR)

(TILSA)

Need for Revision

assessment systems since 2001.

2001.

progressed to address new assessment and accountability systems.

Organization and Content of the Handbook

Jan Sheinker jansheinker@gmail.com

Organization – Actually FOUR documents

PDF

Word And Ppt (Static & Animated)

PowerPoint for use in Trainings & Presentations

Table of Contents

Guide to Using the Handbook

Chapter One: Why Build an Assessment System?

Chapter Two: What is Technical Quality?

analysis, and reporting

accommodations

Chapter Three: How are the Purposes of Assessment Related to Technical Quality?

Chapter Four: How are the Uses of Assessment Results Related to Technical Quality?

Chapter Five: How Do Schools/States Report Results in Proper Context?

Chapter Six: How Should Results Be Used to Make Decisions?

Technical Quality (Chapter 2: Handbook for Professional Development in Assessment Literacy)

Doris Redfield, Ph.D. dredfield@suddenlink.net 304-344-3083

Contents

– Importance – Ways to increase

Analysis, Reporting

What is Technical Quality (TQ)

instrument, process, & procedure for contributing to fair and defensible decisions.

Aspects of TQ

to measure)

consistently measured across circumstances)

– Fairness & accessibility – Comparability – Procedures for test administration, scoring, data analysis & reporting – Interpretation & use of results

TECHNICAL QUALITY IN

ASSESSMENT SYSTEMS

 Rigorous standards and standards-aligned tests  Valid, reliable, comparable, fair, and accessible tests that include ALL students  Technically sound administration and scoring  Accurate reporting and interpretation of results

Additions and/or Increased Emphasis

– Comparability – Generalizability – Standard Setting

– Peer Review Guidance – Joint Standards for Educational & Psychological Testing – Evaluation of the TQ of Accommodations

VALIDITY

what we say we are measuring?

interpretations/ inferences?

Validity

Educational & Psychological Testing)

Validity: Peer Review Guidance Questions (Section 4.2)

Has the State

INCREASING VALIDITY

Reliability

standard errors of measurement (SEMs); confidence bands

– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work

STANDARD SETTING METHODS

METHOD METHODOLOGY

Angoff

Bookmark

Body of Work

Reliability

standard errors of measurement (SEMs); confidence bands

– At least 3 levels required – Descriptors for each level required – Methods: Angoff (& modifications); Bookmark; Body of Work

Reliability Cont’d . . .

be replicated across situations

State

KEY: Provide clear & explicit documentation of evidence

Fairness, Bias, & Accessibility