THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools - - PowerPoint PPT Presentation

the nys appr system brenda myers
SMART_READER_LITE
LIVE PREVIEW

THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools - - PowerPoint PPT Presentation

EMPIRICAL ANALYSIS OF THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools Valhalla UFSD Education Analytics Work with the Lower Hudson Council Analyzed state growth model methods and policy Acquired data from many districts in


slide-1
SLIDE 1

EMPIRICAL ANALYSIS OF THE NYS APPR SYSTEM

slide-2
SLIDE 2

Superintendent of Schools Valhalla UFSD

Brenda Myers

slide-3
SLIDE 3

Education Analytics’ Work with the Lower Hudson Council

 Analyzed state growth model methods and policy  Acquired data from many districts in the council  Analyzed results for these members  Unique cross district data collaboration

 Allows for a better understanding of how state policy

is affecting local decisions

 Individual district data not enough for a broad picture

slide-4
SLIDE 4

Goals for Today

 Present a high level discussion of data findings  Examine how APPR rating policy may affect

measurement

 Look at where this all fits in with other states

slide-5
SLIDE 5

Vice President of Research and Operations Education Analytics

Andrew Rice

slide-6
SLIDE 6

EA Mission

 Founded in 2012 by Dr. Robert Meyer, director of

the Value-Added Research Center (VARC) at the University of Wisconsin-Madison

 “Conducting research and developing policy and

management analytics to support reform and continuous improvement in American education”

 Developing and implementing analytic tools for

systems of education based on research innovations developed in academia

slide-7
SLIDE 7

What Are Our Biases?

 Support research and data based policy  Scientific perspective on decision making  Respect (not expertise) for political process  If the data say:

 the emperor has no hat  the emperor has no shoes  the emperor has no robe

 We would conclude:

 it may be the case that the emperor has no clothes

slide-8
SLIDE 8

Who We Work With

 Districts  States  Foundations (Walton, Gates, Dell)  Unions (NEA, AFT)  Understanding the data is useful to everyone

slide-9
SLIDE 9

Measures vs. Ratings

 A Measure

 Has technical validity  Can be evaluated by scientific inquiry  SGP, Charlotte Danielson Rubric, Survey result etc.

 A Rating

 Is a policy judgment  Cannot be evaluated without policy judgment  APPR Categories, “Effective”, “Developing”, etc.

slide-10
SLIDE 10

Measure to Rating Conversion

slide-11
SLIDE 11

HEDI Scales State Growth, Comparable, Locally Selected

 State Growth Model  Comparable Growth & Locally Selected Measures

Ineffective Developing Effective Highly Effective

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Ineffective Developing Effective Highly Effective

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

slide-12
SLIDE 12

If HEDI Scales were Consistent

 State Growth Model  Comparable Growth & Locally Selected Measures  Observation Rubrics and Practice Measures  Hypothetically Aligned Composite Rating

Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective

slide-13
SLIDE 13

The Actual Composite Rating

 Hypothetically Aligned Composite Rating  Actual Composite Rating

Ineffective Developing Effective Highly Effective

to 4 5 to 9 10 to 14 15 to 19 20 to 24 25 to 29 30 to 34 35 to 39 40 to 44 45 to 49 50 to 54 55 to 59 60 to 64 65 to 69 70 to 74 75 to 79 80 to 84 85 to 89 90 to 94 95 to 99 1

Ineffective Devel-

  • ping

Effective Highly Effective

to 4 5 to 9 10 to 14 15 to 19 20 to 24 25 to 29 30 to 34 35 to 39 40 to 44 45 to 49 50 to 54 55 to 59 60 to 64 65 to 69 70 to 74 75 to 79 80 to 84 85 to 89 90 to 94 95 to 99 1

slide-14
SLIDE 14

Impact on Observation and Practice Measures Rating Scale

 State Growth Model  Comparable Growth & Locally Selected Measures  Observation Rubrics and Practice Measures  Actual Composite Rating

Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective

Devel-

  • ping

Effective

Highly Effective

slide-15
SLIDE 15

Alignment of Actual Lower Hudson Scores to Compressed Scale

slide-16
SLIDE 16

Observation Rubrics and Practice Measures Scores

 Districts are responding to a particular set of rules

that requires them to abandon almost all of the rating scale

 it would be optimal if they did not have to

 Does your district retain the “measures” for

decision making

 Report “ratings” as compressed  Effort not wasted as long as information retains value

slide-17
SLIDE 17

State Growth Model Study Findings

slide-18
SLIDE 18

SGP Model

 NY SGP model is rigorous and attempts to deal

with many growth modeling issues

 In phase 1 (2011/2012) we were concerned with

strong relationships between incoming test performance and SGP

 In phase 2 (2012/2013) we note that this has been

changed through the addition of classroom average characteristics or “peer effect” variables

slide-19
SLIDE 19

Distribution of State Growth Scores

slide-20
SLIDE 20

State Growth Model

 Distributions are largely spread out over the scale  High-growth teachers and low-growth teachers in

almost every district

 Peer effects have evened out scores between high

and low proficiency regions

 Class size has some impact but is mitigated with

translation from measurement to rating

slide-21
SLIDE 21

Comparable Measures Study Findings

slide-22
SLIDE 22

Distribution of Comparable Measures Scores

slide-23
SLIDE 23

Comparable Measures

 Substantial differences between districts in

 the way their policies measure effectiveness with

Comparable Measures ratings OR

 their teacher’s ability to score highly on these metrics

 NYSED policy allows variance in implementation of

comparable measures

 rating comparability between districts is suspect.

 Seems not possible for a teacher to attain any of

the scores from 0-20 as required by regulation

slide-24
SLIDE 24

Local Measures Study Findings

slide-25
SLIDE 25

Distribution of Local Measures Scores

slide-26
SLIDE 26

Local Measures

 Substantial differences between districts in

 the way their policies measure effectiveness with

Local Measures ratings OR

 their teacher’s ability to score highly on these metrics

 NYSED policy allows variance in implementation of

Local measures

 rating comparability between districts is suspect.

 Seems not possible for a teacher to attain any of

the scores from 0-20 as required by regulation

slide-27
SLIDE 27

Flexibility at the local level seems to have produced ratings that are not comparable across districts

Student Outcome Measures Comparability Across Districts

slide-28
SLIDE 28

Observation / Practice Local Measure State Growth or Comparable Growth

slide-29
SLIDE 29

Overall System

slide-30
SLIDE 30

What’s Driving Differentiation of Scores?

10 20 30 40 50 60 70 80 90 100 75 83 92 100

Each Measure Distributed

Observation and Practice Local Measure State Growth or Comparable Growth 10 20 30 40 50 60 70 80 90 100 75 83 92 100

State Growth Drives Differences

Observation and Practice Local Measure State Growth or Comparable Growth

slide-31
SLIDE 31

In Lower Hudson

 Only 10% of

variance driven by Observations

slide-32
SLIDE 32

Summary of Findings

 Strong variation between district implementations  SLOs 3 points higher than MGP  Local measures 4 points higher than MGP  Observation ratings almost no differentiation  Likely driven by the rules set forth in the composite

score HEDI bands

slide-33
SLIDE 33

Nationwide Context

slide-34
SLIDE 34

Total system issues

 Two big rating systems

 Index: weighted points  Matrix: categories based on multiple positions of

measures

 A visual: 3 E H H H 2 I D E H 1 I I D H 1 2 3 4 Measure 1 Measure 2

slide-35
SLIDE 35

Pro/Con of Rating Systems

 Index

 Pro: easy to calculate  Pro: easy to communicate  Con: Compensatory model may incent cross

component influence

 Matrix

 Pro: more flexible  Con: more difficult to explain  Pro: may allow for disagreeing measures to be dealt

with in a different way than index

slide-36
SLIDE 36

What About the Other 49 States?

 Much experimentation

 High weights on growth  High weights on observations  Student surveys  Assessment system redesigns  Index and Matrix approaches

 Who gets it right?

slide-37
SLIDE 37

Exemplars

 Developing field  No state has it right  Some components work

 Growth on assessment measurement in NY is good  No state has gotten SLO’s right (RI is getting there)  Observations are coming under fire for poor

implementation and possible bias (some great work

  • n the measures – not yet on ratings)

 Total system scoring and policy is all over the map

slide-38
SLIDE 38

Superintendent of Schools Harrison Central School District

Louis Wool