THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools - - PowerPoint PPT Presentation
THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools - - PowerPoint PPT Presentation
EMPIRICAL ANALYSIS OF THE NYS APPR SYSTEM Brenda Myers Superintendent of Schools Valhalla UFSD Education Analytics Work with the Lower Hudson Council Analyzed state growth model methods and policy Acquired data from many districts in
Superintendent of Schools Valhalla UFSD
Brenda Myers
Education Analytics’ Work with the Lower Hudson Council
Analyzed state growth model methods and policy Acquired data from many districts in the council Analyzed results for these members Unique cross district data collaboration
Allows for a better understanding of how state policy
is affecting local decisions
Individual district data not enough for a broad picture
Goals for Today
Present a high level discussion of data findings Examine how APPR rating policy may affect
measurement
Look at where this all fits in with other states
Vice President of Research and Operations Education Analytics
Andrew Rice
EA Mission
Founded in 2012 by Dr. Robert Meyer, director of
the Value-Added Research Center (VARC) at the University of Wisconsin-Madison
“Conducting research and developing policy and
management analytics to support reform and continuous improvement in American education”
Developing and implementing analytic tools for
systems of education based on research innovations developed in academia
What Are Our Biases?
Support research and data based policy Scientific perspective on decision making Respect (not expertise) for political process If the data say:
the emperor has no hat the emperor has no shoes the emperor has no robe
We would conclude:
it may be the case that the emperor has no clothes
Who We Work With
Districts States Foundations (Walton, Gates, Dell) Unions (NEA, AFT) Understanding the data is useful to everyone
Measures vs. Ratings
A Measure
Has technical validity Can be evaluated by scientific inquiry SGP, Charlotte Danielson Rubric, Survey result etc.
A Rating
Is a policy judgment Cannot be evaluated without policy judgment APPR Categories, “Effective”, “Developing”, etc.
Measure to Rating Conversion
HEDI Scales State Growth, Comparable, Locally Selected
State Growth Model Comparable Growth & Locally Selected Measures
Ineffective Developing Effective Highly Effective
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ineffective Developing Effective Highly Effective
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
If HEDI Scales were Consistent
State Growth Model Comparable Growth & Locally Selected Measures Observation Rubrics and Practice Measures Hypothetically Aligned Composite Rating
Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective
The Actual Composite Rating
Hypothetically Aligned Composite Rating Actual Composite Rating
Ineffective Developing Effective Highly Effective
to 4 5 to 9 10 to 14 15 to 19 20 to 24 25 to 29 30 to 34 35 to 39 40 to 44 45 to 49 50 to 54 55 to 59 60 to 64 65 to 69 70 to 74 75 to 79 80 to 84 85 to 89 90 to 94 95 to 99 1
Ineffective Devel-
- ping
Effective Highly Effective
to 4 5 to 9 10 to 14 15 to 19 20 to 24 25 to 29 30 to 34 35 to 39 40 to 44 45 to 49 50 to 54 55 to 59 60 to 64 65 to 69 70 to 74 75 to 79 80 to 84 85 to 89 90 to 94 95 to 99 1
Impact on Observation and Practice Measures Rating Scale
State Growth Model Comparable Growth & Locally Selected Measures Observation Rubrics and Practice Measures Actual Composite Rating
Ineffective Developing Effective Highly Effective Ineffective Developing Effective Highly Effective Ineffective
Devel-
- ping
Effective
Highly Effective
Alignment of Actual Lower Hudson Scores to Compressed Scale
Observation Rubrics and Practice Measures Scores
Districts are responding to a particular set of rules
that requires them to abandon almost all of the rating scale
it would be optimal if they did not have to
Does your district retain the “measures” for
decision making
Report “ratings” as compressed Effort not wasted as long as information retains value
State Growth Model Study Findings
SGP Model
NY SGP model is rigorous and attempts to deal
with many growth modeling issues
In phase 1 (2011/2012) we were concerned with
strong relationships between incoming test performance and SGP
In phase 2 (2012/2013) we note that this has been
changed through the addition of classroom average characteristics or “peer effect” variables
Distribution of State Growth Scores
State Growth Model
Distributions are largely spread out over the scale High-growth teachers and low-growth teachers in
almost every district
Peer effects have evened out scores between high
and low proficiency regions
Class size has some impact but is mitigated with
translation from measurement to rating
Comparable Measures Study Findings
Distribution of Comparable Measures Scores
Comparable Measures
Substantial differences between districts in
the way their policies measure effectiveness with
Comparable Measures ratings OR
their teacher’s ability to score highly on these metrics
NYSED policy allows variance in implementation of
comparable measures
rating comparability between districts is suspect.
Seems not possible for a teacher to attain any of
the scores from 0-20 as required by regulation
Local Measures Study Findings
Distribution of Local Measures Scores
Local Measures
Substantial differences between districts in
the way their policies measure effectiveness with
Local Measures ratings OR
their teacher’s ability to score highly on these metrics
NYSED policy allows variance in implementation of
Local measures
rating comparability between districts is suspect.
Seems not possible for a teacher to attain any of
the scores from 0-20 as required by regulation
Flexibility at the local level seems to have produced ratings that are not comparable across districts
Student Outcome Measures Comparability Across Districts
Observation / Practice Local Measure State Growth or Comparable Growth
Overall System
What’s Driving Differentiation of Scores?
10 20 30 40 50 60 70 80 90 100 75 83 92 100
Each Measure Distributed
Observation and Practice Local Measure State Growth or Comparable Growth 10 20 30 40 50 60 70 80 90 100 75 83 92 100
State Growth Drives Differences
Observation and Practice Local Measure State Growth or Comparable Growth
In Lower Hudson
Only 10% of
variance driven by Observations
Summary of Findings
Strong variation between district implementations SLOs 3 points higher than MGP Local measures 4 points higher than MGP Observation ratings almost no differentiation Likely driven by the rules set forth in the composite
score HEDI bands
Nationwide Context
Total system issues
Two big rating systems
Index: weighted points Matrix: categories based on multiple positions of
measures
A visual: 3 E H H H 2 I D E H 1 I I D H 1 2 3 4 Measure 1 Measure 2
Pro/Con of Rating Systems
Index
Pro: easy to calculate Pro: easy to communicate Con: Compensatory model may incent cross
component influence
Matrix
Pro: more flexible Con: more difficult to explain Pro: may allow for disagreeing measures to be dealt
with in a different way than index
What About the Other 49 States?
Much experimentation
High weights on growth High weights on observations Student surveys Assessment system redesigns Index and Matrix approaches
Who gets it right?
Exemplars
Developing field No state has it right Some components work
Growth on assessment measurement in NY is good No state has gotten SLO’s right (RI is getting there) Observations are coming under fire for poor
implementation and possible bias (some great work
- n the measures – not yet on ratings)
Total system scoring and policy is all over the map