Growth in Student Achievement: Issues of Measurement, Longitudinal - - PowerPoint PPT Presentation
Growth in Student Achievement: Issues of Measurement, Longitudinal - - PowerPoint PPT Presentation
Growth in Student Achievement: Issues of Measurement, Longitudinal Analyses & Accountability Damian W. Betebenner NCIEA CCSSO NCSA, June 23, 2010 Discussions of student growth lie at the intersection of three topics Longitudinal Data
Discussions of student growth lie at the intersection of three topics
Longitudinal Data Analysis/Applied Statistics Accountability/Education Policy/Data Use Measurement/Psychometrics
Overview
Measurement/Psychometrics
Examining student growth requires multiple measurements of the same individual
Growth in what? How much growth? (How is scaling involved
in answering this question?)
Is it enough growth?
Overview
Longitudinal Data Analysis/Applied Statistics
Many methods for analysis of longitudinal data
What are the relevant questions? Are the analytic techniques capable of
answering those questions?
Does the data possess properties sufficient
for the analytic techniques employed? (e.g., vertical scale)
Does the analysis sustain the inferences
made from the data?
Overview
Accountability/Education Policy/Data Use
Education Policy & Accountability have many goals and purposes
Why growth in accountability? What are the goals and purposes of
accountability?
What is the theory of action behind
accountability?
How can we judge the validity of the
accountability system?
What about the current policy context? Overview
Measurement/Psychometric Issues
Technical Considerations
Measurement/Psychometric Issues
Growth in what? How much growth? Scales for measuring growth
Ordinal (within-year, across year) Interval (within-year, across year) Vertical
Growth magnitude versus growth norm Is it enough growth? Norm- versus criterion-
referencing (intersection of Accountability and Measurement)
Technical Considerations
Growth in what?
Beneath any notion of change (i.e., growth)
is a construct that is changing over time
Height and weight are common points of
reference
Constructs in education are “slippery” Need, at a minimum, an underlying
semantical referent (e.g. reading or math)
Technical Considerations
How much growth?
Technical Considerations Are growth magnitudes possible in
education?
If calculable, are they interpretable absent
some norm?
Approaches to growth magnitudes:
Performance standards Vertical scale with interval properties Learning progressions (qualitative growth)
How much growth?
Performance Standards Strengths
Anchors reference points for discussions about performance
Growth is embedded in accountability metric Limitations
Few levels, mask
substantial range within levels thus masking student growth within level
Vary greatly in
stringency from state to state so that “proficient” performance lacks meaning Technical Considerations
How much growth?
Scale Scores Limitations
Difficult to interpret or
explain to users
Vertical scales are hard
to defend
Claims of interval
measurement properties don’t hold to close scrutiny Technical Considerations Strengths
Semi-continuous scores
(many score points)
Can be used to create
vertical scales across grade levels
Give the appearance of
interval scales needed by some analytical models
How much growth?
Vertical Scale Vertical & Interval scales required for some analytic techniques:
Gain score calculation (magnitude of growth)
Growth curve analysis (rate of growth) (e.g., Willett & Singer, 2003)
Vertical & Interval scales required for some questions:
Matthew effects: Do higher achievers grow faster than lower achievers?
Growth rates relative to student age: Do students grow more in later grades than earlier grades?
Technical Considerations
How much growth?
Vertical Scale Vertical and/or Interval scales NOT required for some analytic techniques:
Value-Added analyses: Most require interval, but not vertical, scale. See Ballou (2008), Briggs & Betebenner (2009).
Auto-regressive analyses, growth norms
Vertical and/or Interval scales NOT required for some questions:
Is a student’s progress (ab)normal?
Is a student’s growth sufficient to put them on track to reach/maintain proficiency?
See Yen (2007) for an excellent list of questions
Technical Considerations
How much growth?
Magnitudes versus Norms
Physical growth
9 year old boy grew 5 inches
in past year
Average increase in height
for boys between years 8 and 9 is 4 inches Achievement growth
4th grader grew 25 scale
score points since 3rd grade
Average 4th grade scale
score is 21 points higher than average 3rd grade score
Two Growth Quantities
Magnitude of growth Relative amount of growth
How much growth?
People expect an answer
- f magnitude
People need magnitude
embedded within a norm Technical Considerations
How much growth?
Growth norms Technical Considerations
Although normative comparisons are spurned by criterion-referenced and standards-based measurement advocates, norms can provide a useful interpretive framework, especially in the interpretation of student growth
“Scratch a criterion and you find a norm”
- W. H. Angoff (1974)
Longitudinal Data Analysis Issues
Technical Considerations
Many Questions
Technical Considerations
How much annual growth did this (these) student(s)
make in reading?
Is (Are) this (these) student(s) making sufficient
growth to reach/maintain desired achievement targets? (Growth-to-standard & Growth Model Pilot Program)
Are students in particular subgroups (e.g., minority
students) making as much progress as other students?
How much did this teacher/school contribute to
students’ growth over the last year? (Value-Added)
Again, see Yen (2007) for an excellent list of
questions
Many Techniques
Numerous data analysis techniques for use with longitudinal data:
Gain scores (suitable scale required) Cross-tabulation based upon prior and current
categorical achievement level attainment (e.g., value-tables, transition matrices)
Regression based approaches: growth-curve
analysis (HLM), fixed/mixed-effects models, growth norms
Technical Considerations
Questions 1st, Analyses 2nd
Different growth analysis techniques often
address different questions
Different questions lead to different
conversations which lead to different uses and outcomes
“It is better to have an approximate answer to the right question than a precise answer to the wrong question.”
- J. W. Tukey
Technical Considerations
Model Purpose
Three general uses associated with statistical models (Berk, 2004): Description: An account of the data. Model is true to the extent that it is
- useful. Model quality judged by craftsmanship (de Leeuw, 2004)
Inference: Sample to Population. Model is true to the extent that the assumed chance process reflects reality (super-population fallacy) Causality: A causes B to happen. Model is true to the extent that plausible causal theory exists and design criteria are met
Models are rarely descriptive despite minimal requirements Inference and causality require information external to the data. Can’t
be validated solely from data
Models are often causal in nature but rarely meet rigorous criteria
necessary for such inferences
Technical Considerations
Value-Added Models
Causality
Value-Added Models (e.g., EVAAS) are a frequently
discussed type of growth model
Value-Added Models attempt to quantify the portion
- f student progress attributable, usually to a teacher
- r school
Value-Added is about the inferences made and not
the actual model
Causal attributions make value-added models well
suited for accountability discussions
In the absence of random assignment causal
attributions are always suspect and subject to challenges (see, for example, Raudenbush, 2004; Rubin, Stuart & Zanutto, 2004) Technical Considerations
Value-Added Models
Causality
Value-added models return norm-referenced
effectiveness quantities
With regard to schools, quantities indicate whether a
school is significantly more or less effective than the mean school effectiveness in the district or state
In a standards based assessment environment, how
much effectiveness is enough?
Especially important in light of universal proficiency
policy mandates
Growth-to-standard models created to provide
criterion-referenced growth models Technical Considerations
Growth Model Pilot Program
Growth-to-standard
In response to requests for growth model use as part
- f AYP, USED allowed states to apply to use growth
models
Fifteen states had models accepted Models required to adhere to the “bright line
principle” of universal proficiency (growth-to- standard)
Yen (2009) provides an excellent overview of the
models
Growth-to-standard models returned, in general,
results that closely aligned with AYP status results. Technical Considerations
Growth versus Value-Added Models
Description & Causality Growth measures are descriptive Accountability has skewed discussions of
growth from description toward responsibility (i.e., causality)
All measures (even VAM) are potentially
- descriptive. However, some measures are
specially crafted for causal inference/attribution
Good descriptive measures are interpretable,
informative and capable of multiple uses
Technical Considerations
Growth versus Value-Added Models
Description: Colorado Growth Model The Colorado Growth Model uses student
growth percentiles to quantify student growth
Percentiles are familiar to stakeholders Separating description from responsibility
has led to broad public acceptance including teacher’s unions
Asking what schools or teachers are
associated with students demonstrating the highest growth percentiles moves from description toward value-added
Technical Considerations
Growth versus Value-Added Models
Description: Colorado Growth Model Analysis employs quantile regression to calculate conditional quantile relationships between current and prior achievement
Student growth percentiles can also be criterion
referenced to accommodate growth-to-standard
This approach formed the basis of Colorado’s
successful application as part of the Growth Model Pilot Program
Student growth percentiles provide a bridge connecting
value-added and criterion-referenced interests Technical Considerations
Validating Models
There is no “gold standard” against which to
judge value-added or growth model results
Statistical model specification goes only part
way toward validation
Results should have credibility (i.e. face validity) Because of their importance in accountability,
utility is a primary component of model validity
“All models are wrong but some are useful.”
- G. E. P. Box
Technical Considerations
Accountability/Policy/Data Use
Technical Considerations
Accountability Systems
Purpose and requirements Intended to improve education
Increase student achievement Reduce achievement gaps Increase efficiencies
Externally mandated and designed to hold
educators responsible for student learning
Impose sanctions and rewards based upon
results from large scale assessment
- utcomes
Technical Considerations
Accountability and Growth
What type of growth?
Value-added provides a norm-referenced lens
judging growth/effectiveness against district/state averages.
Growth-to-standard provides a criterion-referenced
lens judging growth toward community endorsed achievement goals
Inferences about education quality based upon
value-added make judgments relative to students reaching statistical expectation
Inferences about education quality based upon
growth-to-standard make judgments relative to students reaching criterion-referenced destinations Technical Considerations
Accountability and Growth
What type of growth? States currently employ a variety of growth models in service of accountability
Current policy mandates like NCLB are criterion-
referenced---establishing achievement targets/destinations for all students
Need BOTH norm- and criterion-referenced growth to
reconcile individual focused policies like NCLB with imperatives to judge education quality at the group level (e.g., teacher or school) Technical Considerations
Accountability Systems
Theory of action Theory of action connects interpretations,
uses, and consequences (Gong, 2008)
Details connecting punishments/rewards
and outcomes are usually vague/incomplete
It is exactly these details that are critical to
validating the theory of action associated with the accountability system’s use of growth any growth/value-added metric
Technical Considerations
Accountability System Validity
Systemic Validity
“Assessment practices and systems of accountability are systemically valid if they generate useful information and constructive responses that support one or more policy goals (Access, Quality, Equity, Efficiency) within an education system, without causing undue deterioration with respect to other goals.”
- H. Braun (2008)
Technical Considerations
Descriptive Accountability
Building in systemic validity “Accountability system results can have value without making causal inferences about school quality, solely from the results of student achievement measures and demographic characteristics. Treating the results as descriptive information and for identification of schools that require more intensive investigation of
- rganizational and instructional process characteristics
are potentially of considerable value. Rather than using the results of the accountability system as the sole determiner of sanctions for schools, they could be used to flag schools that need more intensive investigation to reach sound conclusions about needed improvements
- r judgments about quality.”
- R. L. Linn (2008)
Technical Considerations
Descriptive Accountability
Part of a broader research program Helpful in spotting provocative associations A part of advocacy/informative discussions
(e.g, growth-gaps by ethnicity)
Informs policy goals and initiatives
The descriptive growth norms of the Colorado Growth Model are an example of this type of accountability metric
Technical Considerations
Bibliography
Angoff, W. H. (1974). Criterion-referencing, norm-referencing and the
- SAT. College Board Review, 92:2–5.
Ballou, D. (2008). Test Scaling and Value-Added Measurement.
Presented at National Conference on Value-Added Modeling, Madison, Wisconsin.
Berk, R. A. (2004). Regression analysis: A constructive critique. Sage
Publications, Thousand Oaks, CA.
Braun, H. (2008). Viccissitudes of the Validators. Presented at the
Reidy Interactive Lecture Series, Portsmouth, New Hampshire.
Breiter, A & Light, D (2006). Data for School Improvement: Factors for
designing effective information systems to support decision-making in
- schools. Educational Technology & Society, 9 (3), 206-217.
Briggs, D. C. M. & Betebenner, D. W. (2009) Is growth in student
achievement scale dependent?. Paper presented at the NCME annual conference, April 2009. San Diego, CA.
Campbell, D. T. (1976), Assessing the Impact of Planned Social
- Change. The Public Affairs Center, Dartmouth College, Hanover New
Hampshire, USA. December, 1976.
De Leeuw, J. (2004) Preface to Berk’s “Regression analysis: A
constructive critique”. Sage Publications, Thousand Oaks, CA.
Bibliography
Bibliography
Downloaded January 20th, 2009 from http://www.education-
consumers.com/VAAA/Poverty%20vs%20School%20Effectivness%202 006%20chart.pdf
Downloaded January 20th, 2009 from http://www.education-
consumers.org/tnproject/poverty_vs_effectiveness_2008.pdf
Edley, C. (2006). Educational “Opportunity" is the highest civil rights
- priority. So what should researchers and lawyers do about it? Retrieved
June 22, 2006 from the World Wide Web: http://www.softconference.com/MEDIA/WMP/260407/#43.010
Gong, B. (2008). Validating accountability systems. Presented at the
Reidy Interactive Lecture Series, Portsmouth New Hampshire
Linn, R. L. (2000). Assessments and accountability. Educational
Researcher 29(2): 4-16.
Linn, R. L. (2008). Educational accountability systems. In K. E. Ryan &
- L. A. Shepard (Eds.), The Future of Test-Based Educational
Accountability, pages 3–24. Taylor & Francis, New York.
Mintrop, H. & Sunderman, G. L. (2009). Why high stakes accountability
sounds good but doesn’t work—and why we keep on doing it anyway. Los Angeles, CA: The Civil Rights Project/Proyecto Derechos Civiles at UCLA.
Bibliography
Bibliography
Raudenbush, S. W. (2004). What are value-added models estimating
and what does this imply for statistical practice? Journal of Educational and Behavioral Statistics, 29 (1), 121–129.
Rubin, D. B., Stuart, E. A., & Zanutto, E. L. (2004). A potential
- utcomes view of value-added assessment in education. Journal of
Educational and Behavioral Statistics, 29 (1), 103–116.
Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis:
Modeling change and event occurrence. Oxford University Press, USA.
Yen, W. M. (2007). Vertical scaling and No Child Left Behind. In N. J.
Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and Aligning Scores and Scales, pages 273–283. Springer, New York.
Yen, W. M. (2009, March) Growth Models Approved for the NCLB