Growth in Student Achievement: Issues of Measurement, Longitudinal - - PowerPoint PPT Presentation

growth in student achievement issues of measurement
SMART_READER_LITE
LIVE PREVIEW

Growth in Student Achievement: Issues of Measurement, Longitudinal - - PowerPoint PPT Presentation

Growth in Student Achievement: Issues of Measurement, Longitudinal Analyses & Accountability Damian W. Betebenner NCIEA CCSSO NCSA, June 23, 2010 Discussions of student growth lie at the intersection of three topics Longitudinal Data


slide-1
SLIDE 1

Growth in Student Achievement: Issues of Measurement, Longitudinal Analyses & Accountability Damian W. Betebenner NCIEA CCSSO NCSA, June 23, 2010

slide-2
SLIDE 2

Discussions of student growth lie at the intersection of three topics

Longitudinal Data Analysis/Applied Statistics Accountability/Education Policy/Data Use Measurement/Psychometrics

Overview

slide-3
SLIDE 3

Measurement/Psychometrics

Examining student growth requires multiple measurements of the same individual

 Growth in what?  How much growth? (How is scaling involved

in answering this question?)

 Is it enough growth?

Overview

slide-4
SLIDE 4

Longitudinal Data Analysis/Applied Statistics

Many methods for analysis of longitudinal data

 What are the relevant questions?  Are the analytic techniques capable of

answering those questions?

 Does the data possess properties sufficient

for the analytic techniques employed? (e.g., vertical scale)

 Does the analysis sustain the inferences

made from the data?

Overview

slide-5
SLIDE 5

Accountability/Education Policy/Data Use

Education Policy & Accountability have many goals and purposes

 Why growth in accountability?  What are the goals and purposes of

accountability?

 What is the theory of action behind

accountability?

 How can we judge the validity of the

accountability system?

 What about the current policy context? Overview

slide-6
SLIDE 6

Measurement/Psychometric Issues

Technical Considerations

slide-7
SLIDE 7

Measurement/Psychometric Issues

 Growth in what?  How much growth?  Scales for measuring growth

 Ordinal (within-year, across year)  Interval (within-year, across year)  Vertical

 Growth magnitude versus growth norm  Is it enough growth? Norm- versus criterion-

referencing (intersection of Accountability and Measurement)

Technical Considerations

slide-8
SLIDE 8

Growth in what?

 Beneath any notion of change (i.e., growth)

is a construct that is changing over time

 Height and weight are common points of

reference

 Constructs in education are “slippery”  Need, at a minimum, an underlying

semantical referent (e.g. reading or math)

Technical Considerations

slide-9
SLIDE 9

How much growth?

Technical Considerations  Are growth magnitudes possible in

education?

 If calculable, are they interpretable absent

some norm?

 Approaches to growth magnitudes:

 Performance standards  Vertical scale with interval properties  Learning progressions (qualitative growth)

slide-10
SLIDE 10

How much growth?

Performance Standards Strengths

Anchors reference points for discussions about performance

Growth is embedded in accountability metric Limitations

 Few levels, mask

substantial range within levels thus masking student growth within level

 Vary greatly in

stringency from state to state so that “proficient” performance lacks meaning Technical Considerations

slide-11
SLIDE 11

How much growth?

Scale Scores Limitations

 Difficult to interpret or

explain to users

 Vertical scales are hard

to defend

 Claims of interval

measurement properties don’t hold to close scrutiny Technical Considerations Strengths

 Semi-continuous scores

(many score points)

 Can be used to create

vertical scales across grade levels

 Give the appearance of

interval scales needed by some analytical models

slide-12
SLIDE 12

How much growth?

Vertical Scale Vertical & Interval scales required for some analytic techniques:

Gain score calculation (magnitude of growth)

Growth curve analysis (rate of growth) (e.g., Willett & Singer, 2003)

Vertical & Interval scales required for some questions:

Matthew effects: Do higher achievers grow faster than lower achievers?

Growth rates relative to student age: Do students grow more in later grades than earlier grades?

Technical Considerations

slide-13
SLIDE 13

How much growth?

Vertical Scale Vertical and/or Interval scales NOT required for some analytic techniques:

Value-Added analyses: Most require interval, but not vertical, scale. See Ballou (2008), Briggs & Betebenner (2009).

Auto-regressive analyses, growth norms

Vertical and/or Interval scales NOT required for some questions:

Is a student’s progress (ab)normal?

Is a student’s growth sufficient to put them on track to reach/maintain proficiency?

See Yen (2007) for an excellent list of questions

Technical Considerations

slide-14
SLIDE 14

How much growth?

Magnitudes versus Norms

Physical growth

 9 year old boy grew 5 inches

in past year

 Average increase in height

for boys between years 8 and 9 is 4 inches Achievement growth

 4th grader grew 25 scale

score points since 3rd grade

 Average 4th grade scale

score is 21 points higher than average 3rd grade score

Two Growth Quantities

 Magnitude of growth  Relative amount of growth

How much growth?

 People expect an answer

  • f magnitude

 People need magnitude

embedded within a norm Technical Considerations

slide-15
SLIDE 15

How much growth?

Growth norms Technical Considerations

Although normative comparisons are spurned by criterion-referenced and standards-based measurement advocates, norms can provide a useful interpretive framework, especially in the interpretation of student growth

“Scratch a criterion and you find a norm”

  • W. H. Angoff (1974)
slide-16
SLIDE 16

Longitudinal Data Analysis Issues

Technical Considerations

slide-17
SLIDE 17

Many Questions

Technical Considerations

 How much annual growth did this (these) student(s)

make in reading?

 Is (Are) this (these) student(s) making sufficient

growth to reach/maintain desired achievement targets? (Growth-to-standard & Growth Model Pilot Program)

 Are students in particular subgroups (e.g., minority

students) making as much progress as other students?

 How much did this teacher/school contribute to

students’ growth over the last year? (Value-Added)

 Again, see Yen (2007) for an excellent list of

questions

slide-18
SLIDE 18

Many Techniques

Numerous data analysis techniques for use with longitudinal data:

 Gain scores (suitable scale required)  Cross-tabulation based upon prior and current

categorical achievement level attainment (e.g., value-tables, transition matrices)

 Regression based approaches: growth-curve

analysis (HLM), fixed/mixed-effects models, growth norms

Technical Considerations

slide-19
SLIDE 19

Questions 1st, Analyses 2nd

 Different growth analysis techniques often

address different questions

 Different questions lead to different

conversations which lead to different uses and outcomes

“It is better to have an approximate answer to the right question than a precise answer to the wrong question.”

  • J. W. Tukey

Technical Considerations

slide-20
SLIDE 20

Model Purpose

Three general uses associated with statistical models (Berk, 2004): Description: An account of the data. Model is true to the extent that it is

  • useful. Model quality judged by craftsmanship (de Leeuw, 2004)

Inference: Sample to Population. Model is true to the extent that the assumed chance process reflects reality (super-population fallacy) Causality: A causes B to happen. Model is true to the extent that plausible causal theory exists and design criteria are met

 Models are rarely descriptive despite minimal requirements  Inference and causality require information external to the data. Can’t

be validated solely from data

 Models are often causal in nature but rarely meet rigorous criteria

necessary for such inferences

Technical Considerations

slide-21
SLIDE 21

Value-Added Models

Causality

 Value-Added Models (e.g., EVAAS) are a frequently

discussed type of growth model

 Value-Added Models attempt to quantify the portion

  • f student progress attributable, usually to a teacher
  • r school

 Value-Added is about the inferences made and not

the actual model

 Causal attributions make value-added models well

suited for accountability discussions

 In the absence of random assignment causal

attributions are always suspect and subject to challenges (see, for example, Raudenbush, 2004; Rubin, Stuart & Zanutto, 2004) Technical Considerations

slide-22
SLIDE 22

Value-Added Models

Causality

 Value-added models return norm-referenced

effectiveness quantities

 With regard to schools, quantities indicate whether a

school is significantly more or less effective than the mean school effectiveness in the district or state

 In a standards based assessment environment, how

much effectiveness is enough?

 Especially important in light of universal proficiency

policy mandates

 Growth-to-standard models created to provide

criterion-referenced growth models Technical Considerations

slide-23
SLIDE 23

Growth Model Pilot Program

Growth-to-standard

 In response to requests for growth model use as part

  • f AYP, USED allowed states to apply to use growth

models

 Fifteen states had models accepted  Models required to adhere to the “bright line

principle” of universal proficiency (growth-to- standard)

 Yen (2009) provides an excellent overview of the

models

 Growth-to-standard models returned, in general,

results that closely aligned with AYP status results. Technical Considerations

slide-24
SLIDE 24

Growth versus Value-Added Models

Description & Causality  Growth measures are descriptive  Accountability has skewed discussions of

growth from description toward responsibility (i.e., causality)

 All measures (even VAM) are potentially

  • descriptive. However, some measures are

specially crafted for causal inference/attribution

 Good descriptive measures are interpretable,

informative and capable of multiple uses

Technical Considerations

slide-25
SLIDE 25

Growth versus Value-Added Models

Description: Colorado Growth Model  The Colorado Growth Model uses student

growth percentiles to quantify student growth

 Percentiles are familiar to stakeholders  Separating description from responsibility

has led to broad public acceptance including teacher’s unions

 Asking what schools or teachers are

associated with students demonstrating the highest growth percentiles moves from description toward value-added

Technical Considerations

slide-26
SLIDE 26

Growth versus Value-Added Models

Description: Colorado Growth Model  Analysis employs quantile regression to calculate conditional quantile relationships between current and prior achievement

 Student growth percentiles can also be criterion

referenced to accommodate growth-to-standard

 This approach formed the basis of Colorado’s

successful application as part of the Growth Model Pilot Program

 Student growth percentiles provide a bridge connecting

value-added and criterion-referenced interests Technical Considerations

slide-27
SLIDE 27

Validating Models

 There is no “gold standard” against which to

judge value-added or growth model results

 Statistical model specification goes only part

way toward validation

 Results should have credibility (i.e. face validity)  Because of their importance in accountability,

utility is a primary component of model validity

“All models are wrong but some are useful.”

  • G. E. P. Box

Technical Considerations

slide-28
SLIDE 28

Accountability/Policy/Data Use

Technical Considerations

slide-29
SLIDE 29

Accountability Systems

Purpose and requirements  Intended to improve education

 Increase student achievement  Reduce achievement gaps  Increase efficiencies

 Externally mandated and designed to hold

educators responsible for student learning

 Impose sanctions and rewards based upon

results from large scale assessment

  • utcomes

Technical Considerations

slide-30
SLIDE 30

Accountability and Growth

What type of growth?

 Value-added provides a norm-referenced lens

judging growth/effectiveness against district/state averages.

 Growth-to-standard provides a criterion-referenced

lens judging growth toward community endorsed achievement goals

 Inferences about education quality based upon

value-added make judgments relative to students reaching statistical expectation

 Inferences about education quality based upon

growth-to-standard make judgments relative to students reaching criterion-referenced destinations Technical Considerations

slide-31
SLIDE 31

Accountability and Growth

What type of growth?  States currently employ a variety of growth models in service of accountability

 Current policy mandates like NCLB are criterion-

referenced---establishing achievement targets/destinations for all students

 Need BOTH norm- and criterion-referenced growth to

reconcile individual focused policies like NCLB with imperatives to judge education quality at the group level (e.g., teacher or school) Technical Considerations

slide-32
SLIDE 32

Accountability Systems

Theory of action  Theory of action connects interpretations,

uses, and consequences (Gong, 2008)

 Details connecting punishments/rewards

and outcomes are usually vague/incomplete

 It is exactly these details that are critical to

validating the theory of action associated with the accountability system’s use of growth any growth/value-added metric

Technical Considerations

slide-33
SLIDE 33

Accountability System Validity

Systemic Validity

“Assessment practices and systems of accountability are systemically valid if they generate useful information and constructive responses that support one or more policy goals (Access, Quality, Equity, Efficiency) within an education system, without causing undue deterioration with respect to other goals.”

  • H. Braun (2008)

Technical Considerations

slide-34
SLIDE 34

Descriptive Accountability

Building in systemic validity “Accountability system results can have value without making causal inferences about school quality, solely from the results of student achievement measures and demographic characteristics. Treating the results as descriptive information and for identification of schools that require more intensive investigation of

  • rganizational and instructional process characteristics

are potentially of considerable value. Rather than using the results of the accountability system as the sole determiner of sanctions for schools, they could be used to flag schools that need more intensive investigation to reach sound conclusions about needed improvements

  • r judgments about quality.”
  • R. L. Linn (2008)

Technical Considerations

slide-35
SLIDE 35

Descriptive Accountability

 Part of a broader research program  Helpful in spotting provocative associations  A part of advocacy/informative discussions

(e.g, growth-gaps by ethnicity)

 Informs policy goals and initiatives

The descriptive growth norms of the Colorado Growth Model are an example of this type of accountability metric

Technical Considerations

slide-36
SLIDE 36

Bibliography

 Angoff, W. H. (1974). Criterion-referencing, norm-referencing and the

  • SAT. College Board Review, 92:2–5.

 Ballou, D. (2008). Test Scaling and Value-Added Measurement.

Presented at National Conference on Value-Added Modeling, Madison, Wisconsin.

 Berk, R. A. (2004). Regression analysis: A constructive critique. Sage

Publications, Thousand Oaks, CA.

 Braun, H. (2008). Viccissitudes of the Validators. Presented at the

Reidy Interactive Lecture Series, Portsmouth, New Hampshire.

 Breiter, A & Light, D (2006). Data for School Improvement: Factors for

designing effective information systems to support decision-making in

  • schools. Educational Technology & Society, 9 (3), 206-217.

 Briggs, D. C. M. & Betebenner, D. W. (2009) Is growth in student

achievement scale dependent?. Paper presented at the NCME annual conference, April 2009. San Diego, CA.

 Campbell, D. T. (1976), Assessing the Impact of Planned Social

  • Change. The Public Affairs Center, Dartmouth College, Hanover New

Hampshire, USA. December, 1976.

 De Leeuw, J. (2004) Preface to Berk’s “Regression analysis: A

constructive critique”. Sage Publications, Thousand Oaks, CA.

Bibliography

slide-37
SLIDE 37

Bibliography

 Downloaded January 20th, 2009 from http://www.education-

consumers.com/VAAA/Poverty%20vs%20School%20Effectivness%202 006%20chart.pdf

 Downloaded January 20th, 2009 from http://www.education-

consumers.org/tnproject/poverty_vs_effectiveness_2008.pdf

 Edley, C. (2006). Educational “Opportunity" is the highest civil rights

  • priority. So what should researchers and lawyers do about it? Retrieved

June 22, 2006 from the World Wide Web: http://www.softconference.com/MEDIA/WMP/260407/#43.010

 Gong, B. (2008). Validating accountability systems. Presented at the

Reidy Interactive Lecture Series, Portsmouth New Hampshire

 Linn, R. L. (2000). Assessments and accountability. Educational

Researcher 29(2): 4-16.

 Linn, R. L. (2008). Educational accountability systems. In K. E. Ryan &

  • L. A. Shepard (Eds.), The Future of Test-Based Educational

Accountability, pages 3–24. Taylor & Francis, New York.

 Mintrop, H. & Sunderman, G. L. (2009). Why high stakes accountability

sounds good but doesn’t work—and why we keep on doing it anyway. Los Angeles, CA: The Civil Rights Project/Proyecto Derechos Civiles at UCLA.

Bibliography

slide-38
SLIDE 38

Bibliography

 Raudenbush, S. W. (2004). What are value-added models estimating

and what does this imply for statistical practice? Journal of Educational and Behavioral Statistics, 29 (1), 121–129.

 Rubin, D. B., Stuart, E. A., & Zanutto, E. L. (2004). A potential

  • utcomes view of value-added assessment in education. Journal of

Educational and Behavioral Statistics, 29 (1), 103–116.

 Singer, J. D., & Willett, J. B. (2003). Applied longitudinal data analysis:

Modeling change and event occurrence. Oxford University Press, USA.

 Yen, W. M. (2007). Vertical scaling and No Child Left Behind. In N. J.

Dorans, M. Pommerich, & P. W. Holland (Eds.), Linking and Aligning Scores and Scales, pages 273–283. Springer, New York.

 Yen, W. M. (2009, March) Growth Models Approved for the NCLB

Growth Model Pilot. ETS.

Bibliography