Measurement in the context of formative classroom assessment Mark - - PowerPoint PPT Presentation

measurement in the context of formative classroom
SMART_READER_LITE
LIVE PREVIEW

Measurement in the context of formative classroom assessment Mark - - PowerPoint PPT Presentation

Measurement in the context of formative classroom assessment Mark Wilson UC Berkeley Presented at the University of California, Berkeley March 7, 2017 Abstract This talk will survey the last 15 years of work that we have been doing at the


slide-1
SLIDE 1

Measurement in the context of formative classroom assessment

Mark Wilson UC Berkeley Presented at the University of California, Berkeley March 7, 2017

slide-2
SLIDE 2

Abstract

This talk will survey the last 15 years of work that we have been doing at the BEAR Center on the BEAR Assessment System (BAS). It will begin by noting the initial motivations for developing this approach to measurement/assessment, focusing on the question of what are the measurement demands in the context of formative classroom

  • assessments. This will be followed by a brief description of the BAS,

accompanied by discussion of how it reflects a response to this

  • question. Following that it will also explore the following. (a) What

are the most important developments in the BAS since 2000? (b) What are some important examples of BAS assessments? (c) How should BAS interface with state testing? (d) What are the challenges and opportunities?

slide-3
SLIDE 3

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in the

BAS since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-4
SLIDE 4

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in the

BAS since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-5
SLIDE 5

A bit of my own background

  • Q1. How can educational measurement help

classroom assessment?

  • Worried on this in 1970s, and since

– Saw the need to give teachers better assessment information in the classroom – Saw the need for the assessment to track the “logic behind the curriculum” – B/c that way the teachers can understand what to do with that information.

slide-6
SLIDE 6

A bit of my own background

  • Q2. Where does the scale (aka, the continuum,

the construct, the learning progression) in tests come from?

  • Worked on this in 1980s and early 90s

– Discovering and interpreting effects on item difficulty for “wild” items – Looking for a consistent story about why items are

  • rdered the way they are
  • Published 10 papers on these topics during that

period …

slide-7
SLIDE 7
  • Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American

Educational Research Journal, 22(2), 297-307.

  • Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
  • model. Journal of Experimental Education, 57(4), 357-371.
  • Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.

Psychological Bulletin, 105(2), 276-289.

  • Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research

in Mathematics Education, 21(3), 230-237.

  • Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing

higher order thinking in mathematics. Washington, DC: American Association for the Advancement

  • f Science.
  • Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied

Psychological Measurement. 16(3), 309-325.

  • Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.

Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.

  • Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),

International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.

  • Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.

In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.

  • Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),

Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.

From: Wilson, 2010.

slide-8
SLIDE 8
  • Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American

Educational Research Journal, 22(2), 297-307.

  • Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
  • model. Journal of Experimental Education, 57(4), 357-371.
  • Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.

Psychological Bulletin, 105(2), 276-289.

  • Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research

in Mathematics Education, 21(3), 230-237.

  • Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing

higher order thinking in mathematics. Washington, DC: American Association for the Advancement

  • f Science.
  • Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied

Psychological Measurement. 16(3), 309-325.

  • Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.

Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.

  • Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),

International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.

  • Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.

In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.

  • Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),

Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.

From: Wilson, 2010.

slide-9
SLIDE 9
  • Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American

Educational Research Journal, 22(2), 297-307.

  • Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
  • model. Journal of Experimental Education, 57(4), 357-371.
  • Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.

Psychological Bulletin, 105(2), 276-289.

  • Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research

in Mathematics Education, 21(3), 230-237.

  • Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing

higher order thinking in mathematics. Washington, DC: American Association for the Advancement

  • f Science.
  • Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied

Psychological Measurement. 16(3), 309-325.

  • Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.

Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.

  • Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),

International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.

  • Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.

In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.

  • Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),

Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.

From: Wilson, 2010.

slide-10
SLIDE 10

More of my own background

  • Concluded that there was little enlightenment

about learning progressions in results from these analyses

– The items sets were too diverse, hence too sparse in what they conveyed about student learning, – The items differed in idiosyncratic ways.

  • Gave up doing that

– needed to find another way ...

slide-11
SLIDE 11

Things I learned …

  • Maxim No. 1.

Assessment content specialists (“item writers”) do not know why they build items the way they do. Corrollary: They cannot predict the difficulty of their items.

  • Maxim No. 2.

“Wild” items confound deeper cognitive/structuralist/diagnostic underpinnings with surface features (such as item wording, etc.).

slide-12
SLIDE 12

Things I learned …

  • But

Good curriculum developers create their curricula using a developmental way of thinking about learning,

  • Hence

Need to engage with curriculum developers, not item writers

  • But

Curriculum developers do not know how to write items.

slide-13
SLIDE 13

More of my own background

  • Changed my entire research agenda

– Became an item-designer, not just a secondary data analyst – Became a “sort of” curriculum designer—a learning progression designer – Became a developer of new types of item response models (“explanatory”)

slide-14
SLIDE 14

Learned that there is a further question...

  • Q3. How can we create “summative tests” if

we do not know what classroom tests are testing?

  • that is...
  • Q3. How can classroom assessment help

educational measurement ?

slide-15
SLIDE 15

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in the

BAS since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the next steps?
slide-16
SLIDE 16

What should we want from an Assessment System?

  • Assessment and accountability information available and

useful to important audiences: Audiences

  • Classroom

– Teachers, Students, Parents

  • school site

– other teachers and specialists, administrators

  • school district

– administrators

  • State and public

– School Board, legislators, State Dept. of Education, Public

slide-17
SLIDE 17

Audiences, and what they ought to get

Audiences What they need/want What they get from What the State now they ought to get classroom results useful for (a) individual student feedback SBAC, etc. (b) planning for individual progress SBAC, etc (c) planning for class progress SBAC, etc schools results useful for: (a) school, school district, State SBAC, etc school districts planning (b) school, school district, State SBAC, etc State and public accountability

slide-18
SLIDE 18

What they ought to get

First, some background.

  • 1. Different sorts of assessment

Assessment to assist learning

  • nly in classrooms

Summative assessment of individuals some in classrooms, some from State Assessment to evaluate programs mainly State (or school districts, federal, etc.)

  • 2. What should be the relationships among these different sorts of assessment?

“...teachers’ goals for learning should be consistent with those of large-scale assessments and vice-versa... one challenge is to make stronger connections between the two so they work together to support a common set of learning goals.” (Knowing What Students Know, National Research Council, 2001, p. 41)

slide-19
SLIDE 19
  • 3. Assessment in education is a major industry

State spends millions on it, testing companies make millions out of it. Assessment is one of the major components of every classroom. Not “State” assessments....teachers’ assessments. The instructional cycle....

Assessment Instruction Curriculum

slide-20
SLIDE 20

Assessment is ...

An activity that goes on at almost every minute of every school day. In every classroom, in every school in this State. An activity that is crucial to the success of every piece of instruction that every teacher carries

  • ut.

This assessment enterprise dwarfs the assessment system of the State, in time it takes, in effect on the students, in cost in how it affects the practices of teachers, and hence in how it effects student learning What will teachers do when the have to comply with State tests that are inconsistent with what they do in their instruction?

slide-21
SLIDE 21

Thus, the real question is ...

NOT

  • how to construct a system of State assessments that will

carry out the program evaluation tasks of the State? BUT

  • how to construct a system that

(a) helps in the classroom by (i) defining classroom learning goals, (ii) is a resource for classroom assessment and (iii) focuses instruction, and (b) helps channel useful information to (i) summative individual assessment and (ii) program evaluation tasks of the State

slide-22
SLIDE 22

How do we do that?

  • (a) Need a learning framework that allows

communication among the different audiences for assessment information

  • (b) Need methods of gathering data that are

acceptable and useful to all audiences

  • (c) Need ways to evaluate (score) responses that

is consistent with the framework in (a)

  • (d) Need a technique of interpreting data that

allows meaningful reporting to multiple audiences

slide-23
SLIDE 23

How do we do that?

  • (a) Need a learning framework that allows

communication among the different audiences for assessment information

  • (b) Need methods of gathering data that are

acceptable and useful to all audiences

  • (c) Need ways to evaluate (score) responses that is

consistent with the framework in (a)

  • (d) Need a technique of interpreting data that allows

meaningful reporting to multiple audiences

  • AKA...4 “Building Blocks”
slide-24
SLIDE 24

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in the

BAS since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-25
SLIDE 25

The BEAR Assessment System (BAS)

  • Herb Thier, Lawrence Hall of Science

– Science Education for Public Understanding (SEPUP) – Issues, Evidence and You (IEY) Curriculum

  • Unhappy that:

– Teachers love the IEY curriculum, but no gains on standardized tests – Teachers don’t really know whether their students have learned the “stuff” (Despite “unit tests”) – Students don’t know what they have learned – Looking for the “assessable moment”

slide-26
SLIDE 26

Image of a Curriculum/ Learning Progression

slide-27
SLIDE 27

Relating measurement dimensions to a curriculum?

slide-28
SLIDE 28

SEPUP’s IEY Curriculum (circa 1990)

  • Understanding Concepts (UC)--understanding scientific concepts

(such as properties and interactions of materials, energy, or thresholds) in order to apply the relevant scientific concepts to the solution of problems.

  • Designing and Conducting Investigations (DCI)--designing a

scientific experiment, carrying through a complete scientific investigation, performing laboratory procedures to collect data, recording and organizing data, and analyzing and interpreting results of an experiment.

  • Evidence and Tradeoffs (ET)--identifying objective scientific

evidence as well as evaluating the advantages and disadvantages

  • f different possible solutions to a problem based on the available

evidence.

  • Communicating Scientific Information (CM)--organizing and

presenting results in a way that is free of technical errors and effectively communicates with the chosen audience.

From: Wilson & Sloane, 2000

slide-29
SLIDE 29

Category

Score

Using Evidence:

Response uses objective reason(s) based on relevant evidence to support choice.

Beyond correct

4

Response accomplishes Level 3 AND goes beyond in some significant way, such as questioning or justifying the source, validity, and/or quantity of evidence.

Correct

3

Response provides major objective reasons AND supports each with relevant & accurate evidence.

Partially complete (ii)

  • -some objective reasons

and some evidence

2

Response provides some objective reasons AND some supporting evidence, BUT at least one reason is missing and/or part of the evidence is incomplete.

Partially complete (i)

  • -subjective and/or

inaccurate reasons

1

Response provides only subjective reasons (opinions) for choice and/or uses inaccurate

  • r irrelevant evidence from the activity.

Missing or irrelevant

No response; illegible response; response

  • ffers no reasons AND no evidence to

support choice made.

No opportunity

X

Student had no opportunity to respond.

The IEY Evidence and Tradeoffs construct map.

slide-30
SLIDE 30

Direction of increasing sophistication in using evidence. Students Responses to Items No relevant response Subjective or inaccurate evidence Something important missing All major points included Goes beyond complete and correct Direction of decreasing sophistication in using evidence.

slide-31
SLIDE 31

Relating measurement dimensions to a curriculum?

slide-32
SLIDE 32

Items design

You are a public health official who works in the Water

  • Department. Your supervisor has asked you to respond to the

public's concern about water chlorination at the next City Council meeting. Prepare a written response explaining the issues raised in the newspaper articles. Be sure to discuss the advantages and disadvantages of chlorinating drinking water in your response, and then explain your recommendation about whether the water should be chlorinated.

slide-33
SLIDE 33

“As an edjucated employee of the Grizzelyville water company, I am well aware of the controversy surrounding the topic of the chlorination of our drinking water. I have read the two articals regarding the pro’s and cons of chlorinated water. I have made an informed decision based on the evidence presented the articals entitled “The Peru Story” and “700 Extra People May bet Cancer in the US.” It is my recommendation that our towns water be chlorin treated. The risks of infecting our citizens with a bacterial diseease such as cholera would be inevitable if we drink nontreated water. Our town should learn from the country of Peru. The artical “The Peru Story” reads thousands of inocent people die of cholera epidemic. In just months 3,500 people were killed and more infected with the diease. On the

  • ther hand if we do in fact chlorine treat our drinking water a risk is
  • posed. An increase in bladder and rectal cancer is directly related to

drinking chlorinated water. Specifically 700 more people in the US may get cancer. However, the cholera risk far outweighs the cancer risk for 2 very important reasons. Many more people will be effected by cholera where as the chance of one of our citizens getting cancer due to the water would be very minimal. Also cholera is a spreading diease where as cancer is not. If our town was infected with cholera we could pass it on to millions of others. And so, after careful consideration it is my opion that the citizens of Grizzelyville drink chlorine treated water.”

slide-34
SLIDE 34

Category

Score

Using Evidence:

Response uses objective reason(s) based on relevant evidence to support choice.

Beyond correct

4

Response accomplishes Level 3 AND goes beyond in some significant way, such as questioning or justifying the source, validity, and/or quantity of evidence.

Correct

3

Response provides major objective reasons AND supports each with relevant & accurate evidence.

Partially complete (ii)

  • -some objective reasons

and some evidence

2

Response provides some objective reasons AND some supporting evidence, BUT at least one reason is missing and/or part of the evidence is incomplete.

Partially complete (i)

  • -subjective and/or

inaccurate reasons

1

Response provides only subjective reasons (opinions) for choice and/or uses inaccurate

  • r irrelevant evidence from the activity.

Missing or irrelevant

No response; illegible response; response

  • ffers no reasons AND no evidence to

support choice made.

No opportunity

X

Student had no opportunity to respond.

The IEY Evidence and Tradeoffs outcome space.

slide-35
SLIDE 35

A student’s profile on the IEY constructs.

slide-36
SLIDE 36

A student’s longitudinal profile on the DCI construct.

slide-37
SLIDE 37

General Strategy the 4 “Building Blocks”...

  • (a) Need a framework that allows teachers to interpret

development in the construct

– "Construct map”

  • (b) Need methods of gathering data that are

acceptable and useful to all audiences

– "Items design”

  • (c) Need a way to value what we see in student work

– "Outcome space”

  • (d) Need a technique of interpreting data that allows

meaningful reporting to multiple audiences

– "Measurement model"

slide-38
SLIDE 38

The Bear Assessment System (2000 AD)

slide-39
SLIDE 39

The Bear Assessment System (2000 AD)

Reliability & Validity

slide-40
SLIDE 40

The Bear Assessment System (2000 AD)

Construct Item Responses Measurement Model Outcome Space Causality Inference

A reflective model From: Wilson, 2005

slide-41
SLIDE 41

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in BAS

since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-42
SLIDE 42

What are the most important developments in BAS since 2000?

  • Qualitative Inner Triangle
  • Feedback Loops, etc.
  • OMC
  • Banding
  • Dimensional Comparability
  • Standard Setting
  • Learning Progressions
slide-43
SLIDE 43
  • 1. Qualitative Inner Triangle

Qualitative Inner Triangle

slide-44
SLIDE 44
  • 2. Feedback Loops, etc.
slide-45
SLIDE 45

From: Brown & Wilson, 2011.

slide-46
SLIDE 46
  • 3. Ordered Multiple Choice: OMC

Which is the best explanation for why it gets dark at night?

  • A. The Moon blocks the Sun at night. [Level 1 response]
  • B. The Earth rotates on its axis once a day. [Level 4 response]
  • C. The Sun moves around the Earth once a day. [Level 2 response]
  • D. The Earth moves around the Sun once a day. [Level 3 response]
  • E. The Sun and Moon switch places to create night. [Level 2 response]

From: Briggs, Alonzo, Schwab & Wilson, 2006 See also: Wilson & Adams, 1992

slide-47
SLIDE 47
  • 4. Banding
  • Finding levels in complex assessments...
slide-48
SLIDE 48

What we hoped to see:

48

slide-49
SLIDE 49

What we got

49

slide-50
SLIDE 50

New ECC Wright Maps

50

Strand-A Strand-B Strand-C

slide-51
SLIDE 51

Modelling:CoS Wright Map

51

slide-52
SLIDE 52

Revised CoS Map

| former&labels NL CoS1 CoS2 CoS3 CoS4 | new&labels NL CoS1A CoS1B CoS2* CoS3* | | ErinQ2 4 | ClassQ2* | Range | | | | 3 | | X | | Range HeightQ2 X | Caff2,Water X | ErinQ1 X | HeightQ2MC 2 XX | ClassQ1 TempQ2MC XX | HeightQ2 HeightQ2MC Swim3 XXX | HeightQ2 Water,9TempQ2 TempQ2MC XXXX | FreeThrow XXXXX | TempQ2 Ball2 XXXXXX | Range Batt2,&Ball3 XXXXX | Caff2 Kayla1 1 XXXXXX | XXXXXX | Swim1 XXXXXXXXX | GardC1,&Batt1 XXXXXXXXX | FreeThrow ErinQ1,&Kayla1 Swim2 XXXXXXXXXX | Batt2 XXXXXXXX | XXXXXXXX | Swim1 Tree2MC XXXXXXXX | ErinQ2 XXXXXXXX | Swim2,0Caff2 BallMean GardB1 XXXXXXXX | Swim3 XXXXXXX | ErinQ2 Max4MC XXXXXXX | ErinQ1 Tree1MC XXXXXX | ClassQ2 XXX | BallMode ;1 XXX | FreeThrow,0 Range,BallMean ClassQ1 XXXX | Kayla1 GardMean,&BallMed XX | BallMode Batt1 XX | ClassQ2 XX | BallMed X | GardMean,Ball3 X | ;2 X | GardC1 X | X | Water,Ball2 | GardMed X | | GardMed ;3 | GardB1 | Batt1,Batt2 GardMode | | Gardmode | | |

slide-53
SLIDE 53
  • 5. Dimensional comparability
  • Standard assumption in IRT:

– Mean of person distribution is 0.0

  • r

– Mean of item difficulties is 0.0

  • In multidimensional models, make same

assumptions (as in Factor Analysis).

  • Consequences?
slide-54
SLIDE 54

Multidimensional Wright Map

54

slide-55
SLIDE 55

55

Delta'Dimensional'Alignment'(DDA)'

  • Transforms'the'item'loca/ons'and'step'

parameters'from'the'7:dimensional'analysis,' using'the'results'from'the'unidimensional' analysis'

  • Then,'re:run'7:dimensional'analysis'using'these'

transformed'item'es/mates'and'step'parameters' as'anchored'values.''

16

τij(transformed) = τij(multi) σ d(uni) σ d(multi) $ % & ' ( )

δi(transformed) = δi(multi) σ d(uni) σ d(multi) $ % & ' ( ) + µd(uni)

δij = δi +τ ij

slide-56
SLIDE 56

Multidimensional Wright Map

56

slide-57
SLIDE 57
  • 6. Standard-Setting
  • Setting cut-points between reporting levels ...
slide-58
SLIDE 58
  • 6. Construct-Mapping Software
  • Dynamic display of item map
  • Displays, for any chosen proficiency level

– probability of passing all multiple-choice items – probability of attaining every level on open-ended items – expected total score on multiple-choice section; expected score on each open-ended item

  • Allows choice of weights for item types
slide-59
SLIDE 59

Items GSE Multiple Choice WR 1 WR 2 Scale Prob Prob Prob 750 740 2.5 .00 730 720 710 1.5 .00 700 690 2.4 .00 680 670 660 37 .31 650 640 630 15 .35 1.4 .00 620 610 28 39 .39 600 590 27 .42 580 38 .43 2.3 .07 570 19 .45 560 550 43 .49 540 34 45 48 .50 1.3 .47 530 17 18 20 50 .52 520 40 .53 510 4 31 .55 500 11 33 44 .57 490 32 47 .58 480 5 9 12 46 .59 470 3 7 16 .61 2.2 .92 460 6 10 29 .63 1.2 .52 450 36 .65 440 22 35 .66 430 8 14 23 26 .67 420 13 25 .69

slide-60
SLIDE 60

Persons GSE Items FR Percentile Histogram Scale MC WR 1 WR 2 100.0 740 1.4 100.0 735 100.0 730 42 100.0 < 725 99.9 720 2.4 99.9 715 6 99.9 710 99.9 705 99.9 700 99.9 695 99.9 690 41 99.9 < 685 63 99.7 < 680 99 99.5 < 675 99.2 670 83 99.2 < 665 1.3 23 98.9 < 660 19 98.9 < 655 192 98.8 X 650 66 98.2 < 645 2.3 89 98.0 < 640 256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 744 67.1 XXXXXXX 520 43 46 857 64.7 XXXXXXXX 515 35 996 61.9 XXXXXXXXX 510 12 22 24 41 47 899 58.7 XXXXXXXX 505 37 661 55.8 XXXXXX 500 10 20 29 36 45 50 1.1 853 53.7 XXXXXXXX 495 26 30 40 42 3 1035 51.0 XXXXXXXXXX 490 5 14 25 48 675 47.6 XXXXXX 485 23 33 34 44 837 45.5 XXXXXXXX 480 39 975 42.8 XXXXXXXXX 475 28 967 39.6 XXXXXXXXX 470 51 987 36.5 XXXXXXXXX 465 2.1 1316 33.3 XXXXXXXXXXXXX 460 27 746 29.1 XXXXXXX 455 936 26.7 XXXXXXXXX 450 7 11 49 894 23.7 XXXXXXXX 445 15 16 19 1032 20.8 XXXXXXXXXX 440 755 17.5 XXXXXXX 435 6 9 13 491 15.1 XXXX 430 17 31 32 2 656 13.5 XXXXXX 425 701 11.4 XXXXXXX 420 646 9.1 XXXXXX 415 4 491 7.0 XXXX 410 3

Final Standards Map

slide-61
SLIDE 61

Persons GSE Items FR Percentile Histogram Scale MC WR 1 WR 2 100.0 740 1.4 100.0 735 100.0 730 42 100.0 < 725 99.9 720 2.4 99.9 715 6 99.9 710 99.9 705 99.9 700 99.9 695 99.9 690 41 99.9 < 685 63 99.7 < 680 99 99.5 < 675 99.2 670 83 99.2 < 665 1.3 23 98.9 < 660 19 98.9 < 655 192 98.8 X 650 66 98.2 < 645 2.3 89 98.0 < 640 256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 43 46

slide-62
SLIDE 62

256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 744 67.1 XXXXXXX 520 43 46 857 64.7 XXXXXXXX 515 35 996 61.9 XXXXXXXXX 510 12 22 24 41 47 899 58.7 XXXXXXXX 505 37 661 55.8 XXXXXX 500 10 20 29 36 45 50 1.1 853 53.7 XXXXXXXX 495 26 30 40 42 3 1035 51.0 XXXXXXXXXX 490 5 14 25 48 675 47.6 XXXXXX 485 23 33 34 44 837 45.5 XXXXXXXX 480 39 975 42.8 XXXXXXXXX 475 28 967 39.6 XXXXXXXXX 470 51 987 36.5 XXXXXXXXX 465 2.1 1316 33.3 XXXXXXXXXXXXX 460 27 746 29.1 XXXXXXX 455 936 26.7 XXXXXXXXX 450 7 11 49 894 23.7 XXXXXXXX 445 15 16 19 1032 20.8 XXXXXXXXXX 440 755 17.5 XXXXXXX 435 6 9 13 491 15.1 XXXX 430 17 31 32 2 656 13.5 XXXXXX 425 701 11.4 XXXXXXX 420 646 9.1 XXXXXX 415 4 491 7.0 XXXX 410 3

slide-63
SLIDE 63

Example of resulting "mapping matrix"

Multiple choice Written response problem scores number correct 1 2 3 4 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 3 1 1 1 1 2 4 1 1 1 2 2 5 1 1 1 2 2 6 1 1 2 2 3 7 1 1 2 2 3 8 1 1 2 2 3 9 1 2 2 3 3 10 1 2 2 3 3 11 2 2 2 3 4 12 2 2 3 3 4 13 2 2 3 4 4 14 2 2 3 4 4 15 2 2 3 4 4 16 2 3 3 4 4 17 2 3 3 4 5 18 2 3 3 4 5 19 3 3 4 4 5 20 3 3 4 5 5 21 3 3 4 5 6 22 3 4 4 5 6 23 3 4 5 5 6 24 3 4 5 6 6 25 3 4 5 6 6 26 4 4 5 6 6

slide-64
SLIDE 64
  • 7. Learning Progressions
  • Learning progressions are descriptions of the

successively more sophisticated ways of thinking about an important domain of knowledge and practice that can follow one another as children learn about and investigate a topic over a broad span of time. They are crucially dependent on instructional practices if they are to occur. (Corcoran, Mosher, & Rogat, 2009, p. 37)

slide-65
SLIDE 65

Image of a Learning Progression

From: Wilson, 2009

slide-66
SLIDE 66

Image of a Construct Map

slide-67
SLIDE 67

One Possible Relationship:

the levels of the learning progression are levels of several construct maps

slide-68
SLIDE 68

Another possible relationship:

the levels are staggered

slide-69
SLIDE 69

Possibility of qualitative change

69

slide-70
SLIDE 70

An extreme case

slide-71
SLIDE 71

A “within-levels” case

Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff
slide-72
SLIDE 72

A relationship with complex dependencies:

slide-73
SLIDE 73

A more complicated relationship

slide-74
SLIDE 74

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in BAS

since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-75
SLIDE 75

What are some important examples of BAS assessments?

  • SEPUP (with Herb Thier, UC Berkeley)
  • ADM (with Rich Lehrer & Leona Schauble, Vanderbilt

University)

  • DRDP (with Peter Mangione, WestEd)
  • Scientific Reasoning Project (Karen Draney, UC

Berkeley, with Brian Reiser , Northwestern University)

  • Carbon Cycle Project (Karen Draney, UC Berkeley, with

Andy Andersen, Michigan State University)

  • LPS

– Atomic Molecular Model (with Paul Black, Kings College) – Argumentation (with Jonathan Osborne, Stanford University

slide-76
SLIDE 76

What are some important examples of BAS assessments? Out-of-normal range applications

  • Fidelity measurement

– with Rich Lehrer and

  • Linked constructs/SCMs

– with Rich Lehrer, – with Kathleen Metz (UC Berkeley)

slide-77
SLIDE 77

Structured Construct Models (SCMs)

  • The learning progression involves cross-

dimensional links:

77

! Level!4! Level!3! Level!2! Level!3! Level!2! Level!1! Level!4! Level!1! Requirement! Variable:!R! Target! Variable:!T!

slide-78
SLIDE 78

Relationships between dimensions

  • In order to go:

– from level 2 of target variable T – to level 3 of target variable T

  • You need to be at

– level t-1 of variable T and – Level 2 of variable R

78 ! Level!4! Level!3! Level!2! Level!3! Level!2! Level!1! Level!4! Level!1! Requirement! Variable:!R! Target! Variable:!T!

slide-79
SLIDE 79

NCME 2016 Symposium

  • Wilson, M. (2016, April). Introduction to the concept of a structured

constructs model (SCM). Paper presented at the NCME annual meeting, Washington, DC.

  • Irribarra, D. T., & Diakow, R. (2016, April). Modeling structured constructs

as non-symmetric relations between ordinal latent variables. Paper presented at the NCME annual meeting, Washington, DC.

– Latent class modeling

  • Choi, I-H., & Wilson, M. (2016, April). A structured constructs model for

continuous latent traits with discontinuity parameters. Paper presented at the NCME annual meeting, Washington, DC.

– Continuous modeling with fixed cut-points

  • Shin, H-J. & Wilson, M. (2016, April). A structured constructs model based
  • n change-point analysis. Paper presented at the NCME annual meeting,

Washington, DC.

– Continuous modeling with estimated cut-points

slide-80
SLIDE 80

The full ADM view …

80

slide-81
SLIDE 81

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in BAS

since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-82
SLIDE 82
  • See:

Wilson, M., & Santelices, M.V. (in press). Weaknesses of the traditional view of standard setting and a suggested alternative. In, Blomeke,

  • S. & Gustaffson, J-E. (Eds.), Standard Setting:

International State of Research and Practices in the Nordic Countries. Dordrecht, Germany: Springer

slide-83
SLIDE 83

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in
  • BAS since 2000?
  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-84
SLIDE 84

BAS Challenges and Oportunities: For Measurement

  • “Standard” challenges

– Defining variables well – Creating items in a design-specific way – Developing sound coding/scoring systems – Applying uni- and multidimensional models to interrogate the data appropriately

  • Should-be “standard” challenges

– Making reports useful for teachers – Helping teachers design/adapt assessments

slide-85
SLIDE 85

BAS Challenges and Opportunities: For Measurement

  • “Non-Standard” challenges

– Incorporating the metric in multidimensional models – Representing/modelling “links” across dimensions – Developing new models that well-represent the latent continuum and/or latent classes in LPs – Representing/modeling longitudinal change across a LP

slide-86
SLIDE 86

BAS Challenges and Opportunities: From Measurement

  • To the development of curricula (C&I, policymakers, etc.)

– Readiness (and expectancy) to be proven wrong by empirical

  • bservations

– Thinking of Learning Progressions/Curricula as hypotheses

  • To fidelity measurement

  • -ditto--

– Provides a way to transcend “the obvious”

  • To professional development

– (following on from the previous) provides a new way to link teacher practices to large-scale assessment and accountability

slide-87
SLIDE 87

Outline

  • Why should we think of classroom assessment as

“measurement”?

  • What should we want from an Assessment System?
  • How is the BEAR Assessment System (BAS) a response

to the question?

  • What are the most important developments in the

BAS since 2000?

  • What are some important examples of BAS

assessments?

  • How should BAS interface with state testing?
  • What are the challenges and opportunities?
slide-88
SLIDE 88

References

  • Briggs, D., Alonzo, A., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with
  • rdered multiple-choice items. Educational Assessment, 11(1), 33-63.
  • Brown, N. J. S., & Wilson, M. (2011). Model of cognition: The missing cornerstone
  • f assessment. Educational Psychology Review, 23(2), 221-234.
  • Corcoran, T., Mosher, F. A., & Rogat, A. (2009, May). Learning progressions in

science: An evidence- based approach to reform (CPRE Research Report #RR-63). Philadelphia, PA: Consortium for Policy Research in Education.

  • Wilson, M. (1992) The ordered partition model: An extension of the partial credit
  • model. Applied Psychological Measurement, 16, 309–325.
  • Wilson, M. (2009). Measuring progressions: Assessment structures underlying a

learning progression. Journal for Research in Science Teaching, 46(6), 716-730.

  • Wilson, M. (2010, March). Discussant Notes. Discussant at a symposium on

“Cognition and Valid Inferences about Student Achievement” at the annual meeting of the American Educational Research Association, Denver, Colorado.

  • Wilson, M. & Sloane, K. (2000). From principles to practice: An embedded

assessment system. Applied Measurement in Education, 13(2), 181-208.