Measurement in the context of formative classroom assessment Mark - - PowerPoint PPT Presentation
Measurement in the context of formative classroom assessment Mark - - PowerPoint PPT Presentation
Measurement in the context of formative classroom assessment Mark Wilson UC Berkeley Presented at the University of California, Berkeley March 7, 2017 Abstract This talk will survey the last 15 years of work that we have been doing at the
Abstract
This talk will survey the last 15 years of work that we have been doing at the BEAR Center on the BEAR Assessment System (BAS). It will begin by noting the initial motivations for developing this approach to measurement/assessment, focusing on the question of what are the measurement demands in the context of formative classroom
- assessments. This will be followed by a brief description of the BAS,
accompanied by discussion of how it reflects a response to this
- question. Following that it will also explore the following. (a) What
are the most important developments in the BAS since 2000? (b) What are some important examples of BAS assessments? (c) How should BAS interface with state testing? (d) What are the challenges and opportunities?
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in the
BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in the
BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
A bit of my own background
- Q1. How can educational measurement help
classroom assessment?
- Worried on this in 1970s, and since
– Saw the need to give teachers better assessment information in the classroom – Saw the need for the assessment to track the “logic behind the curriculum” – B/c that way the teachers can understand what to do with that information.
A bit of my own background
- Q2. Where does the scale (aka, the continuum,
the construct, the learning progression) in tests come from?
- Worked on this in 1980s and early 90s
– Discovering and interpreting effects on item difficulty for “wild” items – Looking for a consistent story about why items are
- rdered the way they are
- Published 10 papers on these topics during that
period …
- Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American
Educational Research Journal, 22(2), 297-307.
- Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
- model. Journal of Experimental Education, 57(4), 357-371.
- Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.
Psychological Bulletin, 105(2), 276-289.
- Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research
in Mathematics Education, 21(3), 230-237.
- Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing
higher order thinking in mathematics. Washington, DC: American Association for the Advancement
- f Science.
- Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied
Psychological Measurement. 16(3), 309-325.
- Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.
Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.
- Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),
International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.
- Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.
In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.
- Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),
Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.
From: Wilson, 2010.
- Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American
Educational Research Journal, 22(2), 297-307.
- Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
- model. Journal of Experimental Education, 57(4), 357-371.
- Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.
Psychological Bulletin, 105(2), 276-289.
- Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research
in Mathematics Education, 21(3), 230-237.
- Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing
higher order thinking in mathematics. Washington, DC: American Association for the Advancement
- f Science.
- Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied
Psychological Measurement. 16(3), 309-325.
- Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.
Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.
- Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),
International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.
- Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.
In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.
- Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),
Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.
From: Wilson, 2010.
- Wilson, M., & Bock, R.D. (1985). Spellability: a linearly-ordered content domain. American
Educational Research Journal, 22(2), 297-307.
- Wilson, M. (1989). Empirical examination of a learning hierarchy using an item response theory
- model. Journal of Experimental Education, 57(4), 357-371.
- Wilson, M. (1989). Saltus: A psychometric model of discontinuity in cognitive development.
Psychological Bulletin, 105(2), 276-289.
- Wilson, M. (1990). Measuring a van Hiele geometry sequence: A reanalysis. Journal for Research
in Mathematics Education, 21(3), 230-237.
- Wilson, M. (1990). Investigation of structured problem solving items. In G. Kulms (Ed.), Assessing
higher order thinking in mathematics. Washington, DC: American Association for the Advancement
- f Science.
- Wilson, M. (1992). The ordered partition model: An extension of the partial credit model. Applied
Psychological Measurement. 16(3), 309-325.
- Masters, G.N., Adams, R.J., & Wilson, M. (1990). Charting of student progress. In T. Husen & T.N.
Postlethwaite (Eds.), International Encyclopedia of Education: Research and Studies. Supplementary Volume 2 (pp. 628-634). Oxford: Pergamon Press. Reprinted in: T. Husen & T.N. Postlethwaite (Eds.), (1994). International Encyclopedia of Education (2nd. Ed.) (pp. 5783-91). Oxford: Pergamon Press.
- Wilson, M. (1990). Measurement of developmental levels. In T. Husen & T.N. Postlethwaite (Eds.),
International Encyclopedia of Education: Research and Studies. Supplementary Volume 2. Oxford: Pergamon Press.
- Wilson, M. (1992). Measurement models for new forms of assessment in mathematics education.
In J.F. Izard & M. Stephens (Eds.) Reshaping assessment practices: Assessment in the mathematical sciences under challenge. Hawthorn, Australia: ACER.
- Wilson, M. (1992). Measuring levels of mathematical understanding. In T. Romberg (Ed.),
Mathematics assessment and evaluation: Imperatives for mathematics educators. New York: SUNY Press.
From: Wilson, 2010.
More of my own background
- Concluded that there was little enlightenment
about learning progressions in results from these analyses
– The items sets were too diverse, hence too sparse in what they conveyed about student learning, – The items differed in idiosyncratic ways.
- Gave up doing that
– needed to find another way ...
Things I learned …
- Maxim No. 1.
Assessment content specialists (“item writers”) do not know why they build items the way they do. Corrollary: They cannot predict the difficulty of their items.
- Maxim No. 2.
“Wild” items confound deeper cognitive/structuralist/diagnostic underpinnings with surface features (such as item wording, etc.).
Things I learned …
- But
Good curriculum developers create their curricula using a developmental way of thinking about learning,
- Hence
Need to engage with curriculum developers, not item writers
- But
Curriculum developers do not know how to write items.
More of my own background
- Changed my entire research agenda
– Became an item-designer, not just a secondary data analyst – Became a “sort of” curriculum designer—a learning progression designer – Became a developer of new types of item response models (“explanatory”)
Learned that there is a further question...
- Q3. How can we create “summative tests” if
we do not know what classroom tests are testing?
- that is...
- Q3. How can classroom assessment help
educational measurement ?
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in the
BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the next steps?
What should we want from an Assessment System?
- Assessment and accountability information available and
useful to important audiences: Audiences
- Classroom
– Teachers, Students, Parents
- school site
– other teachers and specialists, administrators
- school district
– administrators
- State and public
– School Board, legislators, State Dept. of Education, Public
Audiences, and what they ought to get
Audiences What they need/want What they get from What the State now they ought to get classroom results useful for (a) individual student feedback SBAC, etc. (b) planning for individual progress SBAC, etc (c) planning for class progress SBAC, etc schools results useful for: (a) school, school district, State SBAC, etc school districts planning (b) school, school district, State SBAC, etc State and public accountability
What they ought to get
First, some background.
- 1. Different sorts of assessment
Assessment to assist learning
- nly in classrooms
Summative assessment of individuals some in classrooms, some from State Assessment to evaluate programs mainly State (or school districts, federal, etc.)
- 2. What should be the relationships among these different sorts of assessment?
“...teachers’ goals for learning should be consistent with those of large-scale assessments and vice-versa... one challenge is to make stronger connections between the two so they work together to support a common set of learning goals.” (Knowing What Students Know, National Research Council, 2001, p. 41)
- 3. Assessment in education is a major industry
State spends millions on it, testing companies make millions out of it. Assessment is one of the major components of every classroom. Not “State” assessments....teachers’ assessments. The instructional cycle....
Assessment Instruction Curriculum
Assessment is ...
An activity that goes on at almost every minute of every school day. In every classroom, in every school in this State. An activity that is crucial to the success of every piece of instruction that every teacher carries
- ut.
This assessment enterprise dwarfs the assessment system of the State, in time it takes, in effect on the students, in cost in how it affects the practices of teachers, and hence in how it effects student learning What will teachers do when the have to comply with State tests that are inconsistent with what they do in their instruction?
Thus, the real question is ...
NOT
- how to construct a system of State assessments that will
carry out the program evaluation tasks of the State? BUT
- how to construct a system that
(a) helps in the classroom by (i) defining classroom learning goals, (ii) is a resource for classroom assessment and (iii) focuses instruction, and (b) helps channel useful information to (i) summative individual assessment and (ii) program evaluation tasks of the State
How do we do that?
- (a) Need a learning framework that allows
communication among the different audiences for assessment information
- (b) Need methods of gathering data that are
acceptable and useful to all audiences
- (c) Need ways to evaluate (score) responses that
is consistent with the framework in (a)
- (d) Need a technique of interpreting data that
allows meaningful reporting to multiple audiences
How do we do that?
- (a) Need a learning framework that allows
communication among the different audiences for assessment information
- (b) Need methods of gathering data that are
acceptable and useful to all audiences
- (c) Need ways to evaluate (score) responses that is
consistent with the framework in (a)
- (d) Need a technique of interpreting data that allows
meaningful reporting to multiple audiences
- AKA...4 “Building Blocks”
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in the
BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
The BEAR Assessment System (BAS)
- Herb Thier, Lawrence Hall of Science
– Science Education for Public Understanding (SEPUP) – Issues, Evidence and You (IEY) Curriculum
- Unhappy that:
– Teachers love the IEY curriculum, but no gains on standardized tests – Teachers don’t really know whether their students have learned the “stuff” (Despite “unit tests”) – Students don’t know what they have learned – Looking for the “assessable moment”
Image of a Curriculum/ Learning Progression
Relating measurement dimensions to a curriculum?
SEPUP’s IEY Curriculum (circa 1990)
- Understanding Concepts (UC)--understanding scientific concepts
(such as properties and interactions of materials, energy, or thresholds) in order to apply the relevant scientific concepts to the solution of problems.
- Designing and Conducting Investigations (DCI)--designing a
scientific experiment, carrying through a complete scientific investigation, performing laboratory procedures to collect data, recording and organizing data, and analyzing and interpreting results of an experiment.
- Evidence and Tradeoffs (ET)--identifying objective scientific
evidence as well as evaluating the advantages and disadvantages
- f different possible solutions to a problem based on the available
evidence.
- Communicating Scientific Information (CM)--organizing and
presenting results in a way that is free of technical errors and effectively communicates with the chosen audience.
From: Wilson & Sloane, 2000
Category
Score
Using Evidence:
Response uses objective reason(s) based on relevant evidence to support choice.
Beyond correct
4
Response accomplishes Level 3 AND goes beyond in some significant way, such as questioning or justifying the source, validity, and/or quantity of evidence.
Correct
3
Response provides major objective reasons AND supports each with relevant & accurate evidence.
Partially complete (ii)
- -some objective reasons
and some evidence
2
Response provides some objective reasons AND some supporting evidence, BUT at least one reason is missing and/or part of the evidence is incomplete.
Partially complete (i)
- -subjective and/or
inaccurate reasons
1
Response provides only subjective reasons (opinions) for choice and/or uses inaccurate
- r irrelevant evidence from the activity.
Missing or irrelevant
No response; illegible response; response
- ffers no reasons AND no evidence to
support choice made.
No opportunity
X
Student had no opportunity to respond.
The IEY Evidence and Tradeoffs construct map.
Direction of increasing sophistication in using evidence. Students Responses to Items No relevant response Subjective or inaccurate evidence Something important missing All major points included Goes beyond complete and correct Direction of decreasing sophistication in using evidence.
Relating measurement dimensions to a curriculum?
Items design
You are a public health official who works in the Water
- Department. Your supervisor has asked you to respond to the
public's concern about water chlorination at the next City Council meeting. Prepare a written response explaining the issues raised in the newspaper articles. Be sure to discuss the advantages and disadvantages of chlorinating drinking water in your response, and then explain your recommendation about whether the water should be chlorinated.
“As an edjucated employee of the Grizzelyville water company, I am well aware of the controversy surrounding the topic of the chlorination of our drinking water. I have read the two articals regarding the pro’s and cons of chlorinated water. I have made an informed decision based on the evidence presented the articals entitled “The Peru Story” and “700 Extra People May bet Cancer in the US.” It is my recommendation that our towns water be chlorin treated. The risks of infecting our citizens with a bacterial diseease such as cholera would be inevitable if we drink nontreated water. Our town should learn from the country of Peru. The artical “The Peru Story” reads thousands of inocent people die of cholera epidemic. In just months 3,500 people were killed and more infected with the diease. On the
- ther hand if we do in fact chlorine treat our drinking water a risk is
- posed. An increase in bladder and rectal cancer is directly related to
drinking chlorinated water. Specifically 700 more people in the US may get cancer. However, the cholera risk far outweighs the cancer risk for 2 very important reasons. Many more people will be effected by cholera where as the chance of one of our citizens getting cancer due to the water would be very minimal. Also cholera is a spreading diease where as cancer is not. If our town was infected with cholera we could pass it on to millions of others. And so, after careful consideration it is my opion that the citizens of Grizzelyville drink chlorine treated water.”
Category
Score
Using Evidence:
Response uses objective reason(s) based on relevant evidence to support choice.
Beyond correct
4
Response accomplishes Level 3 AND goes beyond in some significant way, such as questioning or justifying the source, validity, and/or quantity of evidence.
Correct
3
Response provides major objective reasons AND supports each with relevant & accurate evidence.
Partially complete (ii)
- -some objective reasons
and some evidence
2
Response provides some objective reasons AND some supporting evidence, BUT at least one reason is missing and/or part of the evidence is incomplete.
Partially complete (i)
- -subjective and/or
inaccurate reasons
1
Response provides only subjective reasons (opinions) for choice and/or uses inaccurate
- r irrelevant evidence from the activity.
Missing or irrelevant
No response; illegible response; response
- ffers no reasons AND no evidence to
support choice made.
No opportunity
X
Student had no opportunity to respond.
The IEY Evidence and Tradeoffs outcome space.
A student’s profile on the IEY constructs.
A student’s longitudinal profile on the DCI construct.
General Strategy the 4 “Building Blocks”...
- (a) Need a framework that allows teachers to interpret
development in the construct
– "Construct map”
- (b) Need methods of gathering data that are
acceptable and useful to all audiences
– "Items design”
- (c) Need a way to value what we see in student work
– "Outcome space”
- (d) Need a technique of interpreting data that allows
meaningful reporting to multiple audiences
– "Measurement model"
The Bear Assessment System (2000 AD)
The Bear Assessment System (2000 AD)
Reliability & Validity
The Bear Assessment System (2000 AD)
Construct Item Responses Measurement Model Outcome Space Causality Inference
A reflective model From: Wilson, 2005
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in BAS
since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
What are the most important developments in BAS since 2000?
- Qualitative Inner Triangle
- Feedback Loops, etc.
- OMC
- Banding
- Dimensional Comparability
- Standard Setting
- Learning Progressions
- 1. Qualitative Inner Triangle
Qualitative Inner Triangle
- 2. Feedback Loops, etc.
From: Brown & Wilson, 2011.
- 3. Ordered Multiple Choice: OMC
Which is the best explanation for why it gets dark at night?
- A. The Moon blocks the Sun at night. [Level 1 response]
- B. The Earth rotates on its axis once a day. [Level 4 response]
- C. The Sun moves around the Earth once a day. [Level 2 response]
- D. The Earth moves around the Sun once a day. [Level 3 response]
- E. The Sun and Moon switch places to create night. [Level 2 response]
From: Briggs, Alonzo, Schwab & Wilson, 2006 See also: Wilson & Adams, 1992
- 4. Banding
- Finding levels in complex assessments...
What we hoped to see:
48
What we got
49
New ECC Wright Maps
50
Strand-A Strand-B Strand-C
Modelling:CoS Wright Map
51
Revised CoS Map
| former&labels NL CoS1 CoS2 CoS3 CoS4 | new&labels NL CoS1A CoS1B CoS2* CoS3* | | ErinQ2 4 | ClassQ2* | Range | | | | 3 | | X | | Range HeightQ2 X | Caff2,Water X | ErinQ1 X | HeightQ2MC 2 XX | ClassQ1 TempQ2MC XX | HeightQ2 HeightQ2MC Swim3 XXX | HeightQ2 Water,9TempQ2 TempQ2MC XXXX | FreeThrow XXXXX | TempQ2 Ball2 XXXXXX | Range Batt2,&Ball3 XXXXX | Caff2 Kayla1 1 XXXXXX | XXXXXX | Swim1 XXXXXXXXX | GardC1,&Batt1 XXXXXXXXX | FreeThrow ErinQ1,&Kayla1 Swim2 XXXXXXXXXX | Batt2 XXXXXXXX | XXXXXXXX | Swim1 Tree2MC XXXXXXXX | ErinQ2 XXXXXXXX | Swim2,0Caff2 BallMean GardB1 XXXXXXXX | Swim3 XXXXXXX | ErinQ2 Max4MC XXXXXXX | ErinQ1 Tree1MC XXXXXX | ClassQ2 XXX | BallMode ;1 XXX | FreeThrow,0 Range,BallMean ClassQ1 XXXX | Kayla1 GardMean,&BallMed XX | BallMode Batt1 XX | ClassQ2 XX | BallMed X | GardMean,Ball3 X | ;2 X | GardC1 X | X | Water,Ball2 | GardMed X | | GardMed ;3 | GardB1 | Batt1,Batt2 GardMode | | Gardmode | | |
- 5. Dimensional comparability
- Standard assumption in IRT:
– Mean of person distribution is 0.0
- r
– Mean of item difficulties is 0.0
- In multidimensional models, make same
assumptions (as in Factor Analysis).
- Consequences?
Multidimensional Wright Map
54
55
Delta'Dimensional'Alignment'(DDA)'
- Transforms'the'item'loca/ons'and'step'
parameters'from'the'7:dimensional'analysis,' using'the'results'from'the'unidimensional' analysis'
- Then,'re:run'7:dimensional'analysis'using'these'
transformed'item'es/mates'and'step'parameters' as'anchored'values.''
16
τij(transformed) = τij(multi) σ d(uni) σ d(multi) $ % & ' ( )
δi(transformed) = δi(multi) σ d(uni) σ d(multi) $ % & ' ( ) + µd(uni)
δij = δi +τ ij
Multidimensional Wright Map
56
- 6. Standard-Setting
- Setting cut-points between reporting levels ...
- 6. Construct-Mapping Software
- Dynamic display of item map
- Displays, for any chosen proficiency level
– probability of passing all multiple-choice items – probability of attaining every level on open-ended items – expected total score on multiple-choice section; expected score on each open-ended item
- Allows choice of weights for item types
Items GSE Multiple Choice WR 1 WR 2 Scale Prob Prob Prob 750 740 2.5 .00 730 720 710 1.5 .00 700 690 2.4 .00 680 670 660 37 .31 650 640 630 15 .35 1.4 .00 620 610 28 39 .39 600 590 27 .42 580 38 .43 2.3 .07 570 19 .45 560 550 43 .49 540 34 45 48 .50 1.3 .47 530 17 18 20 50 .52 520 40 .53 510 4 31 .55 500 11 33 44 .57 490 32 47 .58 480 5 9 12 46 .59 470 3 7 16 .61 2.2 .92 460 6 10 29 .63 1.2 .52 450 36 .65 440 22 35 .66 430 8 14 23 26 .67 420 13 25 .69
Persons GSE Items FR Percentile Histogram Scale MC WR 1 WR 2 100.0 740 1.4 100.0 735 100.0 730 42 100.0 < 725 99.9 720 2.4 99.9 715 6 99.9 710 99.9 705 99.9 700 99.9 695 99.9 690 41 99.9 < 685 63 99.7 < 680 99 99.5 < 675 99.2 670 83 99.2 < 665 1.3 23 98.9 < 660 19 98.9 < 655 192 98.8 X 650 66 98.2 < 645 2.3 89 98.0 < 640 256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 744 67.1 XXXXXXX 520 43 46 857 64.7 XXXXXXXX 515 35 996 61.9 XXXXXXXXX 510 12 22 24 41 47 899 58.7 XXXXXXXX 505 37 661 55.8 XXXXXX 500 10 20 29 36 45 50 1.1 853 53.7 XXXXXXXX 495 26 30 40 42 3 1035 51.0 XXXXXXXXXX 490 5 14 25 48 675 47.6 XXXXXX 485 23 33 34 44 837 45.5 XXXXXXXX 480 39 975 42.8 XXXXXXXXX 475 28 967 39.6 XXXXXXXXX 470 51 987 36.5 XXXXXXXXX 465 2.1 1316 33.3 XXXXXXXXXXXXX 460 27 746 29.1 XXXXXXX 455 936 26.7 XXXXXXXXX 450 7 11 49 894 23.7 XXXXXXXX 445 15 16 19 1032 20.8 XXXXXXXXXX 440 755 17.5 XXXXXXX 435 6 9 13 491 15.1 XXXX 430 17 31 32 2 656 13.5 XXXXXX 425 701 11.4 XXXXXXX 420 646 9.1 XXXXXX 415 4 491 7.0 XXXX 410 3
Final Standards Map
Persons GSE Items FR Percentile Histogram Scale MC WR 1 WR 2 100.0 740 1.4 100.0 735 100.0 730 42 100.0 < 725 99.9 720 2.4 99.9 715 6 99.9 710 99.9 705 99.9 700 99.9 695 99.9 690 41 99.9 < 685 63 99.7 < 680 99 99.5 < 675 99.2 670 83 99.2 < 665 1.3 23 98.9 < 660 19 98.9 < 655 192 98.8 X 650 66 98.2 < 645 2.3 89 98.0 < 640 256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 43 46
256 97.7 XX 635 182 96.9 X 630 5 242 96.3 XX 625 224 95.5 XX 620 184 94.8 X 615 203 94.2 XX 610 266 93.5 XX 605 276 92.7 XX 600 308 91.8 XXX 595 327 90.8 XXX 590 413 89.7 XXXX 585 1.2 433 88.4 XXXX 580 301 87.0 XXX 575 581 86.1 XXXXX 570 8 475 84.2 XXXX 565 598 82.7 XXXXX 560 4 461 80.7 XXXX 555 661 79.2 XXXXXX 550 596 77.1 XXXXX 545 491 75.2 XXXX 540 21 655 73.6 XXXXXX 535 2.2 916 71.5 XXXXXXXXX 530 52 459 68.6 XXXX 525 38 744 67.1 XXXXXXX 520 43 46 857 64.7 XXXXXXXX 515 35 996 61.9 XXXXXXXXX 510 12 22 24 41 47 899 58.7 XXXXXXXX 505 37 661 55.8 XXXXXX 500 10 20 29 36 45 50 1.1 853 53.7 XXXXXXXX 495 26 30 40 42 3 1035 51.0 XXXXXXXXXX 490 5 14 25 48 675 47.6 XXXXXX 485 23 33 34 44 837 45.5 XXXXXXXX 480 39 975 42.8 XXXXXXXXX 475 28 967 39.6 XXXXXXXXX 470 51 987 36.5 XXXXXXXXX 465 2.1 1316 33.3 XXXXXXXXXXXXX 460 27 746 29.1 XXXXXXX 455 936 26.7 XXXXXXXXX 450 7 11 49 894 23.7 XXXXXXXX 445 15 16 19 1032 20.8 XXXXXXXXXX 440 755 17.5 XXXXXXX 435 6 9 13 491 15.1 XXXX 430 17 31 32 2 656 13.5 XXXXXX 425 701 11.4 XXXXXXX 420 646 9.1 XXXXXX 415 4 491 7.0 XXXX 410 3
Example of resulting "mapping matrix"
Multiple choice Written response problem scores number correct 1 2 3 4 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 1 2 3 1 1 1 1 2 4 1 1 1 2 2 5 1 1 1 2 2 6 1 1 2 2 3 7 1 1 2 2 3 8 1 1 2 2 3 9 1 2 2 3 3 10 1 2 2 3 3 11 2 2 2 3 4 12 2 2 3 3 4 13 2 2 3 4 4 14 2 2 3 4 4 15 2 2 3 4 4 16 2 3 3 4 4 17 2 3 3 4 5 18 2 3 3 4 5 19 3 3 4 4 5 20 3 3 4 5 5 21 3 3 4 5 6 22 3 4 4 5 6 23 3 4 5 5 6 24 3 4 5 6 6 25 3 4 5 6 6 26 4 4 5 6 6
- 7. Learning Progressions
- Learning progressions are descriptions of the
successively more sophisticated ways of thinking about an important domain of knowledge and practice that can follow one another as children learn about and investigate a topic over a broad span of time. They are crucially dependent on instructional practices if they are to occur. (Corcoran, Mosher, & Rogat, 2009, p. 37)
Image of a Learning Progression
From: Wilson, 2009
Image of a Construct Map
One Possible Relationship:
the levels of the learning progression are levels of several construct maps
Another possible relationship:
the levels are staggered
Possibility of qualitative change
69
An extreme case
A “within-levels” case
Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffff Level What the Student Knows A AAAAAA Aaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa B BBBBBB Bbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbbbbbbbbb C CCCCCCC Ccccccccccccccccccccccccccccccc Ccccccccccccccccccccccccccccccc ccccccccccccccccccccccccccccccc D E DDDD Ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd ddddddddddddd EEEE Eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee eeeeeeeeeeeeee F FFFFFFFFF Ffffffffffffffffffffffffffffffffffffffffff ffffffffffffffffffffffffffffffffffffffffffA relationship with complex dependencies:
A more complicated relationship
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in BAS
since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
What are some important examples of BAS assessments?
- SEPUP (with Herb Thier, UC Berkeley)
- ADM (with Rich Lehrer & Leona Schauble, Vanderbilt
University)
- DRDP (with Peter Mangione, WestEd)
- Scientific Reasoning Project (Karen Draney, UC
Berkeley, with Brian Reiser , Northwestern University)
- Carbon Cycle Project (Karen Draney, UC Berkeley, with
Andy Andersen, Michigan State University)
- LPS
– Atomic Molecular Model (with Paul Black, Kings College) – Argumentation (with Jonathan Osborne, Stanford University
What are some important examples of BAS assessments? Out-of-normal range applications
- Fidelity measurement
– with Rich Lehrer and
- Linked constructs/SCMs
– with Rich Lehrer, – with Kathleen Metz (UC Berkeley)
Structured Construct Models (SCMs)
- The learning progression involves cross-
dimensional links:
77
! Level!4! Level!3! Level!2! Level!3! Level!2! Level!1! Level!4! Level!1! Requirement! Variable:!R! Target! Variable:!T!
Relationships between dimensions
- In order to go:
– from level 2 of target variable T – to level 3 of target variable T
- You need to be at
– level t-1 of variable T and – Level 2 of variable R
78 ! Level!4! Level!3! Level!2! Level!3! Level!2! Level!1! Level!4! Level!1! Requirement! Variable:!R! Target! Variable:!T!
NCME 2016 Symposium
- Wilson, M. (2016, April). Introduction to the concept of a structured
constructs model (SCM). Paper presented at the NCME annual meeting, Washington, DC.
- Irribarra, D. T., & Diakow, R. (2016, April). Modeling structured constructs
as non-symmetric relations between ordinal latent variables. Paper presented at the NCME annual meeting, Washington, DC.
– Latent class modeling
- Choi, I-H., & Wilson, M. (2016, April). A structured constructs model for
continuous latent traits with discontinuity parameters. Paper presented at the NCME annual meeting, Washington, DC.
– Continuous modeling with fixed cut-points
- Shin, H-J. & Wilson, M. (2016, April). A structured constructs model based
- n change-point analysis. Paper presented at the NCME annual meeting,
Washington, DC.
– Continuous modeling with estimated cut-points
The full ADM view …
80
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in BAS
since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
- See:
Wilson, M., & Santelices, M.V. (in press). Weaknesses of the traditional view of standard setting and a suggested alternative. In, Blomeke,
- S. & Gustaffson, J-E. (Eds.), Standard Setting:
International State of Research and Practices in the Nordic Countries. Dordrecht, Germany: Springer
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in
- BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
BAS Challenges and Oportunities: For Measurement
- “Standard” challenges
– Defining variables well – Creating items in a design-specific way – Developing sound coding/scoring systems – Applying uni- and multidimensional models to interrogate the data appropriately
- Should-be “standard” challenges
– Making reports useful for teachers – Helping teachers design/adapt assessments
BAS Challenges and Opportunities: For Measurement
- “Non-Standard” challenges
– Incorporating the metric in multidimensional models – Representing/modelling “links” across dimensions – Developing new models that well-represent the latent continuum and/or latent classes in LPs – Representing/modeling longitudinal change across a LP
BAS Challenges and Opportunities: From Measurement
- To the development of curricula (C&I, policymakers, etc.)
– Readiness (and expectancy) to be proven wrong by empirical
- bservations
– Thinking of Learning Progressions/Curricula as hypotheses
- To fidelity measurement
–
- -ditto--
– Provides a way to transcend “the obvious”
- To professional development
– (following on from the previous) provides a new way to link teacher practices to large-scale assessment and accountability
Outline
- Why should we think of classroom assessment as
“measurement”?
- What should we want from an Assessment System?
- How is the BEAR Assessment System (BAS) a response
to the question?
- What are the most important developments in the
BAS since 2000?
- What are some important examples of BAS
assessments?
- How should BAS interface with state testing?
- What are the challenges and opportunities?
References
- Briggs, D., Alonzo, A., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with
- rdered multiple-choice items. Educational Assessment, 11(1), 33-63.
- Brown, N. J. S., & Wilson, M. (2011). Model of cognition: The missing cornerstone
- f assessment. Educational Psychology Review, 23(2), 221-234.
- Corcoran, T., Mosher, F. A., & Rogat, A. (2009, May). Learning progressions in
science: An evidence- based approach to reform (CPRE Research Report #RR-63). Philadelphia, PA: Consortium for Policy Research in Education.
- Wilson, M. (1992) The ordered partition model: An extension of the partial credit
- model. Applied Psychological Measurement, 16, 309–325.
- Wilson, M. (2009). Measuring progressions: Assessment structures underlying a
learning progression. Journal for Research in Science Teaching, 46(6), 716-730.
- Wilson, M. (2010, March). Discussant Notes. Discussant at a symposium on
“Cognition and Valid Inferences about Student Achievement” at the annual meeting of the American Educational Research Association, Denver, Colorado.
- Wilson, M. & Sloane, K. (2000). From principles to practice: An embedded
assessment system. Applied Measurement in Education, 13(2), 181-208.