[PPT] - http://www.springer.com/us/book/9783319673035 The Springer Series in PowerPoint Presentation

SLIDE 1

http://www.springer.com/us/book/9783319673035

SLIDE 2

The Springer Series in Measurement Science and Technology

The Springer Series in Measurement Science and Technology comprehensively covers the science and technology of measurement, addressing all aspects of the subject from the fundamental principles through to the state-of-the-art in applied and industrial metrology, as well as in the social sciences. Volumes published in the series cover theoretical developments, experimental techniques and measurement best practice, devices and technology, data analysis, uncertainty, and standards, with application to physics, chemistry, materials science, engineering and the life and social sciences.

SLIDE 3

William P. Fisher, Jr.

University of California, Berkeley, CA, USA

BEAR Seminar UC Berkeley Graduate School of Education 30 January 2018

Discontinuous Levels of Complexity in Coherent Educational Measurement: The Roles of KidMaps, Wright Maps, and Construct Maps

SLIDE 4

Thanks to colleagues

Emily Oon and Mei Zhou at University of Macau
Fisher, W. P., Jr., Oon, E. P.-T., & Zhou, M. (2018). Assessment coherence across

information complexity contexts: Coordinating classroom and international

assessments. Journal of Educational Measurement, in review.
Mark Wilson at University of California, Berkeley
National Research Council. (2006). Systems for state science assessment (M. R.

Wilson & M. W. Bertenthal, Eds.). Washington, DC: The National Academies Press.

Wilson, M. (Ed.). (2004). National Society for the Study of Education Yearbooks.
Vol. 103, Part II: Towards coherence between classroom assessment and
accountability. Chicago, Illinois: University of Chicago Press.

SLIDE 5

The problem of coherence in educational assessment

Wilson (2004; NRC, 2006) asks
What kind of information infrastructure is needed to coherently

coordinate meaningful and comparable formative, interim, and summative assessments within and across classrooms?

Applications and reports would have to function within common

frames of reference across developmental, horizontal, and vertical comparisons.

SLIDE 6

Developmental, horizontal and vertical forms

f coherence

SLIDE 7

Coherence: Forced conformity, or an unexplored alternative?

Moss (2004), in a chapter included in Wilson’s (2004) NSSE Yearbook,

fears that coherence in educational assessment will become another instance of a “high modern” scheme that systematically homogenizes human variation into bureaucratically manageable forms.

She cites Scott’s (1998) account of the history of failed governmental

efforts at improving the human condition, but does not mention Scott’s concluding suggestion that language could provide a model for a new kind of standard that functions as a means of continually adapting broad principles to novel circumstances.

SLIDE 8

Multiple levels of complexity in language and information infrastructures

Language is, Scott (1998, p. 357) says, “a structure of meaning and

continuity that is never still and ever open to the improvisations of its speakers."

Star and Ruhleder (1996) similarly point out that "The competing

requirements of openness and malleability, coupled with structure and navigability, create a fascinating design challenge—even a new science."

The design of information infrastructures providing both structure

and openness "is highly challenging technically, requiring new forms

f computability that are both socially situated and abstract enough

to travel across time and space“ (Star & Ruhleder, 1996, p. 132).

SLIDE 9

Levels of complexity in language

justified given the content and difficulties of the questions.

SLIDE 10

Levels of complexity in language

“The cat on the mat” points at something real and tangible.
Pointing at the word ‘cat’ refers to an abstract concept.
It applies to all small domesticated felines.
It has an invariant meaning in the English language.
It came into use via an evolutionary process not

controlled by any person or group.

SLIDE 11

Levels of complexity in language

The score of 28 on the assessment points at something

real and tangible: questions answered correctly and incorrectly.

The number word ‘28’ is supposed to be abstract.
But the meaning of an assessment score of ‘28’ is tied to a

particular set of questions.

It means something different across tests.
It came into use via a process controlled by an individual

person or group.

Used to indicate a learning outcome, ‘28’ does not have a

general and invariant meaning.

The theoretical justification for failure based on the score is

contained in a privately organized information system not

pen to contestation or confirmation by others.

SLIDE 12

What happens when we ignore levels of complexity in language?

We find ourselves:
“…with organizations which are split and confused, systems which are

unused or circumvented, and a set of circumstances of our own creation which more deeply impress disparities on the organizational landscape" (Star & Ruhleder, 1996, p. 118).

Sounds like Scott’s (1998) history of failed “high modern” schemes
Also resonates with Ladd’s (2017) documentation of the flawed

U.S. NCLB proficiency standards.

Ladd, H. F. (2017). No Child Left Behind: A deeply flawed federal
policy. Journal of Policy Analysis and Management, 36(2), 461-469.

SLIDE 13

What’s the alternative?

Can number words be connected with concrete
bservations and abstract meanings that remain

invariant throughout the language?

Can number words emerge from a group-level process

not under the control of any individual?

Can publicly reproducible justifications for uses of

number words provide independent validation of the inferences made?

SLIDE 14

Levels of complexity in education

Denotative: statements about learning
You answered these questions correctly and incorrectly.
Your score on the test was a particular count of correct responses.
Kidmap display
Metalinguistic: learning about learning
We observe a pattern of consistently increasing difficulty in items.
Similar patterns of invariance emerge across assessments.
Wright map display
Metacommunicative: theories about learning
We see item features that cause items to be easy or hard.
We design tests from specifications, and they function as expected.
Construct map display and construct specification equation

SLIDE 15

Levels of complexity in education

Denotative: Concrete statements about learning
You answered these questions correctly and incorrectly.
Your score on the test was 28.
Kidmaps

CORRECT INCORRECT EASY HARD

SLIDE 16

Levels of complexity in education

Metalinguistic: Abstract learning

about learning

We observe a pattern of self-organized

conjoint order:

consistently increasing item difficulties, and
consistently increasing student abilities.
Similar patterns of spontaneous

invariance emerge across tests.

Wright maps
Equating
Item banks

MEASURE | MEASURE <more> --------------------- PERSON -+- ITEM ----------------- <rare> 7 .## + 7 | | | 6 + . 6 | # | | . 5 . + 5 | # . | .## | . 4 . + ### 4 . | ## .### |T ####. . | #### 3 .##### T+ # 3 .## | ###### .####### | ###### .####### | ######## 2 .######## + ###### 2 .########## |S ################. .#################### S| ##################. .############# | ################ 1 .################# + ###################### 1 .############## | ##################### #################### | ################# .######################## | #######################. 0 .######################## M+M ###########################. 0 .########################### | ###################### .######################## | ####################### .####################### | ###############

1 .######################### + ######################. -1

.##################### | #################### .############## S| ##############. .############# |S #############.

2 .########### + ###############. -2

.###### | ####. .###### | ########## .#### | ########.

3 .### T+ #####. -3

.#### | #### .# |T ## . | #.

4 . + . -4

| # . | . . |

5 + -5

. | | |

6 . + -6

<less> --------------------- PERSON -+- ITEM ----------------- <freq>

SLIDE 17

Levels of complexity in education

Metacommunicative: theories about learning
We see item features that cause items to be easy or hard.
We can design tests from specifications, and they function as expected.
Construct maps and specification equations

SLIDE 18

Wilson, M. (2014). BEAR Assessment System

Software. BEAR Center, UC

Berkeley Graduate School

f Education.

SLIDE 19

Levels of complexity in education

Metacommunicative: theories about learning
Construct specification equation

Reading difficulty (or readability) = Alog(MSL)-Blog(WF) + C where MSL is the mean sentence length, WF is the word frequencies, and A, B, and C are constants.

Burdick, H., & Stenner, A. J. (1996). Theoretical prediction of test items. Rasch Measurement Transactions, 10(1), 475.

SLIDE 20

Theoretical vs Empirical Reading Item Estimates

SLIDE 21

Theoretical vs Empirical Mathematics Item Estimates

Fisher, W. P., Jr., Seeratan, K., Draney, K., Wilson, M., Murray, B., Saldarriaga, C. et

al. (2012, April). Predicting

mathematics test item difficulties: Results of a preliminary study. Presented at the Fifteenth International Objective Measurement Workshops, Vancouver, Canada.

SLIDE 22

“There is nothing so practical as a good theory."

(Lewin, 1951, p. 169)

Meaningful and practical explanatory power is obtained

when phenomena are understood well enough to predict their behaviors.

Efficiencies at a new order of magnitude come to bear when

the analysis and reporting of response data from tests and assessments are integrated with learning materials in immediate formative feedback.

SLIDE 23

LO: Student articulates basic properties of matter

Score report for an individual student

*** * * * * *

SLIDE 24

Developmental Coherence

Measures over time

360 400 440 480 520 560 600 640 680 720 760 STUDENT MEASURES |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| DISTRIBUTIONS P Y D F A M G LZQCERUC NW H J K WEEK ONE L Y D F G A M ZPUCQERU NW H J K WEEK THREE G P Y F A M EZQCLRUS NW D K J WEEK FIVE Q Y D W F A P M LZSRUCE J H N KJ WEEK SEVEN P Q D GA F LMSYCERU NJ WK H WEEK NINE |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| 360 400 440 480 520 560 600 640 680 720 760 MEASUREMENT SCALE 80 70 50 40 35 30 35 40 50 70 UNCERTAINTY T S M S T MEAN, SD, 2 SD (T) 0 10 20 30 40 50 70 80 90 99 OVERALL STUDENT PERCENTILE

SLIDE 25

Progress map for a classroom

25

School Term

Developmentally coherent

SLIDE 26

Horizontal Coherence

Score report for multiple classrooms, week xx

360 400 440 480 520 560 600 640 680 720 760 STUDENT MEASURES |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| DISTRIBUTIONS P Y D F A M G LZQCERUC NW H J K CLASSROOM ONE L Y D F G A M ZPUCQERU NW H J K CLASSROOM TWO G P Y F A M EZQCLRUS NW D K J CLASSROOM THREE Q Y D W F A P M LZSRUCE J H N KJ CLASSROOM FOUR P Q D GA F LMSYCERU NJ WK H CLASSROOM FIVE |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| 360 400 440 480 520 560 600 640 680 720 760 MEASUREMENT SCALE 80 70 50 40 35 30 35 40 50 70 UNCERTAINTY T S M S T MEAN, SD, 2 SD (T) 0 10 20 30 40 50 70 80 90 99 OVERALL STUDENT PERCENTILE

SLIDE 27

Vertical Coherence

360 400 440 480 520 560 600 640 680 720 760 END OF SEMESTER |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| ELEVENTH GRADE 13 2 \ OVERALL 1 2142572791784251 1 1 \ STUDENT 1 1 1 3 22 4 62735893922385590827032906974 5821 1 1 > MEASURE 6756 7 3 7210 381110986347843441031745162392532192 6 89 4 46 / DISTRIBUTION T S M S T MEAN, SD, 2 SD (T) 0 10 20 30 40 50 70 80 90 99 PERCENTILE 110 70 50 35 30 35 50 70 110 UNCERTAINTY 1 T )+ 6 78 Q Y D W 4 3 %$ F^& AV*(P2M LZSRUCE J H N KJ!@# 0 5 9 DISTRICT-WIDE CLASSES P Q D GA F LMSYCERU NJ WK H DISTRICT-WIDE SCHOOLS D | P | S PROFICIENCY STDS 15% 53% 32% PROFICIENCY %ILES * LAST YEAR’S MEAN * PISA/TIMSS/ICILS 400 500 600 700 800 SAT EQUIVALENTS |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| 360 400 440 480 520 560 600 640 680 720 760 MEASUREMENT SCALE 4 2 1 1 .5 .3 .5 1 2 4 UNCERTAINTY T S M S T MEAN, SD, 2 SD (T) 0 10 20 30 40 50 70 80 90 99 OVERALL STUDENT %ILE

SLIDE 28

Vertical Coherence

360 400 440 480 520 560 600 640 680 720 760 END OF SEMESTER |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| FIFTH GRADE 13 2 \ OVERALL DISTRICT 1 2142572791784251 1 1 \ STUDENT 1 1 1 3 22 4 62735893922385590827032906974 5821 1 1 > MEASURE 6756 7 3 7210 381110986347843441031745162392532192 6 89 4 46 / DISTRIBUTION T S M S T MEAN, SD, 2 SD (T) 0 10 20 30 40 50 70 80 90 99 PERCENTILE XXXXXXX STUDENT X WEEK ONE XXXXXXX STUDENT X WEEK THREE XXXXXXX STUDENT X WEEK FIVE XXXXXXX STUDENT X WEEK SEVEN XXXXXXX STUDENT X WEEK NINE D | P | S PROFICIENCY STDS 15% 53% 32% PROFICIENCY %ILES * LAST YEAR’S MEAN * PISA/TIMMS/ICILS 400 500 600 700 800 SAT EQUIVALENTS |-----+-----+-----+-----+-----+-----+-----+-----+-----+-----| 360 400 440 480 520 560 600 640 680 720 760 MEASUREMENT SCALE 0 10 20 30 40 50 70 80 90 99 OVERALL STUD %ILE AT GRADE LEVEL

SLIDE 29

Taking language as a model

Provides guidelines to the complex and discontinuous

denotative, metalinguistic, and metacommunicative structures we need to connect number words with formal theories, abstract concepts, and concrete things in the world.

Rasch measurement theory’s kidmaps, Wright maps, and

construct maps and specification equations provide the tools we need for productively integrating data, instruments, and theory in a new art and science.

Rasch provides the “new form of computability” that is

“both socially situated and abstract enough to travel across space and time,” as called for by Star and Ruhleder (1996).

SLIDE 30

Alliances and Translations for Coherence

(Adapted from Star & Griesemer, 1989, p. 390) Construct map and specification equation Different Wright maps showing separate samples

f students and items in

same unit Unique kidmaps Metacommunicative (theory) Metalinguistic (instrument) Denotative (data)

Golinski (2012, p. 35):

"Practices of translation, replication, and metrology have taken the place of the universality that used to be assumed as an attribute of singular science."

SLIDE 31

Linguistic Complexity in Research & Practice

Level of Complexity

Bottom-Up Research Top-Down Practice Visual Display (Users) Metacommunicative Construct specification equations and explanatory theory Metrological traceability to metric system standard units Construct Map (Theoreticians) Metalinguistic Invariance scaling models Applied research innovations and quality improvement applications Wright Map (Psychometricians) Denotative Qualitative observations Contextualized information supporting caring arts and sciences KidMap (Teachers)

SLIDE 32

An English language reading measurement network

100+ English language reading tests across the world measure in a

common unit.

Over 30 million student measures in the U.S. annually are interpreted

relative to 250,000 book measures and 200 million article measures, where matching student and text measures predict a 75 percent comprehension rate.

Books, articles, assessments, and students have been brought into a

common frame of reference in a process now almost 30 years old and still accelerating.

Text complexity corresponds with reading learning progressions,

enabling individualized instruction.

SLIDE 33

Developmental Coherence

Fisher, W. P., Jr., & Stenner, A. J. (2016). Theory-based metrological traceability in education: A reading measurement network. Measurement, 92, 489-496

SLIDE 34

Coherence in reading measurement

Student measures are tracked over time and across grade levels,

instantiating developmental coherence.

Teachers are able to compare learning outcomes within their own and

across each other’s classes, realizing horizontal coherence.

State end-of-year or graduation tests report in the common unit,

providing parents, students, teachers, principals, librarians, researchers, and the public with the vertical coherence needed for connecting classroom formative assessments with accountability standards.

SLIDE 35

Philosophically…

…we are taking up the problem of how to realize a full and non-

contradictory integration of global human identity and unique human singularity.

“…a social ethic cannot spring from a system but from a paradox. It

aims at two opposed things: human totality and human singularity. I want both.” (Ricoeur, 1974, p. 166)

Ricoeur, P. (1974). The project of a social ethic. In D. Stewart & J. Bien, (Eds.). Political and social essays (pp. 160-175). Athens, Ohio: Ohio University Press.

SLIDE 36

Towards a social ethic

A social ethic capable of integrating human totality and

human singularity:

will emerge only from resolution of the paradox of inclusively

addressing the needs of humanity as a whole

while also vigorously personalizing to the maximum relationships

that tend to become anonymous and inhuman in the wake of the quest for a shared human identity.

SLIDE 37

Thank you!

William P. Fisher, Jr.
University of California, Berkeley
wfisher@berkeley.edu

http://www.springer.com/us/book/9783319673035

The Springer Series in Measurement Science and Technology

William P. Fisher, Jr.

BEAR Seminar UC Berkeley Graduate School of Education 30 January 2018

Discontinuous Levels of Complexity in Coherent Educational Measurement: The Roles of KidMaps, Wright Maps, and Construct Maps

Thanks to colleagues

information complexity contexts: Coordinating classroom and international

Wilson & M. W. Bertenthal, Eds.). Washington, DC: The National Academies Press.

The problem of coherence in educational assessment

coordinate meaningful and comparable formative, interim, and summative assessments within and across classrooms?

frames of reference across developmental, horizontal, and vertical comparisons.

Developmental, horizontal and vertical forms

Coherence: Forced conformity, or an unexplored alternative?

fears that coherence in educational assessment will become another instance of a “high modern” scheme that systematically homogenizes human variation into bureaucratically manageable forms.

efforts at improving the human condition, but does not mention Scott’s concluding suggestion that language could provide a model for a new kind of standard that functions as a means of continually adapting broad principles to novel circumstances.

Multiple levels of complexity in language and information infrastructures

continuity that is never still and ever open to the improvisations of its speakers."

requirements of openness and malleability, coupled with structure and navigability, create a fascinating design challenge—even a new science."

and openness "is highly challenging technically, requiring new forms

to travel across time and space“ (Star & Ruhleder, 1996, p. 132).

Levels of complexity in language

(Star & Ruhleder, 1996; following Bateson, 1972)

justified given the content and difficulties of the questions.

Levels of complexity in language

controlled by any person or group.

Levels of complexity in language

real and tangible: questions answered correctly and incorrectly.

particular set of questions.

person or group.

general and invariant meaning.

contained in a privately organized information system not

What happens when we ignore levels of complexity in language?

unused or circumvented, and a set of circumstances of our own creation which more deeply impress disparities on the organizational landscape" (Star & Ruhleder, 1996, p. 118).

U.S. NCLB proficiency standards.

What’s the alternative?

invariant throughout the language?

not under the control of any individual?

number words provide independent validation of the inferences made?

Levels of complexity in education

Levels of complexity in education

Levels of complexity in education

about learning

conjoint order:

invariance emerge across tests.

Levels of complexity in education

Levels of complexity in education

Reading difficulty (or readability) = A*log(MSL)-B*log(WF) + C where MSL is the mean sentence length, WF is the word frequencies, and A, B, and C are constants.

Burdick, H., & Stenner, A. J. (1996). Theoretical prediction of test items. Rasch Measurement Transactions, 10(1), 475.

Theoretical vs Empirical Reading Item Estimates

Theoretical vs Empirical Mathematics Item Estimates

“There is nothing so practical as a good theory."

(Lewin, 1951, p. 169)

when phenomena are understood well enough to predict their behaviors.

the analysis and reporting of response data from tests and assessments are integrated with learning materials in immediate formative feedback.

LO: Student articulates basic properties of matter

Score report for an individual student

*** * * * * *

Developmental Coherence

Measures over time

Progress map for a classroom

School Term

Developmentally coherent

Horizontal Coherence

Score report for multiple classrooms, week xx

Vertical Coherence

Vertical Coherence

Taking language as a model

denotative, metalinguistic, and metacommunicative structures we need to connect number words with formal theories, abstract concepts, and concrete things in the world.

construct maps and specification equations provide the tools we need for productively integrating data, instruments, and theory in a new art and science.

“both socially situated and abstract enough to travel across space and time,” as called for by Star and Ruhleder (1996).

Alliances and Translations for Coherence

(Adapted from Star & Griesemer, 1989, p. 390) Construct map and specification equation Different Wright maps showing separate samples

same unit Unique kidmaps Metacommunicative (theory) Metalinguistic (instrument) Denotative (data)

"Practices of translation, replication, and metrology have taken the place of the universality that used to be assumed as an attribute of singular science."

Linguistic Complexity in Research & Practice

Level of Complexity

An English language reading measurement network

common unit.

relative to 250,000 book measures and 200 million article measures, where matching student and text measures predict a 75 percent comprehension rate.

common frame of reference in a process now almost 30 years old and still accelerating.

Reading difficulty (or readability) = Alog(MSL)-Blog(WF) + C where MSL is the mean sentence length, WF is the word frequencies, and A, B, and C are constants.