[PPT] - www.mikethicke.com I N T R O D U C T I O N Dissertation: PowerPoint Presentation

SLIDE 1

U S I N G C I TAT I O N - M A P P I N G T O A S S E S S E C O N O M I C M O D E L S O F S C I E N C E

Mike Thicke PhD, IHPST, University of Toronto (2016) Bard College, Bard Prison Initiative mikethicke@gmail.com

www.mikethicke.com

SLIDE 2

I N T R O D U C T I O N

Dissertation: Consequences of importing economic ideas and

methods into philosophy of science.

Formal models of the division of cognitive labor in science substitute

plausibility and robustness for empirical data.

Without empirical data to establish representational or predictive

accuracy, only weak inferences about science can be drawn.

Citation analysis one way to inform models with data.
Two examples from my project on CDL in climate science.
Advantages and challenges of citation analysis.

SLIDE 3

T W O WAY S T O A S S E S S M O D E L S W I T H D ATA

D ATA M O D E L D ATA

Representational Accuracy Predictive Accuracy

SLIDE 4

W E I S B E R G : R E P R E S E N TAT I O N A L A C C U R A C Y

Volterra principle: “A general

pesticide will increase abundance of prey and decrease abundance of predators.”

Data at the beginning:

populations can be “described by coupled differential equations.”

Model explores consequences
f that.
Robustness analysis at the end

confirms results of model.

SLIDE 5

S C H E L L I N G : P R E D I C T I V E A C C U R A C Y

Racial segregation can

result from “mild" racial preferences.

Individuals move if too

many neighbours are of different race.

Plausibility at beginning,

confirmed by data at end.

SLIDE 6

A S S E S S M E N T I N F O R M A L M O D E L S O F S C I E N C E

M O D E L

Representational Accuracy Predictive Accuracy

D ATA D ATA

SLIDE 7

A S S E S S M E N T I N F O R M A L M O D E L S O F S C I E N C E

P L A U S I B I L I T Y

M O D E L

R O B U S T N E S S

Representational Accuracy Predictive Accuracy

SLIDE 8

P L A U S I B I L I T Y: T H O M A O N W E I S B E R G & M U L D O O N

Weisberg and Muldoon: research

communities composed of mavericks and followers.

Thoma: Implausible that anyone

would employ follower strategy:

Scientists can easily learn about

the success of nearby approaches without investigating themselves.

Why would anyone be

motivated to duplicate work for no epistemic benefit?

SLIDE 9

R O B U S T N E S S : W E I S B E R G & M U L D O O N O N K I T C H E R & S T E V E N S

Kitcher & Strevens: Self-interested

scientists can achieve optimal divisions of labour between two research projects.

Weisberg and Muldoon: Result

not robust to changes in scientists’ knowledge of each

thers’ work.
As radius of vision decreases,

community diverges from

ptimal allocation.

SLIDE 10

W H Y I S D ATA I M P O R TA N T ?

Robustness analysis epistemically significant only to the extent that

the model is representationally accurate.

Plausibility only weakly establishes representational accuracy.
Plausibility epistemically significant only to the extent that the

model is predictively accurate.

Robustness only weakly establishes predictive accuracy.
Even if plausibility+robustness are informative about target

systems, impossible to establish magnitude of effects without data.

To make normative claims about scientific practice, need to

establish magnitudes.

SLIDE 11

M Y P R O J E C T: C O G N I T I V E D I V I S I O N O F L A B O R I N C L I M AT E S C I E N C E

SLIDE 12

S U N D B E R G ' S C L A I M S

Climate models are an obligatory passage point to

climate policy.

Data flows from experiments to models through

parameterizations.

Experimentalists often fail to translate their results into

parameterizations that are useful to modelers.

Climate science faces a coordination problem.

Sundberg, “Parameterizations as Boundary Objects on the Climate Arena” (2007).

SLIDE 13

R E S E A R C H Q U E S T I O N S

Is there really a coordination problem in climate

science between modelers and experimentalists?

What is the magnitude of this problem?
If there is a problem, what is the cause?
Problem of education / communication?
Problem of incentives?

SLIDE 14

C L I M AT E M O D E L 1 6 3 6 A E R O S O L 5 6 8 7

36 6 7 17 6 2 0.2

0.3

PARAMETERIZATION 851

CITATION COUNTS: STANDARD DEVIATIONS ABOVE MEAN 47

SLIDE 15

1 SD 6 SD 571 Citations 240 Citations

PARAMETERIZATION→AEROSOL CITATIONS COMPARED TO PARAMETERIZATION→RANDOM CITATIONS

SLIDE 16

M O D E L I N G T H E C A U S E

Assume there is a coordination problem. What is the cause?
Observation: Citation counts follow power laws.
Hypothesis: Rational scientists seeking to maximize citations will target

papers narrowly.

Paper quality is group-relative, widely-targeted papers will have medium

quality for many groups while narrowly targeted papers will have high quality for one group and low quality for others.

Maximizing quality relative to one group at the expense of others will

maximize total citations.

It is easier to target a paper narrowly at one’s own discipline.
Few papers will be targeted outside of home discipline.

SLIDE 17

PA P E R S 4 9 9 7 M E A N 5 . 6 M E D I A N 3 1 0 % 9 0 % 1 3 9 9 % 4 1 9 9 . 9 % 1 4 6

CITATIONS OF “AEROSOL” PAPERS

Very long tail

SLIDE 18

A S I M P L E M O D E L

Q, Qω, Qψ ∈ (0, 1)

quality, internal quality, external quality

qω,i = qa

i

A ∈ (1 5, 1 4, 1 3, 1 2, 1, 2, 3, 4, 5)

degree of specialization (1/5 and 5 are high)

qψ,i = q

1 a

i

specializing trades off between internal and external quality

C, Cω, Cψ

total, internal, and external citation counts

ci = cω,i + cψ,i

𝜇, 𝜆 parameters of Pareto (long-tailed) distribution. Total citations is sum of internal and external citations.

cω,i = λ(1 − qω,i)

−1 κ − 1

cψ,i = λ(1 − qψ,i)

−1 κ − 1

SLIDE 19

P E R C E N T I L E 1 0 % 5 0 % 9 0 % 9 9 % U N I F O R M Q 3 1 6 3 8 R A N D O M A 1 5 1 7 4 1 E X T R E M E A 2 6 1 8 4 1

SLIDE 20

O T H E R P O S S I B L E M O D E L S

Alternative causes (eg. making papers useful to wider

audiences takes more time).

Alternative models of specialization.
Agent-based simulations (papers accrue citations

through time, papers take time to produce, authors have varying utility functions, authors have varying talent, authors discover papers through previous citation, adjustable reward structure).

SLIDE 21

C I TAT I O N S A S D ATA : A D VA N TA G E S

Can parameterize/fit models with empirical data.
Can test model predictions against empirical data.
Can measure effect sizes.

SLIDE 22

C I TAT I O N S A S D ATA : C H A L L E N G E S

Time consuming.
Long execution times.
Data access can be difficult.
Never get full coverage.
Even with good datasets (eg. Web of Science), tracking citations can be difficult.
Messy data.
Limited range of questions that can be answered.
Don’t have access to counterfactual world (hard to use data at both ends of

model).

SLIDE 23

A W E B O F S C I E N C E R E C O R D

SLIDE 24

SLIDE 25

C O U N T E R FA C T U A L S

Model requires specifying 𝜇, 𝜆

parameters for each distribution.

Currently based on real data.

Alternatively, use regression.

Can’t double-dip: compare

predictions with same data used to parameterize model.

How to assess predictive

accuracy?

Need data other than citations

at one end or the other, or substitute plausibility / robustness.

cω,i = λ(1 − qω,i)

−1 κ − 1

cψ,i = λ(1 − qψ,i)

−1 κ − 1

SLIDE 26

R E F E R E N C E S

Weisberg, Michael. “Robustness Analysis.”

Philosophy of Science (2006).

Thoma, Johanna. “The Epistemic Division of

Labor Revisited.” Philosophy of Science (2015).

Weisberg, Michael, and Ryan Muldoon.

“Epistemic Landscapes and the Division of Cognitive Labor.” Philosophy of Science (2009).

Muldoon, R, and M Weisberg. “Robustness

and Idealization in Models of Cognitive Labor.” Synthese (2010).

Sundberg, Mikaela. “Parameterizations as