Tracking science in real-time from large-scale usage data Johan - - PowerPoint PPT Presentation

tracking science in real time from large scale usage data
SMART_READER_LITE
LIVE PREVIEW

Tracking science in real-time from large-scale usage data Johan - - PowerPoint PPT Presentation

Introduction MESUR Mapping Science Metrics Survey Discussion Tracking science in real-time from large-scale usage data Johan Bollen - jbollen@indiana.edu Indiana University School of Informatics and Computing Center for Complex Networks


slide-1
SLIDE 1

Introduction MESUR Mapping Science Metrics Survey Discussion

Tracking science in real-time from large-scale usage data

Johan Bollen - jbollen@indiana.edu

Indiana University School of Informatics and Computing Center for Complex Networks and Systems Research Cognitive Science Program

July 29, 2010

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-2
SLIDE 2

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Outline

1 Introduction

Problem statement Usage data

2 MESUR

MESUR overview Creating MESUR’s reference data set

3 Mapping Science 4 Metrics Survey 5 Discussion

Overview Future Research Relevant papers

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-3
SLIDE 3

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Why study science?

Tremendous importance Enormous amount of resources and people involved: Allocation of resources: funding agencies, policy makers, the public, corporations, the scientists themselves Social systems research: Ability to learn about general properties

  • f similar social systems

Unrelated picture of my daughter Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-4
SLIDE 4

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Science as a social system

Actors: scientists, stakeholders Relations (often focused on exchange of knowledge)

Informal interactions: meetings, workshops, conversations, messages, etc Formal: affiliations, collaborations, projects, publications

Artifacts: articles, journals, reports, data, software

Xiaoming Liu, Johan Bollen, Michael L. Nelson, and Herbert Van de Sompel. Co-authorship networks in the digital library research community. Information Processing and Management, 41(6):14621480, December 2005 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-5
SLIDE 5

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

The measure of science in the print-era

A word on the traditional way

Gold standard: citation data Extracted from “publication” data.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-6
SLIDE 6

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

The measure of science in the print-era

A word on the traditional way

Gold standard: citation data Extracted from “publication” data. From print comes citation:

Science is a gift-economy Steal text, but not ideas. Currency is acknowledgement

  • f influence: citations

More citations, more influence Track scientific activity from citation data

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-7
SLIDE 7

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

The measure of science in the print-era

A word on the traditional way

Gold standard: citation data Extracted from “publication” data. From print comes citation:

Science is a gift-economy Steal text, but not ideas. Currency is acknowledgement

  • f influence: citations

More citations, more influence Track scientific activity from citation data

Citations are derived from peer-reviewed publications

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-8
SLIDE 8

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

From this springs an entire industry

Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-9
SLIDE 9

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

From this springs an entire industry

Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc. Citation-based services

Reference linking Variety of end-user services

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-10
SLIDE 10

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

From this springs an entire industry

Applications of citation data End-user services and bibliometrics Web of Science, Scopus, publishers, institutions, etc. Citation-based services

Reference linking Variety of end-user services

Citation-based assessment and bibliometrics

Impact metrics Analytics Mapping

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-11
SLIDE 11

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Two problems with present citation-based approach

Data (1) and metrics (2)

Issues:

1 Inherent features of citation data due to its origins in

published material

2 Existing metrics fail to acknowledge (1), and ignore network

properties of science as a social system.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-12
SLIDE 12

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation data

Delays and partiality

Citation problem 1: Data Citation data is late and partial indicator of scientific activity

discovery research publication discovery research publication citation time 0 year +0.5 year +1.5 years +3.5 years +2 years +2.5 years citation data ?

Publication delays Publicationt → publicationt−1 → citation DB Domain-dependent practices Community: Publishing authors

  • nly

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-13
SLIDE 13

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation data

Metrics vs. networks

Citation problem 2: networks Ignoring network properties of science

Livestock Food Chemistry Material Sciences Physics Computer Science Geology Sports Medicine ObGyn Medicine Resource Management Dermatology

!"#$%&'(%) *$+&,-%*$+&,!"#$ .,-%-,+%&%*$+&,-%$,/ 0%!"#$%/"'.%* 0%,&1%/"'. 0%!"#$%/"'.%) !"#$%&'(%2 *$+&,!"#$%0 0%)3,4%/"'. )3,/"'.3$+&# !"#$%&'(%' 0%"31"%'-'&1#%!"#$ 3-,&1%/"'. !"#$%&'(%4'++ ,&1*-,.'+*443/$ !"#$%&'(%* +'+&*"'2&,- 0%/"'.%!"#$ 0%*!!4%!"#$ *$+&,-%0 *-1'5%/"'.%3-+%'23+ +'+&*"'2&,-%4'++ !,4#.'& ,&1%4'++ 0%!"#$!/,-2'-$%.*+ /"'.%&'( 0%(3&,4 /"'.%!"#$%4'++ !"#$%&'(%/ )3,/"'.%)3,!"%&'$%/, 0%*.%/"'.%$,/ /"'.!'6&%0 0%3..6-,4 !%$,/%!",+,!,!+%3-$ *--%1',!"#$!*+.%"#2& 1',!"#$%&'$%4'++ *$+&%$,/%! !"#$%74632$ !%-*+4%*/*2%$/3%6$* *!!4%!"#$%4'++
  • 6/4%!"#$%)
!"#$%/"'.%/"'.%!"#$ 0%1',!"#$%&'$ .,4%!"#$ !"#$%4'++%) '6&%0%,&1%/"'. .,4%.3/&,)3,4 /"'.%/,..6- *3!%/,-7%! $6&7%$/3 0%,&1*-,.'+%/"'. 4*-1.63& 4'/+%-,+'$%/,.!6+%$/ /"'.%!"#$ *!!4%/*+*4%*!1'- 3-+%0%86*-+6.%/"'. +"',/"'.!0%.,4%$+&6/ )3,/"'.%0 */+*%/&#$+*44,1&%'
  • '6&,$/3'-/'
$#-4'++ ,&1%)3,.,4%/"'. 0%!"#$%)!*+%.,4%,!+ (3&,4,1# 3-,&1%/"3.%*/+* 2*4+,-%+ 0%-'6&,!"#$3,4 !"#$%!4*$.*$ !"#$3/*%) )3,!"#$%0 0%.,4%)3,4 *!!4%'-(3&,-%.3/&,) $#-+"'$3$!$+6++1*&+ '*&+"%*-2%!4*-'+*&#%$/3'-/'%4'++'&$ 0%/,.!%-'6&,4 0%!,4#.%$/3%!,4%/"'. .*+%$/3%'-1%*!$+&6/+ 1',!"#$%0%3-+ .,4%/'44%)3,4 0%-'6&,$/3
  • 6/4'3/%*/32$%&'$
3-+%0%.,2%!"#$%* 0%/"&,.*+,1&%* .*/&,.,4'/64'$ 2'(%)3,4 '6&%0%3-,&1%/"'. 7')$%0 /,,&23-%/"'.%&'( 0%)*/+'&3,4 /43-%,&+",!%&'4*+%& +"'&3,1'-,4,1# *-*4%/"3.%*/+* *--%+",&*/%$6&1 /43-%/*-/'&%&'$ !"#$%4'++%* *.%0%!"#$3,4!"'*&+%/ */+*%.*+'& 0*!*-'$'%0,6&-*4%,7%*!!43'2%!"#$3/$ '4'/+&,/"3.%*/+* '6&%!"#$%0%) *&/"%)3,/"'.%)3,!"#$ 7')$%4'++ 3-+%0%/*&23,4 .3/&,)3,4!69 /"'$+ *.%0%/*&23,4 /'44%.,4%437'%$/3 3-2%'-1%/"'.%&'$ /43-%-'6&,!"#$3,4 $,432%$+*+'%/,..6- /4*$$3/*4%*-2%86*-+6.%1&*(3+# /"'.%1',4 '6&%0%-'6&,$/3 *$+&,!"#$%0%$6!!4%$ +'+&*"'2&,-!*$#..'+& .,4%)3,4%/'44 /*-%0%/"'. 0%*.%/,44%/*&23,4 0%),-'%0,3-+%$6&1%*. 0%/,44,32%3-+'&7%$/3
  • 6/4%!"#$%*
0%*!!4%!,64+&#%&'$ !&,+'3-$ (*2,$'%:,-'%0 0%!"#$%*!.*+"%1'- 0%1'-%(3&,4 .,-%5'*+"'&%&'( 0%/43-%,-/,4 0%*&+"&,!4*$+# "6.%&'!&,2
  • 6/4%2*+*%$"''+$
2'('4,!.'-+ /*+*4%+,2*# 0%'6&%/'&*.%$,/ ,-/,1'-'
  • 6/4%!"#$%)!!&,/%$6!
3-+%0%.,2%!"#$%) !"#$%$+*+6$%$,4323%) .*&%'/,4!!&,1%$'& /*-/'&%&'$ +"3-%$,432%734.$ 3-7'/+%3..6- )&*3-%&'$ $!'/+&,/"3.%*/+*%* !,4#"'2&,- 0%),-'%0,3-+%$6&1%)& )4,,2 )%/"'.%$,/%0!- 0%/'44%$/3 *$+&,!"#$%$!*/'%$/3 !"#$%&'!
  • '6&,$6&1'&#
+'/+,-,!"#$3/$ '6&%0%*!!4%!"#$3,4 0%*!!4%!,4#.%$/3 3/*&6$ '6&%!"#$%0%/ +*4*-+* 0%.,4%/*+*4%*!/"'. /"'.,$!"'&' 0%!",+,/"%!",+,)3,%* $!*/'%$/3%&'( .,4%'/,4 /"'.%.*+'& 2'(%2#-*. 0%"#2&,4 '6&%!"#$%0%* *.%0%&,'-+1'-,4 ';!%)&*3-%&'$ )%$'3$.,4%$,/%*. ,'/,4,13* 0%!,4#.%$/3%!,4%!"#$ 0%/&#$+%1&,5+" $!3-' 1',/"3.%/,$.,/"3.%*/ 0%-'6&,/"'. 0%*+.,$%$,4!+'&&%!"# ,39,$ /,44,32%$6&7*/'%* !"#$3/*%/ &'(%)&*$%:,,+'/- /"'.%'-1%$/3 0%/43.*+' )3,,&1*-%.'2%/"'. '6&,!"#$%4'++ *-*4%)3,*-*4%/"'. /3&/64*+3,- 0%/,.!6+%/"'. 1',/"'.%1',!"#%1',$# 0%.,4%$+&6/+ 0%-*+%!&,26/+$ !6)4%*$+&,-%$,/%!*/ '-2,/&3-,4,1# .'+*44%.*+'&%+&*-$%* 9'#%'-1%.*+'& 0%.'2%/"'. 0%';!%),+ 0%*!!4%!"#$3,4 0%*1&%7,,2%/"'. *.%0%&'$!%/&3+%/*&' 3-('$+%,!"+"%(3$%$/3 ,!"+"*4.,4,1# 1'-'+3/$ $6&1%-'6&,4 7,&'$+%'/,4%.*-*1 .'*+%$/3 0%-'6&,$6&1 .*+'&%$/3%7,&6. 0%*44,#%/,.!2 0%.*1-%.*1-%.*+'& *.%0%,)$+'+%1#-'/,4 "'4(%/"3.%*/+* *+.,$%'-(3&,- '4'/+&,*-*4 *.%"'*&+%0 $+62%$6&7%$/3%/*+*4 ,!+%';!&'$$ $/3'-/' 7'.$%.3/&,)3,4%4'++ $#-+"'+3/%/,..6- 74632%!"*$'%'86343)&
  • 6/4%3-$+&6.%.'+"%)
'6&%&'$!3&%0 8%0%&,#%.'+',&%$,/ !"#$3/*%* '6&%"'*&+%0 0%$!''/"%4*-1%"'*&%& :%*-,&1%*441%/"'.
  • *+6&'
/43-%3-7'/+%23$ *-+3/*-/'&%&'$ 0%!"#$%2%*!!4%!"#$ 0%.*+'&%$/3 )'"*(%)&*3-%&'$ 0%*.%*/*2%2'&.*+,4 ,)$+'+%1#-'/,4 !$#/",!"*&.*/,4,1# *.%0%,!"+"*4.,4 .,2%!"#$%4'++%* */+*%",&+3/ 0%$+&6/+%1',4 0%!'+&,4 *--6%&'(%*$+&,-%*$+& 0%.,4%$!'/+&,$/ "#2&,4%!&,/'$$ .*/&,.,4%/"'.%!"#$ 0%!"#$%1%-6/4%!*&+3/ *!!4%$6&7%$/3 7,,2%/"'. &'(%.,2%!"#$
  • 6/4%3-$+&6.%.'+"%*
!"*&.*/,4%)3,/"'.%)' 5*+'&%'-(3&,-%&'$ 0%*+.,$%$/3 *&+"&,$/,!# '6&%0%3..6-,4 .'+",2%'-:#.,4 $,4%!"#$ 7'&+34%$+'&34 )3,,&1%.'2%/"'.%4'++ .'2%$/3%$!,&+%';'& 0%(,4/*-,4%1',+"%&'$ !4*-'+%$!*/'%$/3 !&,+'3-%$/3 '6&%0%!"*&.*/,4 '-(3&,-%$/3%+'/"-,4 *.%0%$!,&+%.'2 ",&+$/3'-/' '/,4,1# 0%/,$.,4%*$+&,!*&+%! 0%!"#$3,4!4,-2,- !"#$%*+,.%-6/4< 0%'4'/+&,/"'.%$,/ *-*4%/"'. )&3+%0%,!"+"*4.,4 "#2&,)3,4,13* *-*4%)3,/"'. )3,3-7,&.*+3/$ ,!+%'-1 3-+%0%&'.,+'%$'-$ 1#-'/,4%,-/,4 ,!+%/,..6- *$3*-%*6$+&*4%0%*-3. 0%'4'/+&,*-*4%/"'. '(,46+3,- 0%!"#$%/"'.%$,432$ 0%+",&*/%/*&23,(%$6& /&,!%$/3 0%';!%.*&%)3,4%'/,4 3-+%0%&*23*+%,-/,4 43+",$ !4*-+* 0%!"*&.*/,4%';!%+"'&
  • '6&,$/3%4'++
932-'#%3-+ !'23*+&3/$ )3,4%&'!&,2 0%/43-%'-2,/&%.'+*) 3..6-,4%&'( /,/"&*-'%2)%$#$+%&'( 0%$,6-2%(3) ';!%/'44%&'$ 3-+%0%"'*+%.*$$%+&*- !4*-+%!"#$3,4 .*&%)3,4 3-06&# 0%!"#$%,/'*-,1& 0%74632%.'/" 0%/43-%.3/&,)3,4 0%-,-!/&#$+%$,432$ 0%!'23*+&%,&+",!'2 +"&,.)%"*'.,$+*$3$ (*//3-' 0%!'23*+&%,&+",!%) 0%/'44%)3,4 1'-'%2'( 0%,!+%$,/%*.%* 23*)'+'$%/*&' 23*)'+'$ 0*.*!0%*.%.'2%*$$,/ +%*.%73$"%$,/ 1'-' )&3+%0%2'&.*+,4 !4*-+%0 &*23,4,1# .3/&,!,&%.'$,!,&%.*+ *-3.%&'!&,2%$/3 0%/*+*4 0%*/,6$+%$,/%*. *&/"%(3&,4 5*+'&%&'$
  • '6&,-
/*-/'& 0%$/3%7,,2%*1& 0%*.%/'&*.%$,/ *3/"'%0 .,4%)3,4%'(,4 *!!4%,!+3/$ (3$3,-%&'$ *&+'&3,$/4%+"&,.%(*$ /,-+&3)%.3-'&*4%!'+& '.),%0 *-+3.3/&,)%*1'-+$%/" +"',&%/,.!6+%$/3 *2(%.*+'& !4*$.*%!"#$%/,-+&%7 0%7,,2%$/3 *-3.%)'"*( &'.,+'%$'-$%'-(3&,- 0%2*3&#%$/3 !4*-+%*-2%$,34 4*-/'+ 0%';!%+"',&%!"#$< !'&/'!+%!$#/",!"#$ *&/"%,!"+"*4.,4!/"3/ *.%-*+ 0%.*+"%!"#$ 0%';!%.'2 1',4,1# */+*%/&#$+*44,1&%2 0%431"+5*('%+'/"-,4 !4*-+%/'44 5*+'&%&'$,6&%&'$ /*-%0%73$"%*86*+%$/3 0%/43-%3-('$+ ,!+%4'++ /*-%0%7,&'$+%&'$ *.%.3-'&*4 !,64+&#%$/3 /'44 $,34%$/3%$,/%*.%0 *&/"%2'&.*+,4 !"#$3/*4%&'(3'5 0%!"#$%/"'.!6$ /,..6-%*/. $"'4;$=>%$"'4;4=> 6-!6) 0%*$$,/%/,.!6+%.*/" */+*%.'+*44%.*+'& 0%!"#$%/%$,432%$+*+' )3,/"3.%)3,!"#$%*/+* 0%&'!&,2%7'&+34 /3&/%&'$ */+*%/&#$+*44,1&%* *.%&'(%&'$!3&%23$ '4'/+&,'-%/43-%-'6&, 0%*!!4%/&#$+*44,1& 3'''%0%86*-+6.%'4'/+ '6&%0%)3,/"'. $3*.%0%/,.!6+ $*2*)$ 3'''%+%3-7,&.%+"',&# .6+*+%&'$ 0%*-3.%$/3
  • '5%'-14%0%.'2
!"#+,/"'.3$+&# '4'/+&,-%4'++ 0%/"'.%$,/%!'&9%+%? :%!"#$%/%!*&+%73'42$ 0%3-7'/+%23$ /,..6-3/*+3,- 1*$+&,'-+'&,4,1# 0%*.%$,/%",&+3/%$/3
  • 6/4%76$3,-
43.-,4%,/'*-,1& *--%!"#$!-'5%#,&9 /"'.%)'& *.%0%/43-%-6+& /,..6-%.*+"%!"#$ 0%-*+4%/*-/'&%3

❳ ❳ ③ ❳❳ ❳ ③

bad good Measuring impact, influence, prestige from citation data: More citations is better than less citations In citation network, central position is better than outskirts.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-14
SLIDE 14

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor:

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-15
SLIDE 15

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor:

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-16
SLIDE 16

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Metrics vs. networks

This approach is epitomized in Thomson-Reuters Impact Factor: Standard citation graph representation G = (V , E, W ) E ⊆ V 2 W : E → N+ However: ∀(ni, nj) ∈ E : pubtime(ni) > pubtime(ni) Impact Factor: normalized in-degree IFj =

P

i wij

Nj

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-17
SLIDE 17

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Between Big Star and Britney Spears

Is more always better? Why context matters

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-18
SLIDE 18

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Between Big Star and Britney Spears

Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-19
SLIDE 19

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Between Big Star and Britney Spears

Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-20
SLIDE 20

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Between Big Star and Britney Spears

Is more always better? Why context matters Sold 50M records Influenced who? High popularity, low prestige. Sold 50k records Influenced REM, B52s, Teenage Fanclub, etc. Low popularity, high influence

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-21
SLIDE 21

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Citation problem no. 2

Between Big Star and Britney Spears

Silly? That’s how we evaluate scientific impact right now! The Impact Factor and lots of other citation-based metrics are presently being used to assess the quality of publications, journals, authors, institutions, and even entire countries by proxy. Note: Lots of developments on better metrics (cf. Eigenfactor: Bergstrom and Rosvall), but still reliance on citation data.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-22
SLIDE 22

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

New developments

PageRank for citation graphs

v2 v3 15 v1 50 5

4 1 2 3 60 20 40 10 20 ? ? 20

Impact Factor: IFj =

P

i wij

Nj

: normalized citation count

  • rigin of citation disregarded

A different route: IF(vi, ) ≃ λ

j IFj

IF(vi, ) ≃ λ

j IFj × 1 O(vj)

PR(vi) ≃ λ

j PR(vj) × 1 O(vj)

PR(vi) ≃ (1−λ)

N

+ λ

j PR(vj) × 1 O(vj)

PRw(vi) = (1−λ)

N

+ λ

j PRw(vj) × w(vj, vi)

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-23
SLIDE 23

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

New developments, contd

PageRank for citation graphs

ISI IF PRw rank value Journal value (x 103) Journal 1 52.28 ANNU REV IMMUNOL 16.78 NATURE 2 37.65 ANNU REV BIOCHEM 16.39 J BIOL CHEM 3 36.83 PHYSIOL REV 16.38 SCIENCE 4 35.04 NAT REV MOL CELL BIO 14.49 P NATL ACAD SCI USA 5 34.83 NEW ENGL J MED 8.41 PHYS REV LETT 6 30.98 NATURE 5.76 CELL 7 30.55 NAT MED 5.70 NEW ENGL J MED 8 29.78 SCIENCE 4.67 J AM CHEM SOC 9 28.18 NAT IMMUNOL 4.46 J IMMUNOL 10 28.17 REV MOD PHYS 4.28 APPL PHYS LETT

Table: The highest ranking journals according to ISI IF and Weighted PageRank (JCR2003)

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-24
SLIDE 24

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Two problems with present citation-based approach

Data (1) and metrics (2)

Issues:

1 Inherent features of citation data due to its origins in

published material

2 Existing metrics fail to acknowledge (1), and ignore network

properties of science as a social system.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-25
SLIDE 25

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Post-print, online era, aka known as “today”: usage data?

Most use of scholarly resources is now mediated by online services

discovery research publication discovery research publication citation time 0 year +0.5 year +1.5 years +3.5 years +2 years +2.5 years usage data search queries text data citation data early indicators late indicators

Recorded by nearly every scholarly service Captures early phases of scientific activity Includes a wide variety of resource types Activities of larger scientific communities, e.g. not limited

  • nly to authors

Scale: cf. Elsevier announced 1B downloads in 2006 vs. 650M citation in WoS.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-26
SLIDE 26

Introduction MESUR Mapping Science Metrics Survey Discussion Problem statement Usage data

Issues with usage data

Lots of interest in leveraging usage data for models of science, tracking scientific activity, metrics, etc. Cf. COUNTER, Ex Libris bX, and many more. However many different issues: Silo: recorded for particular service for a particular user community Lack of standards: recorded in different manner, for different systems Lack of research: how leverage usage data for scientific tracking, modeling, metrics, etc.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-27
SLIDE 27

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

Outline

1 Introduction

Problem statement Usage data

2 MESUR

MESUR overview Creating MESUR’s reference data set

3 Mapping Science 4 Metrics Survey 5 Discussion

Overview Future Research Relevant papers

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-28
SLIDE 28

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at very large-scale

Studying scientific activity

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-29
SLIDE 29

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at very large-scale

Studying scientific activity Can we model and study patterns of scientific activity from large-scale, representative usage data?

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-30
SLIDE 30

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR project: survey the potential of usage data at very large-scale

Studying scientific activity Can we model and study patterns of scientific activity from large-scale, representative usage data? What can we learn about impact and structure from patterns

  • f scientific activity from usage data?

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-31
SLIDE 31

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-32
SLIDE 32

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. Mellon Foundation

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-33
SLIDE 33

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-34
SLIDE 34

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-35
SLIDE 35

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing 2009-2013: National Science Foundation

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-36
SLIDE 36

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR history

2006-2008: Andrew W. Mellon Foundation 2008-2009: Los Alamos National Laboratory 2009-: Indiana University, School of Informatics and Computing 2009-2013: National Science Foundation Team: PI, co-PI, 2 scientists, 2 full-time developers, and 1 PhD student.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-37
SLIDE 37

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR objective

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-38
SLIDE 38

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR dataflow

MESUR reference data set Data providers ... Publishers Aggregators Filtering & deduplication

Usage data Usage date Usage date

...

ingestion Parser 1 Parser 2 Parser 3 MESUR

  • ntology

institutions impact metrics Models Models Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-39
SLIDE 39

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

Creating MESUR’s data set

MESUR reference data set Data providers ... Publishers Aggregators Filtering & deduplication Usage data Usage date Usage date ... ingestion Parser 1 Parser 2 Parser 3 MESUR

  • ntology

institutions impact metrics Models Models

0.0e+00 5.0e+06 1.0e+07 1.5e+07 date number of events 2004 2005 2006 2007 2008 2009

Providers 2006-20010: BMC, Blackwell, UC, CSU (23), EBSCO, ELSEVIER, EMERALD, INGENTA, JSTOR, LANL, MIMAS/ZETOC, THOMSON, UPENN (9), UTEXAS events 1,000,000,000 usage events citations +500,000,000 citations, articles and serials +50M articles, +-100,000 serials

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-40
SLIDE 40

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

Data requirements

Minimum requirements Separate user requests Date-time stamp down to second Document identifier or sufficient metadata to de-duplicate Session ID (or anonymized user ID) Request type identifier

http://tweety.lanl.gov/public/schemas/2007-01/mesur.owl

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-41
SLIDE 41

Introduction MESUR Mapping Science Metrics Survey Discussion MESUR overview Creating MESUR’s reference data set

MESUR’s OWL/RDF ontology

1

1Rodriguez, Bollen & Van de Sompel. A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage. JCDL07 - Based on OntologyX work. Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-42
SLIDE 42

Introduction MESUR Mapping Science Metrics Survey Discussion

Outline

1 Introduction

Problem statement Usage data

2 MESUR

MESUR overview Creating MESUR’s reference data set

3 Mapping Science 4 Metrics Survey 5 Discussion

Overview Future Research Relevant papers

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-43
SLIDE 43

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting

Domain Usage UC Degrees JCR Natural Science 37% 39% 92.8% Social Sciences 45% 46% 7.2% Humanities 14% 15% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-44
SLIDE 44

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting Common time period March 1st 2006 - February 1st 2007

Domain Usage UC Degrees JCR Natural Science 37% 39% 92.8% Social Sciences 45% 46% 7.2% Humanities 14% 15% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-45
SLIDE 45

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting Common time period March 1st 2006 - February 1st 2007 Providers: Thomson Scientic (Web of Science), Elsevier (Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6 health institutions), and California State University (23 campuses)

Domain Usage UC Degrees JCR Natural Science 37% 39% 92.8% Social Sciences 45% 46% 7.2% Humanities 14% 15% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-46
SLIDE 46

Introduction MESUR Mapping Science Metrics Survey Discussion

Reference data set

Subsetting Common time period March 1st 2006 - February 1st 2007 Providers: Thomson Scientic (Web of Science), Elsevier (Scopus), JSTOR, Ingenta, University of Texas (9 campuses, 6 health institutions), and California State University (23 campuses) Scale: 346,312,045 usage events, 97,532 serials (many of which not journals)

Domain Usage UC Degrees JCR Natural Science 37% 39% 92.8% Social Sciences 45% 46% 7.2% Humanities 14% 15% Source: http://www.ucop.edu/ucophome/uwnews/stat/statsum/fall2007/statsumm2007.pdf (table 9) Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-47
SLIDE 47

Introduction MESUR Mapping Science Metrics Survey Discussion

Article clickstreams

usage data log: U = {u1, u2, · · · , un} u = {s, t, a} s = session identifier , t = a date-time and a = article f ∈ F = clickstreams extracted from U f ⊂ U, where f = (∀u ∈ U, ∃s : s(u) ∧ t(ui) < t(ui+1)) s(u) and t(u): session identifier and date-time of interaction u.

v1 a1 v2 a2 a3 v3 a4 v1 a1 v3 a4

articles

  • nline

user interactions journals

clickstream c1

i2 i1 i4 i1 i2 i3

time

session1 session2 clickstream c2 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-48
SLIDE 48

Introduction MESUR Mapping Science Metrics Survey Discussion

Journal Clickstreams

article clickstream: fa = (a1, a2, · · · , ak) journal clickstream: fv = (v1, v2, · · · , vk) We observe: N(vi, vj) for all pairs (vi, vj) in which j = i + 1 From which follows: P(vi, vj) = N(vi, vj)

  • j N(vi, vj)

and M whose entries mi,j = P(vi, vj).

v1 v4 v3 v2 15 5 30 v1 v4 v3 v2 0.3 0.1 0.6 Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-49
SLIDE 49

Introduction MESUR Mapping Science Metrics Survey Discussion

Examples of prominent connections

vi vj p(vi , vj ) N(vi , vj ) N(vi ) American Journal of International Law International Organization 0.0207 9,292 448,034 International Affairs 0.0184 8,254 International and Comparative Law Quarterly 0.0171 7,654 Foreign Policy 0.0167 7,500 American Political Science Association 0.0140 6,291 Journal of Educational Sociology American Journal of Sociology 0.0334 2,790 83,419 Journal of Higher Education 0.0303 2,529 Journal of Negro Education 0.0286 2,389 American Sociological Review 0.0276 2,303 Social Forces 0.0249 2,076 Surface Science Physical Review B 0.0704 2,555 36,282 Applied Surface Science 0.0341 1,239 Physical Review Letters 0.0339 1,230 Journal of Chemical Physics 0.0333 1,207 Applied Physics Letters 0.0327 1,188 Journal of Organic Chemistry Journal of the American Chemical Society 0.0873 4,141 47,439 Tetrahedron Letters 0.0865 4,105 Tetrahedron 0.0602 2,857 Organic Letters 0.0532 2,526 Angewandte Chemie 0.0305 1,448 Ecological Applications Ecology 0.0965 13,659 141,481 Conservation Biology 0.0524 7,408 Bioscience 0.0215 3,043 Annual Review of Ecology and Systematics 0.0215 3,043 Clinical and Experimental Allergy 0.0191 2,699 Annals of Mathematics American Journal of Mathematics 0.0705 5,392 76,526 American Mathematical Monthly 0.0579 4,432 PNAS 0.0156 1,195 Econometrica 0.0082 624 Mathematics Magazine 0.0077 587

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-50
SLIDE 50

Introduction MESUR Mapping Science Metrics Survey Discussion

Network parameters

Network matrix Parameter M M′ Journals 97,532 2,307 Edges 6,783,552 50,000 Matrix density 0.071% 0.939% Strongly Connected Components (SCC) 16,474 236 Journals in SCC 80,934 1,944 Average journal clustering coefficient (SCC) 0.285 0.514 Diameter of largest SCC 37 14

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-51
SLIDE 51

Introduction MESUR Mapping Science Metrics Survey Discussion

log data matrix M matrix M' most reliable edges 97k journals 6.7M edges 346M interactions MESUR 1B interactions map rankings 2307 journals 50k edges all edges common date range

rank N(vi,vj) 5000 12000 20000 28000 36000 0e+00 3e+04 1e+05 2e+05

Fruchterman (1991) Graph Drawing by Force-directed

  • Placement. Software, 21(11):1129-1164

Classification code:

root

Dewey JCR Journals 1 2 3 AAT taxonomy 4 5

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-52
SLIDE 52

Introduction MESUR Mapping Science Metrics Survey Discussion

Applied physics Organic chemistry Analytical Chemistry Biochemistry Social and personality psychology Polymers Miocrobiology Biotechnology Music Plant biology Biodiversity Hydrology Genetics Brain research Geology Agriculture Alternative energy Toxicology Plant agriculture Animal Behavior Environmental Science Chemical Engineering Tourism Nutrition Neurology Sports medicine Classical studies Nursing Psychology Geography Social work Dermatology Clinical Pharmacology International studies Statistics Archeology Demographics Economics Child Psychology Human geography Production research Manufacturing Material science Engineering Clinical trials Education Anthropology Public health Law Sociology Philosophy Asian studies Religion Minerology Acoustics Statistical physics Thermodynamics Physical chemistry Pharmaceutical research Electrochemistry Ecology Language Cognitive Science Brain studies Soil/Marine biology Physiology Plant genetics Architecture Design

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-53
SLIDE 53

Introduction MESUR Mapping Science Metrics Survey Discussion

The Arts and Architecture Thesaurus (Getty Research Center)

Table: Distance from AAT root (α) and number of classifications Nc at that level. Each α produces a finer-grained separation of scientific disciplines. Distance (α) Nc Example classifications 1 4 Natural sciences, social sciences, humanities, · · · 2 8 Biology, chemistry, physics, · · · 3 31 Classics, communication, engineering, · · · 4 195 Allergy, anesthesiology, applied linguistics, · · ·

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-54
SLIDE 54

Introduction MESUR Mapping Science Metrics Survey Discussion

root Dewey JCR Journals 1 2 3 AAT taxonomy 4 5

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-55
SLIDE 55

Introduction MESUR Mapping Science Metrics Survey Discussion

Cross-validating the clickstream map

f (vi, vj, α) =

  • 1

Cα(vi) = Cα(vj) Cα(vi) = Cα(vj)

log data P(vi, vj)

  • M =

   p0,0 · · · p0,n . . . ... . . . pn,0 · · · pn,n    , Ni,j > µ0.5(Nk) Ni,j ≤ µ0.5(Nk)

  • *

H H j F1 χ2 AAT f(vi, vj, α) - Aα =    a0,0 · · · a0,n . . . ... . . . an,0 · · · an,n    , ai,j = 0 ai,j = 1 F2 ? ? ? 6 contingency table at α

α = 1 : p < 0.0001 α = 2 : p < 0.0001 α = 3 : p < 0.0001 α = 4 : p < 0.0001

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-56
SLIDE 56

Introduction MESUR Mapping Science Metrics Survey Discussion

Betweenness centrality

Cb(vk) =

  • i=j=k

σi,j(vk) σi,j (1)

Table: Ranking of journals from M′ according to betweenness centrality.

Rank Journal Top-level AAT classification 1 Science Natural Sciences 2 Proceedings of the National Academy of Sciences Natural Sciences 3 Environmental Health Perspectives Natural Science 4 Chemosphere Natural Sciences 5 Journal of Advanced Nursing Natural Sciences 6 Nature Natural Sciences 7 Ecology Natural Sciences 8 Milbank Quarterly Natural Sciences 9 Applied and Environmental Microbiology Natural Sciences 10 Child Development Social Sciences 11 Behavioral Ecology and Sociobiology Social Sciences 12 Journal of Colloid and Information Science Natural Sciences 13 American Anthropologist Social Sciences 14 Journal of Biogeography Natural Sciences 15 Materials Science and Technology Natural Sciences Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-57
SLIDE 57

Introduction MESUR Mapping Science Metrics Survey Discussion

PageRank

PR(vi) = 1 − λ N + λ

  • j

PR(vj) O(vj) (2)

Table: Ranking of journals from M′ according to PageRank (λ = 0.85).

Rank Journal Top-level AAT classification 1 Applied Physics Letters Natural Sciences 2 Journal of Advanced Nursing Natural Sciences 3 Journal of the American Chemical Society Natural Sciences 4 Ecology Natural Sciences 5 Nature Natural Sciences 6 Physical Review B Natural Sciences 7 Journal of Applied Physics Natural Sciences 8 American Economic Review Social Sciences 9 American Historical Review Social Sciences 10 Physical Review Letters Natural Sciences 11 Science Natural Sciences 12 Langmuir Natural Sciences 13 Journal of Chemical Physics Natural Sciences 14 American Anthropologist Social Sciences 15 Annals of the American Academy of Political and Social Science Social Science Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-58
SLIDE 58

Introduction MESUR Mapping Science Metrics Survey Discussion

Outline

1 Introduction

Problem statement Usage data

2 MESUR

MESUR overview Creating MESUR’s reference data set

3 Mapping Science 4 Metrics Survey 5 Discussion

Overview Future Research Relevant papers

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-59
SLIDE 59

Introduction MESUR Mapping Science Metrics Survey Discussion

Metrics

ID Type Measure Source Network parameters PC1 PC2 ¯ ρ 1 Citation Scimago Journal Rank Scimago/Scopus

  • 0.974
  • 8.296

0.556⋆ 2 Citation Immediacy Index JCR 2007 1.659

  • 7.046

0.508⋆ 3 Citation Closeness Centrality JCR 2007 Undirected, weighted 0.339

  • 6.284

0.565⋆ 4 Citaton Cites per doc Scimago/Scopus

  • 1.311
  • 6.192

0.588⋆ 5 Citation Journal Impact Factor JCR 2007

  • 1.854
  • 5.937

0.592⋆ 6 Citation Closeness centrality JCR 2007 Undirected, unweighted

  • 1.388
  • 4.827

0.619 7 Citation Out-degree centrality JCR 2007 Directed, weighted

  • 3.191
  • 4.215

0.642 8 Citation Out-degree centrality JCR 2007 Directed, unweighted

  • 2.703
  • 4.015

0.640 9 Citation Degree Centrality JCR 2007 Undirected, weighted

  • 4.850
  • 2.834

0.690 10 Citation Degree Centrality JCR 2007 Undirected, unweighted

  • 4.398
  • 2.643

0.691 11 Citation H-Index Scimago/Scopus

  • 3.326
  • 2.003

0.681 12 Citation Scimago Total cites Scimago/Scopus

  • 4.926
  • 1.722

0.712 13 Citation Journal Cite Probability JCR 2007

  • 5.389
  • 1.647

0.710 14 Citation In-degree centrality JCR 2007 Directed, unweighted

  • 5.302
  • 1.429

0.717 15 Citation In-degree centrality JCR 2007 Directed, weighted

  • 5.380
  • 1.554

0.712 16 Citation PageRank JCR 2007 Directed, unweighted

  • 4.476

0.108 0.693 17 Citation PageRank JCR 2007 Undirected, unweighted

  • 4.929

0.731 0.726 18 Citation PageRank JCR 2007 Undirected, weighted

  • 4.160

0.864 0.696 19 Citation PageRank JCR 2007 Directed, weighted

  • 3.103

0.333 0.659 20 Citation Y-factor JCR 2007 Directed, weighted

  • 2.971

0.317 0.657 21 Citation Betweenness centrality JCR 2007 Undirected, weighted

  • 0.462

0.872 0.643 22 Citation Betweenness centrality JCR 2007 Undirected, unweighted

  • 0.474

1.609 0.642 23 Citation Citation Half-Life JCR 2007 / / 0.037 24 Usage Closeness centrality MESUR 2007 Undirected, weighted 3.130 2.683 0.703 25 Usage Closeness centrality MESUR 2007 Undirected, unweighted 3.100 3.899 0.731 26 Usage Degree centrality MESUR 2007 Undirected, unweighted 3.271 3.873 0.729 27 Usage PageRank MESUR 2007 Undirected, unweighted 3.327 4.192 0.728 28 Usage PageRank MESUR 2007 Directed, unweighted 3.463 4.336 0.727 29 Usage In-degree centrality MESUR 2007 Directed, unweighted 3.463 4.015 0.728 30 Usage Out-degree centrality MESUR 2007 Directed, unweighted 3.484 3.994 0.727 31 Usage PageRank MESUR 2007 Directed, weighted 3.780 4.217 0.710 32 Usage PageRank MESUR 2007 Undirected, weighted 3.813 4.223 0.710 33 Usage Betweenness centrality MESUR 2007 Undirected, unweighted 3.988 4.271 0.699 34 Usage Betweenness centrality MESUR 2007 Undirected, weighted 3.957 3.698 0.693 35 Usage Degree centrality MESUR 2007 Undirected, weighted 5.293 3.528 0.683 36 Usage Out-degree centrality MESUR 2007 Directed, weighted 5.302 3.518 0.683 37 Usage In-degree centrality MESUR 2007 Directed, weighted 5.286 3.531 0.683 38 Usage Journal Use Probability MESUR 2007 8.914 1.833 0.593 39 Usage Usage Impact Factor MESUR 2007 / / 0.279

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-60
SLIDE 60

Introduction MESUR Mapping Science Metrics Survey Discussion

R10×10 =

B B B B B B B B B B B B B @

1.00 0.71 0.77 0.52 0.79 0.55 0.69 0.63 0.60 0.18 0.71 0.99 0.52 0.69 0.79 0.85 0.49 0.44 0.49 0.22 0.77 0.52 1.00 0.62 0.63 0.39 0.70 0.73 0.68 0.20 0.52 0.69 0.62 1.00 0.68 0.78 0.49 0.56 0.65 0.06 0.79 0.79 0.63 0.68 1.00 0.82 0.66 0.62 0.66 0.15 0.55 0.85 0.39 0.78 0.82 1.00 0.40 0.40 0.50 0.13 0.69 0.49 0.70 0.49 0.66 0.40 1.00 0.89 0.85 0.53 0.63 0.44 0.73 0.56 0.62 0.40 0.89 1.00 0.97 0.45 0.60 0.49 0.68 0.65 0.66 0.50 0.85 0.97 1.00 0.42 0.18 0.22 0.20 0.06 0.15 0.13 0.53 0.45 0.42 1.00

1 C C C C C C C C C C C C C A

19: Citation PageRank 5: Journal Impact Factor 22: Citation Betweenness 6: Citation Closeness 11: Citation H-index 1: Citation Scimago Journal Rank 31: Usage PageRank 34: Usage Betweenness 24: Usage Closeness 39: Usage Impact Factor

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-61
SLIDE 61

Introduction MESUR Mapping Science Metrics Survey Discussion

2007 JCR MESUR citation data usage log data Scimago citation network 7,388 x 7,388 usage network 7,575 x 7,575 2,5,23 3,6,7,8,9,10,13,14 15,16,17,18,19,20 21,22 24,25,26,27,28,29,39 31,32,33,34,35,36,37, 38,39 1,4,11,12 data sources impact measures citation data 39x39 correlation matrix intersection 7,584 journals 12,751 journals

Schematic representation of data sources and processing. Impact measure identifiers.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-62
SLIDE 62

Introduction MESUR Mapping Science Metrics Survey Discussion

Principal Component Analysis

PC1 PC2 PC3 PC4 PC5 Proportion of Variance 66.1% 17.3% 9.2% 4.8% 0.9% Cumulative Proportion 66.1% 83.4% 92.6% 97.4% 98.3%

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-63
SLIDE 63

Introduction MESUR Mapping Science Metrics Survey Discussion

!! " ! #" !! " ! $%# $%&

22 21 19 5 18 17 16 15 14 13 12 10 9 8 7 6 4 3 2 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 1 11 20

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-64
SLIDE 64

Introduction MESUR Mapping Science Metrics Survey Discussion

PC1 PC2 Usage Betweenness PageRank citation Betweenness PageRank Total Cites Cites per Doc JIF06 Usage Probability Citation Immediacy

+

  • +
  • Johan Bollen - jbollen@indiana.edu

Tracking science in real-time from large-scale usage data

slide-65
SLIDE 65

Introduction MESUR Mapping Science Metrics Survey Discussion

2.5 2.0 1.5 1.0 0.5 0.0 38: Journal Use Probability 31: Usage PageRank 32: Usage PageRank 36: Usage Out.degree 37: Usage In.degree centr. 35: Usage Degree centr. 24: Usage Closen. centr. 33: Usage Betw. centr. 34: Usage Betw. centr. 25: Usage Closen. centr. 26: Usage Degree centr. 28: Usage PageRank 27: Usage PageRank 29: Usage In.degree centr. 30: Usage Out.degree centr. 11: Scimago H.index 12: Scimago Total Cites 14: Citation In.degree centr. 15: Citation In.degree centr. 13: Cite Probability 17: Citation PageRank 10: Citation Degree centr. 9: Citation Degree centr. 22: Citation Between. centr. 21: Citation Between. centr. 19: Citation PageRank 20: Citation Y.factor 16: Citation PageRank 18: Citation PageRank 3: Citation Closen. centr. 6: Citation Closen. centr. 8: Citation Out.degree centr. 7: Citation Out.degree centr. 2: Citation Immediacy Index 1: Scimago Journal Rank 5: Journal Impact Factor 4: Cites per doc

Cluster Measures Interpretation 1 38 Journal Use Probability 2 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37 Usage measures 3 1, 2, 3, 4, 5 JIF, SJR, Cites per Document measures 4 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 Total Citation rates and distributions 5 16, 17, 18, 19, 20, 21, 22 Citation Betweenness and PageRank

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-66
SLIDE 66

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Outline

1 Introduction

Problem statement Usage data

2 MESUR

MESUR overview Creating MESUR’s reference data set

3 Mapping Science 4 Metrics Survey 5 Discussion

Overview Future Research Relevant papers

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-67
SLIDE 67

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

MESUR: so far

Usage data: single largest reference data set of usage, citation and bibliographic data +1,000,000,000 usage events loaded multiple publishers, aggregators and institutions Infrastructure for research program Usage graphs: track scientific flow of activity real-time studies of science inclusive of larger “scholarly community” Metrics: valid, vetted indicators of scientific impact Different facets of scholarly impact Simple metrics, good results. Law of diminishing returns? Hybrid, consensus metrics?

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-68
SLIDE 68

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

MESUR: the bad

Sustainability: burden of data collection ad hoc, customized agreements with data providers restrictive agreements with regards to data sharing high costs of maintaining infrastructure: funding Research program: too much to do 3rd party access: better science many eyes on data Metrics and services: Community support and acceptance Can’t be limited to academic exercise Investigations need to be useful Accepted by community, become part of scholarly assessment system

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-69
SLIDE 69

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Future Research: Two directions

Longitudinal research and dynamic models of science Bursty behavior of science

Timeseries of reads over time are bursty Coordinated: social networks at work?

Modeling

Stochastic and agent-based models of scientific activity Citation following? Group-think? Find parameters of human information searching behavior.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-70
SLIDE 70

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Bursty behavior

50 100 150 200 250 300 350 100 300 500 700

APPLIED STATISTICS: time series

Index j[, 2] 50 100 150 200 250 300 350 −200 200 400

APPLIED STATISTICS: residuals percentiles

Index j[, 2] − m$y 50 100 150 200 250 300 350 −200 200 400

APPLIED STATISTICS: residuals smoothed

Index mr$y

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-71
SLIDE 71

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Contagion?

7 14 21 28 35 42 49 56 63 70 77 84 91 98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210 217 224 231 238 245 252 259 266 273 280 287 294 301 308 315 322 329 336 PERCEPTION & PSYCHOPHYSICS MOLECULAR PHYSICS ADVANCES IN PHYSICS JOURNAL OF PHYSICS B ATOMIC MOLECULAR AND OPTICAL PHYSI JOURNAL OF COMPUTATIONAL PHYSICS JOURNAL OF CHEMICAL PHYSICS CENTRAL EUROPEAN JOURNAL OF PHYSICS CELL BIOCHEMISTRY AND BIOPHYSICS PHYSICS LETTERS B CONTEMPORARY PHYSICS NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SEC CHEMICAL PHYSICS PHYSICA STATUS SOLIDI (B) NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SEC INTERNATIONAL JOURNAL OF THERMOPHYSICS PHYSICA SCRIPTA CANADIAN JOURNAL OF PHYSICS CHEMICAL PHYSICS LETTERS PHYSICA D APPLIED PHYSICS CHINESE PHYSICS LETTERS RADIATION PHYSICS AND CHEMISTRY JOURNAL OF PHYSICS D APPLIED PHYSICS CURRENT APPLIED PHYSICS PHYSICA A NUCLEAR PHYSICS A MEDICAL PHYSICS NUCLEAR PHYSICS B PHYSICA C INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS JOURNAL OF STATISTICAL PHYSICS FOUNDATIONS OF PHYSICS NEW JOURNAL OF PHYSICS PHYSICS OF PLASMAS MATERIALS CHEMISTRY AND PHYSICS JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS APPLIED PHYSICS B CZECHOSLOVAK JOURNAL OF PHYSICS ASTRONOMY & ASTROPHYSICS JOURNAL OF PHYSICS A GENERAL PHYSICS PHYSICS OF FLUIDS TECTONOPHYSICS PHYSICS LETTERS A JOURNAL OF PHYSICS G NUCLEAR AND PARTICLE PHYSICS BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOENERGETICS JAPANESE JOURNAL OF APPLIED PHYSICS PHYSICS OF ATOMIC NUCLEI ASTRONOMY & GEOPHYSICS QUARTERLY REVIEWS OF BIOPHYSICS INTERNATIONAL JOURNAL OF MODERN PHYSICS A PHYSICS OF THE EARTH AND PLANETARY INTERIORS PHYSICA B PHYSICA E MODERN PHYSICS LETTERS A ATMOSPHERIC CHEMISTRY AND PHYSICS PHYSICS AND CHEMISTRY OF LIQUIDS SOLAR PHYSICS NATURE PHYSICS EUROPEAN JOURNAL OF PHYSICS FOUNDATIONS OF PHYSICS LETTERS PHYSICS TODAY BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − BIOMEMBRANES ASTROPHYSICS AND SPACE SCIENCE PHYSICS AND CHEMISTRY OF THE EARTH PARTS A/B/C ACTA CRYSTALLOGRAPHICA SECTION A CRYSTAL PHYSICS DIFFR ANNALS OF PHYSICS BIOCHIMICA ET BIOPHYSICA ACTA (BBA) − MOLECULAR BASIS OF D EUROPHYSICS LETTERS AMERICAN JOURNAL OF PHYSICS JOURNAL OF HIGH ENERGY PHYSICS

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-72
SLIDE 72

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Contagion?

! " #$ %# %& '( $% $) (* *' "! "" &$ )# )& #!( ##% ##) #%* #'' #$! #$" #($ #*# #*& #"( #&% #&) #)* %!' %#! %#" %%$ %'# %'& %$( %(% %() %** %"' %&! %&" %)$ '!# '!& '#( '%% '%) ''* +,-./0-1234,2.5063/37849 7:8/05393.54+0./;0+,-./063/37849 ./8-.2063/37849 428/84.2063/37849 .//.2901<0+,-./063/37849 =1,5/.201<0.998973;053>51;,4781/0./;063/37849 =1,5/.201<0./8-.20?533;8/60./;063/37849

  • 1234,2.50>+@2163/378490./;03A12,781/

<,/6.2063/378490./;0?81216@

  • 1234,2.5063/378490./;0-37.?1289-

?3+.A815063/37849 41/935A.781/063/37849 /.7,53063/37849 63/3784. 982A.3063/3784. /3:063/378490B0914837@ 3,51>3./0=1,5/.201<08--,/163/37849 .-3584./0=1,5/.201<0+,-./063/37849 ?814+3-84.2063/37849 =1,5/.201<0-3;84.2063/37849 4./435063/378490./;04@7163/37849 >219063/37849 .//,.2053A83:01<063/37849 753/;908/063/37849 8--,/163/37849 .-3584./0=1,5/.201<0-3;84.2063/378490>.570. 5,998./0=1,5/.201<063/37849 +,-./063/37849 .//,.2053A83:01<063/1-8490./;0+,-./063/37849 =1,5/.201<0+,-./063/37849 63/390B063/378409@973-9 =1,5/.201<063/37840>9@4+1216@ 63/3784.205393.54+ =1,5/.201<0/3,5163/37849 4@7163/37840./;063/1-305393.54+ 63/378409148.20./;063/35.20>9@4+1216@0-1/165.>+9 /3,5163/37849 >9@4+8.7584063/37849 63/3784073978/6 >+.5-.4163/37849 63/378490./;0-1234,2.50?81216@ ?-4063/37849 4,553/701>8/81/08/063/378490B0;3A321>-3/7 3,51>3./0=1,5/.201<0-3;84.2063/37849

  • 1234,2.50./;063/35.2063/37849

3,51>3./0=1,5/.201<0+,-./063/37849 .-3584./0=1,5/.201<0-3;84.2063/378490>.570?0/3,51>9@4 63/378403/68/3358/60/3:9 63/378403>8;3-81216@ .//.2390;3063/378C,3

  • ,7.781/05393.54+0!063/3784071D841216@0./;03/A851/-3/

.-3584./0=1,5/.201<0-3;84.2063/378490>.5704093-8/.5908/0 63/37849093234781/03A12,781/ 4,553/7063/37849 =1,5/.201<063/37849 .;A./43908/063/37849 63/3784908/0-3;848/3 E153./0=1,5/.201<063/37849 8/735/.781/.20=1,5/.201<08--,/163/37849 41--,/87@063/37849 >+.5-.4163/378490./;063/1-849 63/3784041,/9328/6 /.7,53053A83:9063/37849 63/37849

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data

slide-73
SLIDE 73

Introduction MESUR Mapping Science Metrics Survey Discussion Overview Future Research Relevant papers

Relevant papers

Johan Bollen, Herbert Van de Sompel, Aric Hagberg and Ryan Chute A Principal Component Analysis of 39 Scientific Impact Measures. PLoS ONE, June 2009. URL: http://dx.plos.org/10.1371/journal.pone.0006022. Michael Kurtz and Johan Bollen. Usage bibliometrics. Annual Review of Information Science and Technology, 2010 Johan Bollen, Herbert Van de Sompel, Aric Hagberg,Luis Bettencourt, Ryan Chute, Marko A. Rodriguez, Lyudmila Balakireva. Clickstream data yields high-resolution maps of science. PLoS One, February 2009.

Johan Bollen - jbollen@indiana.edu Tracking science in real-time from large-scale usage data