Filippo Radicchi Graph-based ranking algorithms BioCenit Research - - PowerPoint PPT Presentation
Filippo Radicchi Graph-based ranking algorithms BioCenit Research - - PowerPoint PPT Presentation
Filippo Radicchi Graph-based ranking algorithms BioCenit Research Lab @ Universitat Rovira i Virgili Why analyze bibliographic data? Scientif i c motivation Electronic databases store a huge amount of information about scientif i c
Collaboration networks Collaboration networks Citation networks Citation networks
Scientif i c motivation
Electronic databases store a huge amount of information about scientif i c publications ( only in 2006, Journals ~104 Papers ~106 Citations ~107 )
Why analyze bibliographic data?
Practical motivation
Citations represent the fundamental units used to measure the scientif i c relevance of papers, journals, scientists, research groups and institutions
Why analyze bibliographic data?
Discipline Discipline Researcher Researcher
- Ass. Professor
- Ass. Professor
Full Professor Full Professor Mathematics Mathematics Physics Physics Biology Biology
Computer Sci Computer Sci. .
source: ”Indicatori di Attività Scientif i ca e di Ricerca”, Consiglio Universitario Nazionale (CUN), Dec. 2008
P = # publications, T = period of activity , C = total # of citations
Chemistry Chemistry
A practical example
A practical example
source: ”Abilitazione Scientif i ca Nazionale – La normalizzazione degli indicatori per l'eta' accademica (ANVUR)”, Jul. 2012
1) Number of papers:
2) Total number of citations: I ( N p , AA)=10 N p A A I ( N C , AA)= N C AA
3) Contemporary h-index: S (i ,t i ,t)=
4 (t−ti+1) C (i ,t i ,t)
AA N p N C C(i ,t i ,t)
hc
total number of papers academic age total number of citations
citations accumulated by paper i published in year ti and measured in year t
- A. Sidiropoulos et al., Scientometrics 72, 253 (2007)
- R. Manella and P. Rossi, arXiv:1207.3499 (2012)
calculated over a population of 1400 Italian physicists
On a larger sample of scientists
35000 prof i les on Google Scholar citations
The network structure of citation data is often neglected in research evaluation
D.J. de Solla Price, Science 169, 510 (1965)
Network approach Network approach “ “Standard” approach Standard” approach
Citation counts Citation counts h-index, g-index, ... h-index, g-index, ... Impact factor Impact factor
papers journals scientists
CiteRank CiteRank Eigenfactor Eigenfactor
? ?
Physical Review Series I (PRI), Physical Review (PR), Physical Review Letters (PRL), Physical Review A (PRA), Physical Review B (PRB), Physical Review C (PRC), Physical Review D (PRD), Physical Review E (PRE), Reviews of Modern Physics (RMP)
between 1893 and 2006
Paper Citation Network Paper Citation Network Weighted Author Citation Network Weighted Author Citation Network
Graph-based ranking of scientists
key-words: ”complex network” , ”scale-free network”, ”small-world network”, etc..
Weighted author citation network
Divide 8,783,994 total references into Divide 8,783,994 total references into homogeneous homogeneous intervals intervals
M MI
I = # of intervals
= # of intervals M MR
R = # of references in each interval
= # of references in each interval
M MR
R
2M 2MR
R
M MI
I M
MR
R
(M (MI
I-1)
- 1)
M
MR
R
(M (MI
I-2)
- 2)
M
MR
R
MR/2 3MR/2
(MI - 1/2) MR (MI - 3/2) MR
MR ~ 488,000 MI = 18 2006 1893-1966
Dynamical representation
Diffusion equation
weight of the arc from j to i
- ut-strength of the node j
each paper carries a ”scientif i c credit”, equally divided among its authors
SARA scores depend on the choice of the redistribution probability q
Science Author Rank Algorithm
Science Author Rank Algorithm
Benchmarking SARA
Considered prizes: Nobel prize, Wolf prize, Boltzmann medal, Dirac medal and Planck medal
Comparison with different metrics
NP= Nobel prize, WP= Wolf prize, BM= Boltzmann medal, DM= Dirac medal, and PM= Planck medal
Best physicists according to SARA
physauthorsrank.org
Ranking tennis players
ATP points distribution as of 2009
Grand Slams: Australian Open, Roland Garros, Wimbledon, US Open Masters 1000: Indian Wells, Miami, Monte Carlo, Madrid, Rome, Canada, Cincinnati, Shanghai, Paris 500 Series: Rotterdam, Memphis, Acapulco, Dubai, Barcelona, Hamburg, Washington, Beijing, Tokyo, Basel, Valencia 250 Series: Doha, Chennai, Brisbane, Sydney, Auckland, ........
4 9 11 40
Best results in 18 tournaments: 4 Grand Slams, 8 Masters 1000, best 4 results in 500 Series and best 2 results in 250 Series
source: wikipedia.org
ATP World Tour Finals: reserved to the best 8 players in the ranking
ATP points 2008 ATP points 2009
ATP data cover all tournaments since 1968
The Open Era
3700 players, 3600 tournaments, 133000 matches
Tennis contact graph
each match is a directed edge from the loser to the winner edges are weighted wi , j = total matches i vs. j, won by j
Top players in Grand Slams
Only players with at least two Grand Slam titles between 1968 and 2010
Tennis is “complex”
Matthew effect in career longevity, A.M. Pertersen at al., Proc. Natl. Acad. Sci. USA 108, 18 (2011)
Prestige score
diffusion random relocation
correction for dangling nodes
for a Grand Slam tournament for a tournament
#3 : John McEnroe
Career prize money: $12,547,797 Career record: 875–198 (81.55%) Career titles: 104 including 77 listed by the ATP
#2 : Ivan Lendl
Career prize money: $21,262,417 Career record: 1071–239 (81.8%) Career titles: 144 including 94 listed by the ATP
#1 : Jimmy Connors
Career prize money: $8,641,040 Career record: 1241–277 (81.75%) Career titles: 148 including 109 listed by the ATP
Prestige Rank
Relation with other scores
Relation with other scores
2009 ATP year-end rank
Best player of the year
Best player of the year
Best players in Grand Slams
What did people think about this ranking?
What did players think about this ranking?
journalist: “There is a weird study by an American physician...” Pete: “Who is this guy!?!?!?!”
References
- F. Radicchi
PloS ONE 6, e17249 (2011)
- F. Radicchi, S. Fortunato, B. Markines and A. Vespignani
- Phys. Rev. E 80, 056103 (2009)
Diffusion of scientif i c credits and the ranking of scientists Who is the best player ever? A complex network analysis of the history of professional tennis
- F. Radicchi, S. Fortunato and A. Vespignani
In Models of Science Dynamics: Encounters Between Complexity Theory and Information Sciences.
- Eds. A.Scharnhorst; K. Börner and P. van den Besselaar (Springer, 2012)
Citation networks
Can bibliographic data be used for research evaluation?
discipline discipline scientist scientist h-index h-index
Physics
Edward Witten Marvin Cohen Philip W. Anderson Manuel Cardona Frank Wilczek 110 94 91 86 68
Chemistry
George Whitesides Martin Karplus Kurt Wuthrich 135 Elias J. Corey Alan Hegeer 132 129 114 113
Computer science
Hector Garcia- Molina Ian Foster 70 Deborah Estrin Scott Shenker, Jeffrey D. Ullman, Don Towsley 68 67 65
- P. Ball, Nature 448, 737 (2007)
Papers are classified in 172 scientific disciplines (from A
c
- u
s t i c s to Z
- l
- g
y ) Publication year Publication year Number of citations Number of citations
Papers Papers Journals Journals Subject Categories Subject Categories
Description of the dataset
Source of data March 2008