Filippo Radicchi Graph-based ranking algorithms BioCenit Research - - PowerPoint PPT Presentation

filippo radicchi graph based ranking algorithms
SMART_READER_LITE
LIVE PREVIEW

Filippo Radicchi Graph-based ranking algorithms BioCenit Research - - PowerPoint PPT Presentation

Filippo Radicchi Graph-based ranking algorithms BioCenit Research Lab @ Universitat Rovira i Virgili Why analyze bibliographic data? Scientif i c motivation Electronic databases store a huge amount of information about scientif i c


slide-1
SLIDE 1

Filippo Radicchi

BioCenit Research Lab @ Universitat Rovira i Virgili

Graph-based ranking algorithms

slide-2
SLIDE 2

Collaboration networks Collaboration networks Citation networks Citation networks

Scientif i c motivation

Electronic databases store a huge amount of information about scientif i c publications ( only in 2006, Journals ~104 Papers ~106 Citations ~107 )

Why analyze bibliographic data?

slide-3
SLIDE 3

Practical motivation

Citations represent the fundamental units used to measure the scientif i c relevance of papers, journals, scientists, research groups and institutions

Why analyze bibliographic data?

slide-4
SLIDE 4

Discipline Discipline Researcher Researcher

  • Ass. Professor
  • Ass. Professor

Full Professor Full Professor Mathematics Mathematics Physics Physics Biology Biology

Computer Sci Computer Sci. .

source: ”Indicatori di Attività Scientif i ca e di Ricerca”, Consiglio Universitario Nazionale (CUN), Dec. 2008

P = # publications, T = period of activity , C = total # of citations

Chemistry Chemistry

A practical example

slide-5
SLIDE 5

A practical example

source: ”Abilitazione Scientif i ca Nazionale – La normalizzazione degli indicatori per l'eta' accademica (ANVUR)”, Jul. 2012

1) Number of papers:

2) Total number of citations: I ( N p , AA)=10 N p A A I ( N C , AA)= N C AA

3) Contemporary h-index: S (i ,t i ,t)=

4 (t−ti+1) C (i ,t i ,t)

AA N p N C C(i ,t i ,t)

hc

total number of papers academic age total number of citations

citations accumulated by paper i published in year ti and measured in year t

  • A. Sidiropoulos et al., Scientometrics 72, 253 (2007)
  • R. Manella and P. Rossi, arXiv:1207.3499 (2012)

calculated over a population of 1400 Italian physicists

slide-6
SLIDE 6

On a larger sample of scientists

35000 prof i les on Google Scholar citations

slide-7
SLIDE 7

The network structure of citation data is often neglected in research evaluation

D.J. de Solla Price, Science 169, 510 (1965)

Network approach Network approach “ “Standard” approach Standard” approach

Citation counts Citation counts h-index, g-index, ... h-index, g-index, ... Impact factor Impact factor

papers journals scientists

CiteRank CiteRank Eigenfactor Eigenfactor

? ?

slide-8
SLIDE 8

Physical Review Series I (PRI), Physical Review (PR), Physical Review Letters (PRL), Physical Review A (PRA), Physical Review B (PRB), Physical Review C (PRC), Physical Review D (PRD), Physical Review E (PRE), Reviews of Modern Physics (RMP)

between 1893 and 2006

Paper Citation Network Paper Citation Network Weighted Author Citation Network Weighted Author Citation Network

Graph-based ranking of scientists

slide-9
SLIDE 9

key-words: ”complex network” , ”scale-free network”, ”small-world network”, etc..

Weighted author citation network

slide-10
SLIDE 10

Divide 8,783,994 total references into Divide 8,783,994 total references into homogeneous homogeneous intervals intervals

M MI

I = # of intervals

= # of intervals M MR

R = # of references in each interval

= # of references in each interval

M MR

R

2M 2MR

R

M MI

I M

MR

R

(M (MI

I-1)

  • 1)

M

MR

R

(M (MI

I-2)

  • 2)

M

MR

R

MR/2 3MR/2

(MI - 1/2) MR (MI - 3/2) MR

MR ~ 488,000 MI = 18 2006 1893-1966

Dynamical representation

slide-11
SLIDE 11

Diffusion equation

weight of the arc from j to i

  • ut-strength of the node j

each paper carries a ”scientif i c credit”, equally divided among its authors

SARA scores depend on the choice of the redistribution probability q

Science Author Rank Algorithm

slide-12
SLIDE 12

Science Author Rank Algorithm

slide-13
SLIDE 13

Benchmarking SARA

Considered prizes: Nobel prize, Wolf prize, Boltzmann medal, Dirac medal and Planck medal

Comparison with different metrics

slide-14
SLIDE 14

NP= Nobel prize, WP= Wolf prize, BM= Boltzmann medal, DM= Dirac medal, and PM= Planck medal

Best physicists according to SARA

slide-15
SLIDE 15

physauthorsrank.org

slide-16
SLIDE 16

Ranking tennis players

slide-17
SLIDE 17

ATP points distribution as of 2009

Grand Slams: Australian Open, Roland Garros, Wimbledon, US Open Masters 1000: Indian Wells, Miami, Monte Carlo, Madrid, Rome, Canada, Cincinnati, Shanghai, Paris 500 Series: Rotterdam, Memphis, Acapulco, Dubai, Barcelona, Hamburg, Washington, Beijing, Tokyo, Basel, Valencia 250 Series: Doha, Chennai, Brisbane, Sydney, Auckland, ........

4 9 11 40

Best results in 18 tournaments: 4 Grand Slams, 8 Masters 1000, best 4 results in 500 Series and best 2 results in 250 Series

source: wikipedia.org

ATP World Tour Finals: reserved to the best 8 players in the ranking

slide-18
SLIDE 18

ATP points 2008 ATP points 2009

slide-19
SLIDE 19

ATP data cover all tournaments since 1968

slide-20
SLIDE 20

The Open Era

3700 players, 3600 tournaments, 133000 matches

slide-21
SLIDE 21

Tennis contact graph

each match is a directed edge from the loser to the winner edges are weighted wi , j = total matches i vs. j, won by j

slide-22
SLIDE 22

Top players in Grand Slams

Only players with at least two Grand Slam titles between 1968 and 2010

slide-23
SLIDE 23

Tennis is “complex”

Matthew effect in career longevity, A.M. Pertersen at al., Proc. Natl. Acad. Sci. USA 108, 18 (2011)

slide-24
SLIDE 24

Prestige score

diffusion random relocation

correction for dangling nodes

for a Grand Slam tournament for a tournament

slide-25
SLIDE 25

#3 : John McEnroe

Career prize money: $12,547,797 Career record: 875–198 (81.55%) Career titles: 104 including 77 listed by the ATP

slide-26
SLIDE 26

#2 : Ivan Lendl

Career prize money: $21,262,417 Career record: 1071–239 (81.8%) Career titles: 144 including 94 listed by the ATP

slide-27
SLIDE 27

#1 : Jimmy Connors

Career prize money: $8,641,040 Career record: 1241–277 (81.75%) Career titles: 148 including 109 listed by the ATP

slide-28
SLIDE 28

Prestige Rank

slide-29
SLIDE 29

Relation with other scores

slide-30
SLIDE 30

Relation with other scores

2009 ATP year-end rank

slide-31
SLIDE 31

Best player of the year

slide-32
SLIDE 32

Best player of the year

slide-33
SLIDE 33

Best players in Grand Slams

slide-34
SLIDE 34

What did people think about this ranking?

slide-35
SLIDE 35

What did players think about this ranking?

journalist: “There is a weird study by an American physician...” Pete: “Who is this guy!?!?!?!”

slide-36
SLIDE 36

References

  • F. Radicchi

PloS ONE 6, e17249 (2011)

  • F. Radicchi, S. Fortunato, B. Markines and A. Vespignani
  • Phys. Rev. E 80, 056103 (2009)

Diffusion of scientif i c credits and the ranking of scientists Who is the best player ever? A complex network analysis of the history of professional tennis

  • F. Radicchi, S. Fortunato and A. Vespignani

In Models of Science Dynamics: Encounters Between Complexity Theory and Information Sciences.

  • Eds. A.Scharnhorst; K. Börner and P. van den Besselaar (Springer, 2012)

Citation networks

slide-37
SLIDE 37
slide-38
SLIDE 38

Can bibliographic data be used for research evaluation?

discipline discipline scientist scientist h-index h-index

Physics

Edward Witten Marvin Cohen Philip W. Anderson Manuel Cardona Frank Wilczek 110 94 91 86 68

Chemistry

George Whitesides Martin Karplus Kurt Wuthrich 135 Elias J. Corey Alan Hegeer 132 129 114 113

Computer science

Hector Garcia- Molina Ian Foster 70 Deborah Estrin Scott Shenker, Jeffrey D. Ullman, Don Towsley 68 67 65

  • P. Ball, Nature 448, 737 (2007)
slide-39
SLIDE 39

Papers are classified in 172 scientific disciplines (from A

c

  • u

s t i c s to Z

  • l
  • g

y ) Publication year Publication year Number of citations Number of citations

Papers Papers Journals Journals Subject Categories Subject Categories

Description of the dataset

slide-40
SLIDE 40

Source of data March 2008

Different scientific disciplines

slide-41
SLIDE 41

Different scientific disciplines

slide-42
SLIDE 42

Different scientific disciplines