Collaboration Signatur Collaboration Signatures es Reveal - - PowerPoint PPT Presentation

collaboration signatur collaboration signatures es reveal
SMART_READER_LITE
LIVE PREVIEW

Collaboration Signatur Collaboration Signatures es Reveal - - PowerPoint PPT Presentation

Collaboration Signatur Collaboration Signatures es Reveal Scientific Impact Reveal Scientific Impact Yuxiao Dong, Reid A. Johnson, Yang Yang, Nitesh V. Chawla Interdisciplinary Center for Network Science and Applications Department of Computer


slide-1
SLIDE 1

Yuxiao Dong, Reid A. Johnson, Yang Yang, Nitesh V. Chawla Interdisciplinary Center for Network Science and Applications Department of Computer Science and Engineering University of Notre Dame

Collaboration Signatur Collaboration Signatures es Reveal Scientific Impact Reveal Scientific Impact

slide-2
SLIDE 2

2

Collaboration is an integral element of the scientific process that often leads to findings with significant impact.

slide-3
SLIDE 3

3

A real-world academic dataset from .

  • 1. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su. ArnetMiner:

Extraction and Mining of Academic Social Networks. KDD’08.

  • 2. https://aminer.org/billboard/AMinerNetwork.

1,712,433 Authors 2,092,356 Papers 4,258,615 Collaborators

slide-4
SLIDE 4

4

1970 1980 1990 2000 2010 1.5 2.0 2.5 3.0 Years #authors per publication

Year vs. Number of Authors per Publication

Research collaborations are becoming increasingly prevalent.

slide-5
SLIDE 5

5

year

1960 1970 1980 1990 2000 2010

#papers

101 102 103 104 105

Number of Publications

year

1960 1970 1980 1990 2000 2010

#authors

101 102 103 104 105

#authors #new-authors

year

1960 1970 1980 1990 2000 2010

#average value

0.5 1 1.5 2 2.5 3 3.5 4

#papers per author #authors per paper

Number of Authors

  • Avg. of Paper/Author

Year (1950-2010) vs.: ¡

Average publication output has remained roughly constant. Collaboration has substantially expanded.

slide-6
SLIDE 6

6

u’s collaboration ego network consists of the ego u and u’s collaboration relationships, including the self-collaboration self-collaboration with u. Tie Weight: Tie Strength:

P: set of publications that u and v co-authored np: number of authors of each publication p Γ(u): u’s collaborations in ego network

wuv = 1 np

p∈P

suv = wuv wuk

k∈Γ u

( )

u v

u d e c b g f

wuv

slide-7
SLIDE 7

7

Sociability: the number of collaborators . This metric examines the number of collaboration relationships that researchers can maintain throughout their academic careers.

Sociability Dependence Diversity Self-Collaboration

Γ u

( )

u v

u d e c b g f

wuv

slide-8
SLIDE 8

8

Sociability Dependence Diversity Self-Collaboration

Dependence: the fraction of a researcher’s collaborators fulfilling: This metric indicates the level of one’s research dependence.

suv > svu, I suv > svu

( )

v∈Γ u

( )

Γ u

( )

u v

u d e c b g f

wuv

slide-9
SLIDE 9

9

Diversity: the Shannon entropy of collaboration strength distribution: This metric investigates how researchers distribute scientific collaborations among different collaborators.

Sociability Dependence Diversity Self-Collaboration

− suv

v∈Γ u

( )

× log suv

( )

u v

u d e c b g f

wuv

slide-10
SLIDE 10

Sociability Dependence Diversity Self-Collaboration

10

Self-Collaboration: the fraction of ties that are self-collaboration, Suu. This metric measures the efforts that are independent research, as compared to collaborative endeavors.

u v

u d e c b g f

wuv

slide-11
SLIDE 11

11

What are Turing Award winners’ collaboration signatures? Are they distinct from other researchers’? Do we have distinctive collaboration signatures conditioned on our scientific impact?

– Turing Award winners – h-index – Number of top-venue publications – Big-hit publications

slide-12
SLIDE 12

12

  • J. E. Hirsch. An Index to Quantify an Individual’s Scientific Research Output. PNAS 102(45). 2005.

h-index

A researcher’s h-index can be used to quantify his/her scientific impact.

slide-13
SLIDE 13

13

career year

5 10 15 20 25 30

sociability

5 10 15 20 25 30 35 40

Turing Winners h-index [1, 9] h-index [10, 19] h-index [20, 29] h-index [30, 39] h-index [40, 49] h-index [50, 59] h-index [60, 123]

Researchers with higher h-indices have relatively greater sociability, though sociability increases to a peak for all groups.

Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.

x-axis: the xth year of one’s research career.

slide-14
SLIDE 14

14

#years

10 15 20 25 30 35 40 45

h-index

20 30 40 50 60 70 80 90

x-axis: Number of years since first publication. y-axis: h-index.

h-indices range from 25 to 83 in 2012.

slide-15
SLIDE 15

15

career year

5 10 15 20 25 30

dependence

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Researchers’ dependence scores generally decrease at the initial career stages and take time to increase.

Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.

x-axis: the xth year of one’s research career.

slide-16
SLIDE 16

16

career year

5 10 15 20 25 30

diversity

0.5 1 1.5 2 2.5 3

Between groups of researchers with different h-indices,’ diversity values tend to diverge over time, eventually stabilizing.

Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.

x-axis: the xth year of one’s research career.

slide-17
SLIDE 17

17

career year

5 10 15 20 25 30

self-collaboration

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Between groups of researchers with different h-indices, a long- term difference in self-collaboration is identifiable early.

Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.

x-axis: the xth year of one’s research career.

slide-18
SLIDE 18

18

Extracted from 8 computer science focus areas. Choose top 3 venues for each area.

Top Venues

A researcher’s number of top-venue publications can be used to quantify his/her scientific impact.

slide-19
SLIDE 19

19

#top-venue papers

100 101

sociability

50 100 150 200 250

AI IR CV ML TH DB DM NLP

Regardless of research area, the degree of sociability exhibited by researchers tends to increase with top-venue publications.

Artificial Intelligence (AI)

  • IJCAI, AAAI, ECAI

Information Retrieval (IR)

  • SIGIR, ECIR, TREC

Computer Vision (CV)

  • CVPR, ICCV, ECCV

Machine Learning (ML)

  • ICML, NIPS, ECML

Theory (TH)

  • FOCS, STOC, SODA

Databases (DB)

  • SIGMOD, VLDB, ICDE

Data Mining (DM)

  • KDD, ICDM, SDM

Natural Language Processing (NLP)

  • ACL, EMNLP

, COLING

slide-20
SLIDE 20

20

#top-venue papers

100 101

dependence

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

AI IR CV ML TH DB DM NLP

Regardless of research area, research dependence decreases with the number of publications in top-venues.

Artificial Intelligence (AI)

  • IJCAI, AAAI, ECAI

Information Retrieval (IR)

  • SIGIR, ECIR, TREC

Computer Vision (CV)

  • CVPR, ICCV, ECCV

Machine Learning (ML)

  • ICML, NIPS, ECML

Theory (TH)

  • FOCS, STOC, SODA

Databases (DB)

  • SIGMOD, VLDB, ICDE

Data Mining (DM)

  • KDD, ICDM, SDM

Natural Language Processing (NLP)

  • ACL, EMNLP

, COLING

slide-21
SLIDE 21

21

#top-venue papers

100 101

diversity

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

AI IR CV ML TH DB DM NLP

Regardless of research area, the degree of diversity exhibited by researchers tends to increase with top-venue publications.

Artificial Intelligence (AI)

  • IJCAI, AAAI, ECAI

Information Retrieval (IR)

  • SIGIR, ECIR, TREC

Computer Vision (CV)

  • CVPR, ICCV, ECCV

Machine Learning (ML)

  • ICML, NIPS, ECML

Theory (TH)

  • FOCS, STOC, SODA

Databases (DB)

  • SIGMOD, VLDB, ICDE

Data Mining (DM)

  • KDD, ICDM, SDM

Natural Language Processing (NLP)

  • ACL, EMNLP

, COLING

slide-22
SLIDE 22

22

Big-Hit Papers

A researcher’s most cited publication can be used to quantify his/her scientific impact.

slide-23
SLIDE 23

23

year

1975 1980 1985 1990 1995 2000

sociability

50 100 150

bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)

Researchers with high sociability tend to have big-hit publications.

slide-24
SLIDE 24

24

year

1975 1980 1985 1990 1995 2000

dependence

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)

Researchers with big-hit publications tend to have relatively low dependence.

slide-25
SLIDE 25

25

year

1975 1980 1985 1990 1995 2000

diversity

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)

Researchers with high diversity tend to have big-hit publications.

slide-26
SLIDE 26

26

Based on these findings, we use collaboration signatures to predict scientific impact.

slide-27
SLIDE 27

27

year

1950 1960 1970 1980 1990 2000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R2 PCC

Predictiveness vs. First Year Publishing

Scientific impact can be reasonably inferred from our four simple collaboration signatures even across generations of researchers.

slide-28
SLIDE 28

28

career year

5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

R2 PCC

Predictiveness vs. Number of Years Publishing

With longer collaboration signatures, future scientific impact can be predicted with increasing fidelity (as measured by R2 and PCC).

slide-29
SLIDE 29

29

data (log) predicted (log)

10 100 10 100

  • data (log)

predicted (log)

10 100 10 100

  • data (log)

predicted (log)

10 100 10 100

  • 1970-1974

1980-1984 1990-1994

Actual vs. Predicted Author h-indices

Strong correlation between collaboration signatures and future scientific impact.

slide-30
SLIDE 30

30

data (log) predicted (log)

10 100 10 100

  • data (log)

predicted (log)

10 100 10 100

  • data (log)

predicted (log)

10 100 10 100

  • First 5 Years

First 15 Years First 25 Years

Actual vs. Predicted Author h-indices

Strong correlation between collaboration signatures and future scientific impact.

slide-31
SLIDE 31

31

Army Research Laboratory (ARL) U.S. Air Force Office of Scientific Research (AFOSR) Defense Advanced Research Projects Agency (DARPA) National Science Foundation (NSF)

slide-32
SLIDE 32

32

Collaboration signatures reveal scientific impact.

– Scholars with dissimilar impact produce distinctive collaboration signatures. – Scientific impact (e.g., h-index) can be inferred from collaboration signatures.

Turing Award winners display unique collaboration signatures.

– Low level of sociability and diversity in collaborations. – High level of self-collaboration. – Lifetime stability of collaboration signatures.

slide-33
SLIDE 33

33

Q & A Q & A

Will your ASONAM’15 paper or next paper increase your h-index?

Dong, Johnson, Chawla. Will This Paper Increase Your h-index? Scientific Impact Prediction. In ACM WSDM’15.