Yuxiao Dong, Reid A. Johnson, Yang Yang, Nitesh V. Chawla Interdisciplinary Center for Network Science and Applications Department of Computer Science and Engineering University of Notre Dame
Collaboration Signatur Collaboration Signatures es Reveal - - PowerPoint PPT Presentation
Collaboration Signatur Collaboration Signatures es Reveal - - PowerPoint PPT Presentation
Collaboration Signatur Collaboration Signatures es Reveal Scientific Impact Reveal Scientific Impact Yuxiao Dong, Reid A. Johnson, Yang Yang, Nitesh V. Chawla Interdisciplinary Center for Network Science and Applications Department of Computer
2
Collaboration is an integral element of the scientific process that often leads to findings with significant impact.
3
A real-world academic dataset from .
- 1. J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su. ArnetMiner:
Extraction and Mining of Academic Social Networks. KDD’08.
- 2. https://aminer.org/billboard/AMinerNetwork.
1,712,433 Authors 2,092,356 Papers 4,258,615 Collaborators
4
1970 1980 1990 2000 2010 1.5 2.0 2.5 3.0 Years #authors per publication
Year vs. Number of Authors per Publication
Research collaborations are becoming increasingly prevalent.
5
year
1960 1970 1980 1990 2000 2010
#papers
101 102 103 104 105
Number of Publications
year
1960 1970 1980 1990 2000 2010
#authors
101 102 103 104 105
#authors #new-authors
year
1960 1970 1980 1990 2000 2010
#average value
0.5 1 1.5 2 2.5 3 3.5 4
#papers per author #authors per paper
Number of Authors
- Avg. of Paper/Author
Year (1950-2010) vs.: ¡
Average publication output has remained roughly constant. Collaboration has substantially expanded.
6
u’s collaboration ego network consists of the ego u and u’s collaboration relationships, including the self-collaboration self-collaboration with u. Tie Weight: Tie Strength:
P: set of publications that u and v co-authored np: number of authors of each publication p Γ(u): u’s collaborations in ego network
wuv = 1 np
p∈P
∑
suv = wuv wuk
k∈Γ u
( )
∑
u v
u d e c b g f
wuv
7
Sociability: the number of collaborators . This metric examines the number of collaboration relationships that researchers can maintain throughout their academic careers.
Sociability Dependence Diversity Self-Collaboration
Γ u
( )
u v
u d e c b g f
wuv
8
Sociability Dependence Diversity Self-Collaboration
Dependence: the fraction of a researcher’s collaborators fulfilling: This metric indicates the level of one’s research dependence.
suv > svu, I suv > svu
( )
v∈Γ u
( )
∑
Γ u
( )
u v
u d e c b g f
wuv
9
Diversity: the Shannon entropy of collaboration strength distribution: This metric investigates how researchers distribute scientific collaborations among different collaborators.
Sociability Dependence Diversity Self-Collaboration
− suv
v∈Γ u
( )
∑
× log suv
( )
u v
u d e c b g f
wuv
Sociability Dependence Diversity Self-Collaboration
10
Self-Collaboration: the fraction of ties that are self-collaboration, Suu. This metric measures the efforts that are independent research, as compared to collaborative endeavors.
u v
u d e c b g f
wuv
11
What are Turing Award winners’ collaboration signatures? Are they distinct from other researchers’? Do we have distinctive collaboration signatures conditioned on our scientific impact?
– Turing Award winners – h-index – Number of top-venue publications – Big-hit publications
12
- J. E. Hirsch. An Index to Quantify an Individual’s Scientific Research Output. PNAS 102(45). 2005.
h-index
A researcher’s h-index can be used to quantify his/her scientific impact.
13
career year
5 10 15 20 25 30
sociability
5 10 15 20 25 30 35 40
Turing Winners h-index [1, 9] h-index [10, 19] h-index [20, 29] h-index [30, 39] h-index [40, 49] h-index [50, 59] h-index [60, 123]
Researchers with higher h-indices have relatively greater sociability, though sociability increases to a peak for all groups.
Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.
x-axis: the xth year of one’s research career.
14
#years
10 15 20 25 30 35 40 45
h-index
20 30 40 50 60 70 80 90
x-axis: Number of years since first publication. y-axis: h-index.
h-indices range from 25 to 83 in 2012.
15
career year
5 10 15 20 25 30
dependence
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Researchers’ dependence scores generally decrease at the initial career stages and take time to increase.
Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.
x-axis: the xth year of one’s research career.
16
career year
5 10 15 20 25 30
diversity
0.5 1 1.5 2 2.5 3
Between groups of researchers with different h-indices,’ diversity values tend to diverge over time, eventually stabilizing.
Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.
x-axis: the xth year of one’s research career.
17
career year
5 10 15 20 25 30
self-collaboration
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Between groups of researchers with different h-indices, a long- term difference in self-collaboration is identifiable early.
Given a researcher’s h-index in 2012 and the year of his/her first publication, his/her collaboration signature at each year is characterized.
x-axis: the xth year of one’s research career.
18
Extracted from 8 computer science focus areas. Choose top 3 venues for each area.
Top Venues
A researcher’s number of top-venue publications can be used to quantify his/her scientific impact.
19
#top-venue papers
100 101
sociability
50 100 150 200 250
AI IR CV ML TH DB DM NLP
Regardless of research area, the degree of sociability exhibited by researchers tends to increase with top-venue publications.
Artificial Intelligence (AI)
- IJCAI, AAAI, ECAI
Information Retrieval (IR)
- SIGIR, ECIR, TREC
Computer Vision (CV)
- CVPR, ICCV, ECCV
Machine Learning (ML)
- ICML, NIPS, ECML
Theory (TH)
- FOCS, STOC, SODA
Databases (DB)
- SIGMOD, VLDB, ICDE
Data Mining (DM)
- KDD, ICDM, SDM
Natural Language Processing (NLP)
- ACL, EMNLP
, COLING
20
#top-venue papers
100 101
dependence
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
AI IR CV ML TH DB DM NLP
Regardless of research area, research dependence decreases with the number of publications in top-venues.
Artificial Intelligence (AI)
- IJCAI, AAAI, ECAI
Information Retrieval (IR)
- SIGIR, ECIR, TREC
Computer Vision (CV)
- CVPR, ICCV, ECCV
Machine Learning (ML)
- ICML, NIPS, ECML
Theory (TH)
- FOCS, STOC, SODA
Databases (DB)
- SIGMOD, VLDB, ICDE
Data Mining (DM)
- KDD, ICDM, SDM
Natural Language Processing (NLP)
- ACL, EMNLP
, COLING
21
#top-venue papers
100 101
diversity
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
AI IR CV ML TH DB DM NLP
Regardless of research area, the degree of diversity exhibited by researchers tends to increase with top-venue publications.
Artificial Intelligence (AI)
- IJCAI, AAAI, ECAI
Information Retrieval (IR)
- SIGIR, ECIR, TREC
Computer Vision (CV)
- CVPR, ICCV, ECCV
Machine Learning (ML)
- ICML, NIPS, ECML
Theory (TH)
- FOCS, STOC, SODA
Databases (DB)
- SIGMOD, VLDB, ICDE
Data Mining (DM)
- KDD, ICDM, SDM
Natural Language Processing (NLP)
- ACL, EMNLP
, COLING
22
Big-Hit Papers
A researcher’s most cited publication can be used to quantify his/her scientific impact.
23
year
1975 1980 1985 1990 1995 2000
sociability
50 100 150
bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)
Researchers with high sociability tend to have big-hit publications.
24
year
1975 1980 1985 1990 1995 2000
dependence
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)
Researchers with big-hit publications tend to have relatively low dependence.
25
year
1975 1980 1985 1990 1995 2000
diversity
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
bighit [10, 100) bighit [100, 1000) bighit [1000, 10000) bighit [10000, +)
Researchers with high diversity tend to have big-hit publications.
26
Based on these findings, we use collaboration signatures to predict scientific impact.
27
year
1950 1960 1970 1980 1990 2000 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R2 PCC
Predictiveness vs. First Year Publishing
Scientific impact can be reasonably inferred from our four simple collaboration signatures even across generations of researchers.
28
career year
5 10 15 20 25 30 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
R2 PCC
Predictiveness vs. Number of Years Publishing
With longer collaboration signatures, future scientific impact can be predicted with increasing fidelity (as measured by R2 and PCC).
29
data (log) predicted (log)
10 100 10 100
- data (log)
predicted (log)
10 100 10 100
- data (log)
predicted (log)
10 100 10 100
- 1970-1974
1980-1984 1990-1994
Actual vs. Predicted Author h-indices
Strong correlation between collaboration signatures and future scientific impact.
30
data (log) predicted (log)
10 100 10 100
- data (log)
predicted (log)
10 100 10 100
- data (log)
predicted (log)
10 100 10 100
- First 5 Years
First 15 Years First 25 Years
Actual vs. Predicted Author h-indices
Strong correlation between collaboration signatures and future scientific impact.
31
Army Research Laboratory (ARL) U.S. Air Force Office of Scientific Research (AFOSR) Defense Advanced Research Projects Agency (DARPA) National Science Foundation (NSF)
32
Collaboration signatures reveal scientific impact.
– Scholars with dissimilar impact produce distinctive collaboration signatures. – Scientific impact (e.g., h-index) can be inferred from collaboration signatures.
Turing Award winners display unique collaboration signatures.
– Low level of sociability and diversity in collaborations. – High level of self-collaboration. – Lifetime stability of collaboration signatures.
33
Q & A Q & A
Will your ASONAM’15 paper or next paper increase your h-index?
Dong, Johnson, Chawla. Will This Paper Increase Your h-index? Scientific Impact Prediction. In ACM WSDM’15.