Modeling co-authorship and citation networks. Analytical models: - PowerPoint PPT Presentation

Modeling co-authorship and citation networks. Analytical models: Other models: 1) Continuum models 1) de Solla Price's modification of the Simon 2) Master Equation method model. 3) Rate Equation method. 2) TARL Model 4) Generating Function. 3) Group based Yule model

Why are there so many models ? Different models emphasize different aspects of citation networks. No single model is able to reproduce citation patterns of different datasets. Parameters have to be tuned to fit different datasets

Citation properties that we want to model:: Papers with more citations tend to be cited more After a certain period of time the citations of papers drop. Papers are “rediscovered” after lying dormant for a certain period of time. First is modeled by preferential attachment Second has to be modeled by including an “aging” bias in the system.

Continuum approach (Barabasi-Albert) Linear Preferential Attachment (B – A) Exponential Power Aging decay Accelerating growth law decay (Zhu et al.; (B-A) + link length (Zhu et al P.Sen) Applied to citation dependance Sen et al) networks by P.Sen (M-M-S) Multiplicative node fitness (B-B)

Master-Equation method (Dorogovtsev et al.) Initial attractiveness+preferential attachment (D-M-S) Aging included D-M. Disappearance of scale - Edge Inheritance (D-M-S) free structure at particular values. (Time function taken to be power law) Generalized by P.S and K.B.H which re-instates the scale free structure

TARL model. (Topic, Aging and Recursive Linking) An evolving bipartite network: Vertices : authors and articles Edges : Undirected (author -author) Directed (author – article) Directed (article - article) Assumptions Assumptions Each paper has a fixed number of authors and a fixed number of references Each author and each paper has exactly one topic Consumed – produced relationships among papers and authors are restricted to authors and papers within the same topic A single fixed number od papers per author per year is assumed

TARL Model The modeling process: The modeling process: A set of authors and a set of papers with randomly assigned topics are generated A predefined number of coauthors sharing the same topic is randomly selected and assigned to each paper via ‘produced by’ links ● All papers have authors but there are authors without papers ● Initially no coauthor or paper citation links, making it advantageous to start the model 1 year earlier than the period of interest

TARL Model The modeling process (cont.) The modeling process (cont.) At each time step (a year) a specified number of authors is created and added to the set of existing authors Each author in the new set randomly identifies a set of coauthors, reads a specified number of randomly selected papers from within his/her topic, and produces a specified number of new papers Each new paper will cite a fixed number of existing papers. To select the papers cited, authors consume(read) a small set of papers because of time constraint

TARL Model T he probability of citing a paper written t years ago was fit by a Weibull distribution of the form b controls the rightward extension of the curve. As b increases, the probability of citing older papers increases. For the present purposes, a small value of b represents a strong aging bias that favors citing papers that have been published recently.

Model Validation To validate the TARL model, a 20-year (1982–2001) data set of PNAS was used. The PNAS data set contains 45,120 regular articles. The number of unique authors for those papers is 105,915. Note that the citation counts, particularly for younger papers, are artificially low because they have not existed in the literature long enough to garner many citations. Table 1. PNAS statistics in terms of total number of papers (#p), unique authors (#a), references (#r), citations received per paper (#c), number of coauthors per paper (a#ca), and the number of citations (#cwin) within the PNAS data set for each year

Model Validation The PNAS dataset suggested : The PNAS dataset suggested : systematic deviations from a power law ( most cited papers are cited less often than predicted by a power law, and the less cited papers are cited more often than predicted) =>AGING =>AGING

Statistic Total number of actual and simulated papers (#p) and authors (#a) ( a ) and received citations (#cwin) ( b ). The fit for the first 2 years is poor because the model has no initial citation links nor record of papers before 1981 (how to avoid??)

Simple model to incorporate age bias using the Weibull function. Probability that a vertex born at “t = s” will have a degree k is given by p(k,s,t) . Then the evolution of each of the individual vertices is given by the master equation. p  k , s ,t  1 = p  k − 1  p  t − s  p  k − 1, s ,t  1 − p  k  p  t − s  p  k , s ,t  where, p  k = k / 2t − a t  a − 1  exp [− t / b  a ] p  t − s = C a b

Solving the difference-differential equation obtained from the master equation we get the probability for the case of a = 2 as: 2k − 1 2k − 1  t − s C k p  k ,s ,t =  k − 1  !  b  To obtain the distribution P(k) at a particular “t” we sum over all possible values of “s” To obtain total citations we obtain the average “k” for each year and multiply it with total papers in that year.

Initial conditions considered: At t =1, the first vertex (paper) is created. Each new vertex comes with one edge. Constant increase in the number of papers with time. As a result we get systematic departures from the power law as observed in the PNAS data. However due to our assumption that the first paper was created at t = 1, the total number of citations do not match the data.

Much more remains to be done: Proper initialization has to be performed. Co – evolution of author-article has to be modeled. We still have to introduce some kind of a parameter into the model which will model “rediscovery” of a dormant paper. Validation with different datasets.

Modeling co-authorship and citation networks. Analytical models: - PowerPoint PPT Presentation

Modeling co-authorship and citation networks. Analytical models: Other models: 1) Continuum models 1) de Solla Price's modification of the Simon 2) Master Equation method model. 3) Rate Equation method.

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author

Citation networks in economics Carlo D Ippoliti Carlo D Ippoliti Citation Networks in

Santo Fortunato Universality of citation distributions The World Citation Network The

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

DataCite and Data Citation Joan Starr California Digital Library DataCite & Data Citation

Citation Detective : A Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale

Exemplary Practice Citation Exemplary Practice Citation Application Automated External

Data Citation Principles: A Synthesis The Data Citation Synthesis Group Maryann Martone

Authorship Attribution of Micro-Messages Roy Schwartz + , Oren Tsur + , Ari Rappoport + and Moshe

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of

Obfuscation Using Distributional Features Bachelors Thesis Defense by Janek Bevendorff Date:

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

Eliciting GAI preference models with binary attributes aided by association rule mining Sergio

PHPE 4000 Individual and Group Decision Making Eric Pacuit University of Maryland pacuit.org 1

Tutorial on Computational Social Choice Ulle Endriss Institute for Logic, Language and

A Design Of Secure Preferential E-Voting Kun Peng and Feng Bao { dr.kun.peng } @gmail.com

Learning Ceteris Paribus Preferences Sergei Obiedkov National Research University Higher School

Existence of a persistent hub in the convex preferential attachment model Pavel Galashin St

American National Government POL 140 Sections 3-6 Political Parties, Candidates, and Campaigns

Sambuz

Useful Links

Newsletter

Mail Us

Modeling co-authorship and citation networks. Analytical models: - PowerPoint PPT Presentation

Modeling co-authorship and citation networks. Analytical models: Other models: 1) Continuum models 1) de Solla Price's modification of the Simon 2) Master Equation method model. 3) Rate Equation method.

Authorship: why not just toss a coin? Benefits and responsibilities of authorship Tactics

Authorship &amp; Publication August 4, 2009 Authorship Publication Authorship Each author

Citation networks in economics Carlo D Ippoliti Carlo D Ippoliti Citation Networks in

Santo Fortunato Universality of citation distributions The World Citation Network The

Kernel Methods and String Kernels for Authorship Analysis Marius Popescu 1 Cristian Grozea 2 1

GLAD: Groningen Lightweight Authorship Detection PAN, Authorship verification, 2015 Manuela

A Mathematical Study A Mathematical Study of Authorship Attribution of Authorship Attribution

DataCite and Data Citation Joan Starr California Digital Library DataCite &amp; Data Citation

Citation Detective : A Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale

Exemplary Practice Citation Exemplary Practice Citation Application Automated External

Data Citation Principles: A Synthesis The Data Citation Synthesis Group Maryann Martone

Authorship Attribution of Micro-Messages Roy Schwartz + , Oren Tsur + , Ari Rappoport + and Moshe

Managing Research Integrity during the COVID-19 Emergency Authorship agreements Abigail Norris

A multitude of linguistically- rich features for authorship attribution Ludovic Tanguy, Assaf

Bootstrapped Authorship Attribution in Compression Space Ramon de Graaf Leiden Institute of

Obfuscation Using Distributional Features Bachelors Thesis Defense by Janek Bevendorff Date:

IR: Information Retrieval FIB, Master in Innovation and Research in Informatics Slides by Marta

Eliciting GAI preference models with binary attributes aided by association rule mining Sergio

PHPE 4000 Individual and Group Decision Making Eric Pacuit University of Maryland pacuit.org 1

Tutorial on Computational Social Choice Ulle Endriss Institute for Logic, Language and

A Design Of Secure Preferential E-Voting Kun Peng and Feng Bao { dr.kun.peng } @gmail.com

Learning Ceteris Paribus Preferences Sergei Obiedkov National Research University Higher School

Existence of a persistent hub in the convex preferential attachment model Pavel Galashin St

American National Government POL 140 Sections 3-6 Political Parties, Candidates, and Campaigns

Sambuz

Useful Links

Newsletter

Mail Us

Authorship & Publication August 4, 2009 Authorship Publication Authorship Each author

DataCite and Data Citation Joan Starr California Digital Library DataCite & Data Citation