do citations and readership predict excellent publications
play

Do Citations and Readership Predict Excellent Publications? Dasha - PowerPoint PPT Presentation

Do Citations and Readership Predict Excellent Publications? Dasha Herrmannova, The Open University, UK Robert Patton, Oak Ridge National Laboratory, USA Petr Knoth, The Open University, UK Chris Stahl, Oak Ridge National Laboratory, USA


  1. Do Citations and Readership Predict Excellent Publications? Dasha Herrmannova, The Open University, UK Robert Patton, Oak Ridge National Laboratory, USA Petr Knoth, The Open University, UK Chris Stahl, Oak Ridge National Laboratory, USA

  2. Research question Q: Are current research evaluation metrics sufficient for identifying highly influential papers?

  3. Why care about metrics? What to read? Where to publish? Collaborators? C i t a Researchers t i o n a n a l y s i s Research papers Who to fund? Returns on investment? Funding agencies Altmetrics Are we doing well? What to subscribe to? Institutions

  4. Finding what works ● ML approach Evaluate all methods in terms of precision-recall/accuracy/… ○ ○ Requirement: ground truth Research evaluation ● ○ No ground truth ○ Authority often established axiomatically ○ JIF, h-index, etc. ● Can we build a ground truth dataset?

  5. Our understanding of "impact" Low impact High impact vs

  6. Our understanding of "impact" Low impact High impact Seminal works: Survey papers: "Strongly influencing later "A general view, examination or vs developments" description of someone or something"

  7. Creating a dataset ● Online questionnaire Discipline? ○ ○ Reference to a survey paper Reference to a seminal paper ○ ● Collected 314 papers ○ Labels (seminal, survey) ○ Title, authors, year of publication, abstract, DOI, ... ● Available online http://trueid.semantometrics.org ○ ● Analysis ○ Seminal papers on average 10 years older ○ Seminal papers cited on average 5 times more

  8. Do citations/readership predict excellent papers? ● Classify papers using citations and readership as features Model ● ○ Select a threshold t ○ If cit(d) ≥ t → label as seminal ○ Else → label as survey ○ Use threshold with best accuracy on the training set ● Leave-one-out cross-validation 3 experiments ● ○ Aggregate ○ Per discipline ○ Per year

  9. Results Model Data Accuracy Upper bound Baseline Citations - 52.87% Readership - 52.87% Aggregate Citations 63.06% 63.38% Readership 42.68% 52.87% Discipline based Citations 45.28% 68.11% Readership 42.13% 62.60% Year based Citations 55.23% 68.62% Readership 51.05% 65.27%

  10. Conclusion ● Both citations and readership provide an improvement over the baseline Neither of the two metrics is optimal ●

  11. What next? ● Ideal dataset Multi-disciplinary ○ ○ Time span Publication types ○ ○ Peer review judgement Better metrics ● ○ Citation context ○ Analyzing content

  12. Thank you! Questions? http://trueid.semantometrics.org

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend