who is more likely to gain a large number of citations 2
play

Who is more likely to gain a large number of citations 2 () - PowerPoint PPT Presentation

Who is more likely to gain a large number of citations 2 () 1) 0 Predicting the future


  1. ������ Who is more likely to gain a large number of citations � ��2� ()�� ������������ �1����������)� �0 ������������ • Predicting the future influential researchers in big scholarly network

  2. 1 2 Dataset & Preprocessing Introduction MENU 4 3 Results and Predicting Conclusion

  3. 1 Introduction

  4. Introduction A Whether to be accepted or identified often depends on the influence of a paper(or work). One of the essential factors that indicate the of a scientific work is its frequency of citation. In this project, we firstly introduce the threshold model to get some ideas of the information diffusion in real condition and then we introduce some regression models to fit the features and citation to implement the predicting work

  5. 2 Dataset & Preprocessing

  6. Dataset Citation Network Dataset Citation data is extracted from DBLP, ACM, MAG (Microsoft Academic Graph), and other sources. It can be used for clustering with network and side information, studying influence in the citation network, finding the most influential papers, topic modeling analysis, etc

  7. Preprocess #* --- paperTitle #@ --- Authors #year ---- Year #conf --- publication venue #citation --- citation number #! --- Abstract … Extraction Iteration Parsing Extract the text features from Iterating the citation with Use word2vec in Gensim to the original text. the same author or extract the feature inside the publication venue to titles and abstracts. convert the feature into numeric values

  8. Parsing

  9. Analysis Results We plot some diagrams to get clearer ideas about the preprocessed features.

  10. 3 Predicting

  11. Threshold Model Sales Generally, earlier publication will have less influence in the future. Each node v has an information acceptance threshold vand is affected by all of its active neighbor nodes A(v). Node v will be activated when

  12. Linear Regression & NLR using the regress() to obtain the weight array for multiple variables and then get the predicting results NLR models add some features by multiplying others. SVM Different Models Use fitrsvm to fit the model and predict the result. Related errors are calculated as well. Regression Tree Use fitrtree to fit the model and predict the result. Related errors are calculated as well.

  13. Common Steps · Data preprocessing Firstly we import the data (”output.csv”), extracting the citation column as the result for training. · Split Then we split the data set into training set and testing set. · Fitting We choose different models to fit the feature and citation and obtain the optimal weight vector. · Post-processing Accumulate the citation of the same author to represent an author’s future impact. · Compare Calculate the errors to compare the performance.

  14. Specific Procedure

  15. Specific Procedure

  16. 4 Results & Conclusion

  17. Comparison of Performance

  18. Conclusion We quantify the impact as the citation times of a researcher. By comparing the performance of the different models, we find that the Non-linear regression and SVM models obtain the respectively better predicted results among the four and we are expecting to implement more complicated and accurate algorithms in the future to deeply study the future impact prediction of a researcher.

  19. ������ 2018.05

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend