ScholarBase: Towards a Cross-Domain Knowledgebase for Linked - PowerPoint PPT Presentation

Workshop on Scholarly Web Mining – WSDM 2017 ScholarBase: Towards a Cross-Domain Knowledgebase for Linked Scholarly Data Mahmoud Elbattah

Workshop on Scholarly Web Mining – WSDM 2017 What is ScholarBase about? • ScholarBase is aimed to serve as a Linked Data repository for cross-domain scholarly data. • ScholarBase can be conceived as a knowledgebase that weaves links among: • scholars, • institutions, • research areas, • publications, and • geographical locations . 2

Workshop on Scholarly Web Mining – WSDM 2017 What is ScholarBase about? 3

Workshop on Scholarly Web Mining – WSDM 2017 Exemplary Queries Who are the scholars that co-authored publications in relation to 1. both of ML and Bioinformatics? Who are the top-cited scholars in the field of ML, and are affiliated 2. with institutions located in UK? Who are the scholars contributed to ML, and are affiliated with 3. institutions located in UK, and co-authored publications with scholars affiliated with institutions located outside the UK? What are the institutions that are associated with the top-cited 4. scholars in ML, and are located outside USA? What are the inter-disciplinary research areas that bring together 5. scholars from different backgrounds? 4

Workshop on Scholarly Web Mining – WSDM 2017 Data Source: Google Scholar Profiles 5

Workshop on Scholarly Web Mining – WSDM 2017 Implementation Challenges • Absence of Google Scholar APIs. • Data inconsistencies and ambiguities. • Missing data. 6

Workshop on Scholarly Web Mining – WSDM 2017 Overview 7

Stage 1: Data Collection

Workshop on Scholarly Web Mining – WSDM 2017 Data Collection Strategy Stage 1-Random Walk: Find intial seeds (i.e. scholar profiles) based on random search queries. Stage 2-Collect Keywords: Collect data describing research keywords and institutions from seed profiles gathered at Stage 1. Stage 3-Focused-Search: Find scholars based on focused- queries using keywords gathered at Stage 2. Stage 4-Catch All: Collect scholars associated with keywords / institutions gathered at Stage 2 and Stage 3. 9

Stage 2: Reconciliation of Data Inconsistencies

Workshop on Scholarly Web Mining – WSDM 2017 Research Keywords Inconsistency • Variation of keywords. • Lack of specification. • Excessive specification. • Vagueness of acronyms. • Variation of languages. • Missing keywords. • Misspelled keywords. 11

Workshop on Scholarly Web Mining – WSDM 2017 Reconciliation of Keywords Inconsistencies 12

Workshop on Scholarly Web Mining – WSDM 2017 Example of Keyword Reconciliation Scholar Name GS Keywords Keywords Extracted by AlchemyAPI Concepts Taxonomy Lotfi A. Zadeh Fuzzy Logic, Soft Computing, Fuzzy Logic, /technology and Artificial Intelligence, Human- Fuzzy Set computing Level Machine Intelligence /science/computer science/artificial intelligence Andrew P. Epigenetics, Epigenomics Cancer, /health and Feinberg Epigenetics, DNA, fitness/disease/cancer DNA Methylation, Oncology 13

Workshop on Scholarly Web Mining – WSDM 2017 Affiliation Inconsistencies Scholar Affiliation David Karger MIT N. P. Suh M.I.T. David Pesetsky Massachusetts Institute of Technology 14

Workshop on Scholarly Web Mining – WSDM 2017 Affiliation Inconsistencies (cont’d) Scholar Affiliation Verified email at David Karger MIT N. P. Suh M.I.T. David Pesetsky Massachusetts Institute of mit.edu Technology 15

Stage 3: Semantification

Workshop on Scholarly Web Mining – WSDM 2017 Semantification Subject: https://scholar.google.com/citations?user=S6H-0RAAAAAJ Predicate: http://xmlns.com/foaf/spec/#term_topic_interest Object: http://dbpedia.org/page/Fuzzy_logic 17

Stage 3: Linking to LOD

Workshop on Scholarly Web Mining – WSDM 2017 Linking to LOD 19

Workshop on Scholarly Web Mining – WSDM 2017 What is different about ScholarBase? • ScholarBase might be the first initiative towards structuring the data of GS profiles. • Unlike other endeavours that focused on specific domains in science (e.g semantic DBLP), or conferences (e.g. ESWC and ISWC ), ScholarBase aims to be a knowledgebase of cross- domain scholarly data. • Having consistent keywords for describing research keywords and affiliations can help to understand more about the dynamics of research areas, and answering complex queries about scholars . 20

Workshop on Scholarly Web Mining – WSDM 2017 Limitations • The scholar entities within ScholarBase are tightly coupled to the presence of a GS profile. • In other words, if a scholar does not have a GS profile, that scholar will not be included in the ScholarBase dataset. • It is difficult to test the comprehensiveness of data collected by the web scraper, whereas we could not find any official reports from GS about the number of existing profiles. • The web scraper cannot find GS profiles that are not set to not be publicly visible. 21

Workshop on Scholarly Web Mining – WSDM 2017 THANK YOU! Mahmoud Elbattah m.elbattah1@nuigalway.ie

ScholarBase: Towards a Cross-Domain Knowledgebase for Linked - PowerPoint PPT Presentation

Workshop on Scholarly Web Mining WSDM 2017 ScholarBase: Towards a Cross-Domain Knowledgebase for Linked Scholarly Data Mahmoud Elbattah Workshop on Scholarly Web Mining WSDM 2017 What is ScholarBase about? ScholarBase is aimed to

University KnowledgeBase Pushing the KnowledgeBase Limits Without Getting a Faceplant John

Kicking Down the Cross Domain Door Techniques for Cross Domain Exploitation Billy K Rios (BK) and

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Cross-Domain Learning-to-rank with SVM Erheng Zhong 1 1 Department of Computer Science and

The Cross- -domain Information domain Information The Cross Exchange Framework (CIEF) Exchange

Protg Knowledgebase Coordinator Noah Zimmerman Herzenberg Laboratory Department of Genetics

Adverse Outcome Pathway Networks and the AOP Knowledgebase Stephen Edwards U.S. Environmental

Exploring Architecture Options for a Federated, Cloud-based Systems Biology Knowledgebase Ian

Cross Ram Support Set Ram accessories 1 Cross Ram Support Set Set composition The Cross

Web Hosting and Domain Names Introduction to Web Design Web Hosting and Domain Names

Focusing the Core Domain Model A Domain-Driven Design Case Study, Eric Evans, Domain Language

Image Processing A case study for a domain decomposed MPI code Domain Decomposition 1

Red Cross Disaster Communications and the Amateur Radio Community 1 American Red Cross Gold

Red Cross Clubs Red Cross Clubs Why Red Cross Clubs should be started at your school What We

MALAYSIA IN CROSS BORDER RAIL INITIATIVE 20 DECEMBER 2017 Content: i. Cross Border Railway

SOME REFERENCES ABOUT APPLICATIONS Y.S. Chan, A.C. Fannjiang, G.H. Paulino,Integral equations

An Overview of the Multiscale Mixed Finite-Element Method SINTEF ICT, Department of Applied

Overview of LHC conventional and advanced collimation systems Stefano Redaelli, CERN, BE-ABP

Technical systems in the BDS R. Tom as Thanks to the input of many: D. Angal-Kalinin, G.

Algorithms in Nature Robustness in biological systems Failure and attacks on networks Is

Designing Trustworthy User-Agents for a Hostile Web Usenix Security 2009 IE8 Program Manager -

The Comprehensive ESRD Care (CEC) Model Open Door Forum April 24, 2014 Alefiyah Mesiwala, MD MPH

Proposal of a Hierarchical Proposal of a Hierarchical Architecture for Multimodal Architecture for

Sambuz

Useful Links

Newsletter

Mail Us