Scientific Papers in Social Sciences Alexander Garcia / Philipp Mayr - - PowerPoint PPT Presentation

scientific papers in social sciences
SMART_READER_LITE
LIVE PREVIEW

Scientific Papers in Social Sciences Alexander Garcia / Philipp Mayr - - PowerPoint PPT Presentation

Simple Semantic Enrichment of Scientific Papers in Social Sciences Alexander Garcia / Philipp Mayr / Leyla Jael Garcia Florida State University / GESIS / biotea.ws Outline Motivation What data do we have? Why we are doing this?


slide-1
SLIDE 1

Simple Semantic Enrichment of Scientific Papers in Social Sciences

Alexander Garcia / Philipp Mayr / Leyla Jael Garcia Florida State University / GESIS / biotea.ws

slide-2
SLIDE 2

 Motivation

 What data do we have?  Why we are doing this?  What are we doing? What do we aim to achieve?

 RDF generation

 Metadata and Content  Content enrichment

 Consuming and delivering the data

 A first approach

Outline

SWIB 2012, Köln 2 12/4/2012

slide-3
SLIDE 3

 GESIS

 Leibniz Institute for the Social Sciences  Support for the research cycle  Journals: ISI, MDA

 MDA – Methods, Data, Analysis

 Journal for Empirical Social Science Research  Focus on

 Survey methodologies  Methods in empirical social research

 Open-access, full-text

Motivation

What data do we have?

SWIB 2012, Köln 3 12/4/2012

slide-4
SLIDE 4

 The World Wide Web

 Dissemination infrastructure: Scientific and non- scientific contributions  Information:

 Still locked up in discrete documents  Not interconnected, not machine-processable

 RDF technology:

 Connectivity tissue  But how does it impact to the scientific communication?

Motivation

Why we are doing this?

SWIB 2012, Köln 4 12/4/2012

slide-5
SLIDE 5

 Question: How can scientific publications be delivered into the Semantic Web?  Our approach

 RDF for research articles

 Entry point to the Web of Data  Part of the Linked Open Data

 Semantic enrichment

 Interoperable with online data

 Richer user interface

 A different read experience  Interconnected with external related elements  Collaborative environment

Motivation

What are we doing? What do we aim to achieve?

SWIB 2012, Köln 5 12/4/2012

slide-6
SLIDE 6

RDF Generation

Metadata and Content

MDA PDF

BIBO

Metadata+ Content + References MDA XML

http://pdfx.cs.man.ac.uk/

RDF Generation Reference Enrichment

SWIB 2012, Köln 6 12/4/2012

slide-7
SLIDE 7

RDF Generation

Content enrichment

Metadata+ Content + References Automatically Annotated RDF Automatic Annotation Manual Annotation

SWIB 2012, Köln 7 12/4/2012

Manually Annotated RDF

slide-8
SLIDE 8

 Biotea, a similar project on the biomedical domain

 XML to RDF works well  RDF annotation works well but … annotators are not perfect  Format is not translated  bold, italics  Modeling tables is not easy  Dictionary–based entity recognition tools works better

 This project

 PDF to XML is not perfect

12/4/2012 SWIB 2012, Köln 8

Lessons learnt

slide-9
SLIDE 9

 What does it make possible ?

 How similar are two articles?  based on concepts  semantic similarity  What articles use this reference in a section with title “Results”?  Which annotation co-occurs more with this “X” annotation?  Which articles include term “A” but not term “B”?

Consuming and delivering the data

SWIB 2012, Köln 9 12/4/2012

slide-10
SLIDE 10

Consuming and delivering the data

A first approach

SWIB 2012, Köln 10 12/4/2012

slide-11
SLIDE 11

12/4/2012 SWIB 2012, Köln 11

Consuming and delivering the data

A first approach

slide-12
SLIDE 12

 Contact

 Alex García, alexgarciac@gmail.com  Philipp Mayr, philipp.mayr@gesis.org

SWIB 2012, Köln 12 12/4/2012