SparqPlug : Legacy HTML, SPARQL, and the DOM Peter Coetzee - - PowerPoint PPT Presentation

sparqplug
SMART_READER_LITE
LIVE PREVIEW

SparqPlug : Legacy HTML, SPARQL, and the DOM Peter Coetzee - - PowerPoint PPT Presentation

Generating Linked Data from SparqPlug : Legacy HTML, SPARQL, and the DOM Peter Coetzee Tom Heath Imperial College London Talis Information Ltd. Enrico Motta Knowledge Media Institute Generating Linked Data from SparqPlug :


slide-1
SLIDE 1

SparqPlug :

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Peter Coetzee Imperial College London Tom Heath Talis Information Ltd. Enrico Motta Knowledge Media Institute

slide-2
SLIDE 2

 The Problem  Current Approaches  SparqPlug's Background  SparqPlug's Approach  Linked Data  Anatomy of a Job  Maintenance  Wrap-Up

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

SparqPlug :

slide-3
SLIDE 3

 Bootstrapping the Web of Data  Inertia for webmasters to convert  Risks of doing so blindly – Good Linked Data!  Difficult and time-consuming

 Triplify  SquirrelRDF  etc

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

The Problem

SparqPlug :

slide-4
SLIDE 4

 Piggy Bank & Thresher: Easy to use screen

scrapers → RDF Silo

 Sponger & Triplr: Requires a marked up source  SPAT: Great approach, implementations??  XSLT with XQuery: Another language to learn,

could be more expressive and flexible

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Current Approaches

SparqPlug :

slide-5
SLIDE 5

 Developed in Summer of 2007  Funded by OpenKnowledge Project,

development took place at KMi

 Currently hosted at KMi  Built on Java, Jena, Tomcat, MySQL, NG4J  http://sparqplug.rdfize.com/

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

SparqPlug :

SparqPlug's Background

slide-6
SLIDE 6

 Tidy and DOM2RDF  Query the DOM directly with SPARQL

 All the expressivity of a declarative query language  Proprietary extensions – e.g. Property Functions

 DOM2SPARQL  Let SparqPlug manage the entire process, from

extraction to de-referencing

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

SparqPlug :

SparqPlug's Approach

slide-7
SLIDE 7

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

SparqPlug's Approach

SparqPlug :

slide-8
SLIDE 8

 Content Negotiation handled automatically  URIs generated in a separate namespace and

forwarded through Tomcat to the SparqPlug application

 Property Functions to help process data  SPARQL endpoint automatically created for

each data set

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Linked Data

SparqPlug :

slide-9
SLIDE 9

 You give:

 Prototypical Query

(SPARQL)

 Link Query

(SPARQL)

 Graph Name Generator

(RegExp)

 We create:

 Maintenance data  Linked Data constructs  RDF!

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Anatomy of a Job

SparqPlug :

slide-10
SLIDE 10

 Source graph hashed at SPARQL

CONSTRUCT time

 Hash then checked periodically for updated

data

 Graph regenerated and UNION'd with existing

RDF in each named graph

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Maintenance

SparqPlug :

slide-11
SLIDE 11

 SparqPlug offers a simple, partially automated

and scalable solution to the problem of creation and maintenance of RDF data from an arbitrary HTML data source

 http://sparqplug.rdfize.com/  Questions?  peter @ coetzee . org

Generating Linked Data from Legacy HTML, SPARQL, and the DOM

Wrap-Up

SparqPlug :