Building a High Performance Environment for RDF Publishing Pascal - - PowerPoint PPT Presentation

building a high performance environment for rdf publishing
SMART_READER_LITE
LIVE PREVIEW

Building a High Performance Environment for RDF Publishing Pascal - - PowerPoint PPT Presentation

Building a High Performance Environment for RDF Publishing Pascal Christoph These slides and all the graphics made by the author and those taken from https://openclipart.org/ are dedicated to the public domain :


slide-1
SLIDE 1

Building a High Performance Environment for RDF Publishing

Pascal Christoph

slide-2
SLIDE 2

These slides and all the graphics made by the author and those taken from https://openclipart.org/ are dedicated to the public domain : https://creativecommons.org/about/cc0 . All marks mentioned may be trademarks or registered trademarks

  • f their respective owners.

Read about the license of „The scream“ of Edward Munch at https://en.wikipedia.org/wiki/File:The_Scream.jpg Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

slide-3
SLIDE 3

O v e r v i e w

3 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-4
SLIDE 4

O v e r v i e w

4 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-5
SLIDE 5

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Publishing is for Consuming

Building a High Performance Environment for RDF Publishing 5

slide-6
SLIDE 6

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Mandatory

A resource:

Building a High Performance Environment for RDF Publishing 6

slide-7
SLIDE 7

Mandatory

A resource: gets a dereferenceable URI:

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

7

slide-8
SLIDE 8

Mandatory

A resource: gets a dereferenceable URI: which provides RDF:

<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" . <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" . <http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" . <http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

8

slide-9
SLIDE 9

Mandatory

=> basic LOD publishing is very simple:

you just need a Webserver

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

9

slide-10
SLIDE 10

Nice to have

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation (best: RDFa in HTML)
  • Data searchable
  • Timely updates
  • High Availability
  • Versioning
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

10

slide-11
SLIDE 11

SPARQL Endpoint

  • (Dumps)
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation (best: RDFa in HTML)
  • Data searchable
  • Timely updates
  • High Availability
  • Versioning
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

11

slide-12
SLIDE 12

SPARQL Endpoint

  • (Dumps): but may be painfully slow when having lots of data
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation (best: RDFa in HTML)
  • (Data searchable) : maybe painfully slow
  • Timely updates
  • High Availability
  • Versioning
  • Web developers want simple APIs providing JSON
  • most triple stores provides JSON/RDF
  • Simple powerful API : too powerful/complex ?
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

12

slide-13
SLIDE 13

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Nice to have In principle, web developers already got simple APIs :

LOD is the API !

13

slide-14
SLIDE 14

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

14

Nice to have In principle, web developers already got simple APIs :

Remember:

slide-15
SLIDE 15

Mandatory

A resource: gets a dereferenceable URI: which provides the data (in RDF):

<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" . <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" . <http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" . <http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

15

slide-16
SLIDE 16

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

16

Nice to have In principle, web developers already got powerful APIs :

RESTful SPARQL

slide-17
SLIDE 17

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

RESTful SPARQL example getting all data of all resources having a particular ISBN: curl -H "Accept: application/json" --data-urlencode 'query= prefix bibo: <http://purl.org/ontology/bibo/> SELECT * WHERE { ?s bibo:isbn13 "9780851706238" ; ?p ?o . } LIMIT 100 ' http://lobid.org/sparql/

17

slide-18
SLIDE 18

18 Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Nice to have

slide-19
SLIDE 19

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

RESTful SPARQL example

… and the JSON/RDF result:

{ "head": { "vars": [ "s", "p","o"] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://openlibrary.org/works/OL2109573W" }, "p": { "type": "uri", "value": "http://rdvocab.info/RDARelationshipsWEMI/workManifested" }, "s": { "type": "uri", "value": "http://lobid.org/resource/HT007824357" } }, { "o": { ...

19

slide-20
SLIDE 20

20 Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Nice to have

As it is, web developers don't like SPARQL web developer

slide-21
SLIDE 21

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g i s f

  • r

C

  • n

s u m i n g

Nice to have Web developers want APIs like:

http://lobid.org/resources/api/isbn/$isbn

21

slide-22
SLIDE 22

Happy web developer

slide-23
SLIDE 23

O v e r v i e w

23 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-24
SLIDE 24

W h a t i s l

  • b

i d .

  • r

g ?

lobid.org

Building a High Performance Environment for RDF Publishing 24

slide-25
SLIDE 25

W h a t i s l

  • b

i d .

  • r

g ?

Building a High Performance Environment for RDF Publishing

  • lobid := linking open bibliographic data
  • LOD services of the hbz
  • lobid-resources :
  • exposes 85% of the hbz cooperative catalogue
  • entries coming from > 200 scientific German libraries
  • ~ 16 M records with 700 M triples
  • with links to ~ 5 M other resources
  • with links to ~ 32 M items (consisting of 300 M triples)
  • lobid-organisations :
  • exposes German Sigelverzeichnis and MARC-Isil directory
  • ~ 40 k descriptions of institutions

25

slide-26
SLIDE 26

What's missing?

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Timely updates
  • High Availability
  • Versioning
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

W h a t i s l

  • b

i d .

  • r

g ?

26

slide-27
SLIDE 27

O v e r v i e w

27 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-28
SLIDE 28

2010 - 2011, lobid-organisation Filesystem :

+ easy to maintain + reliable + fast

  • no search
  • no SPARQL
  • ...

Building a High Performance Environment for RDF Publishing

s t

  • r

i n g t h e d a t a

28

slide-29
SLIDE 29

lobid today Triple Store (4store) :

+ power of SPARQL +/- depending on the query: fast to horribly slow +/- search (but string searches often slow and limited)

  • sometimes gets stuck !

Building a High Performance Environment for RDF Publishing

s t

  • r

i n g t h e d a t a

29

slide-30
SLIDE 30

lobid today

Search engine (elasticsearch):

+ fast search + stemming, linguistics … + wildcard searching + facets + geo search + JSON + schema-less + simple RESTful API + many plugins + ... + easy to achieve High Availability + scales nicely

Building a High Performance Environment for RDF Publishing

s t

  • r

i n g t h e d a t a

30

slide-31
SLIDE 31

s t

  • r

i n g / g e t t i n g t h e d a t a

lobid today

slide-32
SLIDE 32

O v e r v i e w

32 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-33
SLIDE 33

Building a High Performance Environment for RDF Publishing

g e t t i n g t h e d a t a

lobid : technology/dependency stack lobid : technology/dependency stack

Search Engine Search Engine Webapp Webapp Triple Store Triple Store

33

slide-34
SLIDE 34

Building a High Performance Environment for RDF Publishing

Search Engine Search Engine Webapp Webapp Triple Store Triple Store

sometimes gets stuck! sometimes gets stuck!

34

g e t t i n g t h e d a t a

lobid : technology/dependency stack lobid : technology/dependency stack highly available ! highly available ! we can do that we can do that

slide-35
SLIDE 35

Building a High Performance Environment for RDF Publishing

g e t t i n g t h e d a t a

lobid : technology/dependency stack lobid : technology/dependency stack

Search Engine Search Engine Webapp Webapp Triple Store Triple Store

sometimes gets stuck! sometimes gets stuck!

< =

35

highly available ! highly available ! we can do that we can do that

slide-36
SLIDE 36

Building a High Performance Environment for RDF Publishing

g e t t i n g t h e d a t a

lobid : technology/dependency stack lobid : technology/dependency stack

Search Engine Search Engine Webapp Webapp Triple Store Triple Store

sometimes gets stuck! sometimes gets stuck!

36

highly available ! highly available ! we can do that we can do that

slide-37
SLIDE 37

Building a High Performance Environment for RDF Publishing

s t

  • r

i n g / g e t t i n g t h e d a t a

Variant 1 : technology/dependency stack Variant 1 : technology/dependency stack

Triple Store Triple Store

For external access. Sometimes gets stuck! For external access. Sometimes gets stuck! Closed, internal. Will be safe from malign queries. Closed, internal. Will be safe from malign queries.

Triple Store Triple Store

37

slide-38
SLIDE 38

Building a High Performance Environment for RDF Publishing

s t

  • r

i n g / g e t t i n g t h e d a t a

Variant 1 : technology/dependency stack Variant 1 : technology/dependency stack

Triple Store Triple Store

For external access. Sometimes gets stuck! For external access. Sometimes gets stuck! Closed, internal. Will be safe from malign queries. Closed, internal. Will be safe from malign queries.

Triple Store Triple Store

redundant, complex …

38

slide-39
SLIDE 39

O v e r v i e w

39 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Some more details
  • Caveats

Future prospects

slide-40
SLIDE 40

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Search Engine Search Engine

Variant 2: technology/dependency stack Variant 2: technology/dependency stack

Webapp Webapp

highly available ! highly available ! we can do that we can do that

Triple Store Triple Store

For external access and some fancy nice-to-have stuff. Sometimes gets stuck! For external access and some fancy nice-to-have stuff. Sometimes gets stuck!

LOD basis functionality (and some other APIs) are highly available 40

slide-41
SLIDE 41

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs returning JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

41

slide-42
SLIDE 42

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Benefits

fast, scalable search engine

42

slide-43
SLIDE 43

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

performance test

Data: 10 M records <=> 300 M triple Case-insensitive query: „beach“

SELECT ?s WHERE { ?s <http://purl.org/dc/terms/title> ?o FILTER regex(str(?o), "beach", "i") }

#### => SPARQL execution time for Q8316: 108.7s, returned 2815 rows. http://$ip:9200/_search?q=beach&from=0&size=2800 # => Elasticsearch needed 0.4s

=> Elasticsearch is 250 times faster

43

slide-44
SLIDE 44

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

44

performance test

(there is a support for text indexing in 4store, have not tested that.)

slide-45
SLIDE 45

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

45

performance test Elasticsearch: 18 M records , 6 GB RAM: 5 hour 4store: 1 B triples, having 72 GB RAM: 7 hours

CPU: Quad Core mit 2.4 GhZ und Hyperthreading => 8 CPUs HD: 6 x 2.5" 10k U/min a 146GB

(Don't take benchmarks too seriously – they just give a clue !)

slide-46
SLIDE 46

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

46

slide-47
SLIDE 47

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Benefits

build to be easily made highly available !

47

slide-48
SLIDE 48

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

48

slide-49
SLIDE 49

Benefits

Versioning with elasticsearch: Not out-of-the-box, but comes at least e.g. with * concurrency control * documents have a version number => implementing versioning is not hard

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

49

slide-50
SLIDE 50

Benefits, relying on elasticsearch as basic LOD storage

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • Versionizing
  • Web developers want:
  • JSON (LD)
  • Simple APIs
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

50

slide-51
SLIDE 51

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

51

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs providing JSON
  • ...
slide-52
SLIDE 52

Why JSON-LD? JSON is :

  • stored natively by many tools (e.g. elasticsearch)
  • loved by consumers (web developers)

JSON-LD is :

  • supported by RDF libraries (e.g. transforming to NTriples)

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

52

slide-53
SLIDE 53

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

53

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs providing JSON
  • ...
slide-54
SLIDE 54

Benefits

RESTful elasticsearch API, e. g. :

http://lobid.org/resources/_search?q=isbn:$isbn

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

54

slide-55
SLIDE 55

Benefits

  • … and many other nice things come with elasticsearch
  • geo-search : „Query only libraries/items residing up to 10 km from me.“

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

55

slide-56
SLIDE 56

Benefits

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • (Versioning)
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

M M i i s s s s i i

  • n

n a a c c c c

  • m

m p p l l i i s s h h e e d d ! ! M M i i s s s s i i

  • n

n a a c c c c

  • m

m p p l l i i s s h h e e d d ! !

56

slide-57
SLIDE 57

( … ok, something is left to be done ! )

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • Versionizing
  • Web developers want simple APIs providing JSON
  • ...

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

57

slide-58
SLIDE 58

O v e r v i e w

58 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Caveats
  • Auto suggest demo

Conclusion

slide-59
SLIDE 59

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

!?

59

Caveats

  • Dumps
  • Content Negotiation (different RDF serializations)
  • SPARQL
  • Human readable representation ( RDFa in HTML)
  • Data searchable
  • Near Real Time updates
  • High Availability
  • Versionizing
  • Web developers want simple APIs providing JSON
  • ...
slide-60
SLIDE 60

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Caveats

How to integrate semantic search into a document storage ?

dct:contributor --------> dct:creator -------> dc:creator \---------> dc:contributor

\--------> bibo:translator

There is no inferencing as comes with SPARQL !

60

slide-61
SLIDE 61

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Caveats

Our data flow :

from records to RDF triples to records

61

slide-62
SLIDE 62

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Caveats

Our data flow :

from records to RDF triples to records

62

slide-63
SLIDE 63

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

!?

63

Caveats

from records to RDF triples to records

slide-64
SLIDE 64

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

From records to RDF triples |-----> graph-database '------> computing ---> record-database

MARC/MAB/PICA... JSON-LD 64

Caveats

slide-65
SLIDE 65

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

65

Caveats

tree-based vs graph-based:

Pre-render the whole document?

What is the document ?

slide-66
SLIDE 66

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

66

Caveats

slide-67
SLIDE 67

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

67

Caveats What is the document ? Only the top-level node ?

slide-68
SLIDE 68

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

68

Caveats What is the document ? Only the top-level node ?

… but then you couldn't even search the authors name !

slide-69
SLIDE 69

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

69

Caveats

searching needs integration of some fields from subgraphs into the document

slide-70
SLIDE 70

O v e r v i e w

70 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Caveats
  • Auto suggest demo

Conclusion

slide-71
SLIDE 71

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

authority IDs must be easily found

71

slide-72
SLIDE 72

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

authority IDs must be easily found => in need of auto suggest

72

slide-73
SLIDE 73

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

auto suggests needs fast searching

73

slide-74
SLIDE 74

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Demo

auto suggest

74

slide-75
SLIDE 75

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

75

slide-76
SLIDE 76

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

RESTful APIs:

http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl http://demo.lobid.org/search?format=page&index=gnd-index&author=Schmidt%2C+Karl http://demo.lobid.org/search?format=full&index=gnd-index&author=Schmidt%2C+Karl

… API usage:

GET /search?format=<page|full|short>&index=<lobid-index|gnd-index>&author=<query>

easy to enhance with the play framework and the elasticsearch API

Building a High Performance Environment for RDF Publishing 76

slide-77
SLIDE 77

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

[ "Schmidt, Karl (1894-1945)",

"Schmidt, Karl", "Schmidt, Karl (1910-)", "Schmidt, Karl (1846-1928)",

"Schmidt, Karl (1913-)", "Schmidt, Karl (1899-)", "Schmidt, Karl (1924-)",

"Schmidt, Karl (1836-1888)", "Schmidt, L. F. Karl", "Schmidt, Karl (1902-1945)", "Schmidt, Karl J.", "Schmidt, Karl (1848-1905)", "Schmidt, Karl (1817-1882)", "Schmidt, Karl R.", "Schmidt, Karl (1954-)", "Schmidt, Karl (1888-)", "Schmidt, Karl (1867-)", ...

]

RESTful APIs: http://demo.lobid.org/search ?format=short&index=gnd-index&author=Schmidt%2C+Karl

Building a High Performance Environment for RDF Publishing 77

slide-78
SLIDE 78

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

auto suggest

GND authority file in lobid-resources

78

slide-79
SLIDE 79

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

79

slide-80
SLIDE 80

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Building a High Performance Environment for RDF Publishing 80

slide-81
SLIDE 81

O v e r v i e w

81 Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

  • Mandatory
  • Nice to have

Story so far - experiences with lobid.org

  • What is lobid.org ?
  • Storing the data
  • Getting the data

Publishing RDF through elasticsearch

  • Benefits
  • Caveats
  • Auto suggest demo

Conclusion

slide-82
SLIDE 82

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

Search Engine Search Engine

Conclusion a highly customizable/reliable/feature-rich LOD service Conclusion a highly customizable/reliable/feature-rich LOD service

Webapp Webapp

highly available ! highly available ! we can do that we can do that

Triple Store Triple Store

For external access and some fancy nice-to-have stuff. Sometimes gets stuck! For external access and some fancy nice-to-have stuff. Sometimes gets stuck!

LOD basis functionality (and some other APIs) are highly available 82

slide-83
SLIDE 83

Building a High Performance Environment for RDF Publishing

P u b l i s h i n g L O D w i t h e l a s t i c s e a r c h

the software is Open Source: the software is Open Source:

https://github.com/lobid/ http://elasticsearch.org/ https://hadoop.apache.org/ http://www.playframework.org/ 83 http://4store.org/

slide-84
SLIDE 84

Any Questions ?

Pascal Christoph semweb@hbz-nrw.de christoph@hbz-nrw.de

slide-85
SLIDE 85

Using a dark background, this presentation saves maybe 70% of energy