Compressed RDF: Practical Uses & Hands-on
Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto
23TH AUGUST 2017
Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier - - PowerPoint PPT Presentation
Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 General agenda Session I (09:00 - 10:30) "
Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto
23TH AUGUST 2017
PAGE 2
images: zurb.com
PAGE 3
images: zurb.com
5
select distinct ?x { ?x rdfs:label “Axel Polleres" }
6
LOD Laundromat
Dataset 1
N-Triples (zip)
Dataset 650K
N-Triples (zip)
Linked Open Data
SPARQL endpoint (metadata)
7
consume
http://www.kunsan.af.mil/News/ Article/413995/serving-the-masses/
Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking
consume
Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking
consume
http://www.kunsan.af.mil/News/ Article/413995/serving-the-masses/
11
https://hwy55burgers.wordpress.com/tag/food-challenge/
12
LOD Laundromat
Dataset 1
N-Triples (zip)
Dataset 650K
N-Triples (zip)
Linked Open Data
SPARQL endpoint (metadata)
Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias
28B triples
13
(LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM)
14
https://datahub.io/dataset/lod-a-lot http://purl.org/HDT/lod-a-lot
15
subjects predicates
16
ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, STREAM REASONING WORKSHOP 2015
Number
sources Update rate month year week day hour minute second 104 105 106 101 100 102 103
DBpedia BTC Dyldo Internet
Virtual/Augmented Reality
versions?
LOD-a-lot
3
Most semantic Web/Linked Data tools are focused on this “static view” but do not consider versioning/evolution
Sindice, SWSE, Swoogle, LOD Cache, LOD-Laundromat… so far, no versions!
20
21
Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7)
Research projects Archives Tools Benchmarking
BEnchmark of RDF ARchives
22
Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7)
Research projects Archives Tools Benchmarking
BEnchmark of RDF ARchives
23
V1
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:C1 ex:hasProfessor ex:S2 . ex:S1 ex:study ex:C1 . ex:S3 ex:study ex:C1 .
V2 V3
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 .
V1
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:C1 ex:hasProfessor ex:S2 .
V1,2,
3
ex:C1 ex:hasProfessor ex:P1 [V1,V2]. ex:C1 ex:hasProfessor ex:P2 [V3]. ex:C1 ex:hasProfessor ex:S2 [V3]. ex:S1 ex:study ex:C1 [V1,V2,V3]. ex:S2 ex:study ex:C1 [V1]. ex:S3 ex:study ex:C1 [V2,V3].
a) Independent Copies/Snapshots (IC) b) Change-based approach (CB) c) Timestamp-based approach (TB)
RETRIEVAL MEDIATOR RETRIEVAL MEDIATOR
RETRIEVAL MEDIATOR
24
https://aic.ai.wu.ac.at/qadlod/bear.html
based on IC, CB and TB policies.
25
26
V1
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:C1 ex:hasProfessor ex:S2 . ex:S1 ex:study ex:C1 . ex:S3 ex:study ex:C1 .
V2 V3
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 .
V1
ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:C1 ex:hasProfessor ex:S2 .
V1,2,
3
ex:C1 ex:hasProfessor ex:P1 [V1,V2]. ex:C1 ex:hasProfessor ex:P2 [V3]. ex:C1 ex:hasProfessor ex:S2 [V3]. ex:S1 ex:study ex:C1 [V1,V2,V3]. ex:S2 ex:study ex:C1 [V1]. ex:S3 ex:study ex:C1 [V2,V3].
a) Independent Copies/Snapshots (IC) b) Change-based approach (CB) c) Timestamp-based approach (TB)
RETRIEVAL MEDIATOR RETRIEVAL MEDIATOR
RETRIEVAL MEDIATOR
27
SELECT * WHERE { Q :[v1] }
[1] Antoine Zimmermann, Nuno Lopes, Axel Polleres, and Umberto Straccia. A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics (JWS), 12:72--95, March 2012.
28
SELECT * WHERE { { { {Q :[v1]} MINUS {Q :[v2]} } BIND (v1 AS ?V ) } UNION { { {Q :[v2] } MINUS {Q :[v1]}} BIND (v2 AS ?V ) }
29
SELECT * WHERE { Q :?V }
30
SELECT * WHERE { {Q :[v1]} {Q :[v2]} }
31
SELECT ?V1 ?V2 WHERE { {{Q :?V1 } MINUS {Q :?V2}} UNION {{Q :?V2 } MINUS {Q :?V1}} FILTER( abs(?V1-?V2) = 1 ) }
Open question remains: What is the right query syntax for archive queries?
32
Materialize (s,?,? ; version)
33
diff(?,?,o ; version0 ; version t)
triples
34
Bv
1
1 1 1 Bv
2
1 1 Bv
3
1 1 Triples 1 2 3 4 5
tpv
Versions 1 2 3 Bt
1
1 1 1 1 1 1 1 Triples 1 2 3 4 5
vpt
Version s 1 2 3 Bt
2 Bt 3 Bt 4 Bt 5
[2] Ana Cerdeira-Pena, Antonio Fariña, Javier D. Fernández, and Miguel A. Martínez-Prieto. Self- Indexing RDF Archives. Data Compression Conference (DCC), 2016.
Performs more than one order of magnitude faster than Jena-TDB for query resolution
G3b G1b
Linked Open Data Cloud Linked Closed Data Cloud
dbpedia G3a G4a G1a G2a G1c G2c G2b
“Deep Semantic Web”
38
39
Self-Enforcing Access Control for Encrypted RDF Javier D. Fernández, Sabrina Kirrane, Axel Polleres and Simon Steyskal. In ESWC’17
https://aic.ai.wu.ac.at/qadlod/presentations/ keystoneHandsOn2017.pdf https://aic.ai.wu.ac.at/qadlod/presentations/ codeKeystone2017
OR
OR convert your RDF dataset with the tool.
rdfhdt.org HDT-C++ HDT-Java Command Line tools X X TP search X X Full SPARQL
Parametrizable Compression
X
X
LDF
Jena, Fuseki
simplicity in the installation).
hdt, e.g.
Java)
subjects
will be the “pocket” data tomorrow
Linked Data
= Cheap, scalable consumers
PAGE 59