 
              Compressed RDF: Practical Uses & Hands-on Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017
General agenda Session I (09:00 - 10:30) " Basics of Compression for Big Linked Data Management “  Big (Linked) Semantic Data Compression: motivation & challenges  Compact Data Structures  Session II (13:30 - 15:00) “ RDF Compression “  RDF Compression. HDT  RDF Dictionaries  RDF Triples  Session III (15:30- 17:00) “ Compressed RDF: Practical Uses & Hands-on ”  Practical Uses (LOD-a-lot, RDF Archiving, etc.)  Hands on  PAGE 2 images: zurb.com
Agenda of this session Practical uses  LOD-a-lot: Web-scale queries in your pocket  RDF archiving  Linked Data markets (Linked Close Data)  Hands on  HDT-it  Command line tools  HDT and Fuseki  HDT and Linked Data Fragments  HDT and C++/Java  HDT and Jena  PAGE 3 images: zurb.com
Use case 1 LOD-a-lot
Still… what about Web -scale queries E.g. retrieve all entities in LOD with the label “Axel Polleres “  select distinct ?x { ?x rdfs:label “Axel Polleres" } Options:  Crawl and index LOD locally (-no-)  Follow-your-nose (where should I start?)  Federated querying (as good as the endpoints you query)  Use LOD Laundromat as a “good approximation” (still querying 650K datasets)  5
LOD Laundromat Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) 6
But what about Web-scale queries LOD-a-lot - flashback - 7
The real motivation consume
The real motivation Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking Article/413995/serving-the-masses/ http://www.kunsan.af.mil/News/ consume
The real motivation Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking Article/413995/serving-the-masses/ http://www.kunsan.af.mil/News/ consume
But what about Web-scale queries But one could be really hungry LOD-a-lot https://hwy55burgers.wordpress.com/tag/food-challenge/ 11
LOD-a-lot Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) LOD-a-lo lot 28B triples 12 Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias
LOD-a-lot (some numbers) Disk size:  HDT: 304 GB  HDT-FoQ (additional indexes): 133 GB  305 € Memory footprint (to query):  15.7 GB of RAM (3% of the size)  144 seconds loading time  8 cores (2.6 GHz), RAM 32 GB, SATA HDD on Ubuntu 14.04.5 LTS  LDF page resolution in milliseconds.  (LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM) 13
http://purl.org/HDT/lod-a-lot LOD-a-lot https://datahub.io/dataset/lod-a-lot 14
LOD-a-lot (some use cases) Query resolution at Web scale  Evaluation and Benchmarking  No excuse   RDF metrics and analytics  subjects predicates objects 15
ACKs LOD-a-lot 16
Use case 2 Archiving
So far so good... But RDF is evolving Update rate Virtual/Augmented Internet Reality second of Things minute hour day week Dyldo versions? LOD-a-lot month DBpedia BTC year Number ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, of STREAM REASONING WORKSHOP 2015 10 0 10 1 10 2 10 3 10 4 10 5 10 6 sources
Linked Data Archives: The missing link in the RDF evolution Most semantic Web/Linked Data tools are focused on this “ static view ” but do not consider versioning/evolution Sindice, SWSE, Swoogle, LOD Cache, LOD-Laundromat … so far, no versions! 3
Preservation matters Web archives: Common Crawl, Internet Memory, Internet Archive, …  20
…in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 21
…in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 22
RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 23
BEAR https://aic.ai.wu.ac.at/qadlod/bear.html 24
BEAR: Benchmarking the Efficiency of RDF Archiving Queries and systems  We implemented and evaluate archiving systems on Jena-TDB and HDT,  based on IC, CB and TB policies. Serve as an initial baseline to compare archiving systems  More info: https://aic.ai.wu.ac.at/qadlod/bear.html  25
RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 26
Benchmarking: Define the queries Instantiation of archive queries in AnQL [1]  Mat(Q,V1)  SELECT * WHERE { Q :[v1] } version materialization  Diff(Q,V1,V2)  Ver(Q)  join(Q1,vi,Q2,vj)  Change(Q)  [1] Antoine Zimmermann, Nuno Lopes, Axel Polleres, and Umberto Straccia. A general framework for representing, reasoning and querying with annotated Semantic Web data . Journal of Web Semantics (JWS), 12:72--95, March 2012. 27
Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  SELECT * WHERE { Diff(Q,V1,V2) { { {Q :[v1]} MINUS {Q :[v2]} } BIND (v1 AS ?V )  } delta materialization  UNION { { {Q :[v2] } MINUS {Q :[v1]}} BIND (v2 AS ?V ) Ver(Q)  } join(Q1,vi,Q2,vj)  Change(Q)  28
Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  SELECT * WHERE { Q :?V } results of Q annotated with the version  join(Q1,vi,Q2,vj)  Change(Q)  29
Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  join(Q1,v1,Q2,v2)  SELECT * WHERE { {Q :[v1]} {Q :[v2]} } Change(Q)  30
Benchmarking: Define the queries Instantiation of archive queries in AnQL  Open question remains: What is the right query syntax for archive queries? Mat(Q,V1)  SELECT ?V1 ?V2 WHERE Diff(Q,V1,V2)  { {{Q :?V1 } MINUS {Q :?V2}} UNION Ver(Q)  {{Q :?V2 } MINUS {Q :?V1}} join(Q1,vi,Q2,vj)  FILTER( abs(?V1-?V2) = 1 ) } Change(Q)  Returns consecutive versions in which Diff of a query is not null  31
Time-based access. Queries Materialize (s,?,? ; version) 32
Time-based access. Queries diff(?,?,o ; version0 ; version t) 33
Recommend
More recommend