Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier - PowerPoint PPT Presentation

Compressed RDF: Practical Uses & Hands-on Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017

General agenda Session I (09:00 - 10:30) " Basics of Compression for Big Linked Data Management “  Big (Linked) Semantic Data Compression: motivation & challenges  Compact Data Structures  Session II (13:30 - 15:00) “ RDF Compression “  RDF Compression. HDT  RDF Dictionaries  RDF Triples  Session III (15:30- 17:00) “ Compressed RDF: Practical Uses & Hands-on ”  Practical Uses (LOD-a-lot, RDF Archiving, etc.)  Hands on  PAGE 2 images: zurb.com

Agenda of this session Practical uses  LOD-a-lot: Web-scale queries in your pocket  RDF archiving  Linked Data markets (Linked Close Data)  Hands on  HDT-it  Command line tools  HDT and Fuseki  HDT and Linked Data Fragments  HDT and C++/Java  HDT and Jena  PAGE 3 images: zurb.com

Use case 1 LOD-a-lot

Still… what about Web -scale queries E.g. retrieve all entities in LOD with the label “Axel Polleres “  select distinct ?x { ?x rdfs:label “Axel Polleres" } Options:  Crawl and index LOD locally (-no-)  Follow-your-nose (where should I start?)  Federated querying (as good as the endpoints you query)  Use LOD Laundromat as a “good approximation” (still querying 650K datasets)  5

LOD Laundromat Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) 6

But what about Web-scale queries LOD-a-lot - flashback - 7

The real motivation consume

The real motivation Oh man I’m hungry and I don’ t even know if I will like whatever you are cooking Article/413995/serving-the-masses/ http://www.kunsan.af.mil/News/ consume

But what about Web-scale queries But one could be really hungry LOD-a-lot https://hwy55burgers.wordpress.com/tag/food-challenge/ 11

LOD-a-lot Linked Open Data SPARQL LOD endpoint Laundromat (metadata) Dataset 1 Dataset 650K N-Triples N-Triples (zip) (zip) LOD-a-lo lot 28B triples 12 Kudos Javier D. Fernandez, Wouter Beek, Miguel A. Martínez-Prieto, and Mario Arias

LOD-a-lot (some numbers) Disk size:  HDT: 304 GB  HDT-FoQ (additional indexes): 133 GB  305 € Memory footprint (to query):  15.7 GB of RAM (3% of the size)  144 seconds loading time  8 cores (2.6 GHz), RAM 32 GB, SATA HDD on Ubuntu 14.04.5 LTS  LDF page resolution in milliseconds.  (LOD-a-lot creation took 64 h & 170GB RAM. HDT-FoQ took 8 h & 250GB RAM) 13

http://purl.org/HDT/lod-a-lot LOD-a-lot https://datahub.io/dataset/lod-a-lot 14

LOD-a-lot (some use cases) Query resolution at Web scale  Evaluation and Benchmarking  No excuse   RDF metrics and analytics  subjects predicates objects 15

ACKs LOD-a-lot 16

Use case 2 Archiving

So far so good... But RDF is evolving Update rate Virtual/Augmented Internet Reality second of Things minute hour day week Dyldo versions? LOD-a-lot month DBpedia BTC year Number ANDREAS HARTH - STREAM REASONING IN MIXED REALITY APPLICATIONS, of STREAM REASONING WORKSHOP 2015 10 0 10 1 10 2 10 3 10 4 10 5 10 6 sources

Linked Data Archives: The missing link in the RDF evolution Most semantic Web/Linked Data tools are focused on this “ static view ” but do not consider versioning/evolution Sindice, SWSE, Swoogle, LOD Cache, LOD-Laundromat … so far, no versions! 3

Preservation matters Web archives: Common Crawl, Internet Memory, Internet Archive, …  20

…in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 21

…in the last few years: RDF evolution at Scale one of the fundamental problems in the Web of Data Research projects Managing the Evolution and Preservation of the Data Web (FP7) Preserving Linked Data (FP7) Archives Tools v-RDFCSA Benchmarking BEnchmark of RDF ARchives 22

RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 23

BEAR https://aic.ai.wu.ac.at/qadlod/bear.html 24

BEAR: Benchmarking the Efficiency of RDF Archiving Queries and systems  We implemented and evaluate archiving systems on Jena-TDB and HDT,  based on IC, CB and TB policies. Serve as an initial baseline to compare archiving systems  More info: https://aic.ai.wu.ac.at/qadlod/bear.html  25

RDF Archiving. Archiving policies a) Independent Copies/Snapshots (IC) RETRIEVAL MEDIATOR c) Timestamp-based approach (TB) V 1 V 2 V 3 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P1 . ex:C1 ex:hasProfessor ex:P2 . ex:S1 ex:study ex:C1 . ex:S1 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:S2 . ex:S2 ex:study ex:C1 . ex:S3 ex:study ex:C1 . ex:S1 ex:study ex:C1 . V 1,2, ex:S3 ex:study ex:C1 . b) Change-based approach (CB) 3 ex:C1 ex:hasProfessor ex:P1 [V 1 ,V 2 ]. ex:C1 ex:hasProfessor ex:P2 [V 3 ]. ex:C1 ex:hasProfessor ex:S2 [V 3 ]. ex:S1 ex:study ex:C1 [V 1 ,V 2 ,V 3 ]. ex:S2 ex:study ex:C1 [V 1 ]. ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P2 . ex:S3 ex:study ex:C1 [V 2 ,V 3 ]. ex:C1 ex:hasProfessor ex:S2 . V 1 RETRIEVAL MEDIATOR ex:C1 ex:hasProfessor ex:P1 . ex:S1 ex:study ex:C1 . ex:S2 ex:study ex:C1 . ex:C1 ex:hasProfessor ex:P1 . ex:S3 ex:study ex:C1 . 26

Benchmarking: Define the queries Instantiation of archive queries in AnQL [1]  Mat(Q,V1)  SELECT * WHERE { Q :[v1] } version materialization  Diff(Q,V1,V2)  Ver(Q)  join(Q1,vi,Q2,vj)  Change(Q)  [1] Antoine Zimmermann, Nuno Lopes, Axel Polleres, and Umberto Straccia. A general framework for representing, reasoning and querying with annotated Semantic Web data . Journal of Web Semantics (JWS), 12:72--95, March 2012. 27

Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  SELECT * WHERE { Diff(Q,V1,V2) { { {Q :[v1]} MINUS {Q :[v2]} } BIND (v1 AS ?V )  } delta materialization  UNION { { {Q :[v2] } MINUS {Q :[v1]}} BIND (v2 AS ?V ) Ver(Q)  } join(Q1,vi,Q2,vj)  Change(Q)  28

Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  SELECT * WHERE { Q :?V } results of Q annotated with the version  join(Q1,vi,Q2,vj)  Change(Q)  29

Benchmarking: Define the queries Instantiation of archive queries in AnQL  Mat(Q,V1)  Diff(Q,V1,V2)  Ver(Q)  join(Q1,v1,Q2,v2)  SELECT * WHERE { {Q :[v1]} {Q :[v2]} } Change(Q)  30

Benchmarking: Define the queries Instantiation of archive queries in AnQL  Open question remains: What is the right query syntax for archive queries? Mat(Q,V1)  SELECT ?V1 ?V2 WHERE Diff(Q,V1,V2)  { {{Q :?V1 } MINUS {Q :?V2}} UNION Ver(Q)  {{Q :?V2 } MINUS {Q :?V1}} join(Q1,vi,Q2,vj)  FILTER( abs(?V1-?V2) = 1 ) } Change(Q)  Returns consecutive versions in which Diff of a query is not null  31

Time-based access. Queries Materialize (s,?,? ; version) 32

Time-based access. Queries diff(?,?,o ; version0 ; version t) 33

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier - PowerPoint PPT Presentation

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 General agenda Session I (09:00 - 10:30) "

Aligning DNA sequences on compressed collections of genomes Part 5. Practical session: alignment

Aligning DNA sequences on compressed collections of genomes Part 4. Practical session: Unix

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions Paulino Prez 1 Jos Crossa 1 1

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Hands Overview Outline Existing hands Robot hands of the 80s Commercial hands Research

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Prez 1 Jos Crossa 2

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Planning and Optimization X1. Hands-On and Repetition Florian Pommerening Universit at Basel

Compressed Sensing. Find x with small number of non-zeros using linear measurements. Compressed

Outline Existing hands Robot hands of the 80s Commercial hands Research hands Prosthetics

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Deep Compressed Sensing Yan Wu, Mihaela Rosca, Tim Lillicrap Compressed Sensing A Brief Review

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier - PowerPoint PPT Presentation

Compressed RDF: Practical Uses & Hands-on Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 General agenda Session I (09:00 - 10:30) "

Aligning DNA sequences on compressed collections of genomes Part 5. Practical session: alignment

Aligning DNA sequences on compressed collections of genomes Part 4. Practical session: Unix

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions Paulino Prez 1 Jos Crossa 1 1

Hands-On Network Security: Practical Tools &amp; Methods Security Training Course Dr. Charles J.

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je University of

Hands Overview Outline Existing hands Robot hands of the 80s Commercial hands Research

Hands-On Network Security: Practical Tools &amp; Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools &amp; Methods Security Training Course Dr. Charles J.

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Prez 1 Jos Crossa 2

Hands-On Network Security: Practical Tools &amp; Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools &amp; Methods Security Training Course Dr. Charles J.

Aligning DNA sequences on compressed collections of genomes Part 2. Compressed indexing The

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Planning and Optimization X1. Hands-On and Repetition Florian Pommerening Universit at Basel

Compressed Sensing. Find x with small number of non-zeros using linear measurements. Compressed

Outline Existing hands Robot hands of the 80s Commercial hands Research hands Prosthetics

Statistical Analysis of Genetic and Phenotypic Data for Breeders: Hands on Practical Sessions

Pattern Matching on Compressed T exts II Shunsuke Inenaga Kyushu University, Japan Agenda

Compressed Membership for NFA (DFA) with Compressed Labels is in NP (P) Artur Je Wrocaw,

Deep Compressed Sensing Yan Wu, Mihaela Rosca, Tim Lillicrap Compressed Sensing A Brief Review

Fast Data Driven Compressed Sensing and application to compressed quantitative MRI Mike Davies

Hands-on SELinux: A Practical Introduction Security Training Course Dr. Charles J. Antonelli

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.

Hands-On Network Security: Practical Tools & Methods Security Training Course Dr. Charles J.