Triples compression and Indexing Antonio Faria, Javier D. Fernndez - PowerPoint PPT Presentation

Triples compression and Indexing Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017

Agenda  RDF management overview  K 2 -Tree data structure  K 2 -Triples  Compressed Suffix Array (CSA)  RDF-CSA PAGE 2 images: zurb.com

RDF magament overview “ Recall we can set string from RDF into a dictionary and then handle a set of RDF-triples as a set of ID-based-triples PAGE 3 BIG (LINKED) SEMANTIC DATA COMPRESSION

RDF management overview 4 of 51 SO 1 London P 1 attends 2 SPIRE 2 capital of 3 held on S 3 A.Gionis UK inv-speaker Finland 4 lives in 4 M.Lalmas lives in p 5 position lives in o 5 R.Raman s i lives t i 6 works in o capital of n in O 3 Finland R.Raman 4 inv-speaker M.Lalmas A.Gionis attends works a t 5 UK t s e d n in n d e t s t a Dictionary Encoding London SPIRE held on (SPIRE, held on, London) (London, capital of, UK) (A.Gionis, attends, SPIRE) (R.Raman, attends, SPIRE) (2,3,1) (M.Lalmas, attends, SPIRE) (1,2,5) (M.Lalmas, lives in, UK) (3,1,2) (M.Lalmas, works in, London) (5,1,2) (A.Gionis, lives in, Finland) (4,1,2) (R.Raman, lives in, UK) (4,4,5) (R.Raman, position, inv-speaker) (4,6,1) (3,4,3) Original Triplets (5,4,5) (5,5,4) Id-based Triplets

K 2 -tree data structure “ A k 2 -tree permits a compact representation of an adjacency matrix. PAGE 6 BIG (LINKED) SEMANTIC DATA COMPRESSION

K 2 -Tree Motivation 7 of 51 Structure for representing adjacency matrix  Originally designed for web graphs  Simple directed graph  2 3 4 5 6 7 8 9 10 11 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 3 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 5 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 8 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 4 8 10 9 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 11 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0

K 2 -Tree Construction process 8 of 51 Example with K=2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T = 101111010100100011001000000101011110 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 L = 010000110010001010101000011000100100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

K 2 -Tree Direct neighbor operation 9 of 51 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 8 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 10 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 T = 101111010100100011001000000101011110 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(2) = rank1(T,2)* k 2 = 2*4=8 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(9) = rank1(T,9)* k 2 = 7*4=28 children(31) = rank1(T,31)* k 2 = 14*4=56 L = 010000110010001010101000011000100100 36 38 40 42 44 46 48 50 52 54 56 …

K 2 -triples “ k 2 -triples applies vertical partitioning of an RDF dataset by predicate. Then, |P| k 2 -trees permit to represent all the triples involving a given predicate. PAGE 11 BIG (LINKED) SEMANTIC DATA COMPRESSION

K 2 -Triples Data structure 12 of 51 Dictionary encoding  Triples as a set of identifiers  Mapped RDF triples Dictionary triples

K 2 -Triples Data structure 13 of 51 O Vertical partitioning (by predicates)  7 P1 P2 One K 2 -tree per predicate  (S,P ,O) (8,5,4) S4 1 1 1 1 (4,2,3) 1 1 (4,4,6) 1 1 (4,1,7) 1 1 (7,2,3) (3,3,5) P3 P4 P5 (5,2,1) 1 (1,3,5) 1 (6,2,2) 1 (2,3,5) 1 1

K 2 -Triples operations: solving triple patterns 14 of 51 Query: (4,2,3) SPO  checking a cell  SP?  ?PO  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3)

K 2 -Triples operations: solving triple patterns 15 of 51 Query: (4,2,?) SPO  checking a cell  SP?  direct neighbours  ?PO  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3)

K 2 -Triples operations: solving triple patterns 16 of 51 Query: (?,2,3) SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3), (7,2,3)

K 2 -Triples operations: solving triple patterns Query: (4,?,6) 17 of 51 SPO  checking a cell  SP?  direct neighbours  1 1 1 1 ?PO  reverse neighbours P1 P2  1 1 S?O  checking |P| cells 1 1  1 1 S??  P3 P4 P5 ??O  1 ?P? 1  1 1 1 Result: (4,4,6)

K 2 -Triples operations: solving triple patterns Query: (4,?,?) 18 of 51 SPO  checking a cell  SP?  direct neighbours  1 1 1 1 ?PO  reverse neighbours P1 P2  1 1 S?O  checking |P| cells 1 1  1 1 S??  |P| direct neighbours  P3 P4 P5 ??O  1 ?P? 1  1 1 1 Result: (4,1,7), (4,2,3), (4,4,6)

K 2 -Triples operations: solving triple patterns Query: (?,?,4) 19 of 51 SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours 1 1 1 1 P1 P2  1 1 S?O  checking |P| cells  1 1 1 1 S??  |P| direct neighbours  ??O  |P| reverse neighbours P3 P4 P5  1 ?P?  1 1 1 1 Result: (8,5,4)

K 2 -Triples operations: solving triple patterns 20 OF 51 20 of 51 Query: (?,2,?) SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours  P2 S?O  checking |P| cells  S??  |P| direct neighbours  1 1 ??O  |P| reverse neighbours  1 1 ?P?  full adjacency matrix 1 1  1 1 Result: (4,2,3), (5,2,1),(6,2,2),(7,2,3)

K 2 -Triples SP & OP indexes 21 of 51 Weakness of vertical partitioning  unbounded predicates  (S,?,?), (?,?,O), (S,?,O)  Checking the |P| K 2 -trees!  They proposed indexes SP and OP  S Predicates (S,P,O) (8,5,4) 1 3 (4,2,3) 2 3 (4,4,6) SP INDEX (4,1,7) 3 3 (7,2,3) 4 1,2,4 (3,3,5) 5 2 (5,2,1) Statistically compressed (1,3,5) 6 2 Direct access with DAC (6,2,2) 7 2 (2,3,5) 8 5

K 2 -Triples SP & OP indexes 22 of 51 Subject 4? Query (4,?,?)  SP INDEX Predicate list: 1,2,4 P3 P1 P2 P4 P5 1 1 1 1 1 1 1 1 1 1

K 2 -Triples Joins 23 of 51 They implemented three join strategies  Query: (8,5,?X) (?X,2,?) Taking advantage of the K 2 -triples structure  merge-join Independent join • Best strategy depends on the Chain join index-join • dataset and the type of join Interactive join •

Triples compression and Indexing Antonio Faria, Javier D. Fernndez - PowerPoint PPT Presentation

Triples compression and Indexing Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 Agenda RDF management overview K 2 -Tree data

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

The Trouble with Triples: Difficulties with the triple scenario and how they might be overcome

Semantic Web & BI Triples (Quads) Sources of contextualized triple graphs Analysis. &

A Model to Address Salary Compression for Faculty (an anti-compression model) Presented to

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Compression Strategies & Alternate Summarization Systems and Applications Ling 573 May 23,

Compression Programs File Compression: Gzip, Bzip Archivers :Arc, Pkzip, Winrar,

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

Information Retrieval Tutorial 3: Index Compression Professor: Michel Schellekens TA: Ang Gao

Building Applications on the Ethereum Blockchain Eoin Woods Endava @eoinwoodz 1 licensed

iCouncil Jenny Rhodes jenny.rhodes@newcastle.edu.au Senior Desktop Technologies Officer IT

ICSA 2019 - ECRF Presentation Slides C Pretorius Presentation March 2019 CITATIONS READS 0

STAGE 2 STAGE 1 PROBLEM SPACE DECISIONS DESIGN PLANNING DECISIONS Considering

Internet Conges+on Control Research Group Michael Welzl, Wes Eddy ICCRG @ PFLDNeT 2010

SCADA deep inside: protocols and security mechanisms Aleksandr Timorin

Introducing a Heterogeneous Execution Engine for LLVM Chris Margiolas chrmargiolas@gmail.com

Transformation at the NRC: Innovation Commission Meeting March 28, 2019 Executive Director for

Triples compression and Indexing Antonio Faria, Javier D. Fernndez - PowerPoint PPT Presentation

Triples compression and Indexing Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 Agenda RDF management overview K 2 -Tree data

14.9.2 JPEG2000 compression DCT compression basis for JPEG wavelet compression

Lossless compression in lossy compression systems Almost every lossy compression system

JPEG Compression Ian Snyder December 11, 2009 Ian Snyder JPEG Compression Outline

Lecture 9: Compression 1 / 52 Compression Recap Bu ff er Management Recap 2 / 52 Compression

Digital Image Compression Digital Image Compression Digital Image Compression and JPEG Standards

Digital Video Compression Digital Video Compression Digital Video Compression and H.261

From Sorting to Heaps to Compression Data Compression video on demand/set top box jpeg

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

The Trouble with Triples: Difficulties with the triple scenario and how they might be overcome

Semantic Web &amp; BI Triples (Quads) Sources of contextualized triple graphs Analysis. &amp;

A Model to Address Salary Compression for Faculty (an anti-compression model) Presented to

Compression Overview Multimedia Encoding and Compression Huffman codes Lossless

Compression Strategies &amp; Alternate Summarization Systems and Applications Ling 573 May 23,

Compression Programs File Compression: Gzip, Bzip Archivers :Arc, Pkzip, Winrar,

Scientific Data Compression: From Stone-Age to Renaissance Factor 10,100 compression

Information Retrieval Tutorial 3: Index Compression Professor: Michel Schellekens TA: Ang Gao

Building Applications on the Ethereum Blockchain Eoin Woods Endava @eoinwoodz 1 licensed

iCouncil Jenny Rhodes jenny.rhodes@newcastle.edu.au Senior Desktop Technologies Officer IT

ICSA 2019 - ECRF Presentation Slides C Pretorius Presentation March 2019 CITATIONS READS 0

STAGE 2 STAGE 1 PROBLEM SPACE DECISIONS DESIGN PLANNING DECISIONS Considering

Internet Conges+on Control Research Group Michael Welzl, Wes Eddy ICCRG @ PFLDNeT 2010

SCADA deep inside: protocols and security mechanisms Aleksandr Timorin

Introducing a Heterogeneous Execution Engine for LLVM Chris Margiolas chrmargiolas@gmail.com

Transformation at the NRC: Innovation Commission Meeting March 28, 2019 Executive Director for

Semantic Web & BI Triples (Quads) Sources of contextualized triple graphs Analysis. &

Compression Strategies & Alternate Summarization Systems and Applications Ling 573 May 23,