triples compression and
play

Triples compression and Indexing Antonio Faria, Javier D. Fernndez - PowerPoint PPT Presentation

Triples compression and Indexing Antonio Faria, Javier D. Fernndez and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017 Agenda RDF management overview K 2 -Tree data


  1. Triples compression and Indexing Antonio Fariña, Javier D. Fernández and Miguel A. Martinez-Prieto 3rd KEYSTONE Training School Keyword search in Big Linked Data 23 TH AUGUST 2017

  2. Agenda  RDF management overview  K 2 -Tree data structure  K 2 -Triples  Compressed Suffix Array (CSA)  RDF-CSA PAGE 2 images: zurb.com

  3. RDF magament overview “ Recall we can set string from RDF into a dictionary and then handle a set of RDF-triples as a set of ID-based-triples PAGE 3 BIG (LINKED) SEMANTIC DATA COMPRESSION

  4. RDF management overview 4 of 51 SO 1 London P 1 attends 2 SPIRE 2 capital of 3 held on S 3 A.Gionis UK inv-speaker Finland 4 lives in 4 M.Lalmas lives in p 5 position lives in o 5 R.Raman s i lives t i 6 works in o capital of n in O 3 Finland R.Raman 4 inv-speaker M.Lalmas A.Gionis attends works a t 5 UK t s e d n in n d e t s t a Dictionary Encoding London SPIRE held on (SPIRE, held on, London) (London, capital of, UK) (A.Gionis, attends, SPIRE) (R.Raman, attends, SPIRE) (2,3,1) (M.Lalmas, attends, SPIRE) (1,2,5) (M.Lalmas, lives in, UK) (3,1,2) (M.Lalmas, works in, London) (5,1,2) (A.Gionis, lives in, Finland) (4,1,2) (R.Raman, lives in, UK) (4,4,5) (R.Raman, position, inv-speaker) (4,6,1) (3,4,3) Original Triplets (5,4,5) (5,5,4) Id-based Triplets

  5. Agenda  RDF management overview  K 2 -Tree data structure  K 2 -Triples  Compressed Suffix Array (CSA)  RDF-CSA PAGE 5 images: zurb.com

  6. K 2 -tree data structure “ A k 2 -tree permits a compact representation of an adjacency matrix. PAGE 6 BIG (LINKED) SEMANTIC DATA COMPRESSION

  7. K 2 -Tree Motivation 7 of 51 Structure for representing adjacency matrix  Originally designed for web graphs  Simple directed graph  2 3 4 5 6 7 8 9 10 11 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 6 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 3 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 5 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 8 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 4 8 10 9 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 11 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0

  8. K 2 -Tree Construction process 8 of 51 Example with K=2 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T = 101111010100100011001000000101011110 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 L = 010000110010001010101000011000100100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  9. K 2 -Tree Direct neighbor operation 9 of 51 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 8 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 9 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 10 0100 0011 0010 0010 10101000 0110 0010 0100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 T = 101111010100100011001000000101011110 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(2) = rank1(T,2)* k 2 = 2*4=8 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 children(9) = rank1(T,9)* k 2 = 7*4=28 children(31) = rank1(T,31)* k 2 = 14*4=56 L = 010000110010001010101000011000100100 36 38 40 42 44 46 48 50 52 54 56 …

  10. Agenda  RDF management overview  K 2 -Tree data structure  K 2 -Triples  Compressed Suffix Array (CSA)  RDF-CSA PAGE 10 images: zurb.com

  11. K 2 -triples “ k 2 -triples applies vertical partitioning of an RDF dataset by predicate. Then, |P| k 2 -trees permit to represent all the triples involving a given predicate. PAGE 11 BIG (LINKED) SEMANTIC DATA COMPRESSION

  12. K 2 -Triples Data structure 12 of 51 Dictionary encoding  Triples as a set of identifiers  Mapped RDF triples Dictionary triples

  13. K 2 -Triples Data structure 13 of 51 O Vertical partitioning (by predicates)  7 P1 P2 One K 2 -tree per predicate  (S,P ,O) (8,5,4) S4 1 1 1 1 (4,2,3) 1 1 (4,4,6) 1 1 (4,1,7) 1 1 (7,2,3) (3,3,5) P3 P4 P5 (5,2,1) 1 (1,3,5) 1 (6,2,2) 1 (2,3,5) 1 1

  14. K 2 -Triples operations: solving triple patterns 14 of 51 Query: (4,2,3) SPO  checking a cell  SP?  ?PO  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3)

  15. K 2 -Triples operations: solving triple patterns 15 of 51 Query: (4,2,?) SPO  checking a cell  SP?  direct neighbours  ?PO  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3)

  16. K 2 -Triples operations: solving triple patterns 16 of 51 Query: (?,2,3) SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours  P2 S?O  S??  1 1 ??O  1 1 ?P? 1 1  1 1 Result: (4,2,3), (7,2,3)

  17. K 2 -Triples operations: solving triple patterns Query: (4,?,6) 17 of 51 SPO  checking a cell  SP?  direct neighbours  1 1 1 1 ?PO  reverse neighbours P1 P2  1 1 S?O  checking |P| cells 1 1  1 1 S??  P3 P4 P5 ??O  1 ?P? 1  1 1 1 Result: (4,4,6)

  18. K 2 -Triples operations: solving triple patterns Query: (4,?,?) 18 of 51 SPO  checking a cell  SP?  direct neighbours  1 1 1 1 ?PO  reverse neighbours P1 P2  1 1 S?O  checking |P| cells 1 1  1 1 S??  |P| direct neighbours  P3 P4 P5 ??O  1 ?P? 1  1 1 1 Result: (4,1,7), (4,2,3), (4,4,6)

  19. K 2 -Triples operations: solving triple patterns Query: (?,?,4) 19 of 51 SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours 1 1 1 1 P1 P2  1 1 S?O  checking |P| cells  1 1 1 1 S??  |P| direct neighbours  ??O  |P| reverse neighbours P3 P4 P5  1 ?P?  1 1 1 1 Result: (8,5,4)

  20. K 2 -Triples operations: solving triple patterns 20 OF 51 20 of 51 Query: (?,2,?) SPO  checking a cell  SP?  direct neighbours  ?PO  reverse neighbours  P2 S?O  checking |P| cells  S??  |P| direct neighbours  1 1 ??O  |P| reverse neighbours  1 1 ?P?  full adjacency matrix 1 1  1 1 Result: (4,2,3), (5,2,1),(6,2,2),(7,2,3)

  21. K 2 -Triples SP & OP indexes 21 of 51 Weakness of vertical partitioning  unbounded predicates  (S,?,?), (?,?,O), (S,?,O)  Checking the |P| K 2 -trees!  They proposed indexes SP and OP  S Predicates (S,P,O) (8,5,4) 1 3 (4,2,3) 2 3 (4,4,6) SP INDEX (4,1,7) 3 3 (7,2,3) 4 1,2,4 (3,3,5) 5 2 (5,2,1) Statistically compressed (1,3,5) 6 2 Direct access with DAC (6,2,2) 7 2 (2,3,5) 8 5

  22. K 2 -Triples SP & OP indexes 22 of 51 Subject 4? Query (4,?,?)  SP INDEX Predicate list: 1,2,4 P3 P1 P2 P4 P5 1 1 1 1 1 1 1 1 1 1

  23. K 2 -Triples Joins 23 of 51 They implemented three join strategies  Query: (8,5,?X) (?X,2,?) Taking advantage of the K 2 -triples structure  merge-join Independent join • Best strategy depends on the Chain join index-join • dataset and the type of join Interactive join •

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend