a main memory index structure to query linked data
play

A Main Memory Index Structure to Query Linked Data Olaf Hartig - PowerPoint PPT Presentation

A Main Memory Index Structure to Query Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Frank Huber Database and Information Systems Research Group Humboldt-Universitt zu Berlin The Issue 0 0,2 0,4 0,6 0,8 1 0 5


  1. A Main Memory Index Structure to Query Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Frank Huber Database and Information Systems Research Group Humboldt-Universität zu Berlin

  2. The Issue 0 0,2 0,4 0,6 0,8 1 0 5 10 15 20 25 30 0 20 40 60 80 no reuse given order ContactInfoPhillipe (Query No. 36) UnsetPropsPhillipe (Query No. 37) 2ndDegree1Phillipe (Query No. 38) 2ndDegree2Phillipe (Query No. 39) IncomingPhillipe (Query No. 40) 0 0,2 0,4 0,6 0,8 1 0 5 10 15 20 25 30 0 20 40 60 80 hit rate number of query results query execution time (in seconds) Olaf Hartig - A Main Memory Index Structure to Query Linked Data 2

  3. The Issue 0 0,2 0,4 0,6 0,8 1 0 5 10 15 20 25 30 0 20 40 60 80 Descriptor objects no reuse given order in the query-local ContactInfoPhillipe dataset after (Query No. 36) query execution: UnsetPropsPhillipe 172 (Query No. 37) 533 2ndDegree1Phillipe (Query No. 38) 2ndDegree2Phillipe (Query No. 39) IncomingPhillipe (Query No. 40) 0 0,2 0,4 0,6 0,8 1 0 5 10 15 20 25 30 0 20 40 60 80 hit rate number of query results query execution time (in seconds) Olaf Hartig - A Main Memory Index Structure to Query Linked Data 3

  4. query-local Logical representation of dataset Linked Data from the Web Physical representation of Linked Data from the Web ? What data structure do we use to physically represent the query-local dataset? Olaf Hartig - A Main Memory Index Structure to Query Linked Data 4

  5. Outline 1. Requirements + Existing Work 2. Data Structures 3. Evaluation Olaf Hartig - A Main Memory Index Structure to Query Linked Data 5

  6. Requirements ● (Consecutively) build and use ad hoc collections of many small sets of RDF triples ● Four main operations: ● Find … matching triples for a triple pattern in all descriptor objects ● Add , Remove , Replace … descriptor objects ● Support of concurrent access (i.e. isolation) ● Non -relevant properties: ● Querying descriptor objects individually is not necessary ● No need to write data back to the Web ● ACID properties not required for complete queries Olaf Hartig - A Main Memory Index Structure to Query Linked Data 6

  7. Requirements ● (Consecutively) build and use ad hoc collections of many small sets of RDF triples ● Four main operations: ● Find … matching triples for a triple pattern in all descriptor objects ● Add , Remove , Replace … descriptor objects ● Support of concurrent access (i.e. isolation) ● Non -relevant properties: ● Querying descriptor objects individually is not necessary ● No need to write data back to the Web ● ACID properties not required for complete queries Olaf Hartig - A Main Memory Index Structure to Query Linked Data 7

  8. Existing Work ● Disk based storage solutions for RDF data ● Unsuitable due to very costly I/O operations ● Main memory based data structures in the literature ● Focus on a large, single set of RDF triples ● Optimized for complete graph pattern queries or path queries ● Main memory based data structures in RDF frameworks ● Focus on Jena, ARQ and NG4J ● Inefficient (see evaluation) Olaf Hartig - A Main Memory Index Structure to Query Linked Data 8

  9. Outline 1. Requirements + Existing Work 2. Data Structures 3. Evaluation Olaf Hartig - A Main Memory Index Structure to Query Linked Data 9

  10. Hash-Based Index for RDF Data Logical representation Physical representation SP PO SO ● Dictionary: ● Two-way mapping between RDF Dict terms and numerical identifiers S P O ● 6 hash tables: ● Each hash table contains all ID-encoded triples ● Efficient support for all types of triple patterns *Similar to Harth and Decker, 2005 Olaf Hartig - A Main Memory Index Structure to Query Linked Data 10

  11. Hash-Based Index for RDF Data Logical representation Physical representation SP PO SO ● Dictionary: ● Two-way mapping between RDF Dict terms and numerical identifiers S P O ● 6 hash tables: ● Each hash table contains t id = ( id s ,id p ,id o ) all ID-encoded triples ● Efficient support for all types of triple patterns *Similar to Harth and Decker, 2005 Olaf Hartig - A Main Memory Index Structure to Query Linked Data 11

  12. Hash-Based Index for RDF Data Logical representation Find ?acq knows Physical representation http://bob.name SP PO SO ● Dictionary: ● Two-way mapping between RDF Dict terms and numerical identifiers S P O ● 6 hash tables: ● Each hash table contains t id = ( id s ,id p ,id o ) all ID-encoded triples ● Efficient support for all types of triple patterns *Similar to Harth and Decker, 2005 Olaf Hartig - A Main Memory Index Structure to Query Linked Data 12

  13. Individual Indexing query-local dataset Logical representation Physical representation SP PO SO SP PO SO Dict S P O S P O SP PO SO ● Idea: Index each descriptor object separately S P O ● Implementation of the four operations: ● Add , Remove , and Replace are straightforward ● Find requires iterating over all indexes Olaf Hartig - A Main Memory Index Structure to Query Linked Data 13

  14. Individual Indexing Find ?acq knows query-local http://bob.name dataset Logical representation Physical representation SP PO SO SP PO SO Dict S P O S P O SP PO SO ● Idea: Index each descriptor object separately S P O ● Implementation of the four operations: ● Add , Remove , and Replace are straightforward ● Find requires iterating over all indexes Olaf Hartig - A Main Memory Index Structure to Query Linked Data 14

  15. Combined Indexing query-local dataset Logical representation Physical representation SP PO SO ● Idea: Use a single index for all descriptor objects Dict S P O ● src – maps each triple to a set of descriptor object IDs Olaf Hartig - A Main Memory Index Structure to Query Linked Data 15

  16. Combined Indexing query-local dataset Logical representation Physical representation SP PO SO ● Idea: Use a single index for all descriptor objects Dict S P O ● src – maps each triple to a set of descriptor object IDs t id = ( id s ,id p ,id o ) Olaf Hartig - A Main Memory Index Structure to Query Linked Data 16

  17. Combined Indexing query-local dataset Logical representation Physical representation SP PO SO ● Idea: Use a single index for all descriptor objects Dict S P O ● src – maps each triple to a set of descriptor object IDs t id = ( id s ,id p ,id o ) + src ( t id ) = { , } Olaf Hartig - A Main Memory Index Structure to Query Linked Data 17

  18. Quad Indexing query-local dataset Logical representation Physical representation SP PO SO ● Idea: Use a single quad index for all descriptor objects Dict S P O ● quad = ID-encoded triple + descriptor object ID q = ( (id s ,id p ,id o ) , ) Olaf Hartig - A Main Memory Index Structure to Query Linked Data 18

  19. Outline 1. Requirements + Existing Work 2. Data Structures 3. Evaluation Olaf Hartig - A Main Memory Index Structure to Query Linked Data 19

  20. Experiment Setup Does this affect the overall execution time for link traversal based query executions ? Olaf Hartig - A Main Memory Index Structure to Query Linked Data 20

  21. Experiment Setup Does this affect the overall execution time for link traversal based query executions ? ● Simulation of the Web of Data ● Linked Data server publishes BSBM dataset (scal. factor: 50) ● Adjusted BSBM queries link to the simulation server ● Experiment: ● Sequence of 200 query mixes ● Reuse of the query-local dataset for the whole sequence ● IndIR, CombIR, and QuadIR (as presented), engine: SQUIN ● NamedGraphSetImpl (NG4J/Jena), engine: SemWeb Client Olaf Hartig - A Main Memory Index Structure to Query Linked Data 21

  22. Execution Time 80 2500 overall number of descr.objects in the queried dataset NG4J (SWClLib 70 ) IndIR, m=4 2000 CombIR, m=12 60 CombQuadIR, execution time in seconds m=12 50 1500 40 1000 30 20 500 10 0 0 0 40 80 120160200 0 20 40 60 80 100 120 140 160 180 200 query mix query mix Olaf Hartig - A Main Memory Index Structure to Query Linked Data 22

  23. Execution Time 80 2500 overall number of descr.objects in the queried dataset NG4J (SWClLib 70 ) IndIR, m=4 2000 CombIR, m=12 60 CombQuadIR, execution time in seconds m=12 50 1500 40 1000 30 20 500 10 0 0 0 40 80 120160200 0 20 40 60 80 100 120 140 160 180 200 query mix query mix Olaf Hartig - A Main Memory Index Structure to Query Linked Data 23

  24. Summary ● Three hash index based data structures: ● Individually indexing ● Combined indexing ● Quad indexing ● Findings: ● A single index improves query performance significantly ● Smaller load times with quads ● Also for other use cases of ad hoc storing of Linked Data ● Consecutively retrieved from remote sources ● Used for immediate local processing Olaf Hartig - A Main Memory Index Structure to Query Linked Data 24

  25. Backup Slides Olaf Hartig - How Caching Improves Efficiency and Result Completeness for Querying Linked Data 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend