Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov - - PowerPoint PPT Presentation
Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov - - PowerPoint PPT Presentation
Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece Outline Introduction Dimensions Approaches Observations
Linked Data Indexing Methods: A Survey 2 SWWS@OTM, Crete, Greece 21st October 2011
Outline
- Introduction
- Dimensions
- Approaches
- Observations
- Challenges
- Conclusion
Linked Data Indexing Methods: A Survey 3 SWWS@OTM, Crete, Greece 21st October 2011
Introduction
- Motivation
- Web of Documents
- Web of Data
- Linked Data
- Principles
‒ Unique identifiers (URIs) ‒ Useful description (HTTP, RDF) ‒ Links
Linked Data Indexing Methods: A Survey 4 SWWS@OTM, Crete, Greece 21st October 2011
Introduction
- RDF (Resource Description Framework)
- Triples
‒ Subject Predicate Object.
- Graph
‒ Directed labeled multigraph ‒ Vertices for subjects and objects ‒ Edges for particular triples
Linked Data Indexing Methods: A Survey 5 SWWS@OTM, Crete, Greece 21st October 2011
Intent
- Querying framework
- Architecture
‒ Compromise between local and distributed approaches
- Issues
‒ Physical storage ‒ Index structures ‒ Query processor
- Problems
‒ Data scalability, distribution and dynamicity
Linked Data Indexing Methods: A Survey 6 SWWS@OTM, Crete, Greece 21st October 2011
Intent
- Architecture
- Local
‒ Efficient processing ‒ Independent data ‒ Storage requirements
- Distributed
‒ Runtime requests ‒ Up-to-date data ‒ Network throughput
Linked Data Indexing Methods: A Survey 7 SWWS@OTM, Crete, Greece 21st October 2011
Dimensions
- Aspects
- Data
- Index
- Querying
- Dimensions
- Not all combinations make sense
Linked Data Indexing Methods: A Survey 8 SWWS@OTM, Crete, Greece 21st October 2011
Dimensions
- Data distribution
- Local, distributed or global data
- Data units
- Triples, quads, documents or other sources
- Data dynamicity
- Durable, changeable or volatile data
- Index organization
- Local or distributed model
Linked Data Indexing Methods: A Survey 9 SWWS@OTM, Crete, Greece 21st October 2011
Dimensions
- Index items
- Keywords, triples, quads, trees, paths or areas
- Index content
- Pure data, statistics or summaries about data
- Index dynamicity
- Dynamic or static structures
- Access patterns
- Universal or limited approaches
Linked Data Indexing Methods: A Survey 10 SWWS@OTM, Crete, Greece 21st October 2011
Dimensions
- Querying layer
- Syntactic, structural or semantic querying
- Query models
- Full text querying or graph patterns
- Query evaluation
- Local or distributed processing
- Query results
- Complete or incomplete results
Linked Data Indexing Methods: A Survey 11 SWWS@OTM, Crete, Greece 21st October 2011
Categories
- Main approach types
- Querying systems
‒ Local or distributed data ‒ Structural queries ‒ Complete results
- Searching engines
‒ Global data cloud ‒ Full text queries ‒ Imprecise results
Linked Data Indexing Methods: A Survey 12 SWWS@OTM, Crete, Greece 21st October 2011
Approaches
- Source selection
‒ Andreas Harth et al.: Data Summaries for On-Demand Queries over Linked Data
- Data transformation
‒ 3-dimenisonal space ‒ Hash functions
- Q-trees based on R-trees
‒ Overlapping bounding boxes ‒ Buckets with summaries
Dataset A 87 Dataset B 14 (5, 10, 5) (15, 20, 25)
Linked Data Indexing Methods: A Survey 13 SWWS@OTM, Crete, Greece 21st October 2011
Approaches
- BitMat index
‒ Medha Atre et al.: Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data
- 3-dimensional matrix
‒ Bit values 0 or 1
- 2-dimensional slices
‒ S-O, O-S, P-O, P-S slices
- Implementation
‒ Compressed bit runs
knows lives in
predicates
- bjects
subjects
John Peter 1 1
Linked Data Indexing Methods: A Survey 14 SWWS@OTM, Crete, Greece 21st October 2011
Observations
- String compression
- Repeating string values
‒ URIs and literals
- Unique integer identifiers
‒ Efficient processing ‒ Space requirements
- Translation maps
‒ Both directions ‒ Based on B-trees
Linked Data Indexing Methods: A Survey 15 SWWS@OTM, Crete, Greece 21st October 2011
Observations
- Data pruning
- Idea
‒ Query optimization ‒ Relevant data
- Methods
‒ Filtering selections ‒ Join ordering
- Problem
‒ Partial knowledge
Linked Data Indexing Methods: A Survey 16 SWWS@OTM, Crete, Greece 21st October 2011
Challenges
- Data distribution
- Motivation
‒ Datasets are distributed ‒ Appropriate compromise
- Problems
‒ Network drawbacks ‒ Space requirements ‒ Independent datasets
Linked Data Indexing Methods: A Survey 17 SWWS@OTM, Crete, Greece 21st October 2011
Challenges
- Data scalability
- Motivation
‒ Web of Data size explosion
- September 2011:
- 295 datasets, 31 billion triples, 504 million links
- Problems
‒ Scalable storages and indices ‒ Efficient query evaluation ‒ Quality, provenance and trust
Linked Data Indexing Methods: A Survey 18 SWWS@OTM, Crete, Greece 21st October 2011
Challenges
- Data dynamicity
- Motivation
‒ Data tend to ageing
- Problems
‒ Continuous updates ‒ Dynamic structures
Linked Data Indexing Methods: A Survey 19 SWWS@OTM, Crete, Greece 21st October 2011
Conclusion
- Problem
- Linked Data indexing methods
- Contributions
- Approaches comparison