Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov - - PowerPoint PPT Presentation

linked data indexing methods a survey
SMART_READER_LITE
LIVE PREVIEW

Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov - - PowerPoint PPT Presentation

Linked Data Indexing Methods: A Survey Martin Svoboda, Irena Mlnkov Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece Outline Introduction Dimensions Approaches Observations


slide-1
SLIDE 1

Linked Data Indexing Methods: A Survey

Martin Svoboda, Irena Mlýnková

Charles University in Prague The Czech Republic 21st October 2011 SWWS@OTM, Crete, Greece

slide-2
SLIDE 2

Linked Data Indexing Methods: A Survey 2 SWWS@OTM, Crete, Greece 21st October 2011

Outline

  • Introduction
  • Dimensions
  • Approaches
  • Observations
  • Challenges
  • Conclusion
slide-3
SLIDE 3

Linked Data Indexing Methods: A Survey 3 SWWS@OTM, Crete, Greece 21st October 2011

Introduction

  • Motivation
  • Web of Documents
  • Web of Data
  • Linked Data
  • Principles

‒ Unique identifiers (URIs) ‒ Useful description (HTTP, RDF) ‒ Links

slide-4
SLIDE 4

Linked Data Indexing Methods: A Survey 4 SWWS@OTM, Crete, Greece 21st October 2011

Introduction

  • RDF (Resource Description Framework)
  • Triples

‒ Subject Predicate Object.

  • Graph

‒ Directed labeled multigraph ‒ Vertices for subjects and objects ‒ Edges for particular triples

slide-5
SLIDE 5

Linked Data Indexing Methods: A Survey 5 SWWS@OTM, Crete, Greece 21st October 2011

Intent

  • Querying framework
  • Architecture

‒ Compromise between local and distributed approaches

  • Issues

‒ Physical storage ‒ Index structures ‒ Query processor

  • Problems

‒ Data scalability, distribution and dynamicity

slide-6
SLIDE 6

Linked Data Indexing Methods: A Survey 6 SWWS@OTM, Crete, Greece 21st October 2011

Intent

  • Architecture
  • Local

‒ Efficient processing ‒ Independent data ‒ Storage requirements

  • Distributed

‒ Runtime requests ‒ Up-to-date data ‒ Network throughput

slide-7
SLIDE 7

Linked Data Indexing Methods: A Survey 7 SWWS@OTM, Crete, Greece 21st October 2011

Dimensions

  • Aspects
  • Data
  • Index
  • Querying
  • Dimensions
  • Not all combinations make sense
slide-8
SLIDE 8

Linked Data Indexing Methods: A Survey 8 SWWS@OTM, Crete, Greece 21st October 2011

Dimensions

  • Data distribution
  • Local, distributed or global data
  • Data units
  • Triples, quads, documents or other sources
  • Data dynamicity
  • Durable, changeable or volatile data
  • Index organization
  • Local or distributed model
slide-9
SLIDE 9

Linked Data Indexing Methods: A Survey 9 SWWS@OTM, Crete, Greece 21st October 2011

Dimensions

  • Index items
  • Keywords, triples, quads, trees, paths or areas
  • Index content
  • Pure data, statistics or summaries about data
  • Index dynamicity
  • Dynamic or static structures
  • Access patterns
  • Universal or limited approaches
slide-10
SLIDE 10

Linked Data Indexing Methods: A Survey 10 SWWS@OTM, Crete, Greece 21st October 2011

Dimensions

  • Querying layer
  • Syntactic, structural or semantic querying
  • Query models
  • Full text querying or graph patterns
  • Query evaluation
  • Local or distributed processing
  • Query results
  • Complete or incomplete results
slide-11
SLIDE 11

Linked Data Indexing Methods: A Survey 11 SWWS@OTM, Crete, Greece 21st October 2011

Categories

  • Main approach types
  • Querying systems

‒ Local or distributed data ‒ Structural queries ‒ Complete results

  • Searching engines

‒ Global data cloud ‒ Full text queries ‒ Imprecise results

slide-12
SLIDE 12

Linked Data Indexing Methods: A Survey 12 SWWS@OTM, Crete, Greece 21st October 2011

Approaches

  • Source selection

‒ Andreas Harth et al.: Data Summaries for On-Demand Queries over Linked Data

  • Data transformation

‒ 3-dimenisonal space ‒ Hash functions

  • Q-trees based on R-trees

‒ Overlapping bounding boxes ‒ Buckets with summaries

Dataset A 87 Dataset B 14 (5, 10, 5) (15, 20, 25)

slide-13
SLIDE 13

Linked Data Indexing Methods: A Survey 13 SWWS@OTM, Crete, Greece 21st October 2011

Approaches

  • BitMat index

‒ Medha Atre et al.: Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data

  • 3-dimensional matrix

‒ Bit values 0 or 1

  • 2-dimensional slices

‒ S-O, O-S, P-O, P-S slices

  • Implementation

‒ Compressed bit runs

knows lives in

predicates

  • bjects

subjects

John Peter 1 1

slide-14
SLIDE 14

Linked Data Indexing Methods: A Survey 14 SWWS@OTM, Crete, Greece 21st October 2011

Observations

  • String compression
  • Repeating string values

‒ URIs and literals

  • Unique integer identifiers

‒ Efficient processing ‒ Space requirements

  • Translation maps

‒ Both directions ‒ Based on B-trees

slide-15
SLIDE 15

Linked Data Indexing Methods: A Survey 15 SWWS@OTM, Crete, Greece 21st October 2011

Observations

  • Data pruning
  • Idea

‒ Query optimization ‒ Relevant data

  • Methods

‒ Filtering selections ‒ Join ordering

  • Problem

‒ Partial knowledge

slide-16
SLIDE 16

Linked Data Indexing Methods: A Survey 16 SWWS@OTM, Crete, Greece 21st October 2011

Challenges

  • Data distribution
  • Motivation

‒ Datasets are distributed ‒ Appropriate compromise

  • Problems

‒ Network drawbacks ‒ Space requirements ‒ Independent datasets

slide-17
SLIDE 17

Linked Data Indexing Methods: A Survey 17 SWWS@OTM, Crete, Greece 21st October 2011

Challenges

  • Data scalability
  • Motivation

‒ Web of Data size explosion

  • September 2011:
  • 295 datasets, 31 billion triples, 504 million links
  • Problems

‒ Scalable storages and indices ‒ Efficient query evaluation ‒ Quality, provenance and trust

slide-18
SLIDE 18

Linked Data Indexing Methods: A Survey 18 SWWS@OTM, Crete, Greece 21st October 2011

Challenges

  • Data dynamicity
  • Motivation

‒ Data tend to ageing

  • Problems

‒ Continuous updates ‒ Dynamic structures

slide-19
SLIDE 19

Linked Data Indexing Methods: A Survey 19 SWWS@OTM, Crete, Greece 21st October 2011

Conclusion

  • Problem
  • Linked Data indexing methods
  • Contributions
  • Approaches comparison

‒ Dimensions ‒ Observations ‒ Challenges

slide-20
SLIDE 20

Thank you for your attention…

Faculty of Mathematics and Physics

Charles University in Prague