SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU - PowerPoint PPT Presentation

SPARQLing Kleene – Fast Property Paths in RDF-3X Andrey Gubichev, TU Munich Stephan Seufert, MPI Srikanta Bedathur, IIIT-Delhi June 23, 2013 1 / 21

Motivation ◮ RDF data is a graph ◮ SPARQL 1.1 has introduced the property paths ◮ select * where { Munich yago:isLocatedIn* ?place } ◮ What entities are reached from Munich via yago:isLocatedIn ? 2 / 21

Motivation ◮ RDF data is a graph ◮ SPARQL 1.1 has introduced the property paths ◮ select * where { Munich yago:isLocatedIn* ?place } ◮ What entities are reached from Munich via yago:isLocatedIn ? ◮ We could use joins and unions over the triple store to answer it ◮ Can we do better with a bit of indexing? 3 / 21

Semantics of Property Paths ◮ Originally, one could also count the number of paths between start and end point ◮ However, this semantics leads to #P-hard problems (M.Arenas, WWW’12) ◮ Now, W3C standard only allows to check for reachability, not counting paths 4 / 21

Previous Work: RDF-3X ◮ a triple store ◮ extensive indexing ◮ join ordering with Dynamic Programming ◮ accurate cardinality estimation for common types of queries ◮ T.Neumann et al, SIGMOD 2009 5 / 21

Previous Work: Reachability Index FERRARI ◮ FERRARI index: based on tree interval labeling, assigns exact and approximate labels to nodes (ICDE’2013) ◮ Runtime: use index plus limited DFS ◮ FERRARI: ◮ indexes 100 Mln triples of YAGO in 90 sec ◮ takes 210 Mb ◮ answers a reachability query for (start,end) in microseconds ◮ (all the numbers: off-the-shelf laptop) 6 / 21

Our Contribution How to use FERRARI in RDF-3X ◮ Query optimization ◮ Runtime technique to speed up query execution 7 / 21

QO: Getting the Logical Operator Property path triple may correspond to: ◮ a filter (if one of subject or object is constant) ◮ select * where { Munich yago:isLocatedIn* ?place } ◮ a scan, if one of subject of object is not bound ◮ select * where { ?city yago:isLocatedIn* ?place } ◮ a join, otherwise ◮ Reachability Join: similar to Hash Join (build and probe part) ◮ select * where { ?city yago:isLocatedIn* ?place. ?city hasName "Munich". } ?place type ?type. In the last case, there is one more join opportunity (reflected in the Query Graph) 8 / 21

QO: Plan generation In order to use Dynamic Programming, we extend the cost model ◮ Estimated cardinality of the scan is provided by the index immideately ◮ Cardinality estimation for the join: independence assumption + index information 9 / 21

Runtime: A typical execution plan select ?city ?p ?type where { ?city hasName "Munich". } ?city hasPopulation ?p. ?city locatedIn*/type ?type. ⋉ R (? c , ? o ) ⋊ ⋉ MJ index scan ⋊ c 1 = c 2 (?o, type, ?type) index scan PS index scan POS (? c 1 ,name,Munich) (? c 2 ,population,?p) 10 / 21

Runtime: A typical execution plan select ?city ?p ?type where { ?city hasName "Munich". } ?city hasPopulation ?p. ?city locatedIn*/type ?type. ⋉ R (? c , ? o ) ⋊ ⋉ MJ index scan ⋊ c 1 = c 2 (?o, type, ?type) index scan PS index scan POS (? c 1 ,name,Munich) (? c 2 ,population,?p) ◮ Individual triple patterns are very unselective ◮ We can pass gap information between different index scans, so that most part of the data can be skipped (indirectly) ◮ (With some restrictions) this idea extends to Reachability Joins 11 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values ⋉ RJ ⋊ x 1 = x 2 x 1 x 2 12 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values ⋉ RJ ⋊ x 1 = x 2 x 1 x 2 3 13 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 Domain for ? o 4 min max Bloom 1 9 011000 hash function: v mod 7 14 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 1 Domain for ? o 4 min max Bloom 1 9 011000 hash function: v mod 7 15 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 1 Domain for ? o ✁ 4 3 min max Bloom 1 9 011000 hash function: v mod 7 16 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 1 Domain for ? o ✁ 4 3 min max Bloom ✁ 4 1 9 011000 hash function: v mod 7 17 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 1 Domain for ? o ✁ 4 3 min max Bloom ✁ 4 1 9 011000 6 ✁ hash function: v mod 7 18 / 21

Sideways Information Passing for Property Paths Build phase: construct domain filters for observed attribute values, using approx intervals from FERRARI: min max Bloom filter (1024 bytes) Probe phase : pass the bloome filter to the right index scan; it can skip values FERRARI Index ID Intervals ⋉ RJ ⋊ x 1 = x 2 3 [1, 1] 4 [8, 8], [9, 9] x 1 x 2 3 1 Domain for ? o ✁ 4 3 min max Bloom ✁ 4 1 9 011000 6 ✁ 8 hash function: v mod 7 19 / 21

Choke points How to formulate interesting queries to test property path support? What are the hard things? ◮ Choosing the right build part ◮ Compare cardinalities of different property paths ◮ Compare cardinalities of property paths vs index scans We suggested some queries and evaluated our solution (against Virtuoso) 20 / 21

Conclusions We have: ◮ Support for property paths in RDF-3X ◮ Full-fledged system: query optimization, sideways information passing ◮ Choke points and queries and evaluation Future Work: ◮ Updates 21 / 21

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU - PowerPoint PPT Presentation

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU Munich Stephan Seufert, MPI Srikanta Bedathur, IIIT-Delhi June 23, 2013 1 / 21 Motivation RDF data is a graph SPARQL 1.1 has introduced the property paths

Concurrent Kleene Algebra Tobias Kapp e University College London BCTCS 2018 What is Kleene

SPARQLing Pig SPARQLing Pig Processing Linked Data with Pig Latin Stefan Hagedorn, Katja Hose,

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

Kleene Algebras: The Algebra of Regular Expressions Adam Braude University of Puget Sound May

Kleene Algebra and Kleene Algebra with Tests: An Introduction Warsaw University, December 2015

Kleene Algebra Arithmetic Operators Roland Backhouse 1st October 2002 2 Outline

Regular Languages Today we continue looking at our first class Kleene Theorem I of

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

CHS Field Solar Arrays RDF Advisory Group Presentation July 11, 2017 EP4-34 RDF Grant Contract

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia

Cloud Data Management Felix Gessert December 18, 2018, Universitt Hamburg, DBIS Group

Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of

Alternative Map and Set Implementations Mark Redekopp David Kempe 2 An imperfect set BLOOM

Content Who? Why? Learning Pyramid Millers Pyramid How? Blooms Taxonomy What?

Summarizing A 3 Way Relational Data Stream Baptiste Csernel, 3rd year PhD Student Fabrice

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

It Is Finished Christ has accomplished our redemption. The Atonement Atonement means

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU - PowerPoint PPT Presentation

SPARQLing Kleene Fast Property Paths in RDF-3X Andrey Gubichev, TU Munich Stephan Seufert, MPI Srikanta Bedathur, IIIT-Delhi June 23, 2013 1 / 21 Motivation RDF data is a graph SPARQL 1.1 has introduced the property paths

Concurrent Kleene Algebra Tobias Kapp e University College London BCTCS 2018 What is Kleene

SPARQLing Pig SPARQLing Pig Processing Linked Data with Pig Latin Stefan Hagedorn, Katja Hose,

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

&quot;Interesting&quot; Paths = Shortest Paths? &quot;Interesting&quot; Paths Shortest Paths!

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

Kleene Algebras: The Algebra of Regular Expressions Adam Braude University of Puget Sound May

Kleene Algebra and Kleene Algebra with Tests: An Introduction Warsaw University, December 2015

Kleene Algebra Arithmetic Operators Roland Backhouse 1st October 2002 2 Outline

Regular Languages Today we continue looking at our first class Kleene Theorem I of

Current Flight Paths Current Flight Paths Current approach and departure paths are all over

Plan Discrete paths as Heyting algebras Discrete paths as categories Discrete paths as quantales

CHS Field Solar Arrays RDF Advisory Group Presentation July 11, 2017 EP4-34 RDF Grant Contract

EvenDB: Optimizing Key-Value Storage for Spatial Locality Eran Gilad, Edward Bortnikov, Anastasia

Cloud Data Management Felix Gessert December 18, 2018, Universitt Hamburg, DBIS Group

Improving Performance in the Gnutella Protocol Jonathan Hess Benjamin Poon University of

Alternative Map and Set Implementations Mark Redekopp David Kempe 2 An imperfect set BLOOM

Content Who? Why? Learning Pyramid Millers Pyramid How? Blooms Taxonomy What?

Summarizing A 3 Way Relational Data Stream Baptiste Csernel, 3rd year PhD Student Fabrice

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

It Is Finished Christ has accomplished our redemption. The Atonement Atonement means

"Interesting" Paths = Shortest Paths? "Interesting" Paths Shortest Paths!