optimization of
play

Optimization of in collaboration with: Parke Godfrey and Jarek Gryz - PowerPoint PPT Presentation

Optimization of in collaboration with: Parke Godfrey and Jarek Gryz Regular Path Queries in Large Graphs Nikolay Yakovets Optimization of RPQs Scalable & e ffi cient evaluation of regular path queries Evaluation Implementation RPQs


  1. Optimization of in collaboration with: Parke Godfrey and Jarek Gryz Regular Path Queries in Large Graphs Nikolay Yakovets

  2. Optimization of RPQs Scalable & e ffi cient evaluation of regular path queries Evaluation Implementation RPQs Optimization WAVEGUIDE Plans Linked Data Costs Semantics 2

  3. Graph Query Languages ? ? ? Adjacency Query list all neighbours, find k- ? ? ? neighbourhood of a node G Pattern Matching Query ? find all sub-graphs in a database that are pattern isomorphic to a given query pattern graph Summarization Query + + summarize or operate on query results e.g. aggregation; avg(), min(), max(), etc Reachability/Path Query navigational query deals with paths in a graph test whether nodes are reachable in a graph paths of fixed or arbitrary lengths 3

  4. SPARQL - Query Language adjacency pattern matching summarization S PARQL P rotocol a nd R DF Q uery L anguage (SPARQL) ‣ declarative, based on pattern matching ‣ graph patterns describe subgraphs of the queried RDF graphs ‣ those subgraphs that match a description yield a result ny:nikolay Query: Graph: variables foaf:based_near SELECT ?pop foaf:name WHERE { dbpedia:Oakville :Oakville :population ?pop } "Nikolay dp:population Yakovets" ?pop graph pattern "182520" 4

  5. SPARQL Property Paths ‣ Part of SPARQL 1.1 W3C recommendation path ‣ Allow regular expressions to describe paths between nodes: p 1 | p 2 p 1 /p 2 disjunction concatenation p ? ˆ p zero or one inverted ! iri negated p + Kleene star one or more p ∗ ‣ Useful in many application domains: social networks , biological , encyclopedic ‣ Convenient declarative mechanism to answer queries without prior knowledge of underlying data paths 5

  6. SPARQL Property Paths ‣ Example: DBPedia snippet, part of a LOD dataset ‣ Two datasets English and Japanese interlinked with OWL terms en: Gundam G: en: Tokyo en: Japan :isLocatedIn :sameAs en: Daiba :sameAs jp: ガンダム :isLocatedIn :isLocatedIn jp: 本州 jp: ⽇旦本 jp: 関東地⽅斺 jp: 東京 jp: お台場 select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} ‣ Query: Where is Gundam statue located? ‣ Solution: Need to resolve equivalent data entities ( :sameAs ) and traverse spacial hierarchy ( :isLocatedIn ) to fully utilize richer spacial information in Japanese dataset 6

  7. Formal Evaluation ‣ Property Paths in SPARQL are essentially Regular Path Queries (RPQs) ‣ RPQs have been well-studied before the advent of RDF and SPARQL regular language ‣ Formal def.: Q = ( x, L ( r ) , y ) free variables ‣ Semantics of Evaluation: [[ Q ]] G - an evaluation of Q over graph database G a collection ( s, t ) such that ∃ a path p in G between s and t such that p conforms to regex r aka. solution counting ∀ a bag (allow duplicates) path-induced string λ ( p ) ∈ L ( r ) path is simple or arbitrary a set (discard duplicates) aka. existential semantics ∃ 7

  8. Paths in SPARQL regular ∀ simple ∀ ∃ simple Counting procedures are # P- Evaluation of simple paths is complete on general graphs NP-complete on general (Arenas et al., Losemann et al., 2013 ) graphs (Mendelzon et al., 1987 ) Tractable on DAGs, or restricted Tractable on DAGs, or restricted compatible regex compatible regex regular ∃ SPARQL (W3C proposal for RDF query language) support of RPQs through SPARQL1.1 property paths 8

  9. RPQ Evaluation [[ Q ]] G - an evaluation of Q over graph database G + considering existential semantics on regular paths FA-based 𝝱 -RA-based Use finite state machines in Use relational algebra evaluation extended with alpha- Mendelzon et al., 1987 operator which computes transitive closure Losemann et al., 2013 9

  10. FA-based Evaluation select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} 3. Construct a product P of 1. From a parse tree, construct a query ε -NFA : query and graph automata. 4. Check P for reachable accepting states to produce an answer to a query 2. Minimize the query automaton, if necessary : 10

  11. 𝝱 -RA-based Evaluation select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} Have SPRJU-RA extended with 𝝱 𝝱 computes the least-fixpoint: 𝝱 computes the transitive closure of a given relation 1. From a parse tree, construct an RA tree: Q parse tree Q RA tree favourite RDBMS 11

  12. Comparing Approaches Th: FA and are 𝝱 -RA incomparable plan spaces Pf.: translation into Datalog examine induced sequence of joins 𝝱 -RA FA e.g. (?x, (a/b)+, ?y) P FA =((((a ⋈ b) ⋈ a) ⋈ b) ⋈ a).. P aRA =(a ⋈ b) ⋈ (a ⋈ b) ⋈ (a ⋈ b).. P FA P aRA P aRA ∉ FA P FA ∉ 𝝱 -RA 𝝱 -RA ⊈ FA FA ⊈ 𝝱 -RA 12

  13. WAVEGUIDE Goal: Need to consider both FA and 𝝱 -RA plan spaces Search driven by a waveplan which guides a number of wavefronts which iteratively explore the graph guided iterative waveplan graph search P ab + P ab + W W ab · W ab · W ab + : W ab + : W ab · W ab · U W W ab : W ab : · b · b a · a · U W 13

  14. search wavefronts accepting states seed W l a wavefront wavefront labels • an expanding search unit label edge labels • guided by a wavefront automaton W l = ( l, S, q 0 , Q, δ , E, L, F ) W l = ( l, S, q 0 , Q, δ , E, L, F ) • labeled with regex it evaluates starting state S • seeded with set of states transition function δ a transition function appending or prepending • appending and prepending transitions δ : Q × (( E ∪ L ) × {· , ·} ∪ { ε } ) → 2 Q δ : Q × (( E ∪ L ) × {· , ·} ∪ { ε } ) → 2 Q • transitions over graphs and views graph edges pipeline or wavefront labels S a seed starting state W l • edge incoming into accepting state in W l W l q 0 q 0 • defined with an RPQ, a wavefront or by construction S • can be universal , any node in a graph seed 14

  15. a waveplan a waveplan P Q Q • produces an answer to a given query • an ordered set of wavefront automata • order defines which labels can be used in the seed and transitions over a view • higher wavefronts can use lower wavefronts as their labels and seeds, but not vice-versa • query answered by the highest wavefront P ab + P ab + set of wavefronts ordering < P ab + < P ab + W ab · W ab · W ab + : W ab + : e.g., query (?x, (a/b)+, ?y) W ab · W ab · W ab • produces an answer for (a/b) regex U W ab W ab + • uses as a view to compute W ab : W ab : (a/b)+ · b · b a · a · U 15

  16. WAVEGUIDE - iterative search Exploration procedure based on semi- naive evaluation Intermediate search results kept in the search cache cache keeps track of end-nodes and corresponding states in a plan • seed specifies node pairs to start from loop while discover new tuples • crank advances simultaneously in a graph and automaton • reduce prunes the delta, handles unbounded computation • cache materializes according to the specified strategy • extract produces answers 16

  17. challenges! vs. other e ffi cient? optimal? techniques? enumerator size? plan space optimizations cost model analysis? enabled by WAVEGUIDE? 17

  18. WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) 18

  19. WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) a < P ( abc )+ < P ( abc )+ P ( abc )+ P ( abc )+ a · a · W ( abc )+ : W ( abc )+ : b b · b · a c a · a · c · c · start start U 19

  20. WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) α < P ( abc )+ < P ( abc )+ P ( abc )+ P ( abc )+ W abc · W abc · . . / o = s / o = s W ( abc )+ : W ( abc )+ : W abc · W abc · U . . / o = s / o = s W abc : W abc : b · b · a · a · c · c · σ p = a σ p = a σ p = b σ p = b σ p = c σ p = c U T T T T T 20

  21. WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) a · a · P ( abc )+ P ( abc )+ < P ( abc )+ < P ( abc )+ W ( abc )+ : W ( abc )+ : W bc · W bc · a · a · U W bc : W bc : · b · b c · c · U 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend