SLIDE 1
The Tractability Frontier of Well-designed SPARQL Queries
Miguel Romero (University of Oxford)
ACM PODS 2018, 12 June, Houston-USA
SLIDE 2 Well-designed SPARQL
SPARQL: standard query language for RDF graphs Well-designed SPARQL (Perez, Arenas, Gutierrez 2006)
- Evaluation is coNP-complete (PSPACE-complete for SPARQL)
This work:
- Well-designed SPARQL restricted to AND, OPTIONAL, UNION
SLIDE 3 Tractable evaluation
Evaluating well-designed SPARQL becomes tractable for some classes
- Most general condition: local tractability
(Letelier, Perez, Pichler, Skritek 2013; Barceló, Pichler, Skritek 2015)
Main Question: Which classes of well-designed SPARQL queries
can be evaluated in polynomial time? Our Contribution: The tractable classes are precisely those of bounded domination width
SLIDE 4
Well-designed Pattern Trees/Forests
Well-designed SPARQL queries
with AND, OPTIONAL, UNION Well-designed Pattern Forests = Well-designed SPARQL queries
with AND, OPTIONAL Well-designed Pattern Trees =
(Letelier, Perez, Pichler, Skritek 2013)
In this talk: We focus on (well-designed) pattern forests
SLIDE 5
Basics of RDF graphs and pattern trees/forests
SLIDE 6 RDF Graphs
Fix: set of identifiers I, set of variables V RDF Graph = finite set of triples from I x I x I (s, p, o) s
SLIDE 7 Fix: set of identifiers I, set of variables V Answer of a CQ q(X) over an RDF graph G: q(G) = {h|X : h is a homomorphism from q to G} Conjunctive query (CQ) =
AND of triples from (I U V) x (I U V) x (I U V) + free variables
Conjunctive Queries (CQs) over RDF graphs
q(?y, ?z) = (?x, p, o) AND (?y, ?x, a) AND (o, ?z, ?y) AND (p, ?w, ?w)
- Full CQ = All variables are free (no projection)
SLIDE 8 Well-designed Pattern Tree =
- For each variable ?x, the set {t in T | ?x in pat(t)} is connected in T
Well-designed Pattern Trees
(T, pat), where T is rooted tree and pat is a function
mapping each node of T to a full CQ such that
SLIDE 9
G
Subtree T’ of P = subtree of T containing the root
T’
pat(T’) = AND of all the CQs in {pat(t) | t in T’} P=(T, pat)
Well-designed Pattern Trees: semantics
SLIDE 10
G
Subtree T’ of P = subtree of T containing the root
T’
P=(T, pat) Child of T’= node not in T’ whose parent is in T’
Well-designed Pattern Trees: semantics
pat(T’) = AND of all the CQs in {pat(t) | t in T’}
SLIDE 11 G
T’
P=(T, pat) h is in P(G) iff
there is a subtree T’ such that
- h is a homomorphism from pat(T’) to G
- for each child t of T’, h cannot be extended to pat(T’) AND pat(t)
h t
pat(t)
g
Well-designed Pattern Trees: semantics
SLIDE 12
Well-designed Pattern Forests
Well-designed Pattern Forest = Union of well-designed pattern trees Answer of F={P1,…,Pm} over RDF graph G: F(G) = P1(G) U … U Pm(G)
SLIDE 13
The Evaluation Problem
EVAL(C) Let C be a class of well-designed pattern forests
Instance: well-designed pattern forest F in C, RDF graph G, mapping h Question: does h belong to F(G)?
SLIDE 14
Domination width and main theorem
SLIDE 15 Main Theorem
Assume FPT=W[1]. Let C be a recursively enumerable class of
well-designed pattern forests. Then the following are equivalent:
- EVAL(C) can be solved in polynomial time
- C has bounded domination width
Theorem:
Proof based on the corresponding characterisation for conjunctive queries
(Dalmau, Kolaitis, Vardi 2002; Grohe 2003)
Treewidth of a CQ = measure of tree-likeness ctw(q(X)):= treewidth of the core of q(X)
SLIDE 16 The case of Conjunctive Queries
Assume FPT=W[1]. Let C be a recursively enumerable class of
conjunctive queries of bounded arity. Then the following are equivalent:
- CQ-EVAL(C) can be solved in polynomial time
- C has bounded ctw
Theorem (Dalmau, Kolaitis, Vardi 2002; Grohe 2003) Tractability part via the existential k-pebble game (Kolaitis, Vardi 1995)
- Relaxation for checking existence of homomorphisms (complete, but not correct)
- Existence of a winning strategy for the Duplicator can be done in poly time
- Always correct for conjunctive queries q with ctw(q) < k
Hardness part via a reduction from the clique problem (W[1]-hardness)
SLIDE 17 The case of Conjunctive Queries
Assume FPT=W[1]. Let C be a recursively enumerable class of
conjunctive queries of bounded arity. Then the following are equivalent:
- CQ-EVAL(C) can be solved in polynomial time
- C has bounded ctw
Theorem (Dalmau, Kolaitis, Vardi 2002; Grohe 2003) ctw(Q(X)) =
minimum k such that for every qi(X), there is qj(X) such that
ctw(qj(X)) is at most k and
qj(X) can be mapped to qi(X) via a homomorphism Can be extended to unions of CQs (UCQs) Q(X)={q1(X),…qm(X)}
SLIDE 18
Domination width G
T’
P=(T, pat) h in P(G) ?
h
Is h a “potential solution”?
can be computed in poly time
SLIDE 19
Domination width G
T’
P=(T, pat) h in P(G) ?
h
…
X:= vars(T’) ti t1 tn
…
h is not in P(G) iff h is in QT’(G) UCQ QT’(X) := {qt1(X),…,qtn(X)} CQ qti(X):= (pat(T’) AND pat(ti))(X)
SLIDE 20
Domination width G
T’
P=(T, pat) h in P(G) ?
h
…
X:= vars(T’) ti t1 tn
…
h is not in P(G) iff h is in QT’(G) UCQ QT’(X) := {qt1(X),…,qtn(X)} CQ qti(X):= (pat(T’) AND pat(ti))(X) dw(P) := maximum ctw(QT’(X)), over all subtree T’
SLIDE 21
Domination width G
T’
P=(T, pat) ti CQ qti(X):= (pat(T’) AND pat(ti))(X) tj qti(X) qtj(X) dw(P) < k dw(P) := maximum ctw(QT’(X)), over all subtree T’ ctw(qtj(X))<k
SLIDE 22 Domination width G
T’
P=(T, pat) ti CQ qti(X):= (pat(T’) AND pat(ti))(X) dw(P) < k h in P(G) ?
h
dw(P) := maximum ctw(QT’(X)), over all subtree T’
SLIDE 23
Domination width G
h in P(G) ?
h
…. ….
h
T’ T’’ F={P1,…,Pm}
T’ AND T’’
rename new variables
SLIDE 24 Domination width G
h in P(G) ?
h
X:= vars(T’)=vars(T’’)=dom(h)
…. ….
h
T’ T’’ h is not in F(G) iff h is in Q{T’,T’’}(X) Q{T’,T’’}(X):={pat(T’) AND pat(T’’) + choice of children} dw(F) = maximum ctw(QS(X)), over all set S of subtrees
- ver the same set of variables X and satisfying certain closure property
F={P1,…,Pm} (and renaming)
SLIDE 25 Main Theorem
Assume FPT=W[1]. Let C be a recursively enumerable class of
well-designed pattern forests. Then the following are equivalent:
- EVAL(C) can be solved in polynomial time
- C has bounded domination width
Theorem: Tractability part:
Application of the existential k-pebble game
as for the case of conjunctive queries (Dalmau, Kolaitis, Vardi 2002) Hardness part:
Reduction from clique (Grohe 2003)
+ some basic properties of pattern forests with large dw
SLIDE 26
The case of UNION-free queries (pattern trees)
SLIDE 27
Branch Treewidth
P=(T, pat)
pat(t)
Branch Bt of t
t r
SLIDE 28
Branch Treewidth
P=(T, pat)
pat(t)
t
Branch Bt of t CQ bt(X) := (pat(Bt) AND pat(t))(X) X:= vars(Bt)
bw(P) := maximum ctw(bt(X)) over all node t of T
Proposition:
For every well-designed pattern tree P, we have dw(P)=bw(P)
r
SLIDE 29 Final Remarks
Characterisation of tractable classes of pattern forests
- Dichotomy: A class C is either tractable or W[1]-hard
(well-designed SPARQL restricted to AND, OPTIONAL, UNION)
- Dichotomy fails when we add FILTER (CQs with inequalities) and
SELECT (Kroll, Pichler, Skritek 2016) Open problem: Characterise fixed-parameter tractable classes of queries with SELECT
f(|q|) |G|
c
(Recent characterisation for simple queries, Mengel, Skritek 2018)
Thank you!
The {AND, OPTIONAL, UNION} fragment is maximal with this property: