SLIDE 1 Graph Database Querying vs String Constraints
Pablo Barcel´
- Millennium Institute for Foundational Research on Data &
DCC, University of Chile
SLIDE 2
INTRODUCTION
SLIDE 3 Graph DBs and applications
- Graph DBs are crucial when topology is as important as data itself
SLIDE 4 Graph DBs and applications
- Graph DBs are crucial when topology is as important as data itself
- They gained renewed interest in last years due to trendy applications:
◮ Web (semantic) ◮ Social networks ◮ Chemical and biological networks ◮ Software bug localization ◮ . . .
SLIDE 5 Graph DBs and applications
- Graph DBs are crucial when topology is as important as data itself
- They gained renewed interest in last years due to trendy applications:
◮ Web (semantic) ◮ Social networks ◮ Chemical and biological networks ◮ Software bug localization ◮ . . .
- They are an active area of research and industrial application:
◮ Amazon Neptune, Neo4J, Facebook GraphQL, Google Knowledge
Graph, Oracle Graph DBMS, RDF Virtuoso, Apache Jena, ...
SLIDE 6 Features of the query languages we study
Languages we study express essential features for querying graph DBs
◮ Navigation: Recursively traverse the edges of the graph ◮ Pattern matching: Check if a pattern appears in the graph DB ◮ Path comparisons: Based on relations over words
SLIDE 7 Features of the query languages we study
Languages we study express essential features for querying graph DBs
◮ Navigation: Recursively traverse the edges of the graph ◮ Pattern matching: Check if a pattern appears in the graph DB ◮ Path comparisons: Based on relations over words
Some of these features form the basis of recently formalized graph DB query languages:
◮ LDBC Proposal: G-CORE: A Core for Future Graph Query
Languages (SIGMOD’18)
◮ Neo4J Proposal: Cypher: An Evolving Query Language for Property
Graphs (SIGMOD’18)
◮ Survey: Foundations of Modern Query Languages for Graph
Databases (ACM Comput. Surv.’17)
SLIDE 8
Problems we study:
Expressiveness: What can be said in a query language L?
SLIDE 9 Problems we study:
Expressiveness: What can be said in a query language L? Complexity of evaluation: We study the problem: Problem: Eval(L) Input: A graph DB G, a tuple ¯ t of objects, an L-query Q. Question: Is ¯ t ∈ Q(G)?
◮ Combined complexity: Both G and Q are part of the input. ◮ Data complexity: Only G is part of the input and Q is fixed.
SLIDE 10
THE GRAPH DATA MODEL
SLIDE 11 Graph data model
Different apps have given rise to a myriad of different graph DB models
errez (2008)))
SLIDE 12 Graph data model
Different apps have given rise to a myriad of different graph DB models
errez (2008))) We work with a simple graph data model: Finite, directed, edge labeled graphs
SLIDE 13 Graph data model
Different apps have given rise to a myriad of different graph DB models
errez (2008))) We work with a simple graph data model: Finite, directed, edge labeled graphs Despite the simplicity of the model:
◮ It is flexible enough to accomodate many other more complex
models and express interesting practical scenarios
◮ The most fundamental theoretical issues related to querying graph
DBs appear in full force for it
SLIDE 14 Graph databases
Definition
A graph DB G over finite alphabet Σ is a pair:
(V , E)
set of edges of the form v1
a
− → v2 finite set of node ids (v1, v2 ∈ V , a ∈ Σ)
SLIDE 15 Graph databases
Definition
A graph DB G over finite alphabet Σ is a pair:
(V , E)
set of edges of the form v1
a
− → v2 finite set of node ids (v1, v2 ∈ V , a ∈ Σ)
- A path in G is a sequence of the form:
ρ = v1
a1
− → v2
a2
− → v3 · · · vk
ak
− → vk+1
- The label of ρ, denoted λ(ρ), is the string a1a2 · · · ak−1 ∈ Σ∗
SLIDE 16 Graph DBs: Example
A graph DB representation of a fragment of DBLP
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95
SLIDE 17 Graph DBs: Example
A path in this graph DB
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal inPods:89 series Pods:Ullman89 creator partOf Pods:Vardi95 conf:pods partOf IPL:LibkinW95 inPods:95 Pods:Libkin95 journal:IPL creator series
SLIDE 18 Graph DBs: Example
The label of such path
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal inPods:89 series Pods:Ullman89 creator partOf Pods:Vardi95 conf:pods partOf IPL:LibkinW95 inPods:95 Pods:Libkin95 journal:IPL creator series
SLIDE 19 Graph DBs vs NFAs
Important: Graph DBs can be naturally seen as NFAs.
◮ Nodes are states ◮ Edges u a
− → v are transitions
◮ There are no initial and final states
SLIDE 20
BASIC LANGUAGES FOR GRAPH DBs:
Tractability for a big class of languages
SLIDE 21 Regular path queries
Basic building block for graph queries: Regular path queries (RPQs)
◮ First studied by Mendelzon and Wood (1989) ◮ RPQs = Regular expressions over Σ ◮ Evaluation L(G) of RPQ L on graph DB G = (V , E):
- Pairs of nodes (v, v ′) ∈ V linked by path labeled in L
SLIDE 22 RPQs with inverse
More often studied its extension with inverses, or 2RPQs
◮ First studied by Calvanese, de Giacomo, Lenzerini, Vardi (2000) ◮ 2RPQs = RPQs over Σ±, where:
- Σ± = Σ extended with the inverse a− of each a ∈ Σ
SLIDE 23 RPQs with inverse
More often studied its extension with inverses, or 2RPQs
◮ First studied by Calvanese, de Giacomo, Lenzerini, Vardi (2000) ◮ 2RPQs = RPQs over Σ±, where:
- Σ± = Σ extended with the inverse a− of each a ∈ Σ
Evaluation L(G) of 2RPQ L over graph DB G = (V , E).
◮ Pairs of nodes in G that satisfy RPQ L(G±), where
- G± obtained from G by adding u
a−
− → v for each v
a
− → u ∈ E
SLIDE 24 Example of 2RPQ
The 2RPQ
- creator− ·
- (partOf · series) ∪ journal
- computes (a, v) s.t. author a published in conference or journal v
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95
SLIDE 25 Example of 2RPQ
The 2RPQ
- creator− ·
- (partOf · series) ∪ journal
- computes (a, v) s.t. author a published in conference or journal v
inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a series journal partOf creator creator creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal series partOf creator conf:pods :Ronald Fagin
a v
:Moshe Y. Vardi Pods:Ullman89 creator inPods:89 series partOf series IPL:LibkinW95 inPods:95 partOf creator Pods:Vardi95 Pods:Libkin95 journal:IPL
SLIDE 26 Example of 2RPQ
Example: The 2RPQ
- creator− ·
- (partOf · series) ∪ journal
- computes (a, v) s.t. author a published in conference or journal v
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods Jacm:HopcroftT74 :Jeffrey Ullman conf:focs Focs:HopU67a series series partOf partOf creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal journal creator :Robert E Tarjan journal:jacm
v a
:Moshe Y. Vardi Pods:Ullman89 creator inPods:89 series partOf series IPL:LibkinW95 inPods:95 Pods:Vardi95 partOf creator journal:IPL Pods:Libkin95
SLIDE 27
2RPQ evaluation
Problem: Eval(2RPQ) Input: A graph DB G, nodes v, v ′ in G, a 2RPQ L Question: Is (v, v ′) ∈ L(G)?
SLIDE 28
2RPQ evaluation
Problem: Eval(2RPQ) Input: A graph DB G, nodes v, v ′ in G, a 2RPQ L Question: Is (v, v ′) ∈ L(G)? It boils down to: Problem: RegularPath Input: A graph DB G, nodes v, v ′ in G, a regular expression L over Σ± Question: Is there a path ρ from v to v ′ in G± such that λ(ρ) ∈ L?
SLIDE 29
Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|)
SLIDE 30 Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|) Proof idea:
◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by
setting v and v ′ as initial and final states, respectively
◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)
SLIDE 31 Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|) Proof idea:
◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by
setting v and v ′ as initial and final states, respectively
◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)
SLIDE 32 Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|) Proof idea:
◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by
setting v and v ′ as initial and final states, respectively
◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)
SLIDE 33 Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|) Proof idea:
◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by
setting v and v ′ as initial and final states, respectively
◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)
SLIDE 34 Complexity of finding regular paths
Theorem (Folklore)
RegularPath can be solved in time O(|G| · |L|) Proof idea:
◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by
setting v and v ′ as initial and final states, respectively
◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)
SLIDE 35
Complexity of 2RPQ evaluation
Corollary
Eval(2RPQ) can be solved in linear time O(|G| · |L|)
SLIDE 36 Data complexity of 2RPQ evaluation
Data complexity of 2RPQs belongs to a parallelizable class:
Proposition
Let L be a fixed 2RPQ. There is NLogspace procedure that computes L(G) for each G Proof idea:
◮ Construct (G±, v, v ′) from G in Logspace ◮ Check nonemptiness for (G±, v, v ′) × A in NLogspace
SLIDE 37 Conjunctive regular path queries (CRPQs)
RPQs still do not express arbitrary patterns over graph DBs.
◮ To do this we need to close RPQs under joins and projection
SLIDE 38 Conjunctive regular path queries (CRPQs)
RPQs still do not express arbitrary patterns over graph DBs.
◮ To do this we need to close RPQs under joins and projection
This is the class of conjunctive regular path queries (CRPQs).
◮ Extended with inverses as C2RPQs in [Calvanese et al. (2000)]
SLIDE 39 Example of C2RPQ
The C2RPQ
Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper
:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95
SLIDE 40 Example of C2RPQ
The C2RPQ
Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper
creator inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a partOf creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator partOf series creator conf:pods :Ronald Fagin
z y u x
:Moshe Y. Vardi series journal series partOf Pods:Ullman89 creator inPods:89 Pods:Vardi95 inPods:95 partOf IPL:LibkinW95 Pods:Libkin95 series journal:IPL creator
SLIDE 41 Example of C2RPQ
The C2RPQ
Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)
computes pairs (a1, a2) that are coauthors of a conference paper
inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series journal partOf creator creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator partOf series creator :Ronald Fagin
a1 a2
creator inPods:89 Pods:Ullman89 series partOf IPL:LibkinW95 series journal:IPL creator partOf Pods:Libkin95 Pods:Vardi95 inPods:95
SLIDE 42 C2RPQ: Formal definition
C2RPQ over Σ: Rule of the form Ans(¯ z) ← (x1, L1, y1), . . . , (xm, Lm, ym), such that
◮ the xi, yi are variables, ◮ each Li is a 2RPQ over Σ, ◮ the output ¯
z has some variables among the xi, yi’s
SLIDE 43 C2RPQ: Formal definition
C2RPQ over Σ: Rule of the form Ans(¯ z) ← (x1, L1, y1), . . . , (xm, Lm, ym), such that
◮ the xi, yi are variables, ◮ each Li is a 2RPQ over Σ, ◮ the output ¯
z has some variables among the xi, yi’s CRPQ: C2RPQ without inverse
SLIDE 44
Complexity of evaluation of C2RPQs
Increase in expressiveness from RPQs has a cost in evaluation
Proposition
Eval(C2RPQ) is NP-complete, even if restricted to CRPQs
SLIDE 45
Complexity of evaluation of C2RPQs
Increase in expressiveness from RPQs has a cost in evaluation
Proposition
Eval(C2RPQ) is NP-complete, even if restricted to CRPQs But adding conjunctions is free in data complexity
Proposition
Eval(C2RPQ) can be solved in NLogspace in data complexity
SLIDE 46
PATH QUERIES:
The power of comparisons
SLIDE 47 CRPQs and path queries
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and ◮ to define complex relationships among labels of paths
SLIDE 48 CRPQs and path queries
CRPQs fall short of expressive power for applications that need:
◮ to include paths in the output of a query, and ◮ to define complex relationships among labels of paths
Examples:
◮ Semantic Web queries:
- establish semantic associations among paths
◮ Biological applications:
- compare paths based on similarity
◮ Route-finding applications:
- compare paths based on length or number of occurences of labels
◮ Data provenance and semantic search over the Web:
- require returning paths to the user
SLIDE 49 Path comparisons
We use a set S of relations on words.
◮ Example: S may contain
- Unary relations: Regular, context-free languages, etc.
- Binary relations: prefix, equal length, subsequence, etc.
◮ Comparisons among labels of paths = Pertenence to some S ∈ S
- Example: w1 is a substring of w2
◮ We assume S contains all regular languages
SLIDE 50 Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, L1, y1), . . . , (xm, Lm, ym),
◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯
πj wrt Sj ∈ S
πj a tuple of path variables among the πi’s,
◮ projecting some of πi’s as a tuple ¯
χ in the output
SLIDE 51 Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),
◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯
πj wrt Sj ∈ S
πj a tuple of path variables among the πi’s,
◮ projecting some of πi’s as a tuple ¯
χ in the output
SLIDE 52 Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),
1≤j≤t Sj(¯
πj)
◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯
πj wrt Sj ∈ S
πj a tuple of path variables among the πi’s,
◮ projecting some of πi’s as a tuple ¯
χ in the output
SLIDE 53 Extended CRPQs
The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),
1≤j≤t Sj(¯
πj)
◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯
πj wrt Sj ∈ S
πj a tuple of path variables among the πi’s,
◮ projecting some of πi’s as a tuple ¯
χ in the output
SLIDE 54 Extended CRPQs and our requirements
ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),
1≤j≤t Sj(¯
πj)
SLIDE 55 Extended CRPQs and our requirements
ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),
1≤j≤t Sj(¯
πj)
◮ They allow to export paths in the output ◮ They allow to compare labels of paths with relations Sj ∈ S
SLIDE 56 Extended CRPQs and our requirements
ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),
1≤j≤t Sj(¯
πj)
◮ They allow to export paths in the output ◮ They allow to compare labels of paths with relations Sj ∈ S
SLIDE 57 Considerations about ECRPQ(S)
- ECRPQ(S) extends the class of CRPQs
◮ Ans(¯
z) ←
i(xi, Li, yi) = Ans(¯
z) ←
i(xi, πi, yi), Li(πi)
- Expressiveness and complexity of ECRPQ(S):
◮ Depends on the class S
- We study two such classes with roots in formal language theory:
◮ Regular relations [Elgot, Mezei (1965)] ◮ Rational relations [Nivat (1968)]
SLIDE 58
COMPARING PATHS WITH REGULAR RELATIONS:
Preserving tractable data complexity
SLIDE 59 Introduction
- Regular relations: Regular languages for relations of any arity
◮ REG: Class of regular relations
ECRPQ(REG): Reasonable expressiveness and complexity
SLIDE 60
Regular relations
n-ary regular relation: Set of n-tuples (w1, . . . , wn) of strings accepted by synchronous automaton over Σn
SLIDE 61 Regular relations
n-ary regular relation: Set of n-tuples (w1, . . . , wn) of strings accepted by synchronous automaton over Σn
◮ The input strings are written in the n-tapes ◮ Shorter strings are padded with symbol ⊥ ◮ At each step:
The automaton simultaneously reads next symbol on each tape
SLIDE 62
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a w3 = b b · · · . . . . . . wn = a b b · · · a c
SLIDE 63
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥
SLIDE 64
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 65
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 66
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 67
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 68
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 69
Synchronous automata
w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑
SLIDE 70 Examples of regular relations
- All regular languages
- The prefix relation defined by:
a∈Σ
(a, a) ∗ ·
a∈Σ
(a, ⊥) ∗
- The equal length relation defined by:
a,b∈Σ
(a, b) ∗
- Pairs of strings at edit distance at most k, for fixed k ≥ 0
SLIDE 71 Examples of regular relations
- All regular languages
- The prefix relation defined by:
a∈Σ
(a, a) ∗ ·
a∈Σ
(a, ⊥) ∗
- The equal length relation defined by:
a,b∈Σ
(a, b) ∗
- Pairs of strings at edit distance at most k, for fixed k ≥ 0
Proposition
The subsequence, subword and suffix relations are not regular
SLIDE 72 ECRPQ(REG)
ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←
i(xi, πi, yi), j Sj(¯
πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)]
SLIDE 73 ECRPQ(REG)
ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←
i(xi, πi, yi), j Sj(¯
πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)] Example: The ECRPQ(REG) query Ans(x, y) ← (x, π1, z), (z, π2, y), a∗(π1), b∗(π2), equal length(π1, π2) computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}
SLIDE 74 ECRPQ(REG)
ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←
i(xi, πi, yi), j Sj(¯
πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)] Example: The ECRPQ(REG) query Ans(x, y) ← (x, π1, z), (z, π2, y), a∗(π1), b∗(π2), equal length(π1, π2) computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}
Corollary
ECRPQ(REG) properly extends the class of CRPQs
SLIDE 75 Complexity of evaluation of ECRPQ(REG)
- Extending CRPQs with regular relations is free in data complexity
- Combined complexity is that of FO over relational databases
Theorem (B., Libkin, Lin, Wood (2012))
◮ Eval(ECPRQ(REG)) is Pspace-complete ◮ Eval(ECPRQ(REG)) is in NLogspace in data complexity
SLIDE 76 Complexity of evaluation of ECRPQ(REG)
- Extending CRPQs with regular relations is free in data complexity
- Combined complexity is that of FO over relational databases
Theorem (B., Libkin, Lin, Wood (2012))
◮ Eval(ECPRQ(REG)) is Pspace-complete ◮ Eval(ECPRQ(REG)) is in NLogspace in data complexity
Proof idea:
◮ Convert into RPQ evaluation over Gm, for m = size of ECRPQ ◮ For data complexity m is fixed
SLIDE 77 Expressiveness of ECRPQ(REG)
Understanding the expressive power of ECRPQ(REG) is difficult.
Proposition
Let L be a language of words. TFAE:
◮ L is expressible by a binary ECRPQ(REG) formula ◮ L is definable by a word equation with constraints in REG
SLIDE 78
COMPARING PATHS WITH RATIONAL RELATIONS:
The struggle for decidability and efficiency
SLIDE 79 Introduction
ECRPQ(REG) queries are still short of expressive power.
◮ RDF or biological networks:
- Compare strings based on subsequence and subword relations
◮ These relations are rational: Accepted by asynchronous automata
- RAT: Class of rational relations
Bottomline:
◮ ECRPQ(RAT) evaluation:
- Undecidable or very high complexity
◮ Restricting the syntactic shape of queries yields tractability
SLIDE 80
Rational relations
n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.
SLIDE 81 Rational relations
n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes ◮ At each step:
The automaton enters a new state and move some tape heads
SLIDE 82 Rational relations
n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.
◮ The input strings are written in the n-tapes ◮ At each step:
The automaton enters a new state and move some tape heads n-ary rational relation: Described by regular expression over alphabet (Σ ∪ {ǫ})n
SLIDE 83 Examples of rational relations
- All regular relations
- The subsequence relation ss defined by
a∈Σ
(a, ǫ) ∗
b∈Σ
(b, b) ∗ ·
a∈Σ
(a, ǫ) ∗
- The subword relation sw defined by
a∈Σ
(a, ǫ) ∗ ·
b∈Σ
(b, b) ∗ ·
a∈Σ
(a, ǫ) ∗
SLIDE 84 Examples of rational relations
- All regular relations
- The subsequence relation ss defined by
a∈Σ
(a, ǫ) ∗
b∈Σ
(b, b) ∗ ·
a∈Σ
(a, ǫ) ∗
- The subword relation sw defined by
a∈Σ
(a, ǫ) ∗ ·
b∈Σ
(b, b) ∗ ·
a∈Σ
(a, ǫ) ∗
Proposition
The set of pairs (w1, w2) such that w1 is the reversal of w2 is not rational.
SLIDE 85 ECRPQ(RAT)
ECRPQ(RAT): Class of queries of the form Ans(¯ z, ¯ χ) ←
i(xi, πi, yi), j Sj(¯
πj), where each Sj is a rational relation [B., Figueira, Libkin (2012)] Example: The ECRPQ(RAT) query Ans(x, y) ← (x, π1, z), (y, π2, w), π1 ss π2 computes x, y that are origins of paths ρ1 and ρ2 such that:
◮ λ(ρ1) is a subsequence of λ(ρ2)
SLIDE 86 Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?
SLIDE 87 Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem (B., Figueira, Libkin (2012))
Eval(ECRPQ(REG ∪{sw})) is undecidable (even in data complexity)
SLIDE 88 Evaluation of ECRPQ(RAT) queries
Evaluation of queries in ECRPQ(RAT) is undecidable, but:
◮ True if we allow only practically motivated rational relations?
Adding subword relation to ECRPQ(REG) leads to undecidability:
Theorem (B., Figueira, Libkin (2012))
Eval(ECRPQ(REG ∪{sw})) is undecidable (even in data complexity) Adding subword to CRPQ leads to intractability in data complexity:
Theorem (B., Mu˜
noz (2014)) Eval(CRPQ(sw)) is PSPACE-complete in data complexity
◮ But Eval(CRPQ(suff)) is in NLogspace in data complexity
SLIDE 89 Consequences for word equations
Observation 1: Pspace upper bound for CRPQ(sw)
◮ Uses Pspace procedure for word equations with regular expressions
SLIDE 90 Consequences for word equations
Observation 1: Pspace upper bound for CRPQ(sw)
◮ Uses Pspace procedure for word equations with regular expressions
Observation 2: There exists a fixed word equation e such that
◮ solving e under a single constraint in REG is undecidable ◮ solving e with regular language constraints is Pspace-complete
SLIDE 91 Evaluation of ECRPQ(RAT) queries
Adding subsequence to ECRPQ preserves decidability at a very high cost:
Theorem (B., Figueira, Libkin (2012))
Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.
◮ This holds even in data complexity.
SLIDE 92 Evaluation of ECRPQ(RAT) queries
Adding subsequence to ECRPQ preserves decidability at a very high cost:
Theorem (B., Figueira, Libkin (2012))
Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.
◮ This holds even in data complexity.
Adding subsequence to CRPQ leads to intractability in data complexity:
Theorem (B., Mu˜
noz (2014)) Eval(CRPQ(ss)) is NP-complete in data complexity
SLIDE 93 Evaluation of ECRPQ(RAT) queries
Adding subsequence to ECRPQ preserves decidability at a very high cost:
Theorem (B., Figueira, Libkin (2012))
Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.
◮ This holds even in data complexity.
Adding subsequence to CRPQ leads to intractability in data complexity:
Theorem (B., Mu˜
noz (2014)) Eval(CRPQ(ss)) is NP-complete in data complexity Observation 3: Word equations + ss undecidable [Halfon et al (2017)]
◮ Is this also the case for Eval(CRPQ(ss ∪ sw))?
SLIDE 94 Acyclic CRPQ(RAT) queries
Acyclic CRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form
Ans(¯ z) ←
(xi, πi, yi), Li(πi),
Sj(πj1, πj2), where the graph on {1, . . . , k} defined by edges (πj1, πj2) is acyclic
SLIDE 95 Acyclic CRPQ(RAT) queries
Acyclic CRPQ(RAT) queries yield tractable data complexity.
◮ Queries of the form
Ans(¯ z) ←
(xi, πi, yi), Li(πi),
Sj(πj1, πj2), where the graph on {1, . . . , k} defined by edges (πj1, πj2) is acyclic Acyclic ECRPQ(RAT) is not more expensive than ECRPQ(REG):
Theorem (B., Figueira, Libkin (2012))
◮ Evaluation of acyclic ECRPQ(RAT) queries is Pspace-complete ◮ It is in NLogspace in data complexity
SLIDE 96
STRING SOLVING:
Applying previous ideas
SLIDE 97 The problem we study
We study satisfiability for conjunctions of:
◮ Atomic relational constraints:
y = x1 · · · xn | R(x, y)
◮ Boolean combinations of regular expressions:
L(x) | ϕ ∧ ψ | ¬ϕ
SLIDE 98 The problem we study
We study satisfiability for conjunctions of:
◮ Atomic relational constraints:
y = x1 · · · xn | R(x, y)
◮ Boolean combinations of regular expressions:
L(x) | ϕ ∧ ψ | ¬ϕ Example: x = w1yw2zw3 ∧ R(y, z) ∧ ¬S(z)
SLIDE 99 The problem we study
We study satisfiability for conjunctions of:
◮ Atomic relational constraints:
y = x1 · · · xn | R(x, y)
◮ Boolean combinations of regular expressions:
L(x) | ϕ ∧ ψ | ¬ϕ Example: x = w1yw2zw3 ∧ R(y, z) ∧ ¬S(z) This class is
◮ Useful: Encodes transductions often used in web security
applications, e.g., replace all
◮ Very expressive: Subsumes word equations with rational constraints
SLIDE 100
In full generality the problem is undecidable
Proposition
Satisfiability of expressions R(x, x) is undecidable
SLIDE 101
In full generality the problem is undecidable
Proposition
Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT)
SLIDE 102 In full generality the problem is undecidable
Proposition
Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT) But not just on the graph defined by rational relations ...
◮ R(x, x) is equivalent to x = y ∧ R(x, y) ◮ Satisfiability of formulas of the form x = yz ∧ R(x, z), for R a
regular relation, is undecidable [B., Figueira, Libkin (2013)]
SLIDE 103 In full generality the problem is undecidable
Proposition
Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT) But not just on the graph defined by rational relations ...
◮ R(x, x) is equivalent to x = y ∧ R(x, y) ◮ Satisfiability of formulas of the form x = yz ∧ R(x, z), for R a
regular relation, is undecidable [B., Figueira, Libkin (2013)] Notion of acyclicity needs to consider expressions y = x1 · · · xn
SLIDE 104 Acyclicity restriction
We write R(x, y) as y = R(x) The straight line (SL) fragment:
m
xi = P(x1, . . . , xi−1), such that P(x1, . . . , xi−1) is either L(xj)
xj1 · · · xjn, for {xj, xj1, . . . xjn} ⊆ {x1, . . . , xi−1}.
SLIDE 105 Acyclicity restriction
We write R(x, y) as y = R(x) The straight line (SL) fragment:
m
xi = P(x1, . . . , xi−1), such that P(x1, . . . , xi−1) is either L(xj)
xj1 · · · xjn, for {xj, xj1, . . . xjn} ⊆ {x1, . . . , xi−1}. Example: The formula x = yz ∧ R(x, y) is not in SL, while the formula x = w1yw2zw3 ∧ R(y, z) is in SL
SLIDE 106
The main result
Theorem (Lin, B. (2016))
Satisfiability of expressions in SL is Expspace-complete
SLIDE 107 The main result
Theorem (Lin, B. (2016))
Satisfiability of expressions in SL is Expspace-complete Proof idea for upper bound:
◮ Replace concatenations in the expression ϕ with “exponentially big”
DNF expressions consisting exclusively of regular expressions and regular relations x = y
◮ If ϕ ∈ SL, then the resulting expression ϕ′ is acyclic in the sense
studied for ECRPQ(RAT)
◮ Check satisfiability of ϕ′ in Pspace, i.e., in Expsace in terms of
the size of the input ϕ
SLIDE 108 A better behaved fragment
SLk: Restriction of SL to expressions of depth k ≥ 1
◮ Depth of a variable x is number of variables on which x depends ◮ Depth of an expression is maximum depth of a variable
SLIDE 109 A better behaved fragment
SLk: Restriction of SL to expressions of depth k ≥ 1
◮ Depth of a variable x is number of variables on which x depends ◮ Depth of an expression is maximum depth of a variable
Theorem (Lin, B. (2016))
Satisfiability of expressions in SLk is Pspace-complete
SLIDE 110
FINAL REMARKS
SLIDE 111 Graph DB query languages and string verification share:
◮ interest in expressing complex interactions among words ◮ understanding which restrictions on such problems can lead to
practical tools in real-world applications
SLIDE 112 Graph DB query languages and string verification share:
◮ interest in expressing complex interactions among words ◮ understanding which restrictions on such problems can lead to
practical tools in real-world applications I presented somes interaction between graph DBs, string verification, and word equations, but others are also possible.
◮ Graph QLs with arithmetic expressions:
◮ Require applying tools based on Presburguer atithmetic and
bounded-reversal counter automata [B., Libkin, Lin, Wood (2012)]
◮ Monadic decomposability:
◮ Can a regular relation be expressed as a Boolean combination of
products of regular languages? [B., Hong, Le, Li, Niskanen (2019)]
◮ Related to boundedness problems for recursive query languages
SLIDE 113
THANKS