Graph Database Querying vs String Constraints Pablo Barcel o - - PowerPoint PPT Presentation

graph database querying vs string constraints
SMART_READER_LITE
LIVE PREVIEW

Graph Database Querying vs String Constraints Pablo Barcel o - - PowerPoint PPT Presentation

Graph Database Querying vs String Constraints Pablo Barcel o Millennium Institute for Foundational Research on Data & DCC, University of Chile INTRODUCTION Graph DBs and applications Graph DBs are crucial when topology is as


slide-1
SLIDE 1

Graph Database Querying vs String Constraints

Pablo Barcel´

  • Millennium Institute for Foundational Research on Data &

DCC, University of Chile

slide-2
SLIDE 2

INTRODUCTION

slide-3
SLIDE 3

Graph DBs and applications

  • Graph DBs are crucial when topology is as important as data itself
slide-4
SLIDE 4

Graph DBs and applications

  • Graph DBs are crucial when topology is as important as data itself
  • They gained renewed interest in last years due to trendy applications:

◮ Web (semantic) ◮ Social networks ◮ Chemical and biological networks ◮ Software bug localization ◮ . . .

slide-5
SLIDE 5

Graph DBs and applications

  • Graph DBs are crucial when topology is as important as data itself
  • They gained renewed interest in last years due to trendy applications:

◮ Web (semantic) ◮ Social networks ◮ Chemical and biological networks ◮ Software bug localization ◮ . . .

  • They are an active area of research and industrial application:

◮ Amazon Neptune, Neo4J, Facebook GraphQL, Google Knowledge

Graph, Oracle Graph DBMS, RDF Virtuoso, Apache Jena, ...

slide-6
SLIDE 6

Features of the query languages we study

Languages we study express essential features for querying graph DBs

◮ Navigation: Recursively traverse the edges of the graph ◮ Pattern matching: Check if a pattern appears in the graph DB ◮ Path comparisons: Based on relations over words

slide-7
SLIDE 7

Features of the query languages we study

Languages we study express essential features for querying graph DBs

◮ Navigation: Recursively traverse the edges of the graph ◮ Pattern matching: Check if a pattern appears in the graph DB ◮ Path comparisons: Based on relations over words

Some of these features form the basis of recently formalized graph DB query languages:

◮ LDBC Proposal: G-CORE: A Core for Future Graph Query

Languages (SIGMOD’18)

◮ Neo4J Proposal: Cypher: An Evolving Query Language for Property

Graphs (SIGMOD’18)

◮ Survey: Foundations of Modern Query Languages for Graph

Databases (ACM Comput. Surv.’17)

slide-8
SLIDE 8

Problems we study:

Expressiveness: What can be said in a query language L?

slide-9
SLIDE 9

Problems we study:

Expressiveness: What can be said in a query language L? Complexity of evaluation: We study the problem: Problem: Eval(L) Input: A graph DB G, a tuple ¯ t of objects, an L-query Q. Question: Is ¯ t ∈ Q(G)?

◮ Combined complexity: Both G and Q are part of the input. ◮ Data complexity: Only G is part of the input and Q is fixed.

slide-10
SLIDE 10

THE GRAPH DATA MODEL

slide-11
SLIDE 11

Graph data model

Different apps have given rise to a myriad of different graph DB models

  • (see (Angles, Guti´

errez (2008)))

slide-12
SLIDE 12

Graph data model

Different apps have given rise to a myriad of different graph DB models

  • (see (Angles, Guti´

errez (2008))) We work with a simple graph data model: Finite, directed, edge labeled graphs

slide-13
SLIDE 13

Graph data model

Different apps have given rise to a myriad of different graph DB models

  • (see (Angles, Guti´

errez (2008))) We work with a simple graph data model: Finite, directed, edge labeled graphs Despite the simplicity of the model:

◮ It is flexible enough to accomodate many other more complex

models and express interesting practical scenarios

◮ The most fundamental theoretical issues related to querying graph

DBs appear in full force for it

slide-14
SLIDE 14

Graph databases

Definition

A graph DB G over finite alphabet Σ is a pair:

(V , E)

set of edges of the form v1

a

− → v2 finite set of node ids (v1, v2 ∈ V , a ∈ Σ)

slide-15
SLIDE 15

Graph databases

Definition

A graph DB G over finite alphabet Σ is a pair:

(V , E)

set of edges of the form v1

a

− → v2 finite set of node ids (v1, v2 ∈ V , a ∈ Σ)

  • A path in G is a sequence of the form:

ρ = v1

a1

− → v2

a2

− → v3 · · · vk

ak

− → vk+1

  • The label of ρ, denoted λ(ρ), is the string a1a2 · · · ak−1 ∈ Σ∗
slide-16
SLIDE 16

Graph DBs: Example

A graph DB representation of a fragment of DBLP

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95

slide-17
SLIDE 17

Graph DBs: Example

A path in this graph DB

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal inPods:89 series Pods:Ullman89 creator partOf Pods:Vardi95 conf:pods partOf IPL:LibkinW95 inPods:95 Pods:Libkin95 journal:IPL creator series

slide-18
SLIDE 18

Graph DBs: Example

The label of such path

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal inPods:89 series Pods:Ullman89 creator partOf Pods:Vardi95 conf:pods partOf IPL:LibkinW95 inPods:95 Pods:Libkin95 journal:IPL creator series

slide-19
SLIDE 19

Graph DBs vs NFAs

Important: Graph DBs can be naturally seen as NFAs.

◮ Nodes are states ◮ Edges u a

− → v are transitions

◮ There are no initial and final states

slide-20
SLIDE 20

BASIC LANGUAGES FOR GRAPH DBs:

Tractability for a big class of languages

slide-21
SLIDE 21

Regular path queries

Basic building block for graph queries: Regular path queries (RPQs)

◮ First studied by Mendelzon and Wood (1989) ◮ RPQs = Regular expressions over Σ ◮ Evaluation L(G) of RPQ L on graph DB G = (V , E):

  • Pairs of nodes (v, v ′) ∈ V linked by path labeled in L
slide-22
SLIDE 22

RPQs with inverse

More often studied its extension with inverses, or 2RPQs

◮ First studied by Calvanese, de Giacomo, Lenzerini, Vardi (2000) ◮ 2RPQs = RPQs over Σ±, where:

  • Σ± = Σ extended with the inverse a− of each a ∈ Σ
slide-23
SLIDE 23

RPQs with inverse

More often studied its extension with inverses, or 2RPQs

◮ First studied by Calvanese, de Giacomo, Lenzerini, Vardi (2000) ◮ 2RPQs = RPQs over Σ±, where:

  • Σ± = Σ extended with the inverse a− of each a ∈ Σ

Evaluation L(G) of 2RPQ L over graph DB G = (V , E).

◮ Pairs of nodes in G that satisfy RPQ L(G±), where

  • G± obtained from G by adding u

a−

− → v for each v

a

− → u ∈ E

slide-24
SLIDE 24

Example of 2RPQ

The 2RPQ

  • creator− ·
  • (partOf · series) ∪ journal
  • computes (a, v) s.t. author a published in conference or journal v

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95

slide-25
SLIDE 25

Example of 2RPQ

The 2RPQ

  • creator− ·
  • (partOf · series) ∪ journal
  • computes (a, v) s.t. author a published in conference or journal v

inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a series journal partOf creator creator creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal series partOf creator conf:pods :Ronald Fagin

a v

:Moshe Y. Vardi Pods:Ullman89 creator inPods:89 series partOf series IPL:LibkinW95 inPods:95 partOf creator Pods:Vardi95 Pods:Libkin95 journal:IPL

slide-26
SLIDE 26

Example of 2RPQ

Example: The 2RPQ

  • creator− ·
  • (partOf · series) ∪ journal
  • computes (a, v) s.t. author a published in conference or journal v

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods Jacm:HopcroftT74 :Jeffrey Ullman conf:focs Focs:HopU67a series series partOf partOf creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal journal creator :Robert E Tarjan journal:jacm

v a

:Moshe Y. Vardi Pods:Ullman89 creator inPods:89 series partOf series IPL:LibkinW95 inPods:95 Pods:Vardi95 partOf creator journal:IPL Pods:Libkin95

slide-27
SLIDE 27

2RPQ evaluation

Problem: Eval(2RPQ) Input: A graph DB G, nodes v, v ′ in G, a 2RPQ L Question: Is (v, v ′) ∈ L(G)?

slide-28
SLIDE 28

2RPQ evaluation

Problem: Eval(2RPQ) Input: A graph DB G, nodes v, v ′ in G, a 2RPQ L Question: Is (v, v ′) ∈ L(G)? It boils down to: Problem: RegularPath Input: A graph DB G, nodes v, v ′ in G, a regular expression L over Σ± Question: Is there a path ρ from v to v ′ in G± such that λ(ρ) ∈ L?

slide-29
SLIDE 29

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|)

slide-30
SLIDE 30

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|) Proof idea:

◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by

setting v and v ′ as initial and final states, respectively

◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)

slide-31
SLIDE 31

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|) Proof idea:

◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by

setting v and v ′ as initial and final states, respectively

◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)

slide-32
SLIDE 32

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|) Proof idea:

◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by

setting v and v ′ as initial and final states, respectively

◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)

slide-33
SLIDE 33

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|) Proof idea:

◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by

setting v and v ′ as initial and final states, respectively

◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)

slide-34
SLIDE 34

Complexity of finding regular paths

Theorem (Folklore)

RegularPath can be solved in time O(|G| · |L|) Proof idea:

◮ Compute in linear time from L an equivalent NFA A ◮ Compute in linear time (G±, v, v ′): NFA obtained from G± by

setting v and v ′ as initial and final states, respectively

◮ Then (v, v ′) ∈ L(G) iff NFA (G±, v, v ′) × A is nonempty ◮ The latter can be checked in time O(|G±| · |A|) = O(|G| · |L|)

slide-35
SLIDE 35

Complexity of 2RPQ evaluation

Corollary

Eval(2RPQ) can be solved in linear time O(|G| · |L|)

slide-36
SLIDE 36

Data complexity of 2RPQ evaluation

Data complexity of 2RPQs belongs to a parallelizable class:

Proposition

Let L be a fixed 2RPQ. There is NLogspace procedure that computes L(G) for each G Proof idea:

◮ Construct (G±, v, v ′) from G in Logspace ◮ Check nonemptiness for (G±, v, v ′) × A in NLogspace

slide-37
SLIDE 37

Conjunctive regular path queries (CRPQs)

RPQs still do not express arbitrary patterns over graph DBs.

◮ To do this we need to close RPQs under joins and projection

slide-38
SLIDE 38

Conjunctive regular path queries (CRPQs)

RPQs still do not express arbitrary patterns over graph DBs.

◮ To do this we need to close RPQs under joins and projection

This is the class of conjunctive regular path queries (CRPQs).

◮ Extended with inverses as C2RPQs in [Calvanese et al. (2000)]

slide-39
SLIDE 39

Example of C2RPQ

The C2RPQ

Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)

computes pairs (a1, a2) that are coauthors of a conference paper

:Ronald Fagin inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series series journal partOf partOf creator creator creator creator creator creator Pods:FaginUV83 creator creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator Pods:Ullman89 inPods:89 partOf series Pods:Libkin95 IPL:LibkinW95 partOf creator inPods:95 journal:IPL series Pods:Vardi95

slide-40
SLIDE 40

Example of C2RPQ

The C2RPQ

Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)

computes pairs (a1, a2) that are coauthors of a conference paper

creator inPods:83 :John E. Hopcroft inFocs:FOCS8 journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a partOf creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator partOf series creator conf:pods :Ronald Fagin

z y u x

:Moshe Y. Vardi series journal series partOf Pods:Ullman89 creator inPods:89 Pods:Vardi95 inPods:95 partOf IPL:LibkinW95 Pods:Libkin95 series journal:IPL creator

slide-41
SLIDE 41

Example of C2RPQ

The C2RPQ

Ans(x, u) ← (x, creator−, y), (y, partOf · series, z), (y, creator, u)

computes pairs (a1, a2) that are coauthors of a conference paper

inPods:83 :John E. Hopcroft inFocs:FOCS8 conf:pods journal:jacm Jacm:HopcroftT74 :Robert E Tarjan :Jeffrey Ullman conf:focs Focs:HopU67a :Moshe Y. Vardi series journal partOf creator creator creator creator creator Pods:FaginUV83 creator :Leonid Libkin partOf creator creator :Limsoon Wong journal creator partOf series creator :Ronald Fagin

a1 a2

creator inPods:89 Pods:Ullman89 series partOf IPL:LibkinW95 series journal:IPL creator partOf Pods:Libkin95 Pods:Vardi95 inPods:95

slide-42
SLIDE 42

C2RPQ: Formal definition

C2RPQ over Σ: Rule of the form Ans(¯ z) ← (x1, L1, y1), . . . , (xm, Lm, ym), such that

◮ the xi, yi are variables, ◮ each Li is a 2RPQ over Σ, ◮ the output ¯

z has some variables among the xi, yi’s

slide-43
SLIDE 43

C2RPQ: Formal definition

C2RPQ over Σ: Rule of the form Ans(¯ z) ← (x1, L1, y1), . . . , (xm, Lm, ym), such that

◮ the xi, yi are variables, ◮ each Li is a 2RPQ over Σ, ◮ the output ¯

z has some variables among the xi, yi’s CRPQ: C2RPQ without inverse

slide-44
SLIDE 44

Complexity of evaluation of C2RPQs

Increase in expressiveness from RPQs has a cost in evaluation

Proposition

Eval(C2RPQ) is NP-complete, even if restricted to CRPQs

slide-45
SLIDE 45

Complexity of evaluation of C2RPQs

Increase in expressiveness from RPQs has a cost in evaluation

Proposition

Eval(C2RPQ) is NP-complete, even if restricted to CRPQs But adding conjunctions is free in data complexity

Proposition

Eval(C2RPQ) can be solved in NLogspace in data complexity

slide-46
SLIDE 46

PATH QUERIES:

The power of comparisons

slide-47
SLIDE 47

CRPQs and path queries

CRPQs fall short of expressive power for applications that need:

◮ to include paths in the output of a query, and ◮ to define complex relationships among labels of paths

slide-48
SLIDE 48

CRPQs and path queries

CRPQs fall short of expressive power for applications that need:

◮ to include paths in the output of a query, and ◮ to define complex relationships among labels of paths

Examples:

◮ Semantic Web queries:

  • establish semantic associations among paths

◮ Biological applications:

  • compare paths based on similarity

◮ Route-finding applications:

  • compare paths based on length or number of occurences of labels

◮ Data provenance and semantic search over the Web:

  • require returning paths to the user
slide-49
SLIDE 49

Path comparisons

We use a set S of relations on words.

◮ Example: S may contain

  • Unary relations: Regular, context-free languages, etc.
  • Binary relations: prefix, equal length, subsequence, etc.

◮ Comparisons among labels of paths = Pertenence to some S ∈ S

  • Example: w1 is a substring of w2

◮ We assume S contains all regular languages

slide-50
SLIDE 50

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, L1, y1), . . . , (xm, Lm, ym),

◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯

πj wrt Sj ∈ S

  • for ¯

πj a tuple of path variables among the πi’s,

◮ projecting some of πi’s as a tuple ¯

χ in the output

slide-51
SLIDE 51

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),

◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯

πj wrt Sj ∈ S

  • for ¯

πj a tuple of path variables among the πi’s,

◮ projecting some of πi’s as a tuple ¯

χ in the output

slide-52
SLIDE 52

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ) ← (x1, π1, y1), . . . , (xm, πm, ym),

1≤j≤t Sj(¯

πj)

◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯

πj wrt Sj ∈ S

  • for ¯

πj a tuple of path variables among the πi’s,

◮ projecting some of πi’s as a tuple ¯

χ in the output

slide-53
SLIDE 53

Extended CRPQs

The S-extended CRPQs (ECRPQ(S)) are rules obtained from a CRPQ: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),

1≤j≤t Sj(¯

πj)

◮ by joining each pair (xi, yi) with a path variable πi, ◮ comparing labels of paths in ¯

πj wrt Sj ∈ S

  • for ¯

πj a tuple of path variables among the πi’s,

◮ projecting some of πi’s as a tuple ¯

χ in the output

slide-54
SLIDE 54

Extended CRPQs and our requirements

ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),

1≤j≤t Sj(¯

πj)

slide-55
SLIDE 55

Extended CRPQs and our requirements

ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),

1≤j≤t Sj(¯

πj)

◮ They allow to export paths in the output ◮ They allow to compare labels of paths with relations Sj ∈ S

slide-56
SLIDE 56

Extended CRPQs and our requirements

ECRPQs meet our requirements: Ans(¯ z, ¯ χ) ← (x1, π1, y1), . . . , (xm, πm, ym),

1≤j≤t Sj(¯

πj)

◮ They allow to export paths in the output ◮ They allow to compare labels of paths with relations Sj ∈ S

slide-57
SLIDE 57

Considerations about ECRPQ(S)

  • ECRPQ(S) extends the class of CRPQs

◮ Ans(¯

z) ←

i(xi, Li, yi) = Ans(¯

z) ←

i(xi, πi, yi), Li(πi)

  • Expressiveness and complexity of ECRPQ(S):

◮ Depends on the class S

  • We study two such classes with roots in formal language theory:

◮ Regular relations [Elgot, Mezei (1965)] ◮ Rational relations [Nivat (1968)]

slide-58
SLIDE 58

COMPARING PATHS WITH REGULAR RELATIONS:

Preserving tractable data complexity

slide-59
SLIDE 59

Introduction

  • Regular relations: Regular languages for relations of any arity

◮ REG: Class of regular relations

  • Bottomline:

ECRPQ(REG): Reasonable expressiveness and complexity

slide-60
SLIDE 60

Regular relations

n-ary regular relation: Set of n-tuples (w1, . . . , wn) of strings accepted by synchronous automaton over Σn

slide-61
SLIDE 61

Regular relations

n-ary regular relation: Set of n-tuples (w1, . . . , wn) of strings accepted by synchronous automaton over Σn

◮ The input strings are written in the n-tapes ◮ Shorter strings are padded with symbol ⊥ ◮ At each step:

The automaton simultaneously reads next symbol on each tape

slide-62
SLIDE 62

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a w3 = b b · · · . . . . . . wn = a b b · · · a c

slide-63
SLIDE 63

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥

slide-64
SLIDE 64

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-65
SLIDE 65

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-66
SLIDE 66

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-67
SLIDE 67

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-68
SLIDE 68

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-69
SLIDE 69

Synchronous automata

w1 = a a b · · · a b c w2 = a b a · · · a ⊥ ⊥ w3 = b b ⊥ · · · ⊥ ⊥ ⊥ . . . . . . wn = a b b · · · a c ⊥ ⇑

slide-70
SLIDE 70

Examples of regular relations

  • All regular languages
  • The prefix relation defined by:

a∈Σ

(a, a) ∗ ·

a∈Σ

(a, ⊥) ∗

  • The equal length relation defined by:

a,b∈Σ

(a, b) ∗

  • Pairs of strings at edit distance at most k, for fixed k ≥ 0
slide-71
SLIDE 71

Examples of regular relations

  • All regular languages
  • The prefix relation defined by:

a∈Σ

(a, a) ∗ ·

a∈Σ

(a, ⊥) ∗

  • The equal length relation defined by:

a,b∈Σ

(a, b) ∗

  • Pairs of strings at edit distance at most k, for fixed k ≥ 0

Proposition

The subsequence, subword and suffix relations are not regular

slide-72
SLIDE 72

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←

i(xi, πi, yi), j Sj(¯

πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)]

slide-73
SLIDE 73

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←

i(xi, πi, yi), j Sj(¯

πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)] Example: The ECRPQ(REG) query Ans(x, y) ← (x, π1, z), (z, π2, y), a∗(π1), b∗(π2), equal length(π1, π2) computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}

slide-74
SLIDE 74

ECRPQ(REG)

ECRPQ(REG): Class of queries of the form Ans(¯ z, ¯ χ) ←

i(xi, πi, yi), j Sj(¯

πj), where each Sj is a regular relation [B., Libkin, Lin, Wood (2012)] Example: The ECRPQ(REG) query Ans(x, y) ← (x, π1, z), (z, π2, y), a∗(π1), b∗(π2), equal length(π1, π2) computes pairs of nodes linked by a path labeled in {anbn | n ≥ 0}

Corollary

ECRPQ(REG) properly extends the class of CRPQs

slide-75
SLIDE 75

Complexity of evaluation of ECRPQ(REG)

  • Extending CRPQs with regular relations is free in data complexity
  • Combined complexity is that of FO over relational databases

Theorem (B., Libkin, Lin, Wood (2012))

◮ Eval(ECPRQ(REG)) is Pspace-complete ◮ Eval(ECPRQ(REG)) is in NLogspace in data complexity

slide-76
SLIDE 76

Complexity of evaluation of ECRPQ(REG)

  • Extending CRPQs with regular relations is free in data complexity
  • Combined complexity is that of FO over relational databases

Theorem (B., Libkin, Lin, Wood (2012))

◮ Eval(ECPRQ(REG)) is Pspace-complete ◮ Eval(ECPRQ(REG)) is in NLogspace in data complexity

Proof idea:

◮ Convert into RPQ evaluation over Gm, for m = size of ECRPQ ◮ For data complexity m is fixed

slide-77
SLIDE 77

Expressiveness of ECRPQ(REG)

Understanding the expressive power of ECRPQ(REG) is difficult.

Proposition

Let L be a language of words. TFAE:

◮ L is expressible by a binary ECRPQ(REG) formula ◮ L is definable by a word equation with constraints in REG

slide-78
SLIDE 78

COMPARING PATHS WITH RATIONAL RELATIONS:

The struggle for decidability and efficiency

slide-79
SLIDE 79

Introduction

ECRPQ(REG) queries are still short of expressive power.

◮ RDF or biological networks:

  • Compare strings based on subsequence and subword relations

◮ These relations are rational: Accepted by asynchronous automata

  • RAT: Class of rational relations

Bottomline:

◮ ECRPQ(RAT) evaluation:

  • Undecidable or very high complexity

◮ Restricting the syntactic shape of queries yields tractability

slide-80
SLIDE 80

Rational relations

n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.

slide-81
SLIDE 81

Rational relations

n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.

◮ The input strings are written in the n-tapes ◮ At each step:

The automaton enters a new state and move some tape heads

slide-82
SLIDE 82

Rational relations

n-ary rational relation: Set of n-tuples (w1, . . . , wn) of strings accepted by asynchronous automaton with n heads.

◮ The input strings are written in the n-tapes ◮ At each step:

The automaton enters a new state and move some tape heads n-ary rational relation: Described by regular expression over alphabet (Σ ∪ {ǫ})n

slide-83
SLIDE 83

Examples of rational relations

  • All regular relations
  • The subsequence relation ss defined by

a∈Σ

(a, ǫ) ∗

b∈Σ

(b, b) ∗ ·

a∈Σ

(a, ǫ) ∗

  • The subword relation sw defined by

a∈Σ

(a, ǫ) ∗ ·

b∈Σ

(b, b) ∗ ·

a∈Σ

(a, ǫ) ∗

slide-84
SLIDE 84

Examples of rational relations

  • All regular relations
  • The subsequence relation ss defined by

a∈Σ

(a, ǫ) ∗

b∈Σ

(b, b) ∗ ·

a∈Σ

(a, ǫ) ∗

  • The subword relation sw defined by

a∈Σ

(a, ǫ) ∗ ·

b∈Σ

(b, b) ∗ ·

a∈Σ

(a, ǫ) ∗

Proposition

The set of pairs (w1, w2) such that w1 is the reversal of w2 is not rational.

slide-85
SLIDE 85

ECRPQ(RAT)

ECRPQ(RAT): Class of queries of the form Ans(¯ z, ¯ χ) ←

i(xi, πi, yi), j Sj(¯

πj), where each Sj is a rational relation [B., Figueira, Libkin (2012)] Example: The ECRPQ(RAT) query Ans(x, y) ← (x, π1, z), (y, π2, w), π1 ss π2 computes x, y that are origins of paths ρ1 and ρ2 such that:

◮ λ(ρ1) is a subsequence of λ(ρ2)

slide-86
SLIDE 86

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?

  • For example, ss and sw
slide-87
SLIDE 87

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?

  • For example, ss and sw

Adding subword relation to ECRPQ(REG) leads to undecidability:

Theorem (B., Figueira, Libkin (2012))

Eval(ECRPQ(REG ∪{sw})) is undecidable (even in data complexity)

slide-88
SLIDE 88

Evaluation of ECRPQ(RAT) queries

Evaluation of queries in ECRPQ(RAT) is undecidable, but:

◮ True if we allow only practically motivated rational relations?

  • For example, ss and sw

Adding subword relation to ECRPQ(REG) leads to undecidability:

Theorem (B., Figueira, Libkin (2012))

Eval(ECRPQ(REG ∪{sw})) is undecidable (even in data complexity) Adding subword to CRPQ leads to intractability in data complexity:

Theorem (B., Mu˜

noz (2014)) Eval(CRPQ(sw)) is PSPACE-complete in data complexity

◮ But Eval(CRPQ(suff)) is in NLogspace in data complexity

slide-89
SLIDE 89

Consequences for word equations

Observation 1: Pspace upper bound for CRPQ(sw)

◮ Uses Pspace procedure for word equations with regular expressions

slide-90
SLIDE 90

Consequences for word equations

Observation 1: Pspace upper bound for CRPQ(sw)

◮ Uses Pspace procedure for word equations with regular expressions

Observation 2: There exists a fixed word equation e such that

◮ solving e under a single constraint in REG is undecidable ◮ solving e with regular language constraints is Pspace-complete

slide-91
SLIDE 91

Evaluation of ECRPQ(RAT) queries

Adding subsequence to ECRPQ preserves decidability at a very high cost:

Theorem (B., Figueira, Libkin (2012))

Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.

◮ This holds even in data complexity.

slide-92
SLIDE 92

Evaluation of ECRPQ(RAT) queries

Adding subsequence to ECRPQ preserves decidability at a very high cost:

Theorem (B., Figueira, Libkin (2012))

Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.

◮ This holds even in data complexity.

Adding subsequence to CRPQ leads to intractability in data complexity:

Theorem (B., Mu˜

noz (2014)) Eval(CRPQ(ss)) is NP-complete in data complexity

slide-93
SLIDE 93

Evaluation of ECRPQ(RAT) queries

Adding subsequence to ECRPQ preserves decidability at a very high cost:

Theorem (B., Figueira, Libkin (2012))

Eval(ECRPQ(REG ∪{ss})) is decidable, but non-primitive-recursive.

◮ This holds even in data complexity.

Adding subsequence to CRPQ leads to intractability in data complexity:

Theorem (B., Mu˜

noz (2014)) Eval(CRPQ(ss)) is NP-complete in data complexity Observation 3: Word equations + ss undecidable [Halfon et al (2017)]

◮ Is this also the case for Eval(CRPQ(ss ∪ sw))?

slide-94
SLIDE 94

Acyclic CRPQ(RAT) queries

Acyclic CRPQ(RAT) queries yield tractable data complexity.

◮ Queries of the form

Ans(¯ z) ←

  • i≤k

(xi, πi, yi), Li(πi),

  • j

Sj(πj1, πj2), where the graph on {1, . . . , k} defined by edges (πj1, πj2) is acyclic

slide-95
SLIDE 95

Acyclic CRPQ(RAT) queries

Acyclic CRPQ(RAT) queries yield tractable data complexity.

◮ Queries of the form

Ans(¯ z) ←

  • i≤k

(xi, πi, yi), Li(πi),

  • j

Sj(πj1, πj2), where the graph on {1, . . . , k} defined by edges (πj1, πj2) is acyclic Acyclic ECRPQ(RAT) is not more expensive than ECRPQ(REG):

Theorem (B., Figueira, Libkin (2012))

◮ Evaluation of acyclic ECRPQ(RAT) queries is Pspace-complete ◮ It is in NLogspace in data complexity

slide-96
SLIDE 96

STRING SOLVING:

Applying previous ideas

slide-97
SLIDE 97

The problem we study

We study satisfiability for conjunctions of:

◮ Atomic relational constraints:

y = x1 · · · xn | R(x, y)

◮ Boolean combinations of regular expressions:

L(x) | ϕ ∧ ψ | ¬ϕ

slide-98
SLIDE 98

The problem we study

We study satisfiability for conjunctions of:

◮ Atomic relational constraints:

y = x1 · · · xn | R(x, y)

◮ Boolean combinations of regular expressions:

L(x) | ϕ ∧ ψ | ¬ϕ Example: x = w1yw2zw3 ∧ R(y, z) ∧ ¬S(z)

slide-99
SLIDE 99

The problem we study

We study satisfiability for conjunctions of:

◮ Atomic relational constraints:

y = x1 · · · xn | R(x, y)

◮ Boolean combinations of regular expressions:

L(x) | ϕ ∧ ψ | ¬ϕ Example: x = w1yw2zw3 ∧ R(y, z) ∧ ¬S(z) This class is

◮ Useful: Encodes transductions often used in web security

applications, e.g., replace all

◮ Very expressive: Subsumes word equations with rational constraints

slide-100
SLIDE 100

In full generality the problem is undecidable

Proposition

Satisfiability of expressions R(x, x) is undecidable

slide-101
SLIDE 101

In full generality the problem is undecidable

Proposition

Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT)

slide-102
SLIDE 102

In full generality the problem is undecidable

Proposition

Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT) But not just on the graph defined by rational relations ...

◮ R(x, x) is equivalent to x = y ∧ R(x, y) ◮ Satisfiability of formulas of the form x = yz ∧ R(x, z), for R a

regular relation, is undecidable [B., Figueira, Libkin (2013)]

slide-103
SLIDE 103

In full generality the problem is undecidable

Proposition

Satisfiability of expressions R(x, x) is undecidable Idea: Use acyclicity restrictions as we did for ECRPQ(RAT) But not just on the graph defined by rational relations ...

◮ R(x, x) is equivalent to x = y ∧ R(x, y) ◮ Satisfiability of formulas of the form x = yz ∧ R(x, z), for R a

regular relation, is undecidable [B., Figueira, Libkin (2013)] Notion of acyclicity needs to consider expressions y = x1 · · · xn

slide-104
SLIDE 104

Acyclicity restriction

We write R(x, y) as y = R(x) The straight line (SL) fragment:

m

  • i=1

xi = P(x1, . . . , xi−1), such that P(x1, . . . , xi−1) is either L(xj)

  • r

xj1 · · · xjn, for {xj, xj1, . . . xjn} ⊆ {x1, . . . , xi−1}.

slide-105
SLIDE 105

Acyclicity restriction

We write R(x, y) as y = R(x) The straight line (SL) fragment:

m

  • i=1

xi = P(x1, . . . , xi−1), such that P(x1, . . . , xi−1) is either L(xj)

  • r

xj1 · · · xjn, for {xj, xj1, . . . xjn} ⊆ {x1, . . . , xi−1}. Example: The formula x = yz ∧ R(x, y) is not in SL, while the formula x = w1yw2zw3 ∧ R(y, z) is in SL

slide-106
SLIDE 106

The main result

Theorem (Lin, B. (2016))

Satisfiability of expressions in SL is Expspace-complete

slide-107
SLIDE 107

The main result

Theorem (Lin, B. (2016))

Satisfiability of expressions in SL is Expspace-complete Proof idea for upper bound:

◮ Replace concatenations in the expression ϕ with “exponentially big”

DNF expressions consisting exclusively of regular expressions and regular relations x = y

◮ If ϕ ∈ SL, then the resulting expression ϕ′ is acyclic in the sense

studied for ECRPQ(RAT)

◮ Check satisfiability of ϕ′ in Pspace, i.e., in Expsace in terms of

the size of the input ϕ

slide-108
SLIDE 108

A better behaved fragment

SLk: Restriction of SL to expressions of depth k ≥ 1

◮ Depth of a variable x is number of variables on which x depends ◮ Depth of an expression is maximum depth of a variable

slide-109
SLIDE 109

A better behaved fragment

SLk: Restriction of SL to expressions of depth k ≥ 1

◮ Depth of a variable x is number of variables on which x depends ◮ Depth of an expression is maximum depth of a variable

Theorem (Lin, B. (2016))

Satisfiability of expressions in SLk is Pspace-complete

slide-110
SLIDE 110

FINAL REMARKS

slide-111
SLIDE 111

Graph DB query languages and string verification share:

◮ interest in expressing complex interactions among words ◮ understanding which restrictions on such problems can lead to

practical tools in real-world applications

slide-112
SLIDE 112

Graph DB query languages and string verification share:

◮ interest in expressing complex interactions among words ◮ understanding which restrictions on such problems can lead to

practical tools in real-world applications I presented somes interaction between graph DBs, string verification, and word equations, but others are also possible.

◮ Graph QLs with arithmetic expressions:

◮ Require applying tools based on Presburguer atithmetic and

bounded-reversal counter automata [B., Libkin, Lin, Wood (2012)]

◮ Monadic decomposability:

◮ Can a regular relation be expressed as a Boolean combination of

products of regular languages? [B., Hong, Le, Li, Niskanen (2019)]

◮ Related to boundedness problems for recursive query languages

slide-113
SLIDE 113

THANKS