Static analysis over tree-structured data using graph decompositions - - PowerPoint PPT Presentation

static analysis over tree structured data using graph
SMART_READER_LITE
LIVE PREVIEW

Static analysis over tree-structured data using graph decompositions - - PowerPoint PPT Presentation

Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko laj Boja nczyk, Wojciech Czerwi nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam


slide-1
SLIDE 1

Static analysis over tree-structured data using graph decompositions

Filip Murlak

University of Warsaw, Poland Contains joint work with Miko laj Boja´ nczyk, Wojciech Czerwi´ nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam Witkowski.

ALCOP 2017 Glasgow, Scotland

slide-2
SLIDE 2

Problems Old solutions New solution More problems with solutions Some problems without solutions

slide-3
SLIDE 3

Data

slide-4
SLIDE 4

Data

slide-5
SLIDE 5

Data trees

a, 2 c, 7 c, 3 b, 7 a, 1 a, 5 a, 1 b, 0 trees finite, unranked, ordered labels a, b, c, . . . from a finite alphabet (tags) data values 0, 1, 2, . . . from an infinite data domain (contents)

slide-6
SLIDE 6

Schemas describe allowed shapes of data trees

Define several types of trees, each specified (recursively) by

◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp);

and choose some of the types as allowed.

slide-7
SLIDE 7

Schemas describe allowed shapes of data trees

Define several types of trees, each specified (recursively) by

◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp);

and choose some of the types as allowed. Example: a-only path from root to leaf, b’s elsewhere

◮ type τ: root label a, immediate subtree types σ∗τσ∗ + ǫ ; ◮ type σ: root label b, immediate subtree types σ∗ ; ◮ choose: τ .

slide-8
SLIDE 8

Conjunctive queries over data trees

a a c − → a, 2 c, 7 c, 3 b, 7 a, 1 a, 5 a, 1 b, 0 ∃x1 · · · ∃x5 child(x1, x2) ∧ child(x2, x3) ∧ child(x3, x4) ∧ ∧ desc(x1, x5) ∧ desc(x5, x4) ∧ ∧ a(x1) ∧ a(x4) ∧ c(x5) ∧ ∧ x2 ∼ x3

slide-9
SLIDE 9

Datalog on data trees

a a . . . a b c c c a c b p(x) ← a(x) ∧ desc(x, y) ∧ c(y) ∧ x ∼ y ∧ child(x, z) ∧ p(z) p(x) ← b(x) extensional predicates child, desc, ∼, a, b, c, . . . ; intensional predicates defined recursively using conjunctive queries; monadic only unary intensional predicates; linear at most one intensional atom per rule.

slide-10
SLIDE 10

Static analysis problems

Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P, Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries.

slide-11
SLIDE 11

Static analysis problems

Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P, Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries. P sat iff not P ⇔⊥ iff not P ⇒⊥ P∧¬Q, Q∧¬P unsat iff P ⇔Q iff P ⇒Q, Q ⇒P P∧¬Q unsat iff P ⇔P∧Q iff P ⇒Q

slide-12
SLIDE 12

Problems Old solutions New solution More problems with solutions Some problems without solutions

slide-13
SLIDE 13

Containment of CQs over arbitrary structures

[Chandra, Merlin ’77] Def: Q ∈ CQ

  • AQ: universe VarQ,

relations given by atoms of Q Fact: A | = Q iff exists h: AQ → A Thm: P ⇒ Q iff exists g : AQ → AP

slide-14
SLIDE 14

Containment of CQs over arbitrary structures

[Chandra, Merlin ’77] Def: Q ∈ CQ

  • AQ: universe VarQ,

relations given by atoms of Q Fact: A | = Q iff exists h: AQ → A Thm: P ⇒ Q iff exists g : AQ → AP AQ AP A (⇐) If g : AQ → AP and h: AP → A, then h ◦ g : AQ → A. (⇒) AP | = P and P ⇒ Q, so AP | = Q. Exists h : AQ → AP.

slide-15
SLIDE 15

Containment of CQs over arbitrary structures

[Chandra, Merlin ’77] Def: Q ∈ CQ

  • AQ: universe VarQ,

relations given by atoms of Q Fact: A | = Q iff exists h: AQ → A Thm: P ⇒ Q iff exists g : AQ → AP AQ AP A (⇐) If g : AQ → AP and h: AP → A, then h ◦ g : AQ → A. (⇒) AP | = P and P ⇒ Q, so AP | = Q. Exists h : AQ → AP. To decide containment, test existence of a homomorphism.

slide-16
SLIDE 16

Containment for UCQs over trees without data

[Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs:

b a c ≡ a b c ∨ a c b

slide-17
SLIDE 17

Containment for UCQs over trees without data

[Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs:

b a c ≡ a b c ∨ a c b

For a tree shaped CQ π build an equivalent tree automaton:

◮ it computes bottom-up the set of matched subtrees of π; ◮ knowing which subtrees of π match at the children of node v or

strictly below, one can tell which match at v or strictly below.

slide-18
SLIDE 18

Containment for UCQs over trees without data

[Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs:

b a c ≡ a b c ∨ a c b

For a tree shaped CQ π build an equivalent tree automaton:

◮ it computes bottom-up the set of matched subtrees of π; ◮ knowing which subtrees of π match at the children of node v or

strictly below, one can tell which match at v or strictly below. Tree automata are effectively closed under Boolean combinations. Test emptiness of the automaton corresponding to P ∧ ¬Q.

slide-19
SLIDE 19

Containment for UCQs over data trees

[Bj¨

  • rklund, Martens, Schwentick ’08]

Can restrict to trees with data values c1, . . . , cP and distinct nulls.

◮ Let T be a tree satisfying P and not Q. ◮ P touches ≤ P data values in T; replace with c1, . . . , cP. ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q.

slide-20
SLIDE 20

Containment for UCQs over data trees

[Bj¨

  • rklund, Martens, Schwentick ’08]

Can restrict to trees with data values c1, . . . , cP and distinct nulls.

◮ Let T be a tree satisfying P and not Q. ◮ P touches ≤ P data values in T; replace with c1, . . . , cP. ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q.

In such trees, x ∼ y holds iff either x = y or x ∼ ci and y ∼ ci. By considering all possibilities, replace P, Q with P′, Q′ using only x = y, x ∼ ci, y ∼ ci. Check containment over the finite alphabet Σ × {⊥, c1, . . . , cn}.

slide-21
SLIDE 21

Equivalence for Datalog

Equivalence for Datalog is undecidable:

◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).

slide-22
SLIDE 22

Equivalence for Datalog

Equivalence for Datalog is undecidable:

◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).

Theorem (Mazowiecki, Murlak, Witkowski 2014)

Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before?

slide-23
SLIDE 23

Equivalence for Datalog

Equivalence for Datalog is undecidable:

◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).

Theorem (Mazowiecki, Murlak, Witkowski 2014)

Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before?

◮ Let T be a tree satisfying P and not Q. ◮ Then T satisfies some CQ P0, an unravelling of P. ◮ P0 touches ≤ P0 data values in T, like before, ◮ but P0 can be arbitrarily large...

slide-24
SLIDE 24

Example

a a b a, 1 b, 2 a, 3 b, 4 b b a b, 8 a, 7 b, 6 a, 5 c, 1 . . . c, 8 N = 3 P ← DOWN0(x) DOWNi(x) ← child(x, y) ∧ a(y) ∧ DOWNi+1(y) DOWNN(x) ← UPN(x) ∧ (N+1)-parent(x, y) ∧ child(y, z) ∧ c(z) ∧ x ∼ z UPi(x) ← a(x) ∧ parent(x, y) ∧ child(y, z) ∧ b(z) ∧ DOWNi(z) UPi(x) ← b(x) ∧ parent(x, y) ∧ UPi−1(y) UP0(x) ← true Q ← x ∼ y ∧ i-parent(x, x′) ∧ i-parent(y, y′) ∧ a(x′) ∧ b(y′)

slide-25
SLIDE 25

Problems Old solutions New solution More problems with solutions Some problems without solutions

slide-26
SLIDE 26

Clique-width

Instead of processing structures, process their hierarchical decompositions (derivations). Construct (derive) coloured structures using operations: i – create a new node of colour i; R(i1, . . . , ir) – add to R all tuples of nodes with colours (i1, . . . , ir); i → j – change colour i to j; ⊕ – take disjoint union of two structures. clique-width(A) = least number of colours sufficient to construct A

slide-27
SLIDE 27

Examples

Linear orders: clique-width 2 yellow

slide-28
SLIDE 28

Examples

Linear orders: clique-width 2 yellow ⊕ red

slide-29
SLIDE 29

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red

slide-30
SLIDE 30

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow

slide-31
SLIDE 31

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red

slide-32
SLIDE 32

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red

slide-33
SLIDE 33

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow

slide-34
SLIDE 34

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red

slide-35
SLIDE 35

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red

slide-36
SLIDE 36

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow

slide-37
SLIDE 37

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow Paths: clique-width 3

slide-38
SLIDE 38

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow Paths: clique-width 3 Trees: clique-width 3

slide-39
SLIDE 39

Examples

Linear orders: clique-width 2 yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow ⊕ red yellow ≤ red red → yellow Paths: clique-width 3 Trees: clique-width 3 Cographs: clique-width 2 Distance-hereditary graphs: clique-width 3 Graphs of tree-width k: clique-width 3 · 2k−1

slide-40
SLIDE 40

Bounded clique-width means simple

Many NP-complete problems are in P for graphs of bounded clique-width. Fixed-parameter tractable with clique-width as parameter: time f (k) · nc on inputs of size n and clique-width at most k, where f is some function, and c is an absolute constant. Hamiltonicity Is there a path in graph G that visits each node exactly once? 3-colorability Can nodes of the graph G be coloured so that each edge connects nodes of different colours?

slide-41
SLIDE 41

Courcelle’s theorem

Monadic second order logic (MSO) ϕ, ψ ::= R(x1, . . . , xr) | ¬ϕ | ϕ ∧ ψ | ϕ ∨ ψ | ∃x ϕ | ∀x ϕ | | ∃X ϕ | ∀X ϕ | X(x) 3-colorability ∃X1 ∃X2 ∃X3 ∀x X1(x) ∨ X2(x) ∨ X3(x) ∧ ∀x ∀y E(x, y) ⇒

  • i

¬(Xi(x) ∧ Xi(y))

Theorem (Courcelle)

For every k ∈ N and ϕ ∈ MSO one can construct an automaton recognizing k-derivations yielding models of ϕ.

slide-42
SLIDE 42

Courcelle’s theorem applied to parametrized complexity

Theorem (Courcelle)

For every k ∈ N and ϕ ∈ MSO one can construct an automaton recognizing k-derivations yielding models of ϕ.

Corollary

Each set of structures definable in MSO can be decided in polynomial time over graphs of bounded cliquewidth.

◮ Compute k-derivation e for the input structure (poly-time); ◮ construct the automaton A for k and the defining formula ϕ; ◮ run the automaton A on e.

slide-43
SLIDE 43

Courcelle’s theorem applied to static analysis

Theorem (Courcelle)

For every k ∈ N and ϕ ∈ MSO one can construct an automaton recognizing k-derivations yielding models of ϕ.

Corollary

For every k ∈ N, it is decidable if given ϕ ∈ MSO has a model of clique-width at most k.

◮ Construct the automaton A for k and the formula ϕ; ◮ test emptiness of the automaton A (poly-time).

slide-44
SLIDE 44

Datalog containment via bounded clique-width

[Boja´ nczyk, Murlak, Witkowski ’15]

Theorem

Let P, Q be monadic, linear Datalog programs without descendant. If P ∧ ¬Q is satisfiable, it is satisfiable in a data tree of clique-width at most 10 · P2.

Corollary

Containment for linear monadic Datalog programs without descendant is decidable.

◮ Rewrite monadic programs P, Q into ϕP, ϕQ ∈ MSO. ◮ Write ϕdatatree ∈ MSO saying that the structure is a data tree. ◮ Test satisfiability of ϕP ∧ ¬ϕQ ∧ ϕdatatree. ◮ For tight complexity, adjust Courcelle’s theorem to Datalog.

slide-45
SLIDE 45

Problems Old solutions New solution More problems with solutions Some problems without solutions

slide-46
SLIDE 46

Containment for downward Datalog

[Boja´ nczyk, Murlak, Witkowski ’15] A monadic Datalog program is downward if in all rules for S(x), all mentioned nodes are descendants of x.

Theorem

Let P, Q be downward Datalog programs. If P ∧ ¬Q is satisfiable, it is satisfiable in a data tree of clique-width at most 5 · P.

Corollary

Containment for downward Datalog programs is decidable.

slide-47
SLIDE 47

Non-mixing constraints

[Czerwi´ nski, David, Murlak, Parys ’16] In database systems, correctness of data is expressed with integrity constraints: ϕ(¯ x) ⇒ α∼(¯ x) and ϕ(¯ x) ⇒ α≁(¯ x) with ϕ ∈ UCQ(child, desc, Σ), α∼ ∈ UCQ(∼), α≁ ∈ UCQ(≁). Validity: Does each data tree of schema S satisfy set ∆ of non-mixing constraints? Entailment: Does each data tree of schema S that sastisfies ∆ also satisfies constraint δ?

Theorem

Both problems allow counter-examples of bounded clique-width.

slide-48
SLIDE 48

Problems Old solutions New solution More problems with solutions Some problems without solutions

slide-49
SLIDE 49

Open problems

Containment of Datalog programs

◮ in the presence of a schema; ◮ with sibling order.

Non-mixing constraints with

◮ free use of comparisons with constants; ◮ Skolem functions.