Static analysis over tree-structured data using graph decompositions - PowerPoint PPT Presentation

Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko� laj Boja´ nczyk, Wojciech Czerwi´ nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam Witkowski. ALCOP 2017 Glasgow, Scotland

Problems Old solutions New solution More problems with solutions Some problems without solutions

Data trees a , 2 c , 7 a , 1 c , 3 b , 7 b , 0 a , 1 a , 5 trees finite, unranked, ordered labels a , b , c , . . . from a finite alphabet (tags) data values 0 , 1 , 2 , . . . from an infinite data domain (contents)

Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed.

Schemas describe allowed shapes of data trees Define several types of trees, each specified (recursively) by ◮ the label of the root, ◮ possible sequences of immediate subtree types (regexp); and choose some of the types as allowed. Example: a -only path from root to leaf, b ’s elsewhere ◮ type τ : root label a , immediate subtree types σ ∗ τσ ∗ + ǫ ; ◮ type σ : root label b , immediate subtree types σ ∗ ; ◮ choose: τ .

Conjunctive queries over data trees a a , 2 c , 7 a , 1 − → c c , 3 b , 7 b , 0 a , 1 a , 5 a ∃ x 1 · · · ∃ x 5 child ( x 1 , x 2 ) ∧ child ( x 2 , x 3 ) ∧ child ( x 3 , x 4 ) ∧ ∧ desc ( x 1 , x 5 ) ∧ desc ( x 5 , x 4 ) ∧ ∧ a ( x 1 ) ∧ a ( x 4 ) ∧ c ( x 5 ) ∧ ∧ x 2 ∼ x 3

Datalog on data trees a c p ( x ) ← a ( x ) ∧ a a desc ( x , y ) ∧ c ( y ) ∧ x ∼ y ∧ c c . . . child ( x , z ) ∧ p ( z ) a c b b p ( x ) ← b ( x ) extensional predicates child , desc , ∼ , a , b , c , . . . ; intensional predicates defined recursively using conjunctive queries; monadic only unary intensional predicates; linear at most one intensional atom per rule.

Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries.

Static analysis problems Satisfiability: Is query P (CQ, UCQ, Datalog, FO, etc.) satisfied in some data tree (conforming to given schema)? Equivalence: Are queries P , Q equivalent on all data trees? Containment: Does P imply Q on all data trees? The staple of data management: query optimization, consistency tests, evaluation modulo constraints, constraint entailment, . . . By Trakhtenbrot’s theorem, all undecidable for FO queries. P sat iff not P ⇔⊥ iff not P ⇒⊥ P ∧¬ Q , Q ∧¬ P unsat iff P ⇔ Q iff P ⇒ Q , Q ⇒ P P ∧¬ Q unsat iff P ⇔ P ∧ Q iff P ⇒ Q

Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm:

Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P .

Containment of CQs over arbitrary structures [Chandra, Merlin ’77] Def: Q ∈ CQ A Q : universe Var Q , � relations given by atoms of Q A | = Q iff exists h : A Q → A Fact: P ⇒ Q iff exists g : A Q → A P Thm: A Q A P A ( ⇐ ) If g : A Q → A P and h : A P → A , then h ◦ g : A Q → A . ( ⇒ ) A P | = P and P ⇒ Q , so A P | = Q . Exists h : A Q → A P . To decide containment, test existence of a homomorphism.

Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b

Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below.

Containment for UCQs over trees without data [Miklau, Suciu ’04] Each UCQ is equivalent to a union of tree-shaped CQs: a a c b a c ≡ ∨ b c b For a tree shaped CQ π build an equivalent tree automaton: ◮ it computes bottom-up the set of matched subtrees of π ; ◮ knowing which subtrees of π match at the children of node v or strictly below, one can tell which match at v or strictly below. Tree automata are effectively closed under Boolean combinations. Test emptiness of the automaton corresponding to P ∧ ¬ Q .

Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q .

Containment for UCQs over data trees [Bj¨ orklund, Martens, Schwentick ’08] Can restrict to trees with data values c 1 , . . . , c � P � and distinct nulls. ◮ Let T be a tree satisfying P and not Q . ◮ P touches ≤ � P � data values in T ; replace with c 1 , . . . , c � P � . ◮ In each node not touched by P put a unique fresh data value. ◮ The resulting tree T ′ still satisfies P and not Q . In such trees, x ∼ y holds iff either x = y or x ∼ c i and y ∼ c i . By considering all possibilities, replace P , Q with P ′ , Q ′ using only x = y , x ∼ c i , y ∼ c i . Check containment over the finite alphabet Σ × {⊥ , c 1 , . . . , c n } .

Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated).

Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before?

Equivalence for Datalog Equivalence for Datalog is undecidable: ◮ with descendant [Abiteboul, Bourhis, Muscholl, Wu 2013] ◮ for non-linear programs [Mazowiecki, Murlak, Witkowski 2014] ◮ for non-monadic programs (descendant is easily simulated). Theorem (Mazowiecki, Murlak, Witkowski 2014) Equivalence for linear monadic Datalog without desc is decidable. Can’t we restrict reused datavalues like before? ◮ Let T be a tree satisfying P and not Q . ◮ Then T satisfies some CQ P 0 , an unravelling of P . ◮ P 0 touches ≤ � P 0 � data values in T , like before, ◮ but � P 0 � can be arbitrarily large...

Example . . . c , 1 c , 8 a b a a b b N = 3 a , 1 a , 3 a , 5 a , 7 b , 2 b , 4 b , 6 b , 8 P ← DOWN 0 ( x ) DOWN i ( x ) ← child ( x , y ) ∧ a ( y ) ∧ DOWN i +1 ( y ) DOWN N ( x ) ← UP N ( x ) ∧ (N+1)-parent ( x , y ) ∧ child ( y , z ) ∧ c ( z ) ∧ x ∼ z UP i ( x ) ← a ( x ) ∧ parent ( x , y ) ∧ child ( y , z ) ∧ b ( z ) ∧ DOWN i ( z ) UP i ( x ) ← b ( x ) ∧ parent ( x , y ) ∧ UP i − 1 ( y ) UP 0 ( x ) ← true Q ← x ∼ y ∧ i-parent ( x , x ′ ) ∧ i-parent ( y , y ′ ) ∧ a ( x ′ ) ∧ b ( y ′ )

Clique-width Instead of processing structures, process their hierarchical decompositions (derivations). Construct (derive) coloured structures using operations: i – create a new node of colour i ; R ( i 1 , . . . , i r ) – add to R all tuples of nodes with colours ( i 1 , . . . , i r ); i �→ j – change colour i to j ; ⊕ – take disjoint union of two structures. clique-width( A ) = least number of colours sufficient to construct A

Examples Linear orders: clique-width 2 yellow

Examples Linear orders: clique-width 2 ⊕ yellow red

Examples Linear orders: clique-width 2 yellow ≤ red ⊕ yellow red

Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ yellow red

Examples Linear orders: clique-width 2 ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

Examples Linear orders: clique-width 2 yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

Examples Linear orders: clique-width 2 red �→ yellow yellow ≤ red ⊕ red �→ yellow red yellow ≤ red ⊕ yellow red

Static analysis over tree-structured data using graph decompositions - PowerPoint PPT Presentation

Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko laj Boja nczyk, Wojciech Czerwi nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

INDEXING - 1 Tree-Structured Indices Tree-structured indexing techniques support both

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Minimal Spanning Trees Spanning Tree Assume you have an undirected graph G = (V,E)

Static and Method Overloading static One per class, not per object static variables

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Data and Analysis Part I Structured Data Ian Stark January 2011 Part I: Structured Data

A Combinatorial Language for Put-based Bidirectional Programming Hugo Pacheco National Institute

Dynamic Coalgebraic Modalities Raul Andres Leal 1 & Helle Hvid Hansen 2 1 ILLC Universiteit

A Monadic Approach to Certified Exact Real Arithmetic Russell OConnor Radboud University

Monad Education Supported by Visualizations Tim Steenvoorden Jurrin Stutterheim Erik

Decision Procedures in Verification Decision Procedures (1) 5.12.2013 Viorica

Mechanized proofs in higher-order separation logic Robbert Krebbers Delft University of

The Redistributive Effects of Monetary Policy Daniel Andrei (UCLA) Bernard Herskovic (UCLA)

Implications The 2018 Peston Lecture Silvana Tenreyro Monetary Policy Committee, Bank of England

Static analysis over tree-structured data using graph decompositions - PowerPoint PPT Presentation

Static analysis over tree-structured data using graph decompositions Filip Murlak University of Warsaw, Poland Contains joint work with Miko laj Boja nczyk, Wojciech Czerwi nski, Claire David, Filip Mazowiecki, Pawel Parys, and Adam

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

INDEXING - 1 Tree-Structured Indices Tree-structured indexing techniques support both

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

Minimal Spanning Trees Spanning Tree Assume you have an undirected graph G = (V,E)

Static and Method Overloading static One per class, not per object static variables

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

XL1A: Graph Nominal Frequency Data Using Excel2013 3/10/2017 V0E XL1A: V0E XL1A: V0E Graph

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Static and dynamic verification Static and dynamic V&amp;V Software inspections Concerned

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Announcements Final Examples Tree-Structured Data def tree(label, branches=[]): A tree can

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Data and Analysis Part I Structured Data Ian Stark January 2011 Part I: Structured Data

A Combinatorial Language for Put-based Bidirectional Programming Hugo Pacheco National Institute

Dynamic Coalgebraic Modalities Raul Andres Leal 1 &amp; Helle Hvid Hansen 2 1 ILLC Universiteit

A Monadic Approach to Certified Exact Real Arithmetic Russell OConnor Radboud University

Monad Education Supported by Visualizations Tim Steenvoorden Jurrin Stutterheim Erik

Decision Procedures in Verification Decision Procedures (1) 5.12.2013 Viorica

Mechanized proofs in higher-order separation logic Robbert Krebbers Delft University of

The Redistributive Effects of Monetary Policy Daniel Andrei (UCLA) Bernard Herskovic (UCLA)

Implications The 2018 Peston Lecture Silvana Tenreyro Monetary Policy Committee, Bank of England

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Static and dynamic verification Static and dynamic V&V Software inspections Concerned

Dynamic Coalgebraic Modalities Raul Andres Leal 1 & Helle Hvid Hansen 2 1 ILLC Universiteit