a theory of regular queries
play

A Theory of Regular Queries Moshe Y. Vardi Rice University Theory - PDF document

A Theory of Regular Queries Moshe Y. Vardi Rice University Theory of Regular Languages, I Regular Languages - Robust Definability : Regular expressions DFA NFA 2NFA AFA 2AFA Regular grammar MSO . . . But :


  1. A Theory of Regular Queries Moshe Y. Vardi Rice University

  2. Theory of Regular Languages, I Regular Languages - Robust Definability : • Regular expressions • DFA • NFA • 2NFA • AFA • 2AFA • Regular grammar • MSO • . . . But : Succinctness Gaps : E.g., NFA < RE, NFA < DFA, AFA < NFA, MSO < AFA, . . . 1

  3. NFA A = (Σ , S, S 0 , ρ, F ) • Alphabet : Σ • States : S • Initial states : S 0 ⊆ S • Nondeterministic transition function : ρ : S × Σ → 2 S • Accepting states : F ⊆ S Input word : a 0 , a 1 , . . . , a n − 1 Run : s 0 , s 1 , . . . , s n • s 0 ∈ S 0 • s i +1 ∈ ρ ( s i , a i ) for i ≥ 0 Acceptance : s n ∈ F Recognition : L ( A ) – words accepted by A . 1 ✲ ✓✏ ✲ • • Example : ✛ 0 – ends with 1’s ✒✑ ✻ ✻ ✂ ✁ ✂ ✁ 0 1 2

  4. Theory of Regular Languages, II Regular Languages - Robust Closure : • Union • Intersection • Complement • Concatenation • Kleene star • Reverse • Homomorphism • Inverse homomorpism • . . . 3

  5. NFA Intersection Given: • A 1 = (Σ , S 1 , S 1 0 , ρ 1 , F 1 ) • A 2 = (Σ , S 2 , S 2 0 , ρ 2 , F 2 ) . Define: A 1 ∩ A 2 = (Σ , S 1 × S 2 , S 1 0 , ρ, F 1 × F 2 ) , 0 × S 2 where: • ρ (( s, t ) , a ) = { ( s ′ , t ′ ) : s ∈ ρ 1 ( s, a ) and t ′ ∈ ρ 2 ( t, a ) } 4

  6. NFA Complementation Run Forest of A on w : • Roots: elements of S 0 . • Children of s at level i : elements of ρ ( s, a i ) . • Rejection : no leaf is accepting. Key Observation : collapse forest into a DAG – at most one copy of a state at a level; width of DAG is at most | S | . Subset Construction Rabin-Scott, 1959: • A c = (Σ , 2 S , { S 0 } , ρ c , F c ) • F c = { T : T ∩ F = ∅} • ρ c ( T, a ) = � t ∈ T ρ ( t, a ) • L ( A c ) = Σ ∗ − L ( A ) 5

  7. Complementation Blow-Up A = (Σ , S, S 0 , ρ, F ) , | S | = n A c = (Σ , 2 S , { S 0 } , ρ c , F c ) Blow-Up : 2 n upper bound Can we do better ? Lower Bound : 2 n Sakoda-Sipser 1978, Birget 1993 L n = (0 + 1) ∗ 1(0 + 1) n − 1 0(0 + 1) ∗ • L n is easy for NFA • L n is hard for NFA 6

  8. Theory of Regular Languages, III Regular Languages - Robust Decidability : Emptiness : L ( A ) = ∅ Nonemptiness Problem : Decide if given A is nonempty. NFA Nonemptiness : Directed Graph G A = ( S, E ) of NFA A = (Σ , S, S 0 , ρ, F ) : • Nodes : S • Edges : E = { ( s, t ) : t ∈ ρ ( s, a ) for some a ∈ Σ } Lemma : A is nonempty iff there is a path in G A from S 0 to F . • Decidable in time linear in size of A , using breadth- first search or depth-first search . • Complexity : NLOGSPACE-complete. 7

  9. NFA Containment Containment : L ( A 1 ) ⊆ L ( A 2 ) Lemma : L ( A 1 ) ⊆ L ( A 2 ) iff A 1 ∩ A c 2 is empty. • Decidable in exponential time. • Complexity : PSPACE-complete [Stockmeyer&Meyer, 1973] • Result holds also for RE containment. 8

  10. Database Query Languages • Standard database query languages (e.g., SQL 2.0) are essentially 1st-order. • Aho&Ullman, 1979: 1st-order languages are weak – add recursion • Gallaire&Minker,1978: add recursion via logic programs • SQL 3.0, 1999: recursion added Expressiveness/complexity trade-off : • 1st-order queries: Data complexity – LOGSPACE • Recursive queries: Data complexity – PTIME 9

  11. Datalog Datalog [Maier&Warren, 1988]: • Function-free logic programs • Existential, positive fixpoint logic • Select-project-join-union-recurse queries Example : Transitive Closure Path ( x, y ) : − Edge ( x, y ) Path ( x, y ) : − Path ( x, z ) , Path ( z, y ) Example : Impressionable Shopper Buys ( x, y ) : − Trendy ( x ) , Buys ( z, y ) Buys ( x, y ) : − Likes ( x, y ) 10

  12. Query Containment, I Query Optimization : Given Q , find Q ′ such that: • Q ≡ Q ′ • Q ′ is “easier” than Q Query Containment : Q 1 ⊑ Q 2 if Q 1 ( B ) ⊆ Q 2 ( B ) for all databases B . Fact : Q ≡ Q ′ iff Q ⊑ Q ′ and Q ′ ⊑ Q Consequence : Query containment is a key database problem. 11

  13. Query Containment, II Other applications: • query reuse • query reformulation • information integration • cooperative query answering • integrity checking • . . . Consequence : Query containment is the fundamental database-reasoning problem. 12

  14. Query Containment, III Decidability of Query Containment : • SQL : undecidable – Folk Theorem (unsolvability of FO) – Poor theory and practice of optimization • SPJU Queries: decidable – Chandra&Merlin–1977, Sagiv&Yannakakis–1982 – Rich theory and practice of optimization Select-Project-Join-Union Queries: • Existential positive FO : conjunction, disjunction, existental quantification • Covers the vast majority of real-life database queries Example : Triangle ( x, y ) : − Edge ( x, y ) , Edge ( y, z ) , Edge ( z, x ) 13

  15. Query Containment, IV Datalog Containment : • Complexity: undecidable – Shmueli–1987 - easy reduction from CFG containment • Difficult theory and practice of optimization Unfortunately , most decision problems involving Datalog are undecidable - very few interesting, well-behaved fragments. Reminder : Datalog=SPJU+Recursion Question : Can we limit recursion to recover decidability? 14

  16. 1990s: Graph Databases WWW : • Nodes • Edges • Labels Semistructured Data : WWW, SGML documents, library catalogs, XML documents, Meta data, . . . . Graph Databases : ( D, E, λ ) • D - nodes • E ⊆ D 2 - edges • λ : E → Λ – labels (alt., also node labels) 15

  17. Figure 1: Graph Database 16

  18. Path Queries Active Research Topic : What is the right query language for graph databases? (“No SQL”) Basic Element of all proposals : path queries • Q ( x, y ) : − x L y • L : formal language over labels l 1 · · · l k · b • a · • Q ( a, b ) holds if l 1 · · · l k ∈ L Example : Regular Path Query Q ( x, y ) : − x ( Wing · Part + · Nut ) y 17

  19. Regular Path Queries Observations : • A fragment of Transitive-Closure Logic (FO+TC) • A fragment of binary Datalog – Concatenation : E ( x, y ) : − E 1 ( x, z ) , E 2 ( z, y ) – Union : E ( x, y ) : − E 1 ( x, y ) E ( x, y ) : − E 2 ( x, y ) – Transitive Closure : P ( x, y ) : − E ( x, z ) P ( x, y ) : − E ( x, z ) , E ( z, y ) 18

  20. Path-Query Containment Q 1 ( x, y ) : − x L 1 y Q 2 ( x, y ) : − x L 2 y Language-Theoretic Lemma 1 : Q 1 ⊑ Q 2 iff L 1 ⊆ L 2 Proof : Consider a database l 1 · · · l k · b with l 1 · · · l k ∈ L 1 a · Corollary : Path-Query Containment is • undecidable for context-free path queries • PSPACE-complete for regular path queries. Containment : PSPACE-complete via RE containment 19

  21. Two-Way RPQs Extended Alphabet : Λ − = { a − : a ∈ Λ } Λ ′ = Λ ∪ Λ − Inverse Roles : Part ( x, y ) : y part of x Part − ( x, y ) : x part of y Example : (1 / 2) ∗ Siblings Q ( x, y ) : − x [( father − · father ) + ( mother − · mother )] + y Containment : Use 2NFA? • Hopcroft and Ullman, 1979: 2DFA • Hopcroft, Motwani and Ullman, 2000: ??? 20

  22. 2NFA A = (Σ , S, S 0 , ρ, F ) • Σ – finite alphabet • S – finite state set • S 0 ⊆ S – initial states • F ⊆ S – final states • ρ : S × Σ → 2 S ×{− 1 , 0 , +1 } – transition function Theorem : Rabin&Scott, Shepherdson, 1959 2NFA ≡ 1NFA 21

  23. 2RPQ Containment Difficulties : • 2NFA → 1NFA: exponential blow-up – Consequence : Doubly exponential complementation • Difference between query and language containment – Q 1 ( x, y ) : − x Parent y Q 2 ( x, y ) : − x Parent · Parent − · Parent y – Q 1 ⊑ Q 2 but L ( Parent ) �⊆ L ( Parent · Parent − · Parent ) 22

  24. Back to Basics: 2NFA → 1NFA Theorem : Vardi, 1988 Let A = (Σ , S, S 0 , ρ, F ) be a 2NFA. There is a 1NFA A c such that • L ( A c ) = Σ ∗ − L ( A ) • || A c || ∈ 2 O ( || A || ) Proof : Guess a subset-sequence counterexample a 0 · · · a k − 1 �∈ L ( A ) iff there is a sequence T 0 , T 1 , · · · , T k of subsets of S such that 1. S 0 ⊆ T 0 and T k ∩ F = ∅ . 2. If s ∈ T i and ( t, +1) ∈ ρ ( s, a i ) , then t ∈ T i +1 , for 0 ≤ i < k . 3. If s ∈ T i and ( t, 0) ∈ ρ ( s, a i ) , then t ∈ T i , for 0 ≤ i < k . 4. If s ∈ T i and ( t, − 1) ∈ ρ ( s, a i ) , then t ∈ T i − 1 , for 0 < i ≤ k . 23

  25. Foldings Definition : Let u, v ∈ Λ ′∗ . We say that u folds onto v , denoted u ❀ v , if u can be “folded” onto v , e.g., abb − bc ❀ abc. a b b b c a b c Pictorially, → · → · ← · → · → ❀ → · → · → Definition : Let E be an RE over Λ . Then fold ( E ) = { v : u ❀ v, u ∈ L ( E ) } . Language-Theoretic Lemma 2 : Let Q 1 ( x, y ) : − x E 1 y Q 2 ( x, y ) : − x E 2 y be 2RPQs. Then Q 1 ⊑ Q 2 iff L ( E 1 ) ⊆ fold ( E 2 ) . 24

  26. 2RPQ containment Theorem : Let E be an RE over Λ ′ . There is a 2NFA ˜ A E such that • L ( ˜ A E ) = fold ( E ) • || ˜ A E || ∈ O ( || E || ) Containment Q 1 ( x, y ) : − x E 1 y Q 2 ( x, y ) : − x E 2 y TFAE • Q 1 ⊑ Q 2 • L ( E 1 ) ⊆ fold ( E 2 ) . • L ( E 1 ) ⊆ L ( ˜ A E 2 ) . • L ( E 1 ) ∩ L ( ˜ A c E 2 ) = ∅ • L ( A E 1 ∩ ˜ A c E 2 ) = ∅ Bottom-line : 2RPQ containment is PSPACE- complete. 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend