query answering is the most fundamental problem in db
play

Query answering is the most fundamental problem in DB Query Q Result - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z Three crucial problems for query


  1. Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z

  2. Three crucial problems for query answering Q(D) A C R A B SELECT R.A, S.C u x u y FROM R, S WHERE R.B = S.B v x u z S B C v y x y x z v z 1. Enumeration ( u , y ) , ( u , z ) , ( v , y ) , ( v , z ) 2. Uniform generation ( u , y ) ∶ 1 4 , ( u , z ) ∶ 1 4 , ( v , y ) ∶ 1 4 , ( v , z ) ∶ 1 4 3. Counting ∣ Q ( D )∣ = 4

  3. In this paper, we study log-space complexity classes We consider the class RelationNL and show that it has good algorithmic properties in terms of: Enumeration. Approximate counting. Approximate uniform generation. We consider the subclass RelationUL and show that it has better algorithmic properties in terms of: Constant delay enumeration (polynomial time preprocessing). Exact counting. Exact uniform generation. We show applications of these results in information extraction, graph databases, and among others.

  4. Efficient log-space classes for enumeration, counting, and uniform generation Marcelo Arenas Luis Alberto Croqueville Cristian Riveros Rajesh Jayaram PUC & IMFD Chile Carnegie Mellon University

  5. Outline The class RelationNL FPRAS for RelationNL Conclusions

  6. Outline The class RelationNL FPRAS for RelationNL Conclusions

  7. Relations as instances of problems Let Σ be a finite alphabet. Definitions A problem is a relation R ⊆ Σ ∗ × Σ ∗ . If ( x , y ) ∈ R , then x is an input and y is a solution . We restrict to p -relations R where for every ( x , y ) ∈ Σ ∗ × Σ ∗ : 1. if ( x , y ) ∈ R , then y is of polynomial size with respect to x . 2. ( x , y ) ∈ R can be verified in polynomial time.

  8. Three main problems associated to a p -relation Given an input x we denote by W R ( x ) the set of solutions or witnesses : W R ( x ) = { y ∈ Σ ∗ ∣ ( x , y ) ∈ R } Problem: Enum ( R ) A word x ∈ Σ ∗ Input: Output: Enumerate all y ∈ W R ( x ) without repetitions Problem: Count ( R ) Input: A word x ∈ Σ ∗ Output: The size ∣ W R ( x )∣ Problem: Gen ( R ) A word x ∈ Σ ∗ Input: Output: Generate uniformly at random a word in W R ( x ) .

  9. A log-space complexity class: RelationNL Non-deterministic NL transducer M q 3 ⋱ q 2 q n q 1 q 0 Read/Write ⊢ W O R K log-space Read only ⊢ I N P U T T A P E Write only ⊢ O U T P U T T A P E

  10. A log-space complexity class: RelationNL Given an NL-transducer M and an input x , we define its set of outputs : M ( x ) = { y ∈ Σ ∗ ∣ there exists a run of M on x that halts in an accepting state with y in the output } Definition of RelationNL A relation R is in RelationNL iff there exists an NL-transducer M s.t.: R = {( x , y ) ∈ Σ ∗ × Σ ∗ ∣ y ∈ M ( x )}

  11. Main results for RelationNL Theorem If R ∈ RelationNL then: 1. Enum ( R ) can be solved with polinomial delay . 2. Count ( R ) admits an FPRAS (fully polynomial-time randomized approximation scheme). 3. Gen ( R ) admits a polynomial time “Las Vegas” uniform generator . We introduce a subclass RelationUL that has good properties w.r.t. constant delay enumeration, exact counting, and uniform gen.

  12. Outline The class RelationNL FPRAS for RelationNL Conclusions

  13. A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b How many words of length n are accepted by a non-deterministic finite state automaton (NFA)?

  14. A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b Problem: # NFA A NFA A = ( Q , Σ , ∆ , q 0 , F ) and 0 n . Input: Output: ∣{ w ∣ w ∈ L(A) and ∣ w ∣ = n }∣ . Proposition For every R ∈ RelationNL , there exists a parsimonious reduction from Count ( R ) to # NFA . If we find an FPRAS for # NFA , we have an FPRAS for every R ∈ RelationNL .

  15. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a a b b b b b s 0 s 1 s 2 s 3 ⋯ s n s b b b b b b b b b b b b b ⋯ a t 0 t 1 t 2 t 3 t n t a a a a a

  16. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a a b b b b b s 0 s 1 s 2 s 3 ⋯ s n s b b b b b b b b b b b b b ⋯ a t 0 t 1 t 2 t 3 t n t a a a a a

  17. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a b b b s 1 s 2 s 3 ⋯ s b b b b b b b b b ⋯ a t 2 t 3 t n t a a a The problem is reduced to approximate the number of label-paths from the initial state to the final states.

  18. Main ideas of FPRAS: languages at level k Level- k r k ⋯ . . . ⋯ s k ⋯ ⋯ ⋯ t k Let Q k be the set of states at level k . For each P ⊆ Q k : L( P ) = all words that reach any state in P from the initial state. We want to approximate the size ∣L( P )∣ for any P ⊆ Q k . . . . we want to approximate ∣L( F )∣ where F ⊆ Q n .

  19. Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k r k ⋯ . . . N ( q ) ∶ N ( q ) ∼ ∣ L ( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ s k ⋯ S ( q ) ∶ S ( q ) ⊆ L ( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : ∣ L ( q )∣ ⋅ ∣ L ( q ) / L ({ p ∈ P ∣ p < q })∣ ∣ L ( P )∣ = ∑ ∣ L ( q )∣ q ∈ P N ( q ) ⋅ ∣ S ( q ) / L ({ p ∈ P ∣ p < q })∣ ∼ ∑ ∣ S ( q )∣ q ∈ P This approximation can be computed in poly-time from N ( q ) and S ( q )

  20. Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k ⋯ r k . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ ⋯ s k S ( q ) ∶ S ( q ) ⊆ L( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : N ( q ) ⋅ ∣ S ( q ) / L({ p ∈ P ∣ p < q })∣ ∣L( P )∣ ∼ N ( P ) = ∑ ∣ S ( q )∣ q ∈ P For every P ⊆ Q k and q ∈ Q k − P (by Hoeffding’s inequality): ∣ ∣ S ( q ) / L( P )∣ − ∣L( q ) / L( P )∣ ∣ ≤ ǫ ∣ S ( q )∣ ∣L( q )∣ with (exponentially) high prob.

  21. Main ideas of FPRAS: update the sketch to the next level Level- k + 1 Level- k For every q ∈ Q k ⋯ a r k r k + 1 . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ a an ( 1 ± ǫ ) -approximation. ⋯ ⋯ b s k s k + 1 S ( q ) ∶ S ( q ) ⊆ L( q ) b b uniform sample of poly-size. ⋯ ⋯ t k t k + 1 a For every q ∈ Q k + 1 let P c = { p ∈ Q k ∣ ( p , c , q ) ∈ ∆ } for c ∈ { a , b } : N ( q ) = N ( P a ) + N ( P b ) To generate S ( q ) we use a technique from Jerrum, Valiant, and Vazirani for generating a uniform sample by using the ( 1 ± ǫ ) -approximations: { N ( P )} P ⊆ Q k ′ for every k ′ ≤ k .

  22. Outline The class RelationNL FPRAS for RelationNL Conclusions

  23. Conclusions and future work 1. We provide complexity classes that has good properties in terms of enumeration , counting , and uniform generation . 2. RelationNL is the first complexity class with a simple definition based on TM and where each problem admits an FPRAS . Future work: 1. Find an FPRAS for #NFA that can be used in practice with better polynomial factors and constants. 2. Find an FPRAS for #CFG. Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend