Query answering is the most fundamental problem in DB Query Q Result - - PowerPoint PPT Presentation

query answering is the most fundamental problem in db
SMART_READER_LITE
LIVE PREVIEW

Query answering is the most fundamental problem in DB Query Q Result - - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z Three crucial problems for query


slide-1
SLIDE 1

Query answering is the most fundamental problem in DB

R A B u x v x S B C x y x z Database D Q(D) A C u y u z v y v z Result Q(D) SELECT R.A, S.C FROM R, S WHERE R.B = S.B Query Q

slide-2
SLIDE 2

Three crucial problems for query answering

R A B u x v x S B C x y x z

Q(D) A C u y u z v y v z

SELECT R.A, S.C FROM R, S WHERE R.B = S.B

  • 1. Enumeration

(u, y), (u, z), (v, y), (v, z)

  • 2. Uniform generation

(u, y) ∶ 1 4, (u, z) ∶ 1 4, (v, y) ∶ 1 4, (v, z) ∶ 1 4

  • 3. Counting

∣Q(D)∣ = 4

slide-3
SLIDE 3

In this paper, we study log-space complexity classes

We consider the class RelationNL and show that it has good algorithmic properties in terms of: Enumeration. Approximate counting. Approximate uniform generation. We consider the subclass RelationUL and show that it has better algorithmic properties in terms of: Constant delay enumeration (polynomial time preprocessing). Exact counting. Exact uniform generation. We show applications of these results in information extraction, graph databases, and among others.

slide-4
SLIDE 4

Efficient log-space classes for enumeration, counting, and uniform generation

Marcelo Arenas Luis Alberto Croqueville Cristian Riveros PUC & IMFD Chile Rajesh Jayaram Carnegie Mellon University

slide-5
SLIDE 5

The class RelationNL FPRAS for RelationNL Conclusions

Outline

slide-6
SLIDE 6

The class RelationNL FPRAS for RelationNL Conclusions

Outline

slide-7
SLIDE 7

Relations as instances of problems

Let Σ be a finite alphabet.

Definitions

A problem is a relation R ⊆ Σ∗ × Σ∗. If (x, y) ∈ R, then x is an input and y is a solution. We restrict to p-relations R where for every (x, y) ∈ Σ∗ × Σ∗:

  • 1. if (x, y) ∈ R, then y is of polynomial size with respect to x.
  • 2. (x, y) ∈ R can be verified in polynomial time.
slide-8
SLIDE 8

Three main problems associated to a p-relation

Given an input x we denote by WR(x) the set of solutions or witnesses: WR(x) = {y ∈ Σ∗ ∣ (x, y) ∈ R} Problem: Enum(R) Input: A word x ∈ Σ∗ Output: Enumerate all y ∈ WR(x) without repetitions Problem: Count(R) Input: A word x ∈ Σ∗ Output: The size ∣WR(x)∣ Problem: Gen(R) Input: A word x ∈ Σ∗ Output: Generate uniformly at random a word in WR(x).

slide-9
SLIDE 9

A log-space complexity class: RelationNL

⊢ W O R K ⊢ I N P U T T A P E ⊢ O U T P U T T A P E q0 q1 q2 q3 ⋱ qn Non-deterministic log-space Read only Read/Write Write only NL transducer M

slide-10
SLIDE 10

A log-space complexity class: RelationNL

Given an NL-transducer M and an input x, we define its set of outputs: M(x) = {y ∈ Σ∗ ∣ there exists a run of M on x that halts in an accepting state with y in the output}

Definition of RelationNL

A relation R is in RelationNL iff there exists an NL-transducer M s.t.: R = {(x, y) ∈ Σ∗ × Σ∗ ∣ y ∈ M(x)}

slide-11
SLIDE 11

Main results for RelationNL

Theorem

If R ∈ RelationNL then:

  • 1. Enum(R) can be solved with polinomial delay.
  • 2. Count(R) admits an FPRAS

(fully polynomial-time randomized approximation scheme).

  • 3. Gen(R) admits a polynomial time “Las Vegas” uniform generator.

We introduce a subclass RelationUL that has good properties w.r.t. constant delay enumeration, exact counting, and uniform gen.

slide-12
SLIDE 12

The class RelationNL FPRAS for RelationNL Conclusions

Outline

slide-13
SLIDE 13

A complete problem for RelationNL

r s t a b b a b a 000000000 ⋯00 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

n

How many words of length n are accepted by a non-deterministic finite state automaton (NFA)?

slide-14
SLIDE 14

A complete problem for RelationNL

r s t a b b a b a 000000000 ⋯00 ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ

n

Problem: #NFA Input: A NFA A = (Q, Σ, ∆, q0, F) and 0n. Output: ∣{w ∣ w ∈ L(A) and ∣w∣ = n}∣.

Proposition

For every R ∈ RelationNL, there exists a parsimonious reduction from Count(R) to #NFA . If we find an FPRAS for #NFA, we have an FPRAS for every R ∈ RelationNL.

slide-15
SLIDE 15

Main ideas of FPRAS: Unfold the NFA until level n

r s t a b b a b a r0 s0 t0 r1 s1 t1 a b b a b a r2 s2 t2 a b b a b a r3 s3 t3 a b b a b a ⋯ ⋯ ⋯ a b b a b a rn sn tn a b b a b a n-levels

slide-16
SLIDE 16

Main ideas of FPRAS: Unfold the NFA until level n

r s t a b b a b a r0 s0 t0 r1 s1 t1 a b b a b a r2 s2 t2 a b b a b a r3 s3 t3 a b b a b a ⋯ ⋯ ⋯ a b b a b a rn sn tn a b b a b a n-levels

slide-17
SLIDE 17

Main ideas of FPRAS: Unfold the NFA until level n

r s t a b b a b a r0 r1 s1 a a r2 s2 t2 a b a b r3 s3 t3 a b b a b a ⋯ ⋯ ⋯ a b b a b a rn tn b a n-levels The problem is reduced to approximate the number of label-paths from the initial state to the final states.

slide-18
SLIDE 18

Main ideas of FPRAS: languages at level k

. . . ⋯ ⋯ rk sk tk ⋯ ⋯ ⋯ Level-k Let Qk be the set of states at level k. For each P ⊆ Qk: L(P) = all words that reach any state in P from the initial state. We want to approximate the size ∣L(P)∣ for any P ⊆ Qk. . . . we want to approximate ∣L(F)∣ where F ⊆ Qn.

slide-19
SLIDE 19

Main ideas of FPRAS: a sketch for each level

. . . ⋯ ⋯ rk sk tk ⋯ ⋯ ⋯ Level-k N(q) ∶ N(q) ∼ ∣L(q)∣ an (1 ± ǫ)-approximation. S(q) ∶ S(q) ⊆ L(q) uniform sample of poly-size. For every q ∈ Qk For every P ⊆ Qk and for any total order < of P: ∣L(P)∣ = ∑

q∈P

∣L(q)∣ ⋅ ∣L(q) / L({p ∈ P ∣ p < q})∣ ∣L(q)∣ ∼ ∑

q∈P

N(q) ⋅ ∣S(q) / L({p ∈ P ∣ p < q})∣ ∣S(q)∣ This approximation can be computed in poly-time from N(q) and S(q)

slide-20
SLIDE 20

Main ideas of FPRAS: a sketch for each level

. . . ⋯ ⋯ rk sk tk ⋯ ⋯ ⋯ Level-k N(q) ∶ N(q) ∼ ∣L(q)∣ an (1 ± ǫ)-approximation. S(q) ∶ S(q) ⊆ L(q) uniform sample of poly-size. For every q ∈ Qk For every P ⊆ Qk and for any total order < of P: ∣L(P)∣ ∼ N(P) = ∑

q∈P

N(q) ⋅ ∣S(q) / L({p ∈ P ∣ p < q})∣ ∣S(q)∣ For every P ⊆ Qk and q ∈ Qk − P (by Hoeffding’s inequality): ∣ ∣S(q) / L(P)∣ ∣S(q)∣ − ∣L(q) / L(P)∣ ∣L(q)∣ ∣ ≤ ǫ with (exponentially) high prob.

slide-21
SLIDE 21

Main ideas of FPRAS: update the sketch to the next level

. . . ⋯ ⋯ rk sk tk rk+1 sk+1 tk+1 a b b a b a ⋯ ⋯ ⋯ Level-k Level-k + 1 N(q) ∶ N(q) ∼ ∣L(q)∣ an (1 ± ǫ)-approximation. S(q) ∶ S(q) ⊆ L(q) uniform sample of poly-size. For every q ∈ Qk For every q ∈ Qk+1 let Pc = {p ∈ Qk ∣ (p, c, q) ∈ ∆} for c ∈ {a, b}: N(q) = N(Pa) + N(Pb) To generate S(q) we use a technique from Jerrum, Valiant, and Vazirani for generating a uniform sample by using the (1 ± ǫ)-approximations: {N(P)}P⊆Qk′ for every k′ ≤ k.

slide-22
SLIDE 22

The class RelationNL FPRAS for RelationNL Conclusions

Outline

slide-23
SLIDE 23

Conclusions and future work

  • 1. We provide complexity classes that has good properties

in terms of enumeration, counting, and uniform generation.

  • 2. RelationNL is the first complexity class with a simple definition

based on TM and where each problem admits an FPRAS. Future work:

  • 1. Find an FPRAS for #NFA that can be used in practice

with better polynomial factors and constants.

  • 2. Find an FPRAS for #CFG.

Thanks!