challenges for efficient query
play

Challenges for Efficient Query Evaluation on Structured - PowerPoint PPT Presentation

Challenges for Efficient Query Evaluation on Structured Probabilistic Data SUM2016 SEPTEMBER 23, 2016, NICE Antoine Amarilli, Silviu Maniu, Mikal Monet 2 A probabilistic database R S a d d e f c f e a e d a c e b e Q c


  1. Challenges for Efficient Query Evaluation on Structured Probabilistic Data SUM2016 SEPTEMBER 23, 2016, NICE Antoine Amarilli, Silviu Maniu, Mikaël Monet

  2. 2 A probabilistic database R S a d d e f c f e a e d a c e b e Q c e f

  3. 2 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66

  4. 2 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66 TID model

  5. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f

  6. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66

  7. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7).

  8. 4 Probabilistic query evaluation (PQE)

  9. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)

  10. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , Q ⊨𝐽 Pr(𝐽)

  11. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , Q ⊨𝐽 Pr(𝐽)  Problem: in general #P-hard

  12. 5 1) Approximate probability computation

  13. 5 1) Approximate probability computation  Monte-Carlo sampling

  14. 5 1) Approximate probability computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision.

  15. 5 1) Approximate probability computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision. ⇒ Not adequate for low probabilities.

  16. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

  17. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances

  18. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances

  19. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃ x,y R(x),T(x,y),S(y) is already #P-hard !

  20. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃ x,y R(x),T(x,y),S(y) is already #P-hard !  Criterion is to crisp

  21. 7 3) Restricting the shape of the instances  Bound the treewidth of instances by a constant.  Treewidth: mesure used to tell how far a graph is from being a tree

  22. 8 O(EXP(k ).|I|) O(|A|.|T| ) O(f(q,k ))

  23. 8 Instance I O(EXP(k ).|I|) of treewidth k O(|A|.|T| ) O(f(q,k ))

  24. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) O(f(q,k ))

  25. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, O(f(q,k )) int k

  26. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A

  27. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C

  28. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C Probability

  29. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Bool[X] Provenance circuit C Probability

  30. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  31. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  32. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  33. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  34. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  35. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  36. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  37. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T Are real datasets treelike ? O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  38. 11 Current work

  39. 11 Current work  From bottom-up tree automtata to alternating two-way automata

  40. 11 Current work  From bottom-up tree automtata to alternating two-way automata  Introduce Intensionally-Clique-Guarded Datalog (ICG- Datalog) parameterized by body-size

  41. 11 Current work  From bottom-up tree automtata to alternating two-way automata  Introduce Intensionally-Clique-Guarded Datalog (ICG- Datalog) parameterized by body-size  Provenance as a cyclic circuit ! (cycluit)

  42. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  43. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  44. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  45. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  46. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4} + negations (stratified)

  47. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C Probability

  48. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C Probability

  49. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C 2EXPTIME Upper-bound Probability

  50. Bad news… 14  We proved that: Path queries on tree instances (treewidth = 1) is already #P-hard. (reduction from #MONOTONE-2-SAT)  Still, we obtain a 2EXPTIME combined complexity upperbound

  51. 15 Treelike datasets

  52. 15 Treelike datasets  Transportation networks

  53. 15 Treelike datasets  Transportation networks  Partial decompositions

  54. 15 Treelike datasets  Transportation networks  Partial decompositions  Query-specific decompositions

  55. 16 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend