probabilistic query
play

Probabilistic Query Evaluation on Bounded- Treewidth Instances - PowerPoint PPT Presentation

Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikal Monet Supervised by Pierre Senellart Context 2 Boolean queries (yes/no) on relational instances Context 2


  1. Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikaël Monet Supervised by Pierre Senellart

  2. Context 2  Boolean queries (yes/no) on relational instances

  3. Context 2  Boolean queries (yes/no) on relational instances  We want the answer to contain more information than just « yes/no »:  Add uncertainty  Obtain provenance information

  4. Context 2  Boolean queries (yes/no) on relational instances  We want the answer to contain more information than just « yes/no »:  Add uncertainty  Obtain provenance information  We need restrictions for all of this to be tractable

  5. 3 A probabilistic database R S a d d e f c f e a e d a c e b e Q c e f

  6. 3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66

  7. 3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66 TID model

  8. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f

  9. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66

  10. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7)

  11. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)

  12. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽)

  13. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽)  Problem: in general #P-hard

  14. 6 3 possible directions  Approximate  Restrict queries  Restrict instances

  15. 1) Approximate probability 7 computation

  16. 1) Approximate probability 7 computation  Monte-Carlo sampling

  17. 1) Approximate probability 7 computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision

  18. 1) Approximate probability 7 computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision ⇒ Not adequate for low probabilities.

  19. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

  20. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances

  21. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances

  22. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard!

  23. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard!  Criterion is too crisp

  24. 3) Restricting the shape of 9 the instances  Bound the treewidth of instances by a constant  Treewidth: measure used to tell how far a graph is from being a tree

  25. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  26. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  27. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  28. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  29. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  30. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f Divide and conquer !

  31. 11 O(EXP(k ).|I|) O(|A|.|T| ) O(f(q,k ))

  32. 11 Instance I O(EXP(k ).|I|) of treewidth k O(|A|.|T| ) O(f(q,k ))

  33. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) O(f(q,k ))

  34. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, O(f(q,k )) int k

  35. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A

  36. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C

  37. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C Probability

  38. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  39. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability

  40. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ? Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability

  41. Provenance circuit C of 12 query Q on instance I

  42. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)

  43. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I

  44. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I  For every ν : I → {true, false} ν(I) ⊨ Q iff ν(C) = 1

  45. Tree automata 13  A bottom-up deterministic tree automaton on {a, b}-trees is a tuple A = (Q, F, 𝛋 , 𝛆 ) where :  Q : finite set of states  F ⊆ Q : accepting states  𝛋 : {a, b} → Q , determining state for the leaves  𝛆 : {a, b} X Q² → Q , determining the state for internal nodes

  46. Run of an automaton on a tree 14  Q = {O, O, O}

  47. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}

  48. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

  49. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  50. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  51. Initialization of the leaves 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  52. Initialization of the leaves 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  53. Internal nodes 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  54. Internal nodes 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  55. And so on… 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  56. This tree is in the language of A 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  57. Major drawbacks 15  In general, computing the automaton has non- elementary complexity in the query

  58. Major drawbacks 15  In general, computing the automaton has non- elementary complexity in the query  Exponential dependence in the instance treewidth

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend