probabilistic query evaluation towards tractable combined
play

Probabilistic Query Evaluation: Towards Tractable Combined - PowerPoint PPT Presentation

Probabilistic Query Evaluation: Towards Tractable Combined Complexity Mikal Monet 1 , 2 , supervised by Pierre Senellart 2 , 3 and Antoine Amarilli 1 May 31th, 2017 1 LTCI, Tlcom ParisTech, Universit Paris-Saclay; Paris, France 2 Inria


  1. Probabilistic Query Evaluation: Towards Tractable Combined Complexity Mikaël Monet 1 , 2 , supervised by Pierre Senellart 2 , 3 and Antoine Amarilli 1 May 31th, 2017 1 LTCI, Télécom ParisTech, Université Paris-Saclay; Paris, France 2 Inria Paris; Paris, France 3 École normale supérieure, PSL Research University; Paris, France

  2. Introduction • Uncertainty in data → Untrustworthy sources, automated information extraction, imperfect sensor precision in experimental sciences, etc. • Need framework to model this uncertainty and reason about it 1/20

  3. Introduction • Uncertainty in data → Untrustworthy sources, automated information extraction, imperfect sensor precision in experimental sciences, etc. • Need framework to model this uncertainty and reason about it → Probabilistic Databases! 1/20

  4. Plan 1) Define TID model and probabilistic query evaluation ( PQE ) 2/20

  5. Plan 1) Define TID model and probabilistic query evaluation ( PQE ) 2) Existing approaches (efficient PQE in the data) 2/20

  6. Plan 1) Define TID model and probabilistic query evaluation ( PQE ) 2) Existing approaches (efficient PQE in the data) 3) Efficient PQE in the query and the data 2/20

  7. Plan 1) Define TID model and probabilistic query evaluation ( PQE ) 2) Existing approaches (efficient PQE in the data) 3) Efficient PQE in the query and the data 4) Efficient PQE in the data, reasonable complexity in the query 2/20

  8. Tuple-independent databases (TID) • Probabilistic databases: model uncertainty about data • Simplest model: tuple-independent databases (TID) • A relational database I • A probability valuation π mapping each fact of I to [ 0 , 1 ] • Semantics of a TID ( I , π ) : a probability distribution on I ′ ⊆ I : • Each fact F ∈ I is either present or absent with probability π ( F ) • Assume independence across facts 3/20

  9. Example: TID S a b . 5 a c . 2 4/20

  10. Example: TID S a b . 5 a c . 2 This TID ( I , π ) represents the following probability distribution: 4/20

  11. Example: TID S a b . 5 a c . 2 This TID ( I , π ) represents the following probability distribution: . 5 × . 2 S a b a c 4/20

  12. Example: TID S a b . 5 a c . 2 This TID ( I , π ) represents the following probability distribution: . 5 × . 2 . 5 × ( 1 − . 2 ) S S a b a b a c 4/20

  13. Example: TID S a b . 5 a c . 2 This TID ( I , π ) represents the following probability distribution: . 5 × . 2 . 5 × ( 1 − . 2 ) ( 1 − . 5 ) × . 2 S S S a b a b a c a c 4/20

  14. Example: TID S a b . 5 a c . 2 This TID ( I , π ) represents the following probability distribution: . 5 × . 2 . 5 × ( 1 − . 2 ) ( 1 − . 5 ) × . 2 ( 1 − . 5 ) × ( 1 − . 2 ) S S S S a b a b a c a c 4/20

  15. Probabilistic query evaluation (PQE) Let us fix: • Relational signature σ • Class I of relational instances on σ (e.g., acyclic, treelike) • Class Q of Boolean queries (e.g., paths, trees) 5/20

  16. Probabilistic query evaluation (PQE) Let us fix: • Relational signature σ • Class I of relational instances on σ (e.g., acyclic, treelike) • Class Q of Boolean queries (e.g., paths, trees) Probabilistic query evaluation (PQE) problem for Q and I : • Given a query q ∈ Q • Given an instance I ∈ I and a probability valuation π • Compute the probability that ( I , π ) satisfies q 5/20

  17. Probabilistic query evaluation (PQE) Let us fix: • Relational signature σ • Class I of relational instances on σ (e.g., acyclic, treelike) • Class Q of Boolean queries (e.g., paths, trees) Probabilistic query evaluation (PQE) problem for Q and I : • Given a query q ∈ Q • Given an instance I ∈ I and a probability valuation π • Compute the probability that ( I , π ) satisfies q → Pr (( I , π ) | = q ) = � = q Pr ( J ) J ⊆ I , J | 5/20

  18. Complexity of probabilistic query evaluation (PQE) Question: what is the (data, combined) complexity of PQE depending on the class Q of queries and class I of instances? 6/20

  19. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries 7/20

  20. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S 7/20

  21. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S → PQE is #P-hard for any q ∈ UCQs \S 7/20

  22. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S → PQE is #P-hard for any q ∈ UCQs \S • Existing data dichotomy result on instances 7/20

  23. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S → PQE is #P-hard for any q ∈ UCQs \S • Existing data dichotomy result on instances → PQE for MSO on bounded-treewidth instances has linear data complexity [Amarilli, Bourhis, & Senellart, 2015] 7/20

  24. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S → PQE is #P-hard for any q ∈ UCQs \S • Existing data dichotomy result on instances → PQE for MSO on bounded-treewidth instances has linear data complexity [Amarilli, Bourhis, & Senellart, 2015] → There is an FO query for which PQE is #P-hard on any unbounded-treewidth graph family I (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016] 7/20

  25. Data complexity results: related work • Existing data dichotomy result on queries [Dalvi & Suciu, 2012] • I is all instances • There is a class S ⊆ UCQs of safe queries → PQE is PTIME for any q ∈ S → PQE is #P-hard for any q ∈ UCQs \S • Existing data dichotomy result on instances → PQE for MSO on bounded-treewidth instances has linear data complexity [Amarilli, Bourhis, & Senellart, 2015] → There is an FO query for which PQE is #P-hard on any unbounded-treewidth graph family I (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016] What about combined complexity? 7/20

  26. Wish list We want: • PQE tractable in combined complexity OR • PQE tractable in the data, reasonable in the query 8/20

  27. Restrict to CQs on graph signatures ∃ x y z t R ( x , y ) ∧ S ( y , z ) ∧ S ( t , z ) R a b . 1 b c . 1 c d . 05 d a 1 . d b . 8 S b d . 7 9/20

  28. Restrict to CQs on graph signatures R S S ∃ x y z t R ( x , y ) ∧ S ( y , z ) ∧ S ( t , z ) → y x z t R a b . 1 b c . 1 c d . 05 d a 1 . d b . 8 S b d . 7 9/20

  29. Restrict to CQs on graph signatures R S S ∃ x y z t R ( x , y ) ∧ S ( y , z ) ∧ S ( t , z ) → y x z t R b a b . 1 . 1 R R . 1 b c . 1 R S c d . 05 a a c a → d a 1 . . 7 . 8 . 05 R d b . 8 1 . R d S b d . 7 9/20

  30. Restrict instances to trees Q = one-way paths ( 1WP ), I = polytrees ( PT ) 10/20

  31. Restrict instances to trees Q = one-way paths ( 1WP ), I = polytrees ( PT ) T S S S T Q : 10/20

  32. Restrict instances to trees Q = one-way paths ( 1WP ), I = polytrees ( PT ) T T S S T T I : S S S T S S S T Q : S T T S + prob. for each edge 10/20

  33. Restrict instances to trees Q = one-way paths ( 1WP ), I = polytrees ( PT ) T T S S T T I : S S S T S S S T Q : S T T S + prob. for each edge Proposition PQE of 1WP on PT is #P-hard 10/20

  34. Our graph classes 1WP DWT PT S S R T 2WP R S S T R 2WP ⊆ ⊆ 1WP PT Connected All ⊆ ⊆ ⊆ ⊆ DWT 11/20

  35. Results ↓ Q I → 1WP 2WP DWT PT Connected 1WP 2WP � 2 labels DWT PTIME PT #P-hard Connected 12/20

  36. Results ↓ Q I → 1WP 2WP DWT PT Connected 1WP 2WP � 2 labels DWT PTIME PT #P-hard Connected ↓ Q I → 1WP 2WP DWT PT Connected 1WP 2WP No labels DWT PTIME PT #P-hard Connected 12/20

  37. Led to a publication in PODS’2017 Contributions: • Detailed study of the combined complexity of PQE 13/20

  38. Led to a publication in PODS’2017 Contributions: • Detailed study of the combined complexity of PQE • Focus on CQs on arity-two signatures 13/20

  39. Led to a publication in PODS’2017 Contributions: • Detailed study of the combined complexity of PQE • Focus on CQs on arity-two signatures • Showed the importance of various features on the problem: labels, global orientation, branching, connectedness 13/20

  40. Led to a publication in PODS’2017 Contributions: • Detailed study of the combined complexity of PQE • Focus on CQs on arity-two signatures • Showed the importance of various features on the problem: labels, global orientation, branching, connectedness • Established the complexity for all combinations of the graph classes we considered 13/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend