shannon type inequalities submodular width and
play

Shannon-type Inequalities, Submodular Width, and Disjunctive - PowerPoint PPT Presentation

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Hung Q. Ngo (Stealth Mode) With Mahmoud Abo Khamis and Dan Suciu @ PODS 2017 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow


  1. Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

  2. Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag

  3. Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ g/f = h ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag

  4. Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b , c, e bag a, b , c, d bag b , e, f, g, h bag h, i, j bag e, b , k bag ◮ Every relation is covered by some bag ◮ Bags conntaining a given variable are connected

  5. Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree See [Gottlob et al 2016], Gems of PODS.

  6. Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree ◮ χ : V ( T ) → 2 [ n ] assigns a bag χ ( v ) to each tree-node v ◮ Every hyperedge F ∈ E is covered by some bag ( F ⊆ χ ( v ) ) ◮ Bags containing ∀ i ∈ [ n ] forms a subtree See [Gottlob et al 2016], Gems of PODS.

  7. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E

  8. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB)

  9. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates

  10. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E

  11. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E

  12. Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E v ∈ V ( T ) | P v ( D ) | ≤ N fhtw ( H ) ≤ N ghtw ( H ) ≤ N tw ( H )+1 ( T ,χ ) max min = CC max D |

  13. Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮

  14. Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs

  15. Option 3: Multiple Tree Decompositions How to evaluate this? Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ ? D |

  16. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 R 12 R 23 A 1 A 3 R 41 R 34 A 4

  17. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4

  18. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  19. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 By distributivity, rewrite an equivalent the head: ( T 123 ∨ T 124 ) ∧ ( T 123 ∨ T 234 ) ∧ ( T 134 ∨ T 124 ) ∧ ( T 134 ∨ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 (Each “clause” has one bag per TD)

  20. Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E

  21. Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E | P B ( D ) | ≤ N subw ( H ) ≤ N fhtw ( H ) max = CC max D | B subw = submodular width (Daniel Marx, JACM’2013)

  22. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i :

  23. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  24. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  25. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  26. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  27. Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 3 / 2 Option 3: iTime = max D | Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

  28. Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E

  29. Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D |

  30. Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound.

  31. Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound. Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

  32. Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

  33. High-level View of the Bound Given degree constraints DC , a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E We shall prove bounds of the form max = DC log | P ( D ) | ≤ some function of h D | s.t. h is (approximately) entropic and h satisfies degree constraints

  34. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4

  35. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

  36. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

  37. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

  38. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4

  39. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4

  40. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ...

  41. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ...

  42. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � �� �

  43. An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � �� � deg R 12 ( A 2 | A 1 )

  44. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E

  45. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D |

  46. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies):

  47. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic

  48. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X

  49. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC

  50. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E

  51. Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E ◮ Good Bound, but not computable!

  52. Hierarchy of Set Functions h : 2 [ n ] → R + , non-negative, monotone, h ( ∅ ) = 0 h ( X ) ≤ h ( Y ) if X ⊆ Y SA n := { h | h is sub-additive } h ( X ∪ Y ) ≤ h ( X ) + h ( Y ) Γ n := { h | h is submodular } h ( X ∪ Y ) + h ( X ∩ Y ) ≤ h ( X ) + h ( Y ) ∗ n : topological closure of Γ ∗ Γ n Γ ∗ n = { h : h is entropic } M n : Modular � h ( X ) = h ( x ) x ∈ x

  53. Bounds for Full Conjunctive Query � � ◮ HDC def | h ( Y | X ) ≤ log N Y | X , ∀ ( X, Y, N Y | X ) = h

  54. Bounds for Full Conjunctive Query � � ◮ HDC def | h ( Y | X ) ≤ log N Y | X , ∀ ( X, Y, N Y | X ) = h ◮ Then, = DC log | Q ( D ) | ≤ max max h ([ n ]) entropic bound h ∈ Γ ∗ D | n ∩ HDC ≤ h ∈ Γ n ∩ HDC h ([ n ]) max polymatroid bound ≤ h ∈ SA n ∩ HDC h ([ n ]) max sub-additive bound .

  55. Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗

  56. Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

  57. Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] )

  58. Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] ) Entropic Bound for DC Polymatroid Bound for DC DC (Tight [our work] ) (Not tight [our work] )

  59. Disjunctive Datalog: Size Bounds � � | P ( D ) | def P : T B ( A B ) ← R F ( A F ) = min = P max B ∈B | T B | T : T | B ∈B F ∈E Theorem ( our work ) max = DC log | P ( D ) | ≤ max min B ∈B h ( B ) Tight h ∈ Γ ∗ D | n ∩ HDC � �� � Entropic bound ≤ h ∈ Γ n ∩ HDC min max B ∈B h ( B ) Not Tight � �� � Polymatroid bound Imply all known bounds for (Full) Conjunctive Queries!

  60. Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 .

  61. Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . max = CC log | P 123 , 234 ( D ) | ≤ h ∈ Γ n ∩ CC min { h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 ) } max D | 1 ≤ max 2[ h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 )] h ∈ Γ n ∩ CC 1 ≤ max 2[ h ( A 1 A 2 ) + h ( A 2 A 3 ) + h ( A 3 A 4 )] h ∈ Γ n ∩ CC ≤ 3 2 log N.

  62. Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

  63. Roadmap Given degree constraints DC and a disjunctive datalog rule � � T B ← P : R F B ∈B F ∈E Answer (Worst-case Output Size Bound) = DC log | P ( D ) | ≤ max h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = polymatroid bound. D | Question (Algorithm) Compute a model for P within ˜ O (2 polymatroid bound ) Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

  64. Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B

  65. Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X )

  66. Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X ) (2) is a (vast) generalization of Shearer’s lemma

  67. PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule   � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N  :  Y | X ( X,Y,N Y | X )

  68. PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule   � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N  :  Y | X ( X,Y,N Y | X ) ◮ How? Proof as symbolic instructions ◮ Construct a Proof Sequence for the corresponding Shannon-flow inequality ◮ Proof steps → relational operators.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend