Shannon-type Inequalities, Submodular Width, and Disjunctive - PowerPoint PPT Presentation

Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag

Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ g/f = h ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag

Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b , c, e bag a, b , c, d bag b , e, f, g, h bag h, i, j bag e, b , k bag ◮ Every relation is covered by some bag ◮ Bags conntaining a given variable are connected

Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree See [Gottlob et al 2016], Gems of PODS.

Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree ◮ χ : V ( T ) → 2 [ n ] assigns a bag χ ( v ) to each tree-node v ◮ Every hyperedge F ∈ E is covered by some bag ( F ⊆ χ ( v ) ) ◮ Bags containing ∀ i ∈ [ n ] forms a subtree See [Gottlob et al 2016], Gems of PODS.

Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E

Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB)

Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates

Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E

Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E v ∈ V ( T ) | P v ( D ) | ≤ N fhtw ( H ) ≤ N ghtw ( H ) ≤ N tw ( H )+1 ( T ,χ ) max min = CC max D |

Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮

Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs

Option 3: Multiple Tree Decompositions How to evaluate this? Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ ? D |

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 R 12 R 23 A 1 A 3 R 41 R 34 A 4

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 By distributivity, rewrite an equivalent the head: ( T 123 ∨ T 124 ) ∧ ( T 123 ∨ T 234 ) ∧ ( T 134 ∨ T 124 ) ∧ ( T 134 ∨ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 (Each “clause” has one bag per TD)

Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E

Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E | P B ( D ) | ≤ N subw ( H ) ≤ N fhtw ( H ) max = CC max D | B subw = submodular width (Daniel Marx, JACM’2013)

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i :

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 3 / 2 Option 3: iTime = max D | Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41

Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E

Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D |

Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound.

Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound. Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

High-level View of the Bound Given degree constraints DC , a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E We shall prove bounds of the form max = DC log | P ( D ) | ≤ some function of h D | s.t. h is (approximately) entropic and h satisfies degree constraints

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ...

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ...

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � ��

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � �� deg R 12 ( A 2 | A 1 )

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D |

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies):

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E

Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E ◮ Good Bound, but not computable!

Hierarchy of Set Functions h : 2 [ n ] → R + , non-negative, monotone, h ( ∅ ) = 0 h ( X ) ≤ h ( Y ) if X ⊆ Y SA n := { h | h is sub-additive } h ( X ∪ Y ) ≤ h ( X ) + h ( Y ) Γ n := { h | h is submodular } h ( X ∪ Y ) + h ( X ∩ Y ) ≤ h ( X ) + h ( Y ) ∗ n : topological closure of Γ ∗ Γ n Γ ∗ n = { h : h is entropic } M n : Modular � h ( X ) = h ( x ) x ∈ x

Bounds for Full Conjunctive Query � � ◮ HDC def | h ( Y | X ) ≤ log N Y | X , ∀ ( X, Y, N Y | X ) = h

Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗

Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] )

Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] ) Entropic Bound for DC Polymatroid Bound for DC DC (Tight [our work] ) (Not tight [our work] )

Disjunctive Datalog: Size Bounds � � | P ( D ) | def P : T B ( A B ) ← R F ( A F ) = min = P max B ∈B | T B | T : T | B ∈B F ∈E Theorem ( our work ) max = DC log | P ( D ) | ≤ max min B ∈B h ( B ) Tight h ∈ Γ ∗ D | n ∩ HDC � �� Entropic bound ≤ h ∈ Γ n ∩ HDC min max B ∈B h ( B ) Not Tight � �� Polymatroid bound Imply all known bounds for (Full) Conjunctive Queries!

Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 .

Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . max = CC log | P 123 , 234 ( D ) | ≤ h ∈ Γ n ∩ CC min { h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 ) } max D | 1 ≤ max 2[ h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 )] h ∈ Γ n ∩ CC 1 ≤ max 2[ h ( A 1 A 2 ) + h ( A 2 A 3 ) + h ( A 3 A 4 )] h ∈ Γ n ∩ CC ≤ 3 2 log N.

Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

Roadmap Given degree constraints DC and a disjunctive datalog rule � � T B ← P : R F B ∈B F ∈E Answer (Worst-case Output Size Bound) = DC log | P ( D ) | ≤ max h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = polymatroid bound. D | Question (Algorithm) Compute a model for P within ˜ O (2 polymatroid bound ) Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?

Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B

Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X )

Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X ) (2) is a (vast) generalization of Shearer’s lemma

PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule   � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N  :  Y | X ( X,Y,N Y | X )

PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule   � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N  :  Y | X ( X,Y,N Y | X ) ◮ How? Proof as symbolic instructions ◮ Construct a Proof Sequence for the corresponding Shannon-flow inequality ◮ Proof steps → relational operators.

Shannon-type Inequalities, Submodular Width, and Disjunctive - PowerPoint PPT Presentation

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Hung Q. Ngo (Stealth Mode) With Mahmoud Abo Khamis and Dan Suciu @ PODS 2017 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow

What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one

( ) Outline Submodular

Fast Semi-differential based Submodular Function Optimization Rishabh Iyer 1 Stefanie Jegelka 2

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My

Minimizing Submodular Functions Satoru Iwata (RIMS, Kyoto University) Outline Submodular

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Carving-width, tree-width and area-optimal planar graph drawing Therese Biedl University of

Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints

CS675: Convex and Combinatorial Optimization Fall 2019 Submodular Function Optimization

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Optimization of Submodular Functions Tutorial - lecture II Jan Vondrk 1 1 IBM Almaden Research

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

1 The Second Industrial Revolution (cont.) In the Second Industrial Revolution there was

Capital in the 21 st century Thomas Piketty Paris School of Economics LSE, June 16 2014 This

The mode of production is defined by the "productive forces" and the "relations of

Parameterized complexity of constraint satisfaction problems D aniel Marx Budapest University

Sylos Labini on Marx An Attempt at a Balanced Assessment Based on Critical Thinking Massimo

Initiative in the NYS Budget Kate Breslin , President and CEO Schuyler Center for Analysis and

SLAC Project X RF Power SLAC Project X RF Power Program Program Chris Adolphsen Outline

Individuals as Source of Social Order Marx The Problem How do humans achieve

Shannon-type Inequalities, Submodular Width, and Disjunctive - PowerPoint PPT Presentation

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Hung Q. Ngo (Stealth Mode) With Mahmoud Abo Khamis and Dan Suciu @ PODS 2017 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow

What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one

( ) Outline Submodular

Fast Semi-differential based Submodular Function Optimization Rishabh Iyer 1 Stefanie Jegelka 2

Submodular Maximization Seffi Naor Lecture 2 4th Cargese Workshop on Combinatorial Optimization

Streaming -submodular Maximization under Noise subject to Size Constraint Lan N. Nguyen, My

Minimizing Submodular Functions Satoru Iwata (RIMS, Kyoto University) Outline Submodular

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Mid Shannon Wilderness Park The potential future of the Longford bogs Mid Shannon Potential 22

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Carving-width, tree-width and area-optimal planar graph drawing Therese Biedl University of

Fast and Private Submodular and k- Submodular Functions Maximization with Matroid Constraints

CS675: Convex and Combinatorial Optimization Fall 2019 Submodular Function Optimization

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

Optimization of Submodular Functions Tutorial - lecture II Jan Vondrk 1 1 IBM Almaden Research

Approximating Submodular Functions Everywhere Nick Harvey February 16, 2008 Joint work with M.

Maximization of Submodular Functions Seffi Naor Lecture 1 4th Cargese Workshop on Combinatorial

1 The Second Industrial Revolution (cont.) In the Second Industrial Revolution there was

Capital in the 21 st century Thomas Piketty Paris School of Economics LSE, June 16 2014 This

The mode of production is defined by the &quot;productive forces&quot; and the &quot;relations of

Parameterized complexity of constraint satisfaction problems D aniel Marx Budapest University

Sylos Labini on Marx An Attempt at a Balanced Assessment Based on Critical Thinking Massimo

Initiative in the NYS Budget Kate Breslin , President and CEO Schuyler Center for Analysis and

SLAC Project X RF Power SLAC Project X RF Power Program Program Chris Adolphsen Outline

Individuals as Source of Social Order Marx The Problem How do humans achieve

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

The mode of production is defined by the "productive forces" and the "relations of