Shannon-type Inequalities, Submodular Width, and Disjunctive - - PowerPoint PPT Presentation

shannon type inequalities submodular width and
SMART_READER_LITE
LIVE PREVIEW

Shannon-type Inequalities, Submodular Width, and Disjunctive - - PowerPoint PPT Presentation

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog Hung Q. Ngo (Stealth Mode) With Mahmoud Abo Khamis and Dan Suciu @ PODS 2017 Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow


slide-1
SLIDE 1

Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog

Hung Q. Ngo

(Stealth Mode) With Mahmoud Abo Khamis and Dan Suciu @ PODS 2017

slide-2
SLIDE 2

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-3
SLIDE 3

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-4
SLIDE 4

Exciting “Recent” Results on Query Evaluation

slide-5
SLIDE 5

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n]

slide-6
SLIDE 6

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

slide-7
SLIDE 7

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

◮ Relation RF (AF ) for each F ∈ E,

RF for short

◮ e.g. R123 for R123(A1, A2, A3) ◮ e.g. T25 for T25(A2, A5)

slide-8
SLIDE 8

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

◮ Relation RF (AF ) for each F ∈ E,

RF for short

◮ e.g. R123 for R123(A1, A2, A3) ◮ e.g. T25 for T25(A2, A5)

slide-9
SLIDE 9

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

◮ Relation RF (AF ) for each F ∈ E,

RF for short

◮ e.g. R123 for R123(A1, A2, A3) ◮ e.g. T25 for T25(A2, A5)

Problem (Boolean Conjunctive Query (BCQ))

Q : S() ←

  • F∈E

RF ,

// can all RF be satisfied at once?

slide-10
SLIDE 10

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

◮ Relation RF (AF ) for each F ∈ E,

RF for short

◮ e.g. R123 for R123(A1, A2, A3) ◮ e.g. T25 for T25(A2, A5)

Problem (Boolean Conjunctive Query (BCQ))

Q : S() ←

  • F∈E

RF ,

// can all RF be satisfied at once?

Question

How do we evaluate Q efficiently?

slide-11
SLIDE 11

A Query Evaluation Problem

◮ Hypergraph H = ([n], E), E ⊆ 2[n] ◮ Attribute Ai, i ∈ [n],

AJ := {Aj | j ∈ J}, J ⊆ [n]

◮ Relation RF (AF ) for each F ∈ E,

RF for short

◮ e.g. R123 for R123(A1, A2, A3) ◮ e.g. T25 for T25(A2, A5)

Problem (Boolean Conjunctive Query (BCQ))

Q : S() ←

  • F∈E

RF ,

// can all RF be satisfied at once?

Question

How do we evaluate Q efficiently? Conjunctive, count, aggregate queries are fine too.

slide-12
SLIDE 12

What do we mean by “efficiency”?

Database centric complexity framework

◮ Assumption 1: query size ≪ data size

◮ Data complexity ◮ Fixed parameter tractability (e.g. parameter = some

function of query size) ˜ O(something) = O (f(|query|) · polylog(|data|) · something)

slide-13
SLIDE 13

What do we mean by “efficiency”?

Database centric complexity framework

◮ Assumption 1: query size ≪ data size

◮ Data complexity ◮ Fixed parameter tractability (e.g. parameter = some

function of query size) ˜ O(something) = O (f(|query|) · polylog(|data|) · something)

◮ Assumption 2: known constraints on input relations

◮ Cardinalities of materialized relations,

  • r upper bounds

◮ cardinality constraints (CC) ◮ Functional dependencies,

the more the merrier

◮ FD constraints (FDC) ◮ Degree bounds,

the more the merrier

◮ Degree constraints (DC)

slide-14
SLIDE 14

Example

Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41 ∧ A1 + A2 = A3. H = ([4], {12, 23, 34, 14, 123}) A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 2 1 4 1 4 2 5 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 e 6 e 7 A4 A1 3 2 4 4 4 5

◮ Cardinalities: |R12| = 4, |R23| = 3, . . .

slide-15
SLIDE 15

Example

Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41 ∧ A1 + A2 = A3. H = ([4], {12, 23, 34, 14, 123}) A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 2 1 4 1 4 2 5 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 e 6 e 7 A4 A1 3 2 4 4 4 5

◮ Cardinalities: |R12| = 4, |R23| = 3, . . . ◮ FD: {A1, A2} → A3, {A3, A2} → A1, . . .

slide-16
SLIDE 16

Example

Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41 ∧ A1 + A2 = A3. H = ([4], {12, 23, 34, 14, 123}) A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 2 1 4 1 4 2 5 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 e 6 e 7 A4 A1 3 2 4 4 4 5

◮ Cardinalities: |R12| = 4, |R23| = 3, . . . ◮ FD: {A1, A2} → A3, {A3, A2} → A1, . . . ◮ Degree Bounds: deg34(A4|A3 = x)

def

= |σA3=x(R34)| ≤ 2, ∀x, . . .

slide-17
SLIDE 17

Degree- generalize cardinality- and FD-constraints

DC ⊇ CC ∪ FDC

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

slide-18
SLIDE 18

Degree- generalize cardinality- and FD-constraints

DC ⊇ CC ∪ FDC

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ N ⇔ degF (AF |A∅) ≤ NF |∅

def

= N.

slide-19
SLIDE 19

Degree- generalize cardinality- and FD-constraints

DC ⊇ CC ∪ FDC

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ N ⇔ degF (AF |A∅) ≤ NF |∅

def

= N.

◮ Functional Dependencies (FDC):

AX → AY ⇔ degF (AY |AX) ≤ NY |X

def

= 1.

slide-20
SLIDE 20

Islands of tractability (redrawn from D. Marx’s slides)

Prior results with cardinality constraints

Bounded Treewidth Bounded (generalized) Hypertree Width Bounded fractional edge cover number Bounded fractional hypertree width Bounded submodular width

PTIME FPT not FPT

slide-21
SLIDE 21

Islands of tractability (redrawn from D. Marx’s slides)

Prior results with cardinality constraints

Bounded Treewidth Bounded (generalized) Hypertree Width Bounded fractional edge cover number Bounded fractional hypertree width Bounded submodular width

PTIME FPT not FPT We want the same map with degree constraints.

slide-22
SLIDE 22

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
slide-23
SLIDE 23

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
  • (a) oTime = |Qi(D)| + |answer|
slide-24
SLIDE 24

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
  • (a) oTime = |Qi(D)| + |answer|

◮ i.e. Qo evaluatable in linear time

slide-25
SLIDE 25

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
  • (a) oTime = |Qi(D)| + |answer|

◮ i.e. Qo evaluatable in linear time

(b) iTime = max

D| =DC |Qi(D)|

slide-26
SLIDE 26

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
  • (a) oTime = |Qi(D)| + |answer|

◮ i.e. Qo evaluatable in linear time

(b) iTime = max

D| =DC |Qi(D)|

◮ i.e. Qi evaluatable within its worst-case output size

slide-27
SLIDE 27

A Meta Algorithm (i.e. Meta Query Plan)

Database D

Datalog rule Qi

Qi(D)

  • F∈E

RF Output Answer A propositional formula on relations Tj, j ∈ J iTime

  • Time

Qo

  • verall time = ˜

O

  • |input| + iTime + oTime
  • (a) oTime = |Qi(D)| + |answer|

◮ i.e. Qo evaluatable in linear time

(b) iTime = max

D| =DC |Qi(D)|

◮ i.e. Qi evaluatable within its worst-case output size

◮ Design Qi s.t. (a) holds and (b) as small as possible

slide-28
SLIDE 28

Option 1: Full Conjunctive Query

Database D

Full conjunctive query Qi

Qi(D)

  • F∈E

RF Output Answer T[n] iTime

  • Time

Qo

slide-29
SLIDE 29

Option 1: Full Conjunctive Query

Database D

Full conjunctive query Qi

Qi(D)

  • F∈E

RF Output Answer T[n] iTime

  • Time

Qo

◮ oTime = |Qi(D)| + |answer|

trivially true

slide-30
SLIDE 30

Option 1: Full Conjunctive Query

Database D

Full conjunctive query Qi

Qi(D)

  • F∈E

RF Output Answer T[n] iTime

  • Time

Qo

◮ oTime = |Qi(D)| + |answer|

trivially true

◮ iTime = max D| =DC |Qi(D)|

worst-case optimal algorithm

◮ Known if all DC are cardinality constraints ◮ NPRR [Ngo, Porat, R´

e, Rudra PODS’12]

◮ Leapfrog-Triejoin [Veldhuizen ICDT’14] ◮ Generic Join [Ngo, R´

e, Rudra SIGMOD Records 2013]

◮ Unknown for general DC until our work

slide-31
SLIDE 31

Detour: Tree Decompositions, Informally

slide-32
SLIDE 32

Detour: Tree Decompositions, Informally

S() ← R(a, b, d) ∧ c < d ∧ T(c, b, d) ∧ U(b, e) ∧ V (c, e) ∧ b + e = f ∧ W(b, e, g) ∧ ∧ X(i, j, h) ∧ e − b = k.

slide-33
SLIDE 33

Detour: Tree Decompositions, Informally

S() ← R(a, b, d) ∧ c < d ∧ T(c, b, d) ∧ U(b, e) ∧ V (c, e) ∧ b + e = f ∧ W(b, e, g) ∧ ∧ X(i, j, h) ∧ e − b = k.

b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

slide-34
SLIDE 34

Detour: Tree Decompositions, Informally

S() ← R(a, b, d) ∧ c < d ∧ T(c, b, d) ∧ U(b, e) ∧ V (c, e) ∧ b + e = f ∧ W(b, e, g) ∧ ∧ X(i, j, h) ∧ e − b = k.

b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

◮ Every relation is covered by some bag

slide-35
SLIDE 35

Detour: Tree Decompositions, Informally

S() ← R(a, b, d) ∧ c < d ∧ T(c, b, d) ∧ U(b, e) ∧ V (c, e) ∧ b + e = f ∧ W(b, e, g) ∧ g/f = h ∧ X(i, j, h) ∧ e − b = k.

b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag

◮ Every relation is covered by some bag

slide-36
SLIDE 36

Detour: Tree Decompositions, Informally

S() ← R(a, b, d) ∧ c < d ∧ T(c, b, d) ∧ U(b, e) ∧ V (c, e) ∧ b + e = f ∧ W(b, e, g) ∧ ∧ X(i, j, h) ∧ e − b = k.

b , c, e

bag a, b , c, d bag

b , e, f, g, h

bag h, i, j bag e, b , k bag

◮ Every relation is covered by some bag ◮ Bags conntaining a given variable are connected

slide-37
SLIDE 37

Detour: Tree Decompositions, Formally

◮ Hypergraph H = ([n], E) ◮ A Tree Decomposition of H is a pair (T , χ) where

◮ T = (V (T ), E(T )) is a tree

See [Gottlob et al 2016], Gems of PODS.

slide-38
SLIDE 38

Detour: Tree Decompositions, Formally

◮ Hypergraph H = ([n], E) ◮ A Tree Decomposition of H is a pair (T , χ) where

◮ T = (V (T ), E(T )) is a tree ◮ χ : V (T ) → 2[n] assigns a bag χ(v) to each tree-node v ◮ Every hyperedge F ∈ E is covered by some bag (F ⊆ χ(v)) ◮ Bags containing ∀i ∈ [n] forms a subtree

See [Gottlob et al 2016], Gems of PODS.

slide-39
SLIDE 39

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

slide-40
SLIDE 40

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

◮ Qi(D) is Olteanu’s factorized database (FDB)

slide-41
SLIDE 41

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

◮ Qi(D) is Olteanu’s factorized database (FDB) ◮ oTime = |Qi(D)| + |answer|

◮ Yannakakis for join, FDB/InsideOut for aggregates

slide-42
SLIDE 42

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

◮ Qi(D) is Olteanu’s factorized database (FDB) ◮ oTime = |Qi(D)| + |answer|

◮ Yannakakis for join, FDB/InsideOut for aggregates

◮ iTime = max D| =DC |Qi(D)| ≤ max D| =DC max v∈V (T ) |Pv(D)|

◮ Pv : Tχ(v) ←

  • F ∈E

RF v ∈ V (T )

slide-43
SLIDE 43

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

◮ Qi(D) is Olteanu’s factorized database (FDB) ◮ oTime = |Qi(D)| + |answer|

◮ Yannakakis for join, FDB/InsideOut for aggregates

◮ iTime = max D| =DC |Qi(D)| ≤ max D| =DC max v∈V (T ) |Pv(D)|

◮ Pv : Tχ(v) ←

  • F ∈E

RF v ∈ V (T )

slide-44
SLIDE 44

Option 2: A Single Tree Decomposition

Fix (T , χ)

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

◮ Qi(D) is Olteanu’s factorized database (FDB) ◮ oTime = |Qi(D)| + |answer|

◮ Yannakakis for join, FDB/InsideOut for aggregates

◮ iTime = max D| =DC |Qi(D)| ≤ max D| =DC max v∈V (T ) |Pv(D)|

◮ Pv : Tχ(v) ←

  • F ∈E

RF v ∈ V (T )

min

(T ,χ) max D| =CC max v∈V (T ) |Pv(D)| ≤ Nfhtw(H) ≤ Nghtw(H) ≤ Ntw(H)+1

slide-45
SLIDE 45

Option 3: Multiple Tree Decompositions

Database D

Qi

Qi(D)

  • F∈E

RF Output Answer

  • (T ,χ)
  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

  • ranges over non-redundant TDs (T , χ)
slide-46
SLIDE 46

Option 3: Multiple Tree Decompositions

Database D

Qi

Qi(D)

  • F∈E

RF Output Answer

  • (T ,χ)
  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

  • ranges over non-redundant TDs (T , χ)

◮ oTime = |Qi(D)| + |answer|

◮ Union of Yannakakis on all TDs

slide-47
SLIDE 47

Option 3: Multiple Tree Decompositions

Database D

How to evaluate this? Qi

Qi(D)

  • F∈E

RF Output Answer

  • (T ,χ)
  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

  • ranges over non-redundant TDs (T , χ)

◮ oTime = |Qi(D)| + |answer|

◮ Union of Yannakakis on all TDs

◮ iTime = max D| =DC |Qi(D)| ≤

?

slide-48
SLIDE 48

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

A1 A3 A2 A4 R12 R23 R34 R41

slide-49
SLIDE 49

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-50
SLIDE 50

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

(T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41

slide-51
SLIDE 51

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

(T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 By distributivity, rewrite an equivalent the head: (T123 ∨ T124) ∧ (T123 ∨ T234) ∧ (T134 ∨ T124) ∧ (T134 ∨ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 (Each “clause” has one bag per TD)

slide-52
SLIDE 52

Option 3: Multiple Tree Decompositions

Database D

Multiple Disjunctive Datalog Rules! Qi

Qi(D)

  • F∈E

RF Output Answer

  • B
  • B∈B

TB iTime

  • Time

Qo

◮ oTime = |Qi(D)| + |answer|

◮ Union of Yannakakis on all TDs

◮ iTime = max D| =DC |Qi(D)| ≤ max D| =DC max B

|PB(D)|

◮ PB :

  • B∈B

TB ←

  • F ∈E

RF disjunctive datalog rule

slide-53
SLIDE 53

Option 3: Multiple Tree Decompositions

Database D

Multiple Disjunctive Datalog Rules! Qi

Qi(D)

  • F∈E

RF Output Answer

  • B
  • B∈B

TB iTime

  • Time

Qo

◮ oTime = |Qi(D)| + |answer|

◮ Union of Yannakakis on all TDs

◮ iTime = max D| =DC |Qi(D)| ≤ max D| =DC max B

|PB(D)|

◮ PB :

  • B∈B

TB ←

  • F ∈E

RF disjunctive datalog rule

max

D| =CC max B

|PB(D)| ≤ Nsubw(H) ≤ Nfhtw(H) subw = submodular width (Daniel Marx, JACM’2013)

slide-54
SLIDE 54

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-55
SLIDE 55

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 2: either Qi : T123 ∧ T134 ← R12 ∧ R23 ∧ R34 ∧ R41

  • r Qi :

T124 ∧ T234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-56
SLIDE 56

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 2: either Qi : T123 ∧ T134 ← R12 ∧ R23 ∧ R34 ∧ R41

  • r Qi :

T124 ∧ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 3: Qi : (T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 Equivalent to:

P123,124 : T123 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,124 : T134 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,234 : T134 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-57
SLIDE 57

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: iTime = max

D| =DC |Qi(D)| = N2

Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 2: either Qi : T123 ∧ T134 ← R12 ∧ R23 ∧ R34 ∧ R41

  • r Qi :

T124 ∧ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 3: Qi : (T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 Equivalent to:

P123,124 : T123 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,124 : T134 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,234 : T134 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-58
SLIDE 58

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: iTime = max

D| =DC |Qi(D)| = N2

Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 2: iTime = max

D| =DC |Qi(D)| = N2

either Qi : T123 ∧ T134 ← R12 ∧ R23 ∧ R34 ∧ R41

  • r Qi :

T124 ∧ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 3: Qi : (T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 Equivalent to:

P123,124 : T123 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,124 : T134 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,234 : T134 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-59
SLIDE 59

Example: Q : S() ← R12 ∧ R23 ∧ R34 ∧ R41

Option 1: iTime = max

D| =DC |Qi(D)| = N2

Qi : T1234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 2: iTime = max

D| =DC |Qi(D)| = N2

either Qi : T123 ∧ T134 ← R12 ∧ R23 ∧ R34 ∧ R41

  • r Qi :

T124 ∧ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 Option 3: iTime = max

D| =DC |Qi(D)| = N3/2

Qi : (T123 ∧ T134) ∨ (T124 ∧ T234) ← R12 ∧ R23 ∧ R34 ∧ R41 Equivalent to:

P123,124 : T123 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,124 : T134 ∨ T124 ← R12 ∧ R23 ∧ R34 ∧ R41 P134,234 : T134 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41

slide-60
SLIDE 60

Roadmap

Given degree constraints DC, and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

slide-61
SLIDE 61

Roadmap

Given degree constraints DC, and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

Question (Worst-case Output Size Bound)

Find a good upper-bound for max

D| =DC |P(D)|

slide-62
SLIDE 62

Roadmap

Given degree constraints DC, and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

Question (Worst-case Output Size Bound)

Find a good upper-bound for max

D| =DC |P(D)|

Question (Algorithm)

Design an algorithm evaluating P within the bound.

slide-63
SLIDE 63

Roadmap

Given degree constraints DC, and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

Question (Worst-case Output Size Bound)

Find a good upper-bound for max

D| =DC |P(D)|

Question (Algorithm)

Design an algorithm evaluating P within the bound.

Question (Gathering fruits)

Plug bound/algorithm into Meta Algorithm, what do we get?

slide-64
SLIDE 64

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-65
SLIDE 65

High-level View of the Bound

Given degree constraints DC, a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF We shall prove bounds of the form max

D| =DC log |P(D)| ≤ some function of h

s.t. h is (approximately) entropic and h satisfies degree constraints

slide-66
SLIDE 66

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

slide-67
SLIDE 67

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-68
SLIDE 68

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-69
SLIDE 69

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-70
SLIDE 70

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4

slide-71
SLIDE 71

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 Hunif(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4

slide-72
SLIDE 72

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 Hunif(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 Hunif(A1A2) ≤ log |R12|, Hunif(A2A3) ≤ log |R23|, Hunif(A3A4) ≤ log |R34|, ...

slide-73
SLIDE 73

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 Hunif(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 Hunif(A1A2) ≤ log |R12|, Hunif(A2A3) ≤ log |R23|, Hunif(A3A4) ≤ log |R34|, ... Hunif(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

Hunif(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • , ...
slide-74
SLIDE 74

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 Hunif(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 Hunif(A1A2) ≤ log |R12|, Hunif(A2A3) ≤ log |R23|, Hunif(A3A4) ≤ log |R34|, ... Hunif(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

Hunif(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • , ...

Hunif(A2|A1) ≤ log max

x

  • σA1=xR12
slide-75
SLIDE 75

An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12

Q(A1, A2, A3, A4) ← R12 ∧ R23 ∧ R34 ∧ R41. A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 Hunif(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 Hunif(A1A2) ≤ log |R12|, Hunif(A2A3) ≤ log |R23|, Hunif(A3A4) ≤ log |R34|, ... Hunif(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

Hunif(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • , ...

Hunif(A2|A1) ≤ log max

x

  • σA1=xR12
  • degR12(A2|A1)
slide-76
SLIDE 76

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

slide-77
SLIDE 77

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n])

slide-78
SLIDE 78

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

slide-79
SLIDE 79

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

◮ h is Entropic

slide-80
SLIDE 80

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

◮ h is Entropic ◮ There is some distribution on A[n] such that h(X) is the

marginal entropy on AX, for all X

slide-81
SLIDE 81

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

◮ h is Entropic ◮ There is some distribution on A[n] such that h(X) is the

marginal entropy on AX, for all X

◮ h satisfies DC

slide-82
SLIDE 82

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

◮ h is Entropic ◮ There is some distribution on A[n] such that h(X) is the

marginal entropy on AX, for all X

◮ h satisfies DC ◮ h(Y |X) def

= h(Y ) − h(X) ≤ log NY |X, X ⊂ Y ⊆ F ∈ E

slide-83
SLIDE 83

Entropic Bound for Full Conjunctive Queries

◮ Q : T[n] ←

  • F∈E

RF , and degree constraints DC

max

D| =DC log |Q(D)| ≤ sup h([n]) ◮ subject to (whatever Hunif satisfies):

◮ h is Entropic ◮ There is some distribution on A[n] such that h(X) is the

marginal entropy on AX, for all X

◮ h satisfies DC ◮ h(Y |X) def

= h(Y ) − h(X) ≤ log NY |X, X ⊂ Y ⊆ F ∈ E

◮ Good Bound, but not computable!

slide-84
SLIDE 84

Hierarchy of Set Functions

h : 2[n] → R+, non-negative, monotone, h(∅) = 0 h(X) ≤ h(Y ) if X ⊆ Y SAn := {h | h is sub-additive} h(X ∪ Y ) ≤ h(X) + h(Y ) Γn := {h | h is submodular} h(X ∪ Y ) + h(X ∩ Y ) ≤ h(X) + h(Y ) Γ

∗ n: topological closure of Γ∗ n

Γ∗

n = {h : h is entropic}

Mn : Modular h(X) =

  • x∈x

h(x)

slide-85
SLIDE 85

Bounds for Full Conjunctive Query

◮ HDC def

=

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

slide-86
SLIDE 86

Bounds for Full Conjunctive Query

◮ HDC def

=

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

  • ◮ Then,

max

D| =DC log |Q(D)| ≤

max

h∈Γ∗

n∩HDC

h([n]) entropic bound ≤ max

h∈Γn∩HDC h([n])

polymatroid bound ≤ max

h∈SAn∩HDC h([n])

sub-additive bound.

slide-87
SLIDE 87

Size Bounds for Full Conjunctive Queries

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

slide-88
SLIDE 88

Size Bounds for Full Conjunctive Queries

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

slide-89
SLIDE 89

Size Bounds for Full Conjunctive Queries

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

CC + FD only Entropic Bound for FD Polymatroid Bound for FD

[Gottlob et al. JACM’12] [Gottlob et al. JACM’12]

(Tight [Gogacz et al. ICDT’17]) (Not tight

[our work] )

slide-90
SLIDE 90

Size Bounds for Full Conjunctive Queries

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

CC + FD only Entropic Bound for FD Polymatroid Bound for FD

[Gottlob et al. JACM’12] [Gottlob et al. JACM’12]

(Tight [Gogacz et al. ICDT’17]) (Not tight

[our work] )

DC Entropic Bound for DC Polymatroid Bound for DC (Tight

[our work] )

(Not tight

[our work] )

slide-91
SLIDE 91

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

Theorem ( our work )

max

D| =DC log |P(D)| ≤

max

h∈Γ∗

n∩HDC

min

B∈B h(B)

  • Entropic bound

Tight ≤ max

h∈Γn∩HDC min B∈B h(B)

  • Polymatroid bound

Not Tight Imply all known bounds for (Full) Conjunctive Queries!

slide-92
SLIDE 92

Earlier Example

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-93
SLIDE 93

Earlier Example

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41. max

D| =CC log |P123,234(D)| ≤

max

h∈Γn∩CC min{h(A1A2A3), h(A2A3A4)}

≤ max

h∈Γn∩CC

1 2[h(A1A2A3), h(A2A3A4)] ≤ max

h∈Γn∩CC

1 2[h(A1A2) + h(A2A3) + h(A3A4)] ≤ 3 2 log N.

slide-94
SLIDE 94

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-95
SLIDE 95

Roadmap

Given degree constraints DC and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

Answer (Worst-case Output Size Bound)

max

D| =DC log |P(D)| ≤

max

h∈Γn∩HDC min B∈B h(B) = polymatroid bound.

Question (Algorithm)

Compute a model for P within ˜ O(2polymatroid bound)

Question (Gathering fruits)

Plug bound/algorithm into Meta Algorithm, what do we get?

slide-96
SLIDE 96

Connection to Shannon-flow Inequalities

Lemma (Linearize it)

There exists non-negative λ = (λB)B∈B, with λ1 = 1, s.t. max

h∈Γn∩HDC min B∈B h(B) =

max

h∈Γn∩HDC

  • B∈B

λB h(B) (1)

slide-97
SLIDE 97

Connection to Shannon-flow Inequalities

Lemma (Linearize it)

There exists non-negative λ = (λB)B∈B, with λ1 = 1, s.t. max

h∈Γn∩HDC min B∈B h(B) =

max

h∈Γn∩HDC

  • B∈B

λB h(B) (1)

Lemma (Shannon-flow inequality)

There exists δ ≥ 0 s.t. 2polymatroid bound =

  • (X,Y,NY |X)

N

δY |X Y |X , and

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X), ∀h ∈ Γn (2)

slide-98
SLIDE 98

Connection to Shannon-flow Inequalities

Lemma (Linearize it)

There exists non-negative λ = (λB)B∈B, with λ1 = 1, s.t. max

h∈Γn∩HDC min B∈B h(B) =

max

h∈Γn∩HDC

  • B∈B

λB h(B) (1)

Lemma (Shannon-flow inequality)

There exists δ ≥ 0 s.t. 2polymatroid bound =

  • (X,Y,NY |X)

N

δY |X Y |X , and

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X), ∀h ∈ Γn (2) (2) is a (vast) generalization of Shearer’s lemma

slide-99
SLIDE 99

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ What?

◮ Compute a model for our disjunctive datalog rule ◮ Run within ˜

O

  • 2polymatroid bound

= ˜ O  

  • (X,Y,NY |X)

N

δY |X Y |X

 :

slide-100
SLIDE 100

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ What?

◮ Compute a model for our disjunctive datalog rule ◮ Run within ˜

O

  • 2polymatroid bound

= ˜ O  

  • (X,Y,NY |X)

N

δY |X Y |X

 :

◮ How? Proof as symbolic instructions

◮ Construct a Proof Sequence for the corresponding

Shannon-flow inequality

◮ Proof steps → relational operators.

slide-101
SLIDE 101

Proof sequence

Shannon-flow inequality: h(Y |X) def = h(Y ) − h(X), X ⊆ Y

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, convert RHS to LHS using following steps (In)equality Steps (X ⊆ Y ) h(X) + h(Y |X) = h(Y ) h(X) + h(Y |X) → h(Y ) h(Y ) = h(X) + h(Y |X) h(Y ) → h(X) + h(Y |X) h(Y ) ≥ h(X) h(Y ) → h(X) h(Y |X) ≥ h(Y ∪ Z|X ∪ Z) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-102
SLIDE 102

Proof sequence

Shannon-flow inequality: h(Y |X) def = h(Y ) − h(X), X ⊆ Y

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, convert RHS to LHS using following steps (In)equality Steps (X ⊆ Y ) h(X) + h(Y |X) = h(Y ) h(X) + h(Y |X) → h(Y ) h(Y ) = h(X) + h(Y |X) h(Y ) → h(X) + h(Y |X) h(Y ) ≥ h(X) h(Y ) → h(X) h(Y |X) ≥ h(Y ∪ Z|X ∪ Z) h(Y |X) → h(Y ∪ Z|X ∪ Z)

Theorem

There is a proof sequence for every Shannon-flow inequality. The length is data-independent.

slide-103
SLIDE 103

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) h(Y ) → h(X) + h(Y |X) h(Y ) → h(X) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-104
SLIDE 104

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) h(Y ) → h(X) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-105
SLIDE 105

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-106
SLIDE 106

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) (projection) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-107
SLIDE 107

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) (projection) h(Y |X) → h(Y ∪ Z|X ∪ Z) (NOP)

slide-108
SLIDE 108

Proof Steps As Relational Operators

Shannon-flow inequality:

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) Proof sequence, Steps (X ⊆ Y ) Relational Operator h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) (projection) h(Y |X) → h(Y ∪ Z|X ∪ Z) (NOP)

Theorem

PANDA solves any disjunctive datalog rule P in time ˜ O(N + poly(log N) · 2polymatroid bound for P )

slide-109
SLIDE 109

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N

slide-110
SLIDE 110

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

slide-111
SLIDE 111

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

slide-112
SLIDE 112

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

slide-113
SLIDE 113

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
slide-114
SLIDE 114

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
slide-115
SLIDE 115

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
slide-116
SLIDE 116

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
slide-117
SLIDE 117

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
slide-118
SLIDE 118

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
slide-119
SLIDE 119

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
  • h(A1A2) → h(A1A2|A3)
slide-120
SLIDE 120

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)
slide-121
SLIDE 121

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)
  • h(A1A2|A3) + h(A3) → h(A1A2A3)
slide-122
SLIDE 122

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) (polymatroid bound) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
  • (linearize)

≤ 1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
  • (Shannon-flow)

≤ 3 2 log N (Cardinality constraints)

Proof sequence Proof Step h(A1A2) + h(A2A3) + h(A3A4)

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)
  • h(A1A2|A3) + h(A3) → h(A1A2A3)
  • h(A1A2A3) + h(A2A3A4)
slide-123
SLIDE 123

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3)

slide-124
SLIDE 124

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(ℓ)

34 (A3, A4), R(h) 3

(A3)

slide-125
SLIDE 125

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(ℓ)

34 (A3, A4), R(h) 3

(A3) R(ℓ)

34 (A3, A4) → R(ℓ) 34 (A3, A4)

slide-126
SLIDE 126

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(ℓ)

34 (A3, A4), R(h) 3

(A3) R(ℓ)

34 (A3, A4) → R(ℓ) 34 (A3, A4)

R23(A2, A3) ✶ R(ℓ)

34 (A3, A4) → T234(A2, A3, A4)

slide-127
SLIDE 127

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(ℓ)

34 (A3, A4), R(h) 3

(A3) R(ℓ)

34 (A3, A4) → R(ℓ) 34 (A3, A4)

R23(A2, A3) ✶ R(ℓ)

34 (A3, A4) → T234(A2, A3, A4)

R12(A1, A2) → R12(A1, A2)

slide-128
SLIDE 128

Example: P : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(ℓ)

34 (A3, A4), R(h) 3

(A3) R(ℓ)

34 (A3, A4) → R(ℓ) 34 (A3, A4)

R23(A2, A3) ✶ R(ℓ)

34 (A3, A4) → T234(A2, A3, A4)

R12(A1, A2) → R12(A1, A2) R12(A1, A2) ✶ R(h)

3

(A3) → T123(A1, A2, A3)

slide-129
SLIDE 129

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-130
SLIDE 130

Roadmap

Given degree constraints DC and a disjunctive datalog rule P :

  • B∈B

TB ←

  • F∈E

RF

Answer (Worst-case Output Size Bound)

Polymatroid bound max

D| =DC log |P(D)| ≤

max

h∈Γn∩HDC min B∈B h(B)

Answer (Algorithm)

PANDA computes a model for P within ˜ O(2polymatroid bound)

Question (Gathering fruits)

Plug bound/algorithm into Meta Algorithm, what do we get?

slide-131
SLIDE 131

Option 1: Full Conjunctive Query

Database D

Full conjunctive query Qi

Qi(D)

  • F∈E

RF Output Answer T[n] iTime

  • Time

Qo

iTime = max

D| =DC |Qi(D)| ≤

max

h∈Γn∩HDC 2h([n]) = PANDA’s runtime

PANDA is worst-case optimal whenever the polymatroid bound is tight!

slide-132
SLIDE 132

Option 2: A Single Tree Decomposition

Let (T , χ) be a tree decomposition of H to be chosen later

Database D

Multiple Conjunctive Rules Qi

Qi(D)

  • F∈E

RF Output Answer

  • v∈V (T )

Tχ(v) iTime

  • Time

Qo

iTime = max

D| =DC |Qi(D)| ≤ max D| =DC max v∈V (T ) |Pv(D)|

≤ 2maxv∈V (T ) maxh∈Γn∩DC h(χ(v)) = PANDA’s runtime Pick the best (T , χ) before running PANDA: min

(T ,χ) max v∈V (T )

max

h∈Γn∩CC h(χ(v)) ≤ log N · fhtw(Q)

PANDA evaluates Q within ˜ O(Nfhtw(Q))-time.

slide-133
SLIDE 133

Option 3: Multiple Tree Decompositions

Database D

Multiple Disjunctive Datalog Rules! Qi

Qi(D)

  • F∈E

RF Output Answer

  • B
  • B∈B

TB iTime

  • Time

Qo

log iTime = max

D| =DC log |Qi(D)| ≤ max D| =DC max B

log |PB(D)| ≤ max

B

max

h∈Γn∩HDC min B∈B h(B) =

max

h∈Γn∩HDC max B

min

B∈B h(B)

= max

h∈Γn∩HDC max (T ,χ) min v∈V (T ) h(χ(v)) = log(PANDA’s runtime)

max

h∈Γn∩HDC max (T ,χ) min v∈V (T ) h(χ(v)) ≤ log N · subw(Q)

PANDA evaluates Q within ˜ O(Nsubw(Q))-time.

slide-134
SLIDE 134

Quantities of Interests

◮ X ∈ {Γ ∗ n, Γn, SAn} ◮ Y ∈ {HDC, HFD, HCC, log N · ED, log N · VD}

Define LogSizeBoundX∩Y (P) def = max

h∈X∩Y min B∈B h(B)

MinimaxwidthX∩Y (Q) def = min

(T ,χ) max v∈V (T ) max h∈X∩Y h(χ(v)),

MaximinwidthX∩Y (Q) def = max

h∈X∩Y min (T ,χ) max v∈V (T ) h(χ(v)).

slide-135
SLIDE 135

Summary of Bounds

X Y Z Γ

∗ n

Γn SAn HDC HCC ED · log N VD · log N

LogSizeBoundX∩Y (Q)

log2 VB(Q) log2 VB(Q) log2 VB(Q) ρ(Q) · log2 N ρ∗(Q) · log2 N ρ∗(Q) · log2 N ρ(Q, (NF)F∈E) log2 AGM(Q) log2 AGM(Q) DAPB(Q) DAEB(Q)

MinimaxwidthX∩Y (Q)

tw(Q) + 1 tw(Q) + 1 tw(Q) + 1 ghtw(Q) fhtw(Q) fhtw(Q) da-fhtw(Q) eda-fhtw(Q)

MaximinwidthX∩Y (Q)

tw(Q) + 1 tw(Q) + 1 tw(Q) + 1 ghtw(Q) subw(Q) da-subw(Q) eda-subw(Q)

slide-136
SLIDE 136

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC?

slide-137
SLIDE 137

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC? ◮ Worst-case optimal algorithm for full conjunctive queries

under CC ∪ HDC or DC

slide-138
SLIDE 138

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC? ◮ Worst-case optimal algorithm for full conjunctive queries

under CC ∪ HDC or DC

◮ Worst-case optimal algorithm for disjunctive datalog

rules under CC ∪ HDC or DC

slide-139
SLIDE 139

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC? ◮ Worst-case optimal algorithm for full conjunctive queries

under CC ∪ HDC or DC

◮ Worst-case optimal algorithm for disjunctive datalog

rules under CC ∪ HDC or DC

◮ Remove the annoying poly-log factor from PANDA

slide-140
SLIDE 140

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC? ◮ Worst-case optimal algorithm for full conjunctive queries

under CC ∪ HDC or DC

◮ Worst-case optimal algorithm for disjunctive datalog

rules under CC ∪ HDC or DC

◮ Remove the annoying poly-log factor from PANDA ◮ Other choices for the propositional formula in the Meta

Algorithm, perhaps tradding off iTime and ⊗?

slide-141
SLIDE 141

Many Open Questions

◮ Is the entropic bound computable under CC ∪ HDC or DC? ◮ Worst-case optimal algorithm for full conjunctive queries

under CC ∪ HDC or DC

◮ Worst-case optimal algorithm for disjunctive datalog

rules under CC ∪ HDC or DC

◮ Remove the annoying poly-log factor from PANDA ◮ Other choices for the propositional formula in the Meta

Algorithm, perhaps tradding off iTime and ⊗?

◮ What about negations?

slide-142
SLIDE 142

Many Thanks! Questions?

slide-143
SLIDE 143

Table of Contents

Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix

slide-144
SLIDE 144

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

slide-145
SLIDE 145

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-146
SLIDE 146

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-147
SLIDE 147

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-148
SLIDE 148

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-149
SLIDE 149

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3

slide-150
SLIDE 150

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

slide-151
SLIDE 151

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

slide-152
SLIDE 152

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

Inclusive Disjunction

slide-153
SLIDE 153

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

No minimal model requirement

slide-154
SLIDE 154

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

Model size is max(|T123|, |T234|) = 3

slide-155
SLIDE 155

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 1 c 3 2 d 4

Output size is the minimum over all models

slide-156
SLIDE 156

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4

A minimum-sized model of size 2

slide-157
SLIDE 157

Semantics of Our Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4

A minimum-sized model of size 2 Hence, the output size is 2

slide-158
SLIDE 158

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

slide-159
SLIDE 159

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

slide-160
SLIDE 160

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N.

slide-161
SLIDE 161

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-162
SLIDE 162

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41. max

D| =CC |P123,234(D)| ≤ N 3/2

slide-163
SLIDE 163

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41. max

D| =CC |P123,234(D)| ≤ N 3/2

◮ P ′ : T123 ← R12 ∧ R23 ∧ R34 ∧ R41.

slide-164
SLIDE 164

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41. max

D| =CC |P123,234(D)| ≤ N 3/2

◮ P ′ : T123 ← R12 ∧ R23 ∧ R34 ∧ R41. |P ′(D)| = N 2 , for some D

slide-165
SLIDE 165

Output Size of a Disjunctive Datalog Rule

P :

  • B∈B

TB(AB) ←

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

CC : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P123,234 : T123 ∨ T234 ← R12 ∧ R23 ∧ R34 ∧ R41. max

D| =CC |P123,234(D)| ≤ N 3/2

◮ P ′ : T123 ← R12 ∧ R23 ∧ R34 ∧ R41. |P ′(D)| = N 2 , for some D Using Option 3, 4-cycle query answerable in ˜ O(N 3/2)-time, matching [Alon, Yuster, Zwick’97]

slide-166
SLIDE 166

Polymatroid Bound: Examples

Q(A1, A2, A3, A4) ← R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

◮ |R12|, |R23|, |R34|, |R41| ≤ N

|Q| ≤ N 2

slide-167
SLIDE 167

Polymatroid Bound: Examples

Q(A1, A2, A3, A4) ← R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

◮ |R12|, |R23|, |R34|, |R41| ≤ N

|Q| ≤ N 2

log |Q| = h(A1A2A3A4) ≤ h(A1A2) + h(A3A4) ≤ 2 log N

slide-168
SLIDE 168

Polymatroid Bound: Examples

Q(A1, A2, A3, A4) ← R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

◮ |R12|, |R23|, |R34|, |R41| ≤ N

|Q| ≤ N 2

log |Q| = h(A1A2A3A4) ≤ h(A1A2) + h(A3A4) ≤ 2 log N ◮ deg12(A1A2|A1), deg12(A1A2|A2) ≤ D

|Q| ≤ D · N 3/2

slide-169
SLIDE 169

Polymatroid Bound: Examples

Q(A1, A2, A3, A4) ← R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

◮ |R12|, |R23|, |R34|, |R41| ≤ N

|Q| ≤ N 2

log |Q| = h(A1A2A3A4) ≤ h(A1A2) + h(A3A4) ≤ 2 log N ◮ deg12(A1A2|A1), deg12(A1A2|A2) ≤ D

|Q| ≤ D · N 3/2

2 log |Q| = 2h(A1A2A3A4) ≤ h(A2A3) + h(A3A4) + h(A4A1) + h(A2|A1) + h(A1|A2) ≤ 3 log N + 2 log D

slide-170
SLIDE 170

Polymatroid Bound: Examples

Q(A1, A2, A3, A4) ← R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

◮ |R12|, |R23|, |R34|, |R41| ≤ N

|Q| ≤ N 2

log |Q| = h(A1A2A3A4) ≤ h(A1A2) + h(A3A4) ≤ 2 log N ◮ deg12(A1A2|A1), deg12(A1A2|A2) ≤ D

|Q| ≤ D · N 3/2

2 log |Q| = 2h(A1A2A3A4) ≤ h(A2A3) + h(A3A4) + h(A4A1) + h(A2|A1) + h(A1|A2) ≤ 3 log N + 2 log D ◮ A1 → A2,

A2 → A1 |Q| ≤ N 3/2

slide-171
SLIDE 171

Polymatroid Bound is not tight!

slide-172
SLIDE 172

Polymatroid Bound is not tight!

Proof Strategy

slide-173
SLIDE 173

Polymatroid Bound is not tight!

Proof Strategy

◮ Take a non-Shannon inequality (modified from

[Zhang-Yeung’98])

11h(ABXY C) ≤ 3h(XY ) + 3h(AX) + 3h(AY ) + h(BX) + h(BY ) + 5h(C) + h(ABXY C|AB) + h(ABXY C|AC) + 4h(ABXY C|AXY ) + h(ABXY C|BXY ) + 2h(ABXY C|XC) + 2h(ABXY C|Y C) (3)

slide-174
SLIDE 174

Polymatroid Bound is not tight!

Proof Strategy

◮ Take a non-Shannon inequality (modified from

[Zhang-Yeung’98])

11h(ABXY C) ≤ 3h(XY ) + 3h(AX) + 3h(AY ) + h(BX) + h(BY ) + 5h(C) + h(ABXY C|AB) + h(ABXY C|AC) + 4h(ABXY C|AXY ) + h(ABXY C|BXY ) + 2h(ABXY C|XC) + 2h(ABXY C|Y C) (3) ◮ Construct a query Q where (3) gives the (entropic) bound

11 log |Q| = 11h(ABXY C) ≤ 11 log N 3 + 5 log N 2 = 43 log N

slide-175
SLIDE 175

Polymatroid Bound is not tight!

Proof Strategy

◮ Take a non-Shannon inequality (modified from

[Zhang-Yeung’98])

11h(ABXY C) ≤ 3h(XY ) + 3h(AX) + 3h(AY ) + h(BX) + h(BY ) + 5h(C) + h(ABXY C|AB) + h(ABXY C|AC) + 4h(ABXY C|AXY ) + h(ABXY C|BXY ) + 2h(ABXY C|XC) + 2h(ABXY C|Y C) (3) ◮ Construct a query Q where (3) gives the (entropic) bound

11 log |Q| = 11h(ABXY C) ≤ 11 log N 3 + 5 log N 2 = 43 log N

◮ Construct a polymatroid h satisfying Q’s constraints such that

11h(ABXY C) = 44 log N

slide-176
SLIDE 176

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

slide-177
SLIDE 177

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

slide-178
SLIDE 178

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost
slide-179
SLIDE 179

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

slide-180
SLIDE 180

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

slide-181
SLIDE 181

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13]

slide-182
SLIDE 182

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q)

slide-183
SLIDE 183

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c
slide-184
SLIDE 184

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c

◮ Our goals

slide-185
SLIDE 185

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c

◮ Our goals

Any Q ⇒ ˜ O

  • N da- subw(Q) ×1 + |output|
slide-186
SLIDE 186

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t))

slide-187
SLIDE 187

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

slide-188
SLIDE 188

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t))

slide-189
SLIDE 189

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t)) subw(Q)

def

= max

h∈ED∩Γn

min

(T,χ)

max

t∈V (T)

h(χ(t))

slide-190
SLIDE 190

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t)) subw(Q)

def

= max

h∈ED∩Γn

min

(T,χ)

max

t∈V (T)

h(χ(t)) subw(Q) ≤ fhtw(Q)