What do Shannon-type Inequalities, Submodular Width, and - - PowerPoint PPT Presentation

what do shannon type inequalities submodular width and
SMART_READER_LITE
LIVE PREVIEW

What do Shannon-type Inequalities, Submodular Width, and - - PowerPoint PPT Presentation

What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another? Mahmoud Abo Khamis 1 Hung Q. Ngo 1 Dan Suciu 1 , 2 1 LogicBlox Inc. 2 University of Washington PODS 2017 1/22 Contributions A Join


slide-1
SLIDE 1

1/22

What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another?

Mahmoud Abo Khamis1 Hung Q. Ngo1 Dan Suciu1,2

1LogicBlox Inc. 2University of Washington

PODS 2017

slide-2
SLIDE 2

2/22

Contributions

◮ A Join Algorithm

slide-3
SLIDE 3

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound!

slide-4
SLIDE 4

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog.

slide-5
SLIDE 5

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog. ◮ fully utilizes Functional DEPs and Degree Bounds.

slide-6
SLIDE 6

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog. ◮ fully utilizes Functional DEPs and Degree Bounds.

◮ A Unified Framework for Join Bounds

slide-7
SLIDE 7

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog. ◮ fully utilizes Functional DEPs and Degree Bounds.

◮ A Unified Framework for Join Bounds

◮ subsumes most known bounds.

slide-8
SLIDE 8

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog. ◮ fully utilizes Functional DEPs and Degree Bounds.

◮ A Unified Framework for Join Bounds

◮ subsumes most known bounds. ◮ extends them to Functional DEPs and Degree Bounds.

slide-9
SLIDE 9

2/22

Contributions

◮ A Join Algorithm

◮ first to meet the Submodular Width bound! ◮ works for and relies on Disjunctive Datalog. ◮ fully utilizes Functional DEPs and Degree Bounds.

◮ A Unified Framework for Join Bounds

◮ subsumes most known bounds. ◮ extends them to Functional DEPs and Degree Bounds.

◮ Results on Shannon-type Inequalities

slide-10
SLIDE 10

3/22

Table of Contents

Size Bounds for Full Conjunctive Queries Size Bounds for Disjunctive Datalog Algorithms for Disjunctive Datalog Algorithms for Conjunctive Queries

slide-11
SLIDE 11

4/22

Table of Contents

Size Bounds for Full Conjunctive Queries Size Bounds for Disjunctive Datalog Algorithms for Disjunctive Datalog Algorithms for Conjunctive Queries

slide-12
SLIDE 12

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

slide-13
SLIDE 13

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-14
SLIDE 14

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-15
SLIDE 15

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-16
SLIDE 16

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4

slide-17
SLIDE 17

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 h(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4

slide-18
SLIDE 18

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 h(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 h(A1A2) ≤ log |R12|, h(A2A3) ≤ log |R23|, h(A3A4) ≤ log |R34|, ...

slide-19
SLIDE 19

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 h(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 h(A1A2) ≤ log |R12|, h(A2A3) ≤ log |R23|, h(A3A4) ≤ log |R34|, ... h(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

h(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • ,

...

slide-20
SLIDE 20

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 h(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 h(A1A2) ≤ log |R12|, h(A2A3) ≤ log |R23|, h(A3A4) ≤ log |R34|, ... h(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

h(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • ,

... h(A2|A1) ≤ log max

x

  • σA1=xR12
slide-21
SLIDE 21

5/22

Size Bounds for Full Conjunctive Queries

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41

A1 A2 A3 A4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 h(A1A2A3A4) = log |Q| A1 A2 a 1 1/4 b 1 2/4 b 2 1/4 A2 A3 1 c 1/4 1 d 2/4 2 c 1/4 A3 A4 c 3 2/4 d 4 2/4 d 5 A4 A1 3 b 2/4 4 a 1/4 4 b 1/4 h(A1A2) ≤ log |R12|, h(A2A3) ≤ log |R23|, h(A3A4) ≤ log |R34|, ... h(A2|A1 = ‘a’) ≤ log

  • σA1=‘a’R12
  • ,

h(A2|A1 = ‘b’) ≤ log

  • σA1=‘b’R12
  • ,

... h(A2|A1) ≤ log max

x

  • σA1=xR12
  • degR12(A2|A1)
slide-22
SLIDE 22

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

slide-23
SLIDE 23

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes

slide-24
SLIDE 24

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

slide-25
SLIDE 25

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

slide-26
SLIDE 26

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

slide-27
SLIDE 27

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

slide-28
SLIDE 28

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

slide-29
SLIDE 29

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

Bound Idea

slide-30
SLIDE 30

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

Bound Idea log |Q| ≤ maximum h(A1, . . . , An)

slide-31
SLIDE 31

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

Bound Idea log |Q| ≤ maximum h(A1, . . . , An)

  • ver all

entropies h

slide-32
SLIDE 32

6/22

Size Bounds: Input

◮ A full conjunctive query

Q(A[n]) :-

  • F∈E

RF (AF )

◮ A[n] = {A1, . . . , An} is the full set of attributes ◮ AF = {Af | f ∈ F ⊆ [n]} is a subset

◮ Degree Constraints (DC):

degF (AY |AX) ≤ NY |X, X ⊂ Y ⊆ F ∈ E

◮ Cardinality Constraints (CC):

|RF | ≤ NF |∅

◮ Functional Dependencies (FD):

AX → AY

Bound Idea log |Q| ≤ maximum h(A1, . . . , An)

  • ver all

entropies h such that h satisfies degree constraints of Q

slide-33
SLIDE 33

7/22

Size Bounds: Preliminaries

◮ HDC is the set of functions h : 2[n] → R+ satisfying the

degree constraints HDC def =

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

slide-34
SLIDE 34

7/22

Size Bounds: Preliminaries

◮ HDC is the set of functions h : 2[n] → R+ satisfying the

degree constraints HDC def =

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

  • ◮ Γ∗

n is the set of entropic functions

slide-35
SLIDE 35

7/22

Size Bounds: Preliminaries

◮ HDC is the set of functions h : 2[n] → R+ satisfying the

degree constraints HDC def =

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

  • ◮ Γ∗

n is the set of entropic functions ◮ Γ ∗ n is the topological closure of Γ∗ n

slide-36
SLIDE 36

7/22

Size Bounds: Preliminaries

◮ HDC is the set of functions h : 2[n] → R+ satisfying the

degree constraints HDC def =

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

  • ◮ Γ∗

n is the set of entropic functions ◮ Γ ∗ n is the topological closure of Γ∗ n ◮ Γn is the set of polymatroids, i.e. functions h : 2[n] → R+

satisfying

h(X ∪ Y ) + h(X ∩ Y ) ≤ h(X) + h(Y ), X, Y ⊆ [n] (submodularity) h(X) ≤ h(Y ), X ⊆ Y ⊆ [n] (monotonicity) h(∅) = 0 (strictness)

slide-37
SLIDE 37

7/22

Size Bounds: Preliminaries

◮ HDC is the set of functions h : 2[n] → R+ satisfying the

degree constraints HDC def =

  • h

| h(Y |X) ≤ log NY |X, ∀(X, Y, NY |X)

  • ◮ Γ∗

n is the set of entropic functions ◮ Γ ∗ n is the topological closure of Γ∗ n ◮ Γn is the set of polymatroids, i.e. functions h : 2[n] → R+

satisfying

h(X ∪ Y ) + h(X ∩ Y ) ≤ h(X) + h(Y ), X, Y ⊆ [n] (submodularity) h(X) ≤ h(Y ), X ⊆ Y ⊆ [n] (monotonicity) h(∅) = 0 (strictness)

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids
slide-38
SLIDE 38

8/22

Size Bounds for Full Conjunctive Queries

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids
slide-39
SLIDE 39

8/22

Size Bounds for Full Conjunctive Queries

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

slide-40
SLIDE 40

8/22

Size Bounds for Full Conjunctive Queries

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

slide-41
SLIDE 41

8/22

Size Bounds for Full Conjunctive Queries

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

CC + FD only Entropic Bound for FD Polymatroid Bound for FD

[Gottlob et al. JACM’12] [Gottlob et al. JACM’12]

(Tight [Gogacz et al. ICDT’17]) (Not tight

[Our work] )

slide-42
SLIDE 42

8/22

Size Bounds for Full Conjunctive Queries

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Bound Entropic Bound Polymatroid Bound Definition

log |Q| ≤ max

h∈Γ∗

n∩HDC h([n])

log |Q| ≤ max

h∈Γn∩HDC h([n])

CC only AGM bound (Tight) AGM bound (Tight)

[Atserias et al. FOCS’08] [Atserias et al. FOCS’08]

CC + FD only Entropic Bound for FD Polymatroid Bound for FD

[Gottlob et al. JACM’12] [Gottlob et al. JACM’12]

(Tight [Gogacz et al. ICDT’17]) (Not tight

[Our work] )

DC Entropic Bound for DC Polymatroid Bound for DC (Tight

[Our work] )

(Not tight

[Our work] )

slide-43
SLIDE 43

9/22

Table of Contents

Size Bounds for Full Conjunctive Queries Size Bounds for Disjunctive Datalog Algorithms for Disjunctive Datalog Algorithms for Conjunctive Queries

slide-44
SLIDE 44

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

slide-45
SLIDE 45

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

slide-46
SLIDE 46

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

slide-47
SLIDE 47

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

slide-48
SLIDE 48

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b

slide-49
SLIDE 49

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3

slide-50
SLIDE 50

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 2 c 3 2 d 4

slide-51
SLIDE 51

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 2 c 3 2 d 4

slide-52
SLIDE 52

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 2 c 3 2 d 4

Model size is max(|T123|, |T234|) = 3

slide-53
SLIDE 53

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4 2 c 3 2 d 4

Output size is the minimum over all models

slide-54
SLIDE 54

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4

A minimum-sized model of size 2

slide-55
SLIDE 55

10/22

Disjunctive Datalog

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF )

A1 A3 A2 A4 R12 R23 R34 R41

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A2 a 1 b 1 b 2 A2 A3 1 c 1 d 2 c A3 A4 c 3 d 4 d 5 A4 A1 3 b 4 a 4 b A1 A2 A3 A4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A1 A2 A3 b 1 c b 2 c A2 A3 A4 1 d 4

A minimum-sized model of size 2 ⇒ Output size is 2

slide-56
SLIDE 56

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

slide-57
SLIDE 57

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

slide-58
SLIDE 58

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

D : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N.

slide-59
SLIDE 59

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

D : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

slide-60
SLIDE 60

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

D : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). |P(D)| ≤ N 3/2 , for all D

slide-61
SLIDE 61

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

D : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). |P(D)| ≤ N 3/2 , for all D ◮ P ′ : T123(A1, A2, A3) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

slide-62
SLIDE 62

11/22

Disjunctive Datalog: Output Size

P :

  • B∈B

TB(AB) :-

  • F∈E

RF (AF ) |P(D)| def = min

T:T| =P max B∈B |TB|

A1 A3 A2 A4 R12 R23 R34 R41

D : |R12| ≤ N, |R23| ≤ N, |R34| ≤ N, |R41| ≤ N. ◮ P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). |P(D)| ≤ N 3/2 , for all D ◮ P ′ : T123(A1, A2, A3) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). |P ′(D)| = N 2 , for some D

slide-63
SLIDE 63

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

slide-64
SLIDE 64

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

Recall that:

◮ HDC is the set of functions h : 2[n] → R+ satisfying the degree

constraints

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids
slide-65
SLIDE 65

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

Recall that:

◮ HDC is the set of functions h : 2[n] → R+ satisfying the degree

constraints

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Theorem

log |P(D)|

slide-66
SLIDE 66

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

Recall that:

◮ HDC is the set of functions h : 2[n] → R+ satisfying the degree

constraints

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Theorem

log |P(D)| ≤ max

h∈Γ

∗ n∩HDC

min

B∈B h(B)

  • entropic bound
slide-67
SLIDE 67

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

Recall that:

◮ HDC is the set of functions h : 2[n] → R+ satisfying the degree

constraints

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Theorem

log |P(D)| ≤ max

h∈Γ

∗ n∩HDC

min

B∈B h(B)

  • entropic bound

≤ max

h∈Γn∩HDC

min

B∈B h(B)

  • polymatroid bound
slide-68
SLIDE 68

12/22

Disjunctive Datalog: Size Bounds

P :

  • B∈B

TB(AB) :-

  • F ∈E

RF (AF ) |P(D)|

def

= min

T:T| =P max B∈B|TB|

Recall that:

◮ HDC is the set of functions h : 2[n] → R+ satisfying the degree

constraints

Γ∗

n

  • entropic functions

⊂ Γ

∗ n

  • topological closure of Γ∗

n

⊂ Γn

  • polymatroids

Theorem

log |P(D)| ≤ max

h∈Γ

∗ n∩HDC

min

B∈B h(B)

  • entropic bound

(asymptotically tight!)

≤ max

h∈Γn∩HDC

min

B∈B h(B)

  • polymatroid bound
slide-69
SLIDE 69

13/22

Table of Contents

Size Bounds for Full Conjunctive Queries Size Bounds for Disjunctive Datalog Algorithms for Disjunctive Datalog Algorithms for Conjunctive Queries

slide-70
SLIDE 70

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

slide-71
SLIDE 71

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model

slide-72
SLIDE 72

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model ◮ within the polymatroid bound:

slide-73
SLIDE 73

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model ◮ within the polymatroid bound: ◮ the worst-case size of the minimum model.

slide-74
SLIDE 74

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model ◮ within the polymatroid bound: ◮ the worst-case size of the minimum model.

◮ Outline

slide-75
SLIDE 75

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model ◮ within the polymatroid bound: ◮ the worst-case size of the minimum model.

◮ Outline

◮ Construct a Proof Sequence for the bound.

slide-76
SLIDE 76

14/22

PANDA (Proof-Assisted eNtropic Degree-Aware)

◮ An algorithm for disjunctive datalog

◮ computes a model ◮ within the polymatroid bound: ◮ the worst-case size of the minimum model.

◮ Outline

◮ Construct a Proof Sequence for the bound. ◮ Interpret each proof step as an algorithmic step.

slide-77
SLIDE 77

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B)

slide-78
SLIDE 78

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B) ◮

max

h∈Γn∩HDC min B∈Bh(B) =

max

h∈Γn∩HDC

  • B∈B

λBh(B)

slide-79
SLIDE 79

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B) ◮

max

h∈Γn∩HDC min B∈Bh(B) =

max

h∈Γn∩HDC

  • B∈B

λBh(B)

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) ≤

log NY |X

slide-80
SLIDE 80

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B) ◮

max

h∈Γn∩HDC min B∈Bh(B) =

max

h∈Γn∩HDC

  • B∈B

λBh(B)

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) ≤

log NY |X ◮ Proof Sequence

Given X ⊆ Y : h(X) + h(Y |X) → h(Y ) h(Y ) → h(X) + h(Y |X) h(Y ) → h(X) h(Y |X) → h(Y ∪ Z|X ∪ Z)

slide-81
SLIDE 81

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B) ◮

max

h∈Γn∩HDC min B∈Bh(B) =

max

h∈Γn∩HDC

  • B∈B

λBh(B)

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) ≤

log NY |X ◮ Proof Sequence

Given X ⊆ Y : h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) (projection) h(Y |X) → h(Y ∪ Z|X ∪ Z) (nothing)

◮ Algorithmic Sequence

slide-82
SLIDE 82

15/22

PANDA

◮ Polymatroid bound:

max

h∈Γn∩HDC min B∈B h(B) ◮

max

h∈Γn∩HDC min B∈Bh(B) =

max

h∈Γn∩HDC

  • B∈B

λBh(B)

  • B∈B

λB · h(B) ≤

  • (X,Y,NY |X)

δY |X · h(Y |X) ≤

log NY |X ◮ Proof Sequence

Given X ⊆ Y : h(X) + h(Y |X) → h(Y ) (join) h(Y ) → h(X) + h(Y |X) (data partition) h(Y ) → h(X) (projection) h(Y |X) → h(Y ∪ Z|X ∪ Z) (nothing)

◮ Algorithmic Sequence

Theorem

PANDA solves any disjunctive datalog rule P in time within the polymatroid bound of P.

slide-83
SLIDE 83

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

slide-84
SLIDE 84

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N

slide-85
SLIDE 85

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

slide-86
SLIDE 86

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4))

slide-87
SLIDE 87

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)
slide-88
SLIDE 88

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)
slide-89
SLIDE 89

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

slide-90
SLIDE 90

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4)

slide-91
SLIDE 91

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
slide-92
SLIDE 92

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)
slide-93
SLIDE 93

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
slide-94
SLIDE 94

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)
slide-95
SLIDE 95

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
slide-96
SLIDE 96

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)
slide-97
SLIDE 97

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)

  • h(A1A2) → h(A1A2|A3)
slide-98
SLIDE 98

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)

  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)
slide-99
SLIDE 99

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)

  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)

  • h(A1A2|A3) + h(A3) → h(A1A2A3)
slide-100
SLIDE 100

16/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2 log |P| ≤ min(h(A1A2A3), h(A2A3A4)) ≤ 1 2

  • h(A1A2A3) + h(A2A3A4)

1 2

  • h(A1A2) + h(A2A3) + h(A3A4)

3 2 log N

h(A1A2) + h(A2A3) + h(A3A4) →

  • h(A3A4) → h(A4|A3) + h(A3)
  • h(A1A2) + h(A2A3) + h(A4|A3) + h(A3)

  • h(A4|A3) → h(A4|A2A3)
  • h(A1A2) + h(A2A3) + h(A4|A2A3) + h(A3)

  • h(A2A3) + h(A4|A2A3) → h(A2A3A4)
  • h(A1A2) + h(A2A3A4) + h(A3)

  • h(A1A2) → h(A1A2|A3)
  • h(A1A2|A3) + h(A2A3A4) + h(A3)

  • h(A1A2|A3) + h(A3) → h(A1A2A3)
  • h(A1A2A3) + h(A2A3A4)
slide-101
SLIDE 101

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3)

slide-102
SLIDE 102

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(l)

34 (A3, A4), R(h) 3

(A3)

slide-103
SLIDE 103

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(l)

34 (A3, A4), R(h) 3

(A3) R(l)

34 (A3, A4) → R(l) 34 (A3, A4)

slide-104
SLIDE 104

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(l)

34 (A3, A4), R(h) 3

(A3) R(l)

34 (A3, A4) → R(l) 34 (A3, A4)

R23(A2, A3) ✶ R(l)

34 (A3, A4) → T234(A2, A3, A4)

slide-105
SLIDE 105

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(l)

34 (A3, A4), R(h) 3

(A3) R(l)

34 (A3, A4) → R(l) 34 (A3, A4)

R23(A2, A3) ✶ R(l)

34 (A3, A4) → T234(A2, A3, A4)

R12(A1, A2) → R12(A1, A2)

slide-106
SLIDE 106

17/22

PANDA: Example

P : T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

|R12|, |R23|, |R34|, |R41| ≤ N ⇒ |P| ≤ N 3/2

h(A3A4) → h(A4|A3) + h(A3) h(A4|A3) → h(A4|A2A3) h(A2A3) + h(A4|A2A3) → h(A2A3A4) h(A1A2) → h(A1A2|A3) h(A1A2|A3) + h(A3) → h(A1A2A3) R34(A3, A4) → R(l)

34 (A3, A4), R(h) 3

(A3) R(l)

34 (A3, A4) → R(l) 34 (A3, A4)

R23(A2, A3) ✶ R(l)

34 (A3, A4) → T234(A2, A3, A4)

R12(A1, A2) → R12(A1, A2) R12(A1, A2) ✶ R(h)

3

(A3) → T123(A1, A2, A3)

slide-107
SLIDE 107

18/22

Table of Contents

Size Bounds for Full Conjunctive Queries Size Bounds for Disjunctive Datalog Algorithms for Disjunctive Datalog Algorithms for Conjunctive Queries

slide-108
SLIDE 108

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

slide-109
SLIDE 109

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

slide-110
SLIDE 110

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost
slide-111
SLIDE 111

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

slide-112
SLIDE 112

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

slide-113
SLIDE 113

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13]

slide-114
SLIDE 114

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q)

slide-115
SLIDE 115

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c
slide-116
SLIDE 116

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c

◮ Our goals

slide-117
SLIDE 117

19/22

Beyond Worst-case Optimality

◮ Output-sensitive algorithms

˜ O

  • N d

+ |output|

  • Intrinsic Cost

Output Cost

◮ Submodular width as a candidate for d

[Marx JACM’13] FPT ⇔ Bounded subw(Q) Boolean Q ⇒ ˜ O

  • Nsubw(Q) ×c

◮ Our goals

Any Q ⇒ ˜ O

  • N da- subw(Q) ×1 + |output|
slide-118
SLIDE 118

20/22

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t))

slide-119
SLIDE 119

20/22

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

slide-120
SLIDE 120

20/22

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t))

slide-121
SLIDE 121

20/22

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t)) subw(Q)

def

= max

h∈ED∩Γn

min

(T,χ)

max

t∈V (T)

h(χ(t))

slide-122
SLIDE 122

20/22

Submodular Width

fhtw(Q)

def

= min

(T,χ)

max

t∈V (T)

ρ∗(χ(t)) = min

(T,χ)

max

t∈V (T)

max

h∈ED∩Γnh(χ(t))

= min

(T,χ)

max

h∈ED∩Γn

max

t∈V (T)

h(χ(t)) subw(Q)

def

= max

h∈ED∩Γn

min

(T,χ)

max

t∈V (T)

h(χ(t)) subw(Q) ≤ fhtw(Q)

slide-123
SLIDE 123

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41

slide-124
SLIDE 124

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3

slide-125
SLIDE 125

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-126
SLIDE 126

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

fhtw(Q) = 2 A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-127
SLIDE 127

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

fhtw(Q) = 2 subw(Q) = 3/2 A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-128
SLIDE 128

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

fhtw(Q) = 2 subw(Q) = 3/2 min( max(h(A1A2A3), h(A3A4A1)), max(h(A4A1A2), h(A2A3A4))) ≤ 3/2 A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-129
SLIDE 129

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

fhtw(Q) = 2 subw(Q) = 3/2 min( max(h(A1A2A3), h(A3A4A1)), max(h(A4A1A2), h(A2A3A4))) ≤ 3/2 min(h(A1A2A3), h(A4A1A2)) ≤ 3/2 min(h(A1A2A3), h(A2A3A4)) ≤ 3/2 min(h(A3A4A1), h(A4A1A2)) ≤ 3/2 min(h(A3A4A1), h(A2A3A4)) ≤ 3/2 A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-130
SLIDE 130

21/22

Submodular Width: Example

fhtw(Q)

def

= min

(T,χ) max t∈V (T )ρ∗(χ(t)),

subw(Q)

def

= max

h∈ED∩Γn min (T,χ) max t∈V (T )h(χ(t))

Q(A1, A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1).

T123(A1, A2, A3) ∨ T412(A4, A1, A2) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). T123(A1, A2, A3) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). T341(A3, A4, A1) ∨ T412(A4, A1, A2) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). T341(A3, A4, A1) ∨ T234(A2, A3, A4) :- R12(A1, A2), R23(A2, A3), R34(A3, A4), R41(A4, A1). A1 A3 A2 A4 R12 R23 R34 R41 A1 A2 A3 A1 A4 A3 A2 A3 A4 A2 A1 A4

slide-131
SLIDE 131

22/22

Summary of Bounds

X Y Z Γ

∗ n

Γn SAn HDC HCC ED · log N VD · log N

LogSizeBoundX∩Y (Q)

log2 VB(Q) log2 VB(Q) log2 VB(Q) ρ(Q) · log2 N ρ∗(Q) · log2 N ρ∗(Q) · log2 N ρ(Q, (NF)F∈E) log2 AGM(Q) log2 AGM(Q) DAPB(Q) DAEB(Q)

MinimaxwidthX∩Y (Q)

tw(Q) + 1 tw(Q) + 1 tw(Q) + 1 ghtw(Q) fhtw(Q) fhtw(Q) da-fhtw(Q) eda-fhtw(Q)

MaximinwidthX∩Y (Q)

tw(Q) + 1 tw(Q) + 1 tw(Q) + 1 ghtw(Q) subw(Q) da-subw(Q) eda-subw(Q)