Solving linear programs on factorized databases Nicolas Crosetti - - PowerPoint PPT Presentation

solving linear programs on factorized databases
SMART_READER_LITE
LIVE PREVIEW

Solving linear programs on factorized databases Nicolas Crosetti - - PowerPoint PPT Presentation

Solving linear programs on factorized databases Nicolas Crosetti joint work with F. Capelli, J. Niehren and J. Ramon December 17, 2019 Inria Lille (Links & Magnet) Motivating example Example: optimizing time spent by pairs of researchers


slide-1
SLIDE 1

Solving linear programs on factorized databases

Nicolas Crosetti joint work with F. Capelli, J. Niehren and J. Ramon December 17, 2019

Inria Lille (Links & Magnet)

slide-2
SLIDE 2

Motivating example

Example: optimizing time spent by pairs of researchers and developers on projects without overloading them.

  • Table Project(pname, field, language)
  • Table Researcher(rname, field)
  • Table Developer(dname, language)

All possible assignments are given by the query Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l)

1

slide-3
SLIDE 3

Motivating example: Linear program

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • (p,r,d)∈Q

ω(p, r, d) subject to ∀r ∈ Drname,

  • p,d

ω(p, r, d) ≤ 100. ∀d ∈ Ddname,

  • p,r

ω(p, r, d) ≤ 100.

2

slide-4
SLIDE 4

Motivating example: Complexity

Assume that each table is of size bounded by N, then the size of Q(D) is N3 in the worst case. Thus the linear program can have up to N3 variables in the worst case. Result Through a change of variables, and using knowledge compilation techniques, we are able to defjne an equivalent linear program with O(N) variables.

3

slide-5
SLIDE 5

Formal statement of the problem

Given a database D and a query Q, we want to solve a linear program LP whose variables are the answers of Q(D).

  • Can we avoid the blow-up of answering Q(D) explicitly?
  • Which class of linear programs are we able to handle?

4

slide-6
SLIDE 6

Formal statement of the problem

Given a database D and a query Q, we want to solve a linear program LP whose variables are the answers of Q(D).

  • Can we avoid the blow-up of answering Q(D) explicitly?
  • Which class of linear programs are we able to handle?

4

slide-7
SLIDE 7

CAS-LPs: defjnition

Which class of linear programs are we able to handle? Defjnition A CAS-LP on attributes X and domain D is a linear program on variables S(X, D) := {Sx,d}x∈X,d∈D, called the CAS-variables such that Sx,d =

  • τ∈R

τ(x)=d

ω(τ).

5

slide-8
SLIDE 8

CAS-LPs: example

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • (p,r,d)∈Q

ω(p, r, d) subject to ∀r ∈ Drname,

  • p,d

ω(p, r, d) ≤ 100. ∀d ∈ Ddname,

  • p,r

ω(p, r, d) ≤ 100.

6

slide-9
SLIDE 9

CAS-LPs: example

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • (p,r,d)∈Q

ω(p, r, d) subject to ∀v ∈ Drname, Sr,v ≤ 100 ∀d ∈ Ddname,

  • p,r

ω(p, r, d) ≤ 100.

6

slide-10
SLIDE 10

CAS-LPs: example

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • (p,r,d)∈Q

ω(p, r, d) subject to ∀v ∈ Drname, Sr,v ≤ 100 ∀v ∈ Ddname, Sd,v ≤ 100

6

slide-11
SLIDE 11

CAS-LPs: example

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • v∈pname

Sp,v subject to ∀v ∈ Drname, Sr,v ≤ 100 ∀v ∈ Ddname, Sd,v ≤ 100

6

slide-12
SLIDE 12

CAS-LPs: example

Q(p, r, d) = ∃f, ∃l, Project(p, f, l) ∧ Researcher(r, f) ∧ Developer(d, l) maximize

  • v∈pname

Sp,v subject to ∀v ∈ Drname, Sr,v ≤ 100 ∀v ∈ Ddname, Sd,v ≤ 100 However, CAS-LPs cannot be solved directly

6

slide-13
SLIDE 13

Solving CAS-LPs: counter-example

x y τ0 1 τ1 maximize Sx,0 subject to 0 ≤ Sy,0 ≤ 1 0 ≤ Sy,1 ≤ 1 maximize ω(τ0) + ω(τ1) subject to 0 ≤ ω(τ0) ≤ 1 0 ≤ ω(τ1) ≤ 1

7

slide-14
SLIDE 14
  • Can we avoid the blow-up of answering Q(D) explicitly?
  • Which class of linear programs are we able to handle?
slide-15
SLIDE 15

{⊎, ×}-circuits

Boolean circuits with:

  • inputs labelled with f/d for a variable f and d ∈ D

x y z 1 1 1 1 1 1 1 1 1 1 ∪ × × × y/0 x/1 ∪ y/1 x/0 z/1 z/0

8

slide-16
SLIDE 16

{⊎, ×}-circuits

Boolean circuits with:

  • inputs labelled with f = d for a variable f and d ∈ D
  • disjoint union ⊎
  • Cartesian product

⊎ × × × y/0 x/1 ⊎ y/1 x/0 z/1 z/0

9

slide-17
SLIDE 17

{⊎, ×}-circuits

Boolean circuits with:

  • inputs labelled with f = d for a variable f and d ∈ D
  • disjoint union ⊎
  • Cartesian product ×

⊎ × × × y/0 x/1 ⊎ y/1 x/0 z/1 z/0

9

slide-18
SLIDE 18

Our strategy

Query Database CAS-LP

10

slide-19
SLIDE 19

Our strategy

Query Database Circuit CAS-LP

compile

10

slide-20
SLIDE 20

Our strategy

Query Database Circuit CAS-LP LP

compile rewrite CAS-LP

10

slide-21
SLIDE 21

Our strategy

Query Database Circuit CAS-LP LP Result

compile rewrite CAS-LP solve

The number of variables in the rewritten LP is linear in the size

  • f the circuit.

10

slide-22
SLIDE 22

Our strategy

Query Database Circuit CAS-LP LP Result

compile rewrite CAS-LP solve

The number of variables in the rewritten LP is linear in the size

  • f the circuit.

Thus the bottleneck is the compilation phase.

10

slide-23
SLIDE 23

Compilation phase

Query Database Circuit CAS-LP LP Result

compile rewrite CAS-LP solve

A few classes of database queries can be compiled effjciently into succinct circuits. For example an acyclic conjunctive query can be compiled in linear time into a linear size circuit.

11

slide-24
SLIDE 24

Compilation phase

Query Database Circuit CAS-LP LP Result

compile rewrite CAS-LP solve

A few classes of database queries can be compiled effjciently into succinct circuits. For example an acyclic conjunctive query can be compiled in linear time into a linear size circuit.

11

slide-25
SLIDE 25

Constructing a smaller LP

We build an equivalent linear program with the edges of the circuit as variables. For every Sx,d in the LP, add a constraint Sx,d =

e∈Out(x/d) We.

For every

  • gate of C, Wi1

Win Wo1 Won. For every

  • gate of C, Wi1

Wi2 Wo1 Won.

p1 ... pn x/d

  • 1
  • n

12

slide-26
SLIDE 26

Constructing a smaller LP

For every Sx,d in the LP, add a constraint Sx,d =

e∈Out(x/d) We.

For every ⊎-gate of C, Wi1 + ... + Win = Wo1 + ... + Won. For every

  • gate of C, Wi1

Wi2 Wo1 Won.

p1 ... pn ⊎ c1 ... cm

  • 1
  • n

i1 im

12

slide-27
SLIDE 27

Constructing a smaller LP

For every Sx,d in the LP, add a constraint Sx,d =

e∈Out(x/d) We.

For every ⊎-gate of C, Wi1 + ... + Win = Wo1 + ... + Won. For every ×-gate of C, Wi1 = Wi2 = Wo1 + ... + Won.

p1 ... pn × c1 c2

  • 1
  • n

i1 i2

12

slide-28
SLIDE 28

Extensions of this result

Limitations:

  • CAS-LPs are restrictive
  • Input data is limited to the answer of a single query at a

time Further extensions:

  • Designing a language to express linear programs on

databases

  • Implementing our approach as a PostgreSQL plugin

13

slide-29
SLIDE 29

Thank you for your attention.

14