 
              Solving linear programs on factorized databases Nicolas Crosetti joint work with F. Capelli, J. Niehren and J. Ramon December 17, 2019 Inria Lille (Links & Magnet)
Motivating example Example: optimizing time spent by pairs of researchers and developers on projects without overloading them. • Table Project(pname, field, language) • Table Researcher(rname, field) • Table Developer(dname, language) All possible assignments are given by the query 1 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l )
Motivating example: Linear program maximize 2 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � ω ( p , r , d ) ( p , r , d ) ∈ Q � subject to ∀ r ∈ D rname , ω ( p , r , d ) ≤ 100 . p , d � ∀ d ∈ D dname , ω ( p , r , d ) ≤ 100 . p , r
Motivating example: Complexity Assume that each table is of size bounded by N , worst case. Result Through a change of variables, and using knowledge compilation techniques, we are able to defjne an equivalent 3 then the size of Q ( D ) is N 3 in the worst case. Thus the linear program can have up to N 3 variables in the linear program with O ( N ) variables.
Formal statement of the problem Given a database D and a query Q , we want to solve a linear • Which class of linear programs are we able to handle? 4 program LP whose variables are the answers of Q ( D ) . • Can we avoid the blow-up of answering Q ( D ) explicitly?
Formal statement of the problem Given a database D and a query Q , we want to solve a linear • Which class of linear programs are we able to handle? 4 program LP whose variables are the answers of Q ( D ) . • Can we avoid the blow-up of answering Q ( D ) explicitly?
CAS-LPs: defjnition Which class of linear programs are we able to handle? Defjnition A CAS-LP on attributes X and domain D is a linear program on 5 variables S ( X , D ) := { S x , d } x ∈ X , d ∈ D , called the CAS-variables such that S x , d = � ω ( τ ) . τ ∈ R τ ( x )= d
CAS-LPs: example maximize 6 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � ω ( p , r , d ) ( p , r , d ) ∈ Q � subject to ∀ r ∈ D rname , ω ( p , r , d ) ≤ 100 . p , d � ∀ d ∈ D dname , ω ( p , r , d ) ≤ 100 . p , r
CAS-LPs: example maximize 6 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � ω ( p , r , d ) ( p , r , d ) ∈ Q subject to ∀ v ∈ D rname , S r , v ≤ 100 � ∀ d ∈ D dname , ω ( p , r , d ) ≤ 100 . p , r
CAS-LPs: example maximize 6 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � ω ( p , r , d ) ( p , r , d ) ∈ Q subject to ∀ v ∈ D rname , S r , v ≤ 100 ∀ v ∈ D dname , S d , v ≤ 100
CAS-LPs: example maximize 6 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � S p , v v ∈ pname subject to ∀ v ∈ D rname , S r , v ≤ 100 ∀ v ∈ D dname , S d , v ≤ 100
CAS-LPs: example maximize However, CAS-LPs cannot be solved directly 6 Q ( p , r , d ) = ∃ f , ∃ l , Project ( p , f , l ) ∧ Researcher ( r , f ) ∧ Developer ( d , l ) � S p , v v ∈ pname subject to ∀ v ∈ D rname , S r , v ≤ 100 ∀ v ∈ D dname , S d , v ≤ 100
Solving CAS-LPs: counter-example x y 0 0 0 1 7 τ 0 τ 1 maximize ω ( τ 0 ) + ω ( τ 1 ) maximize S x , 0 subject to 0 ≤ S y , 0 ≤ 1 subject to 0 ≤ ω ( τ 0 ) ≤ 1 0 ≤ S y , 1 ≤ 1 0 ≤ ω ( τ 1 ) ≤ 1
• Which class of linear programs are we able to handle? • Can we avoid the blow-up of answering Q ( D ) explicitly?
8 1 1 1 1 0 Boolean circuits with: 1 1 0 1 1 1 0 0 1 0 z y x { ⊎ , × }-circuits • inputs labelled with f / d for a variable f and d ∈ D ∪ × × × y/ 0 x/ 1 y/ 1 x/ 0 ∪ z/ 1 z/ 0
9 • Cartesian product Boolean circuits with: { ⊎ , × }-circuits • inputs labelled with f = d for a variable f and d ∈ D • disjoint union ⊎ ⊎ × × × y/ 0 x/ 1 y/ 1 x/ 0 ⊎ z/ 1 z/ 0
9 Boolean circuits with: { ⊎ , × }-circuits • inputs labelled with f = d for a variable f and d ∈ D • disjoint union ⊎ • Cartesian product × ⊎ × × × y/ 0 x/ 1 y/ 1 x/ 0 ⊎ z/ 1 z/ 0
Our strategy Query Database CAS-LP 10
Our strategy Query Database Circuit CAS-LP compile 10
Our strategy Query Database Circuit CAS-LP LP compile rewrite CAS-LP 10
Our strategy Query Database Circuit CAS-LP LP Result compile rewrite CAS-LP solve The number of variables in the rewritten LP is linear in the size of the circuit. 10
Our strategy Query Database Circuit CAS-LP LP Result compile rewrite CAS-LP solve The number of variables in the rewritten LP is linear in the size of the circuit. Thus the bottleneck is the compilation phase. 10
Compilation phase Query Database Circuit CAS-LP LP Result compile rewrite CAS-LP solve A few classes of database queries can be compiled effjciently into succinct circuits. For example an acyclic conjunctive query can be compiled in linear time into a linear size circuit. 11
Compilation phase Query Database Circuit CAS-LP LP Result compile rewrite CAS-LP solve A few classes of database queries can be compiled effjciently into succinct circuits. For example an acyclic conjunctive query can be compiled in linear time into a linear size circuit. 11
Constructing a smaller LP Wo n . ... Wo n . Wo 1 Wi 2 We build an equivalent linear program with the edges of the For every -gate of C , Wi 1 Wo 1 Wi n -gate of C , Wi 1 For every circuit as variables. 12 For every S x , d in the LP, add a constraint S x , d = � e ∈ Out ( x / d ) W e . p 1 p n o 1 o n x/d
Constructing a smaller LP Wi 2 ... Wo n . Wo 1 ... -gate of C , Wi 1 For every 12 For every S x , d in the LP, add a constraint S x , d = � e ∈ Out ( x / d ) W e . For every ⊎ -gate of C , Wi 1 + ... + Wi n = Wo 1 + ... + Wo n . p 1 p n o 1 o n ⊎ i m i 1 c 1 c m
... Constructing a smaller LP 12 For every S x , d in the LP, add a constraint S x , d = � e ∈ Out ( x / d ) W e . For every ⊎ -gate of C , Wi 1 + ... + Wi n = Wo 1 + ... + Wo n . For every × -gate of C , Wi 1 = Wi 2 = Wo 1 + ... + Wo n . p 1 p n o 1 o n × i 1 i 2 c 1 c 2
Extensions of this result Limitations: • CAS-LPs are restrictive • Input data is limited to the answer of a single query at a time Further extensions: • Designing a language to express linear programs on databases • Implementing our approach as a PostgreSQL plugin 13
Thank you for your attention. 14
Recommend
More recommend