In-Database Factorized Learning Dan Olteanu Joint work with M. - - PowerPoint PPT Presentation

in database factorized learning
SMART_READER_LITE
LIVE PREVIEW

In-Database Factorized Learning Dan Olteanu Joint work with M. - - PowerPoint PPT Presentation

In-Database Factorized Learning Dan Olteanu Joint work with M. Schleich, J. Zavodny & FDB Team M. Abo-Khamis, H. Ngo, X. Nguyen http://www.cs.ox.ac.uk/projects/FDB/ Recent Trends in Knowledge Compilation Dagstuhl, Sept 2017 1 / 32 We


slide-1
SLIDE 1

1 / 32

In-Database Factorized Learning

Dan Olteanu Joint work with

  • M. Schleich, J. Zavodny & FDB Team
  • M. Abo-Khamis, H. Ngo, X. Nguyen

http://www.cs.ox.ac.uk/projects/FDB/ Recent Trends in Knowledge Compilation Dagstuhl, Sept 2017

slide-2
SLIDE 2

We Work on In-Database Analytics

In-database analytics = solve optimization problems inside the database engine. Why in-database analytics?

  • 1. Bring analytics close to data

⇒ Save non-trivial export/import time

  • 2. Large chunks of analytics code can be rewritten into database queries

⇒ Use scalable systems and low complexity for query processing

  • 3. Used by LogicBlox retail-planning and forecasting applications

Unified in-database analytics solution for a host of optimization problems.

2 / 32

slide-3
SLIDE 3

Problem Formulation

3 / 32

slide-4
SLIDE 4

Problem Formulation

A typical machine learning task is to solve θ∗ := arg minθ J(θ), where J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). θ = (θ1, . . . , θp) ∈ Rp are the parameters of the learned model

4 / 32

slide-5
SLIDE 5

Problem Formulation

A typical machine learning task is to solve θ∗ := arg minθ J(θ), where J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). θ = (θ1, . . . , θp) ∈ Rp are the parameters of the learned model D is the training dataset with features x and response y

◮ Typically, D is the result of a feature extraction query over a database. 4 / 32

slide-6
SLIDE 6

Problem Formulation

A typical machine learning task is to solve θ∗ := arg minθ J(θ), where J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). θ = (θ1, . . . , θp) ∈ Rp are the parameters of the learned model D is the training dataset with features x and response y

◮ Typically, D is the result of a feature extraction query over a database.

L is a loss function, Ω is the regularizer

4 / 32

slide-7
SLIDE 7

Problem Formulation

A typical machine learning task is to solve θ∗ := arg minθ J(θ), where J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). θ = (θ1, . . . , θp) ∈ Rp are the parameters of the learned model D is the training dataset with features x and response y

◮ Typically, D is the result of a feature extraction query over a database.

L is a loss function, Ω is the regularizer functions g : Rp → Rm and h : Rn → Rm for n numeric features (m > 0)

◮ g = (gj)j∈[m] is a vector of multivariate polynomials ◮ h = (hj)j∈[m] is a vector of multivariate monomials 4 / 32

slide-8
SLIDE 8

Problem Formulation

A typical machine learning task is to solve θ∗ := arg minθ J(θ), where J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). θ = (θ1, . . . , θp) ∈ Rp are the parameters of the learned model D is the training dataset with features x and response y

◮ Typically, D is the result of a feature extraction query over a database.

L is a loss function, Ω is the regularizer functions g : Rp → Rm and h : Rn → Rm for n numeric features (m > 0)

◮ g = (gj)j∈[m] is a vector of multivariate polynomials ◮ h = (hj)j∈[m] is a vector of multivariate monomials

Example problems: ridge linear regression, degree-d polynomial regression, degree-d factorization machines; logistic regression, SVM; PCA.

4 / 32

slide-9
SLIDE 9

Ridge Linear Regression

General problem formulation: J(θ) :=

  • (x,y)∈D

L (g(θ), h(x) , y) + Ω(θ). Under square loss L , ℓ2-regularization, data points x = (x0, x1, . . . , xn), p = n + 1 parameters θ = (θ0, . . . , θn),

◮ x0 = 1 corresponds to the bias parameter θ0

g and h identity functions g(θ) = θ and h(x) = x

◮ g(θ), h(x) = θ, x = n

k=0 θkxk

we obtain the following formulation for ridge linear regression: J(θ) := 1 2|D|

  • (x,y)∈D
  • n
  • k=0

θkxk − y 2 + λ 2 θ2

2 .

5 / 32

slide-10
SLIDE 10

Rewriting the Objective Function J

We decouple the parameters θ from the data-dependent features x in J. We can rewrite the loss function J(θ) := 1 2|D|

  • (x,y)∈D
  • n
  • k=0

θkxk − y 2 + λ 2 θ2

2 .

as follows: J(θ) = 1 2θ⊤Σθ − θ, c + sY 2 + λ 2 θ2

2 , where

Σ = (σi,j)i,j∈[n], σi,j = 1 |D|

  • (x,y)∈D

xi · xj c = (ci)i∈[n], ci = 1 |D|

  • (x,y)∈D

y · xi sY = 1 |D|

  • (x,y)∈D

y 2.

6 / 32

slide-11
SLIDE 11

Batch Gradient Descent for Parameter Computation

Repeatedly update θ in the direction of the gradient until convergence: θ := θ − α · ∇J(θ). Since J(θ) = 1 2θ⊤Σθ − θ, c + sY 2 + λ 2 θ2

2 ,

the gradient vector ∇J(θ) becomes: ∇J(θ) = Σθ − c + λθ.

7 / 32

slide-12
SLIDE 12

Key Insights

The computation of the training dataset entails a high degree of redundancy, which can be avoided by factorized joins. Compressed lossless representations of query result that are: deterministic Decomposable Ordered Multi-Valued Diagrams Aggregates can be computed directly over factorized joins.

8 / 32

slide-13
SLIDE 13

Factorization Example

9 / 32

slide-14
SLIDE 14

Factorization Example

Orders (O for short) customer day dish Elise Monday burger Elise Friday burger Steve Friday hotdog Joe Friday hotdog Dish (D for short) dish item burger patty burger

  • nion

burger bun hotdog bun hotdog

  • nion

hotdog sausage Items (I for short) item price patty 6

  • nion

2 bun 2 sausage 4

Consider the join of the above relations:

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

10 / 32

slide-15
SLIDE 15

Factorization Example

O(customer, day, dish), D(dish, item), I(item, price) customer day dish item price Elise Monday burger patty 6 Elise Monday burger

  • nion

2 Elise Monday burger bun 2 Elise Friday burger patty 6 Elise Friday burger

  • nion

2 Elise Friday burger bun 2 . . . . . . . . . . . . . . .

A relational algebra expression encoding the above query result is:

Elise × Monday × burger × patty × 6 ∪ Elise × Monday × burger ×

  • nion

× 2 ∪ Elise × Monday × burger × bun × 2 ∪ Elise × Friday × burger × patty × 6 ∪ Elise × Friday × burger ×

  • nion

× 2 ∪ Elise × Friday × burger × bun × 2 ∪ . . .

It uses relational product (×), union (∪), and data (singleton relations). The attribute names are not shown to avoid clutter.

11 / 32

slide-16
SLIDE 16

This is How A Factorized Join Looks Like

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Variable order Grounding of the variable order over the input database There are several algebraically equivalent factorized joins defined: by distributivity of product over union and their commutativity; as groundings of join trees.

12 / 32

slide-17
SLIDE 17

This is How A Factorized Join Looks Like

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each union has children representing distinct domain values of a variable

13 / 32

slide-18
SLIDE 18

This is How A Factorized Join Looks Like

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each product has children over disjoint sets of variables

14 / 32

slide-19
SLIDE 19

This is How A Factorized Join Looks Like

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each path has values of variables following a global variable order

15 / 32

slide-20
SLIDE 20

This is How A Factorized Join Looks Like

∪ burger hotdog × × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Variable order Grounding of the variable order over the input database deterministic Decomposable Ordered Multi-Valued Diagram Each variable has a finite (but not necessarily Boolean) domain

16 / 32

slide-21
SLIDE 21

.. Now with Further Compression

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

Observation: price is under item, which is under dish, but only depends on item, .. so the same price appears under an item regardless of the dish. Idea: Cache price for a specific item and avoid repetition!

17 / 32

slide-22
SLIDE 22

Same Data, Different Factorization

∪ Monday Friday × × ∪ ∪ Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Joe × ∪ hotdog × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 Steve × ∪ hotdog × ∪ bun onion sausage × × × ∪ ∪ ∪ 2 2 4 day costumer dish item price

18 / 32

slide-23
SLIDE 23

.. and Further Compressed

∪ Monday Friday × × ∪ ∪ Elise × ∪ burger × ∪ pattybunonion × × × ∪ ∪ ∪ 6 2 2 Elise × ∪ burger × Joe × ∪ hotdog × ∪ bun onion sausage × × × ∪ 4 Steve × ∪ hotdog × day costumer dish item price

19 / 32

slide-24
SLIDE 24

Grounding Variable Orders to Factorized Joins

Our join: O(customer, day, dish), D(dish, item), I(item, price) can be grounded to a factorized join as follows:

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day

×

  • O(customer,day,dish)customer
  • D(dish,item)item

×

  • I(item,price)price

This grounding follows the variable order given below:

dish day customer item price

20 / 32

slide-25
SLIDE 25

Grounding Variable Orders to Factorized Joins

  • O( , ,dish),D(dish, )dish

×

  • O( ,day,dish)day

×

  • O(customer,day,dish)customer
  • D(dish,item)item

×

  • I(item,price)price

Relations are sorted following any topological order of the variable order The intersection of relations O and D on dish takes time

  • O(min(|πdishO|, |πdishD|)).

The remaining operations are lookups in the relations, where we first fix the dish value and then the day and item values.

21 / 32

slide-26
SLIDE 26

Factorizing the Computation of Aggregates (1/2)

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

SQL aggregates can be computed in one pass over the factorization: COUNT(*):

◮ values → 1, ◮ ∪ → +, ◮ × → ∗. 22 / 32

slide-27
SLIDE 27

Factorizing the Computation of Aggregates (1/2)

+ 1 1 ∗ ∗ + 1 1 1 ∗ ∗ ∗ + 1 + 1 ∗ + 1 1 + 1 1 1 ∗ ∗ ∗ + + + 1 1 1 + 1 ∗ + 1 1 ∗ + 1 dish day item costumer price

12 6 6 2 3 1 1 1 1 1 3 2 1 2

SQL aggregates can be computed in one pass over the factorization: COUNT(*):

◮ values → 1, ◮ ∪ → +, ◮ × → ∗. 23 / 32

slide-28
SLIDE 28

Factorizing the Computation of Aggregates (2/2)

∪ burger hotdog × × ∪ sausage bunonion × × × ∪ 4 ∪ Friday × ∪ Joe Steve ∪ patty bun onion × × × ∪ ∪ ∪ 6 2 2 ∪ Friday × ∪ Elise Monday × ∪ Elise dish day item costumer price

SQL aggregates can be computed in one pass over the factorization: SUM(dish * price):

◮ Assume there is a function f that turns dish into reals. ◮ All values except for dish & price → 1, ◮ ∪ → +, ◮ × → ∗. 24 / 32

slide-29
SLIDE 29

Factorizing the Computation of Aggregates (2/2)

+ f (burger) f (hotdog) ∗ ∗ + 1 1 1 ∗ ∗ ∗ + 4 + 1 ∗ + 1 1 + 1 1 1 ∗ ∗ ∗ + + + 6 2 2 + 1 ∗ + 1 1 ∗ + 1 dish day item costumer price

20∗f (burger)+16∗f (hotdog) 16 20 2 10 1 1 6 2 2 8 2 4 2

SQL aggregates can be computed in one pass over the factorization: SUM(dish * price):

◮ Assume there is a function f that turns dish into reals. ◮ All values except for dish & price → 1, ◮ ∪ → +, ◮ × → ∗. 25 / 32

slide-30
SLIDE 30

Complexity

26 / 32

slide-31
SLIDE 31

Ridge Linear Regression: Complexity

Given: Training dataset defined by feature extraction join query Q with n variables over database D. Use it runs for convergence. Case 1: Query Q has continuous variables only. Learn time: O(n2 · |D|fhtw(Q) + n2 · it). For acyclic joins: learn time is O(|D|) data complexity. Case 2: Query Q has both continuous and categorical variables. Learn time: O(n2 · |D|fhtw(Q)+1 + n2 · |D|2 · it). For acyclic joins: learn time is O(|D|2) data complexity.

27 / 32

slide-32
SLIDE 32

Experiments

28 / 32

slide-33
SLIDE 33

Learning Regression Models in Practice

Goal of experiment: Show performance gap for the same model accuracy. Real-world retailer dataset: Predict the amount of inventory units. Competing systems: F: Our learner over factorized joins

◮ Next slide: Times (sec) for running in one thread on one machine.

R (QR-decomposition)

◮ Exact method, but fastest among all available in R

M: MADlib (ordinary least squares)

◮ Exact method, but fastest among all available in M 29 / 32

slide-34
SLIDE 34

Performance

Retailer dataset (records) excerpt (17M) full (86M) Linear regression Features (cont+categ) 33 + 55 33+3,653 Aggregates (cont+categ) 595+2,418 595+145k M Learn 1,898.35 > 22h R Join (PSQL) 50.63 – Export/Import 308.83 – Learn 490.13 – F Aggregate+Join 25.51 380.31 Converge (runs) 0.02 (343) 8.82 (366) Polynomial regression degree 2 Features (cont+categ) 562+2,363 562+141k Aggregates (cont+categ) 158k+742k 158k+37M M Learn > 22h – F Aggregate+Join 132.43 1,819.80 Converge (runs) 3.27 (321) 219.51 (180)

30 / 32

slide-35
SLIDE 35

Results At a Glance

Factorized joins (d-DOMDs) can be computed worst-case optimally

◮ They can take arbitrarily less time and space than standard joins (DNFs)

Aggregates can be computed in linear time over factorized joins

◮ There are restrictions on variable orders in case of free variables

Our optimization problems are reducible to polynomially many aggregates in the number of query variables

◮ Aggregates Σ, c, sY need only be computed once over the factorized join ◮ For linear regression, the n gradient aggregates Σθ can be computed

together in O(n) time over the factorized join

Functional dependencies in the input database can reduce the number of parameters of the optimization problem ⇒ Orders of magnitude performance improvements for LogicBlox analytics over state-of-the-art systems

31 / 32

slide-36
SLIDE 36

Thank you!

32 / 32