Joins Aggregates Optimization https://fdbresearch.github.io Dan - PowerPoint PPT Presentation

SumProduct ⊂ FAQ Problem (SumProduct) Given a commutative semiring ( D , ⊕ , ⊗ ) , compute the function � � � � · · · ϕ ( x 1 , . . . , x f ) = ψ S ( x S ) x f +1 x f +2 x n S ∈E For ⊕ = + and ⊗ = ∗ , ϕ can be expressed in SQL as: SELECT x 1 , . . . , x f , SUM( R 1 . val ∗ · · · ∗ R n . val) FROM R 1 NATURAL JOIN . . . R n GROUP BY x 1 , . . . , x f ; where each function ψ i over variables X S is encoded as a relation R i over X S and an additional variable val to account for the values of ψ i . This formulation is equivalent to: SumProduct [D99] Marginalize a Product Function [AM00] 10 / 78

Many examples for SumProduct ( { true , false } , ∨ , ∧ ) ◮ Constraint satisfaction problems ◮ Boolean conjunctive query evaluation ◮ SAT ◮ k -colorability ◮ etc. ( U , ∪ , ∩ ) ◮ Conjunctive query evaluation ( R , + , × ) ◮ Permanent ◮ DFT ◮ Inference in probabilistic graphical models ◮ #CSP ◮ Matrix chain multiplication ◮ Aggregates in DB ( R + , max , × ) ◮ MAP queries in probabilistic graphical models 11 / 78

SumProduct Example 1: Boolean Query Evaluation Boolean Conjunctive Queries: Boolean query Φ with set rels (Φ) of relation symbols Each relation symbol R ∈ rels (Φ) has variables vars ( R ) � Φ = ∃ X 1 . . . ∃ X n : R ( vars ( R )) R ∈ rels (Φ) FAQ encoding: � � φ = ψ S ( x S ) , where x S ∈E φ has the hypergraph ( V , E ) with V = � R ∈ rels (Φ) vars ( R ) and E = { vars ( R ) | R ∈ rels (Φ) } For each S ∈ E , there is a factor ψ S such that ψ S ( x S ) = ( x S ∈ R ) 12 / 78

SumProduct Example 2: Matrix Chain Multiplication Compute the product A = A 1 · · · A n of n matrices Each matrix A i is over field F and has dimensions p i × p i +1 FAQ encoding: We use n + 1 variables X 1 , . . . , X n +1 with domains Dom( X i ) = [ p i ] Each matrix A i can be viewed as a function of two variables: ψ i , i +1 : Dom( X i ) × Dom( X i +1 ) → F , where ψ i , i +1 ( x i , x i +1 ) = ( A i ) x i x i +1 The problem is now to compute the FAQ expression � � � φ ( x 1 , x n +1 ) = · · · ψ i , i +1 ( x i , x i +1 ) . x 2 ∈ Dom( X 2 ) x n ∈ Dom( X n ) i ∈ [ n ] 13 / 78

SumProduct Example 3: Queries in Graphical Models Discrete undirected graphical model represented by a hypergraph ( V , E ) V = { X 1 , . . . , X n } consists of n discrete random variables There is a factor ψ S : � i ∈ S Dom( X i ) → R + for each edge S ∈ E FAQ expression to compute the marginal Maximum A Posteriori estimates: � φ ( x 1 , . . . , x f ) = x f +1 ∈ Dom( X f +1 ) · · · max max ψ S ( x S ) x n ∈ Dom( X n ) S ∈E FAQ expression to compute the marginal distribution of variables X 1 , . . . , X f : � � � φ ( x 1 , . . . , x f ) = · · · ψ S ( x S ) x f +1 ∈ Dom( X f +1 ) x n ∈ Dom( X n ) S ∈E For conditional distributions prob ( X A | X B = x B ), we set X B to x B . 14 / 78

Outline of Part 2: Aggregates 15 / 78

How to compute a SumProduct FAQ ϕ Find a variable order for ϕ Compute ϕ by eliminating variables in the given order This is a dynamic programming algorithm. Two flavours: ◮ FDB: Top-down with memoization (caching) [BKOZ13] We exemplify two variants: 1. Compute the factorized join and the aggregates in one pass over the factorization 2. Translate the factorized computation into relational queries ◮ InsideOut: Bottom-up with indicator projections [ANR16] The complexity is given by the width of the variable order: Given a database of size N , an FAQ ϕ , a variable order for ϕ with width w , ϕ can be computed in time O ( N w + | OUT | ), where | OUT | is the output size. 16 / 78

Finding a Variable Order for a SumProduct FAQ ϕ First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free? 17 / 78

Finding a Variable Order for a SumProduct FAQ ϕ First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free? The free variables sit above the bound variables in ∆. [BKOZ13,OZ15] Equivalent constraint for hypertree decompositions: [ANR16] Take a hypertree decomposition for the join part of ϕ such that the bags with the free variables form a connected subtree. Implication on complexity: The width for ϕ is at least the width for its join part ⇒ ϕ may be more expensive than its join part if it has free variables This new width is called the FAQ-width in the literature [ANR16] 17 / 78

Computing COUNT over Factorized Join using FDB ∪ ∅ dish � burger � � hotdog � × × ∪ ∪ ∪ ∪ { dish } { dish } � Monday � � Friday � � patty � � bun � � onion � � bun �� onion � � sausage � � Friday � day item × × × × × × × × × { dish , ∪ ∪ ∪ ∪ ∪ ∪ ∪ day } { item } price � Elise � � Elise � � 6 � � 2 � � 2 � � 4 � � Joe � � Steve � customer ϕ = � ... O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to ( N , + , ∗ ): ◮ values �→ 1 ∪ �→ + × �→ ∗ 18 / 78

Computing COUNT over Factorized Join using FDB + 12 ∅ dish 1 1 6 6 ∗ ∗ 2 3 3 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 1 1 2 { dish , + + + + + + + 1 1 day } { item } customer price 1 1 1 1 1 1 1 1 ϕ = � ... O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to ( N , + , ∗ ): ◮ values �→ 1 ∪ �→ + × �→ ∗ 19 / 78

Computing SUM over Factorized Join using FDB ∪ � burger � � hotdog � ∅ dish × × ∪ ∪ ∪ ∪ { dish } { dish } day item � Monday � � Friday � � patty � � bun � � onion � � bun �� onion � � sausage � � Friday � × × × × × × × × × { dish , ∪ ∪ ∪ ∪ ∪ ∪ ∪ day } { item } � Elise � � Elise � � 6 � � 2 � � 2 � � 4 � � Joe � � Steve � customer price ϕ = � ... f (dish) · price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT SUM( f (dish) * price) FROM O NATURAL JOIN .. I; ◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price �→ 1, ∪ �→ +, × �→ ∗ . 20 / 78

Computing SUM over Factorized Join using FDB 20 ∗ f ( � burger � )+16 ∗ f ( � hotdog � ) + f ( � burger � ) f ( � hotdog � ) ∅ dish 20 16 ∗ ∗ 2 10 8 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 6 4 2 { dish , + + + + + + + 2 2 day } { item } customer price 1 1 6 2 2 4 1 1 ϕ = � ... f (dish) · price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT SUM( f (dish) * price) FROM O NATURAL JOIN .. I; ◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price �→ 1, ∪ �→ +, × �→ ∗ . 21 / 78

Computing SUM over Factorized Join using FDB 20 ∗ f ( � burger � )+16 ∗ f ( � hotdog � ) + ∅ f ( � burger � ) f ( � hotdog � ) dish 20 16 ∗ ∗ 2 10 8 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 6 4 2 { dish , + + + + + + + 2 2 day } { item } price customer 1 1 6 2 2 4 1 1 If f turns dish into indicator vectors: ϕ ( dish ) = � ... price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT dish, SUM(price) FROM O NATURAL JOIN..I GROUP BY dish; 22 / 78

To Compute or Not To Compute the Factorized Join Aggregates can be computed without materializing the factorized join [OZ15,OS16,ANNOS18a+b] The factorized join becomes the trace of the aggregate computation This is called factorized aggregate computation 23 / 78

Example of Factorized Computation via Query Rewriting The 4-path count query Q 4 on a graph with 4 copies of the edge relation E : � V 1 ( a ) · E 1 ( a , b ) · E 2 ( b , c ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) Q 4 () = � �� a , b , c , d , e J ( a , b , c , d , e ) E 2 E 3 C C key ( C ) = ∅ key ( B ) = key ( D ) = { C } B D B D key ( A ) = { B } E 1 E 4 V 1 A E V 2 A key ( E ) = { D } E 24 / 78

Example of Factorized Computation via Query Rewriting The 4-path count query Q 4 on a graph with 4 copies of the edge relation E : � V 1 ( a ) · E 1 ( a , b ) · E 2 ( b , c ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) Q 4 () = � �� a , b , c , d , e J ( a , b , c , d , e ) E 2 E 3 C C key ( C ) = ∅ key ( B ) = key ( D ) = { C } B D B D key ( A ) = { B } E 1 E 4 V 1 A E V 2 A key ( E ) = { D } E Sizes for listing/factorized representations of the result of the join J of Q 4 ρ ∗ ( J ) = 3 ⇒ listing representation has size O ( | E | 3 ). fhtw ( J ) = 1 ⇒ factorization with caching has size O ( | E | ). 24 / 78

Example of Factorized Computation via Query Rewriting We would like to compute Q 4 : in O ( | E | ) time (no free variables, so use best variable order) using optimized queries that are derived from the variable order of Q 4 without materializing the factorized join J 25 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 C ⇒ B D E 1 E 4 V 1 V 2 A E 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 C C ⇒ ⇒ B D U 1 B D E 1 E 4 E 4 V 1 V 2 V 2 A E E � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) a 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 C C C E 3 ⇒ ⇒ ⇒ B D U 1 B D D E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) e 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) U 4 ( c ) = E 3 ( c , d ) · U 3 ( d ) e d 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 U 5 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) U 4 ( c ) = E 3 ( c , d ) · U 3 ( d ) e d � U 5 = U 2 ( c ) · U 4 ( c ) c 26 / 78

Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 U 5 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E This computation strategy corresponds to the following query rewriting: � V 1 ( a ) · E 1 ( b , a ) · E 2 ( c , b ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) = a , b , c , d , e �� E 2 ( c , b ) · V 1 ( a ) · E 1 ( b , a ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) c a e b d 27 / 78

Example of FAQ Computation using InsideOut � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 key ( X 2 ) = ∅ X 2 X 2 key ( X 1 ) = { X 2 } ψ 2 ψ 1 X 1 X 4 X 1 X 4 key ( X 3 ) = { X 1 , X 2 } key ( X 4 ) = { X 2 } X 3 X 5 X 3 X 5 key ( X 5 ) = { X 2 , X 4 } ψ 3 key ( X 6 ) = { X 4 , X 5 } key ( X 8 ) = { X 6 } ψ 5 X 7 X 6 X 7 X 6 ψ 4 key ( X 7 ) = { X 5 } X 8 X 8 28 / 78

Example of FAQ Computation using InsideOut � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 key ( X 2 ) = ∅ X 2 X 2 key ( X 1 ) = { X 2 } ψ 2 ψ 1 X 1 X 4 X 1 X 4 key ( X 3 ) = { X 1 , X 2 } key ( X 4 ) = { X 2 } X 3 X 5 X 3 X 5 key ( X 5 ) = { X 2 , X 4 } ψ 3 key ( X 6 ) = { X 4 , X 5 } key ( X 8 ) = { X 6 } ψ 5 X 7 X 6 X 7 X 6 ψ 4 key ( X 7 ) = { X 5 } X 8 X 8 ρ ∗ ( ϕ ) = 4, s ( ϕ ) = 2, fhtw ( ϕ ) = 1. The above variable order ∆ has the free variables x 1 , x 2 , x 4 on top of the others and fhtw (∆) = 1. The query result has size: O ( N ) when factorized; O ( N 2 ) when listed 28 / 78

Example of FAQ Computation using InsideOut X 2 X 1 X 4 X 3 X 5 X 7 X 6 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 29 / 78

Example of FAQ Computation using InsideOut X 2 X 1 X 4 X 3 X 5 X 7 X 6 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 , x 7 , x 8 x 3 � �� ψ 6( x 1 , x 2) 30 / 78

Example of FAQ Computation using InsideOut X 2 X 2 ⇒ X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 31 / 78

Example of FAQ Computation using InsideOut X 2 X 2 ⇒ X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 , x 7 x 8 � �� ψ 7( x 6) 32 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 33 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 x 7 � �� ψ 8( x 5) 34 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 35 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) x 5 x 6 � �� ψ 9( x 4 , x 5) 36 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 37 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) x 5 � �� ψ 10( x 2 , x 4) 38 / 78

Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 10 ( x 2 , x 4 ) O ( N ) 39 / 78

Example of FAQ Computation with Indicator Projections � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 X 1 X 1 key ( X 1 ) = ∅ ψ 1 ψ 4 key ( X 2 ) = { X 1 } key ( X 3 ) = { X 1 , X 2 } X 2 X 4 X 2 X 4 key ( X 4 ) = { X 1 } ψ 2 ψ 3 ψ 6 ψ 5 key ( X 5 ) = { X 1 , X 4 } X 3 X 5 X 3 X 5 40 / 78

Example of FAQ Computation with Indicator Projections � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 X 1 X 1 key ( X 1 ) = ∅ ψ 1 ψ 4 key ( X 2 ) = { X 1 } key ( X 3 ) = { X 1 , X 2 } X 2 X 4 X 2 X 4 key ( X 4 ) = { X 1 } ψ 2 ψ 3 ψ 6 ψ 5 key ( X 5 ) = { X 1 , X 4 } X 3 X 5 X 3 X 5 ρ ∗ ( ϕ ) = 2 . 5, s ( ϕ ) = 1 . 5, fhtw ( ϕ ) = 1 . 5. The above variable order ∆ has the free variable x 1 on top of the others and fhtw (∆) = 1 . 5. The (unary) query result has size O ( N ) when factorized or listed. 40 / 78

Example of FAQ Computation with Indicator Projections X 1 X 2 X 4 X 3 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 41 / 78

Example of FAQ Computation with Indicator Projections X 1 X 2 X 4 X 3 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � � � � ψ ′ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ ′ 4 ( x 1 ) · ψ ′ ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · 6 ( x 1 ) · x 2 , x 4 , x 5 x 3 � �� ψ 7( x 1 , x 2) ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) ψ ′ 1 is an indicator projection of ψ 1 (similarly, ψ ′ 4 and ψ ′ 6 ): It has the same support as ψ 1 , i.e., same tuples ( x 1 , x 2 ) ψ ′ 1 ( x 1 , x 2 ) = 1 even in case ψ 1 ( x 1 , x 2 ) � = 1 and ψ 1 ( x 1 , x 2 ) � = 0 42 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 ⇒ X 2 X 4 X 2 X 4 X 3 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 43 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 ⇒ X 2 X 4 X 2 X 4 X 3 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � � � ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ ′ 4 ( x 1 ) · ψ ′ ϕ ( x 1 ) = 6 ( x 1 ) · x 4 , x 5 x 2 � �� ψ 8( x 1) ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) The indicator projections ψ ′ 4 and ψ ′ 6 are redundant here, as they were already used for computing φ 7 . 44 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 45 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � � � � ψ ′ 8 ( x 1 ) · ψ ′ ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 4 x 5 � �� ψ 9( x 1 , x 4) 46 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 47 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � � � ψ ′ ϕ ( x 1 ) = ψ 8 ( x 1 ) · 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � �� ψ 10( x 1) 48 / 78

Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 10 ( x 1 ) O ( N ) 49 / 78

Is Factorized Aggregate Computation Practical? A glimpse at performance experiments [ANNOS18b] Retailer dataset (records) excerpt (17M) full (86M) PostgreSQL computing the join 50.63 sec 216.56 sec Aggregates for a linear regression model FDB computing join+aggregates 25.51 sec 380.31 sec Number of aggregates (scalar+group-by) 595+2,418 595+145k Aggregates for a polynomial regression model FDB computing join+aggregates 132.43 sec 1,819.80 sec Number of aggregates (scalar+group-by) 158k+742k 158k+37M In this experiment: FDB only used one core of a commodity machine For both PostgreSQL and FDB, the dataset was entirely in memory The aggregates represent gradients (or parts thereof) used for learning degree 1 and 2 polynomial regression models 50 / 78

Outline of Part 2: Aggregates 51 / 78

Problem Setting Maintain the triangle count Q under single-tuple updates to R , S , and T ! A R T B C S Q counts the number of tuples in the join of R, S, and T . Q = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) 52 / 78

Updates to the Triangle Count R S T A B B C C A a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 3 1 3 a 2 b 1 b 1 c 2 c 2 a 1 c 2 a 2 3 53 / 78

Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 a 2 b 1 b 1 c 2 c 2 a 1 c 2 a 2 3 53 / 78

Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 53 / 78

Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 Q ( D ) ∅ ( ) 4 + 6 + 9 = 19 53 / 78

Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

Data Updates need the Additive Inverse Data updates can be inserts (tuples with positive multiplicity) and deletes (tuples with negative multiplicity): Semirings are enough if we only want inserts or no updates Recall that FAQs use commutative semirings ( D , ⊕ , ⊗ ): ( D , ⊕ ) is a commutative monoid with identity element 0 : ◮ ( a ⊕ b ) ⊕ c = a ⊕ ( b ⊕ c ) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a ( D , ⊗ ) is a commutative monoid with identity element 1 : ◮ ( a ⊗ b ) ⊗ c = a ⊗ ( b ⊗ c ) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a Multiplication distributes over addition: ◮ a ⊗ ( b ⊕ c ) = ( a ⊗ b ) ⊕ ( a ⊗ c ) Multiplication by 0 annihilates D : ◮ 0 ⊗ a = a ⊗ 0 = 0 54 / 78

From Semirings to Rings We need a commutative ring ( D , ⊕ , ⊗ ) if we want to support deletes as well: ( D , ⊕ ) is an abelian group with identity element 0 : ◮ ( a ⊕ b ) ⊕ c = a ⊕ ( b ⊕ c ) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a ◮ ∃ − a ∈ D : a ⊕ ( − a ) = ( − a ) ⊕ a = 0 ( D , ⊗ ) is a commutative monoid with identity element 1 : ◮ ( a ⊗ b ) ⊗ c = a ⊗ ( b ⊗ c ) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a Multiplication distributes over addition: ◮ a ⊗ ( b ⊕ c ) = ( a ⊗ b ) ⊕ ( a ⊗ c ) Multiplication by 0 annihilates D : ◮ 0 ⊗ a = a ⊗ 0 = 0 Examples: Z , Q , R , C , R n , polynomial ring. We used the ring ( Z , + , ∗ ) in our previous example. 55 / 78

The Maintenance Problem single-tuple single-tuple single-tuple update update update D 0 D 1 D 2 database auxiliary maintain maintain data A 0 A 1 A 2 structure maintain maintain triangle Q ( D 0 ) Q ( D 1 ) Q ( D 2 ) count Given a current database D and a single-tuple update, what are the time and space complexities for maintaining Q ( D )? 56 / 78

Much Ado about Triangles The Triangle Query Served as Milestone in Many Fields Worst-case optimal join algorithms [Algorithmica 1997, SIGMOD R. 2013] Parallel query evaluation [Found. & Trends DB 2018] Randomized approximation in static settings [FOCS 2015] Randomized approximation in data streams [SODA 2002, COCOON 2005, PODS 2006, PODS 2016, Theor. Comput. Sci. 2017] Investigation of Answering Queries under Updates Theoretical developments [PODS 2017, ICDT 2018] Systems developments [F. & T. DB 2012, VLDB J. 2014, SIGMOD 2017, 2018] Lower bounds [STOC 2015, ICM 2018] 57 / 78

Na¨ ıve Maintenance “ Compute from scratch! ” δ R = { ( a ′ , b ′ ) �→ m } � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c � �� newR = � a , b , c newR ( a , b ) · S ( b , c ) · T ( c , a ) Maintenance Complexity Time: O ( | D | 1 . 5 ) using worst-case optimal join algorithms Space: O ( | D | ) to store input relations 58 / 78

Classical Incremental View Maintenance [CY12] “ Compute the difference! ” δ R = { ( a ′ , b ′ ) �→ m } � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) + δ R ( a ′ , b ′ ) · � c S ( b ′ , c ) · T ( c , a ′ ) Maintenance Complexity Time: O ( | D | ) to intersect C -values from S and T Space: O ( | D | ) to store input relations 59 / 78

Factorized Incremental View Maintenance [NO18] “ Compute the difference by using pre-materialized views! ” δ R = { ( a ′ , b ′ ) �→ m } Pre-materialize V ST ( b , a ) = � c S ( b , c ) · T ( c , a )! � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) + δ R ( a ′ , b ′ ) · V ST ( b ′ , a ′ ) Maintenance Complexity Time for updates to R : O (1) to look up in V ST Time for updates to S and T : O ( | D | ) to maintain V ST Space: O ( | D | 2 ) to store input relations and V ST 60 / 78

Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Can the triangle count be maintained in sublinear time? Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Yes! IVM ε [KNNOZ19] Can the triangle count Amortized maintenance time: be maintained in O ( | D | 0 . 5 ) sublinear time? This is worst-case optimal! Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

IVM ε Exhibits a Time-Space Tradeoff [KNNOZ19] Given ε ∈ [0 , 1] and a database D , IVM ε maintains the triangle count with O ( | D | max { ε, 1 − ε } ) amortized update time O ( | D | 1+min { ε, 1 − ε } ) space O ( | D | 3 / 2 ) preprocessing time O (1) answer time. complexity O ( | D | 1 . 5 ) Space Update Time O ( | D | ) worst-case optimality O ( | D | 0 . 5 ) ε = 0 . 5 ε 0 0 . 5 1 Known maintenance approaches are recovered by IVM ε . 62 / 78

Main Ideas in IVM ε Compute the difference like in classical IVM! Materialize views like in Factorized IVM! New ingredient: Use adaptive processing based on data skew! = ⇒ Treat heavy values differently from light values! 63 / 78

Quick Look inside IVM ε Partition R into a light part R L = { t ∈ R | | σ A = t . A | < | D | ε } , a heavy part R H = R \ R L ! R light part A B R L · · A B a b 1 . . . . . . n < | D | ε . . . . . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L · · A B a b 1 . . . . . . n < | D | ε . . . . . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L Likewise, partition · · A B S = S L ∪ S H based on B , and a b 1 . . . . . . n < | D | ε . . . . T = T L ∪ T H based on C ! . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L Likewise, partition · · A B S = S L ∪ S H based on B , and a b 1 . . . . . . n < | D | ε . . . . T = T L ∪ T H based on C ! . . a b n · · heavy part · · R H a ′ b ′ Q is the sum of skew-aware views 1 A B . . . . R U ( a , b ) · S V ( b , c ) · T W ( c , a ) . . m ≥ | D | ε . . . . . . . . with U , V , W ∈ { L , H } . . . . . a ′ b ′ m · · 64 / 78

Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c 65 / 78

Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c � δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S H ( b ′ , c ) O ( | D | 1 − ε ) R ∗ ( a , b ) · S H ( b , c ) · T H ( c , a ) a , b , c c 65 / 78

Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c � δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S H ( b ′ , c ) O ( | D | 1 − ε ) R ∗ ( a , b ) · S H ( b , c ) · T H ( c , a ) a , b , c c δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T H ( c , a ′ ) O ( | D | ε ) c � R ∗ ( a , b ) · S L ( b , c ) · T H ( c , a ) or a , b , c δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S L ( b ′ , c ) O ( | D | 1 − ε ) c 65 / 78

Joins Aggregates Optimization https://fdbresearch.github.io Dan - PowerPoint PPT Presentation

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 23, 2018 1 / 78 Acknowledgements Some work reported in this course has been done in the context of the FDB

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

AGGREGATES AND POZZOLANIC MATERIALS OVERVIEW Presented by Tom Adams, P.E. April 10, 2018

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School

Overview DW Performance Optimization Choosing aggregates Maintaining views

Breedon Aggregates Breedon Aggregates Full-year 2013 results Preliminary results 4 March 2014

An introduction to Breedon Aggregates October 2013 Peter Tom Simon Vivian Introduction Peter

Socially and Environmentally Responsible Aggregates (SERA) Andrea Bourrie Dufferin Aggregates

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

Joins, and more plotting Joins, and more plotting Abhijit Dasgupta Abhijit Dasgupta Fall, 2019

CS 61: Database Systems Joins Adapted from Silberschatz, Korth, and Sundarshan unless otherwise

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Tropical quasisymmetric functions J. Diehl (Universit at Greifswald) joint with K.

Canonical Coalgebraic Linear Time Logics Corina C rstea University of Southampton, UK CALCO

On Language Equations and Grammar Coalgebras for Context-free Languages Jurriaan Rot and Joost

Scalable Uncertainty Management 03 Provenance Rainer Gemulla May 18, 2012 Overview In this

1 Example: Component Models Example: Subset- -Minimal Diagnosis Minimal Diagnosis Example:

The Fellowship of the Semiring: Concerning Bisimulations for Quantitative Systems Marino Miculan 1

A tale of theories and data-structures Jacques Carette, Musa Al-hassy, Wolfram Kahl McMaster

Convexity Meets Coalgebra in Probabilistic Systems Ana Sokolova Coalgebra Now @ FloC 2018

Joins Aggregates Optimization https://fdbresearch.github.io Dan - PowerPoint PPT Presentation

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 23, 2018 1 / 78 Acknowledgements Some work reported in this course has been done in the context of the FDB

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

AGGREGATES AND POZZOLANIC MATERIALS OVERVIEW Presented by Tom Adams, P.E. April 10, 2018

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

SQL$Joins Max$Masnick August&amp;7,&amp;2015 What%are%joins?

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School

Overview DW Performance Optimization Choosing aggregates Maintaining views

Breedon Aggregates Breedon Aggregates Full-year 2013 results Preliminary results 4 March 2014

An introduction to Breedon Aggregates October 2013 Peter Tom Simon Vivian Introduction Peter

Socially and Environmentally Responsible Aggregates (SERA) Andrea Bourrie Dufferin Aggregates

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

Joins, and more plotting Joins, and more plotting Abhijit Dasgupta Abhijit Dasgupta Fall, 2019

CS 61: Database Systems Joins Adapted from Silberschatz, Korth, and Sundarshan unless otherwise

Notes on exact meets and joins R. N. Ball, J. Picado and A. Pultr 1 Exact meets and joins.

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Tropical quasisymmetric functions J. Diehl (Universit at Greifswald) joint with K.

Canonical Coalgebraic Linear Time Logics Corina C rstea University of Southampton, UK CALCO

On Language Equations and Grammar Coalgebras for Context-free Languages Jurriaan Rot and Joost

Scalable Uncertainty Management 03 Provenance Rainer Gemulla May 18, 2012 Overview In this

1 Example: Component Models Example: Subset- -Minimal Diagnosis Minimal Diagnosis Example:

The Fellowship of the Semiring: Concerning Bisimulations for Quantitative Systems Marino Miculan 1

A tale of theories and data-structures Jacques Carette, Musa Al-hassy, Wolfram Kahl McMaster

Convexity Meets Coalgebra in Probabilistic Systems Ana Sokolova Coalgebra Now @ FloC 2018

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?