Multi-join Query Evaluation on Big Data Lecture 1 Dan Suciu March, - - PowerPoint PPT Presentation

multi join query evaluation on big data lecture 1
SMART_READER_LITE
LIVE PREVIEW

Multi-join Query Evaluation on Big Data Lecture 1 Dan Suciu March, - - PowerPoint PPT Presentation

Background AGM Friedgut Optimal Algorithms Summary Multi-join Query Evaluation on Big Data Lecture 1 Dan Suciu March, 2015 Dan Suciu Multi-Joins Lecture 1 March, 2015 1 / 34 Background AGM Friedgut Optimal Algorithms Summary


slide-1
SLIDE 1

Background AGM Friedgut Optimal Algorithms Summary

Multi-join Query Evaluation on Big Data Lecture 1

Dan Suciu March, 2015

Dan Suciu Multi-Joins – Lecture 1 March, 2015 1 / 34

slide-2
SLIDE 2

Background AGM Friedgut Optimal Algorithms Summary

About Me

Orignally from Romania Had fun with Math: 1976 IMO PhD from University of Pennsylvania: Parallel Query Languages Bell Labs and AT&T Labs: Semistructured Data, XML University of Washington: data privacy, probabilistic data, Big Data Today’s topic: Big Data!

Dan Suciu Multi-Joins – Lecture 1 March, 2015 2 / 34

slide-3
SLIDE 3

Background AGM Friedgut Optimal Algorithms Summary

Course Organization

Four lectures (1.5h): slides available on the course Website Two sections (1h): mostly interactive A problem set to pass the course: seven problems (simple to challenging); email me your solutions by April 30, 2015. I hope you can attend all lectures and sections: you need them in

  • rder to solve the problems.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 3 / 34

slide-4
SLIDE 4

Background AGM Friedgut Optimal Algorithms Summary

Multi-join Query Evaluation – Outline

Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12

Dan Suciu Multi-Joins – Lecture 1 March, 2015 4 / 34

slide-5
SLIDE 5

Background AGM Friedgut Optimal Algorithms Summary

Multi-join Query Evaluation – Outline

Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12

Dan Suciu Multi-Joins – Lecture 1 March, 2015 4 / 34

slide-6
SLIDE 6

Background AGM Friedgut Optimal Algorithms Summary

Multi-join Query Evaluation – Outline

Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12

Dan Suciu Multi-Joins – Lecture 1 March, 2015 4 / 34

slide-7
SLIDE 7

Background AGM Friedgut Optimal Algorithms Summary

Multi-join Query Evaluation – Outline

Part 1 Optimal Sequential Algorithms. Thursday 14:15-15:45 Part 2 Lower bounds for Parallel Algorithms. Friday 14:15-15:45 Part 3 Optimal Parallel Algorithms. Saturday 9-10:30 Part 3 Data Skew. Saturday 11-12

Dan Suciu Multi-Joins – Lecture 1 March, 2015 4 / 34

slide-8
SLIDE 8

Background AGM Friedgut Optimal Algorithms Summary

Bibliography

E Friedgut, Hypergraphs, entropy, and inequalities, American Mathematical Monthly, 749-760, 2004. Albert Atserias, Martin Grohe, Dniel Marx: Size Bounds and Query Plans for Relational Joins. SIAM J. Comput. 42(4): 1737-1767 (2013) Hung Q. Ngo, Christopher R´ e, Atri Rudra: Skew strikes back: new developments in the theory of join algorithms. SIGMOD Record 42(4): 5-16 (2013) Paul Beame, Paraschos Koutris, Dan Suciu: Skew in parallel query

  • processing. PODS 2014: 212-223

Paul Beame, Paraschos Koutris, Dan Suciu: Communication steps for parallel query processing. PODS 2013: 273-284

Dan Suciu Multi-Joins – Lecture 1 March, 2015 5 / 34

slide-9
SLIDE 9

Background AGM Friedgut Optimal Algorithms Summary

Outline for Lecture 1

Background: Queries, Databases, Query Evaluation The AGM inequality Friedgut’s inequality Worst-case optimal query evaluation Summary

Dan Suciu Multi-Joins – Lecture 1 March, 2015 6 / 34

slide-10
SLIDE 10

Background AGM Friedgut Optimal Algorithms Summary

Relations and Databases

Person Name Age City Hobby Alice 22 L´

  • dt´

z knitting Bob 33 Lyon karate Carol 44 L´

  • dt´

z kayaking David 33 Lima karate Eve 22 Lima knitting Schema Relation/table name Person; Attribute/column names Name, Age, City, Hobby; Key Name Instance Set of tuples/rows/records, e.g. (Alice, 22, L´

  • dt´

z, knitting) Size Number of tuples m = 5; note: relation is a set Database is a set of relations = a finite structure

Dan Suciu Multi-Joins – Lecture 1 March, 2015 7 / 34

slide-11
SLIDE 11

Background AGM Friedgut Optimal Algorithms Summary

Relations and Databases

Person Name Age City Hobby Alice 22 L´

  • dt´

z knitting Bob 33 Lyon karate Carol 44 L´

  • dt´

z kayaking David 33 Lima karate Eve 22 Lima knitting Schema Relation/table name Person; Attribute/column names Name, Age, City, Hobby; Key Name Instance Set of tuples/rows/records, e.g. (Alice, 22, L´

  • dt´

z, knitting) Size Number of tuples m = 5; note: relation is a set Database is a set of relations = a finite structure

Dan Suciu Multi-Joins – Lecture 1 March, 2015 7 / 34

slide-12
SLIDE 12

Background AGM Friedgut Optimal Algorithms Summary

Relations and Databases

Person Name Age City Hobby Alice 22 L´

  • dt´

z knitting Bob 33 Lyon karate Carol 44 L´

  • dt´

z kayaking David 33 Lima karate Eve 22 Lima knitting Schema Relation/table name Person; Attribute/column names Name, Age, City, Hobby; Key Name Instance Set of tuples/rows/records, e.g. (Alice, 22, L´

  • dt´

z, knitting) Size Number of tuples m = 5; note: relation is a set Database is a set of relations = a finite structure

Dan Suciu Multi-Joins – Lecture 1 March, 2015 7 / 34

slide-13
SLIDE 13

Background AGM Friedgut Optimal Algorithms Summary

Relations and Databases

Person Name Age City Hobby Alice 22 L´

  • dt´

z knitting Bob 33 Lyon karate Carol 44 L´

  • dt´

z kayaking David 33 Lima karate Eve 22 Lima knitting Schema Relation/table name Person; Attribute/column names Name, Age, City, Hobby; Key Name Instance Set of tuples/rows/records, e.g. (Alice, 22, L´

  • dt´

z, knitting) Size Number of tuples m = 5; note: relation is a set Database is a set of relations = a finite structure

Dan Suciu Multi-Joins – Lecture 1 March, 2015 7 / 34

slide-14
SLIDE 14

Background AGM Friedgut Optimal Algorithms Summary

Relations and Databases

Person Name Age City Hobby Alice 22 L´

  • dt´

z knitting Bob 33 Lyon karate Carol 44 L´

  • dt´

z kayaking David 33 Lima karate Eve 22 Lima knitting Schema Relation/table name Person; Attribute/column names Name, Age, City, Hobby; Key Name Instance Set of tuples/rows/records, e.g. (Alice, 22, L´

  • dt´

z, knitting) Size Number of tuples m = 5; note: relation is a set Database is a set of relations = a finite structure

Dan Suciu Multi-Joins – Lecture 1 March, 2015 7 / 34

slide-15
SLIDE 15

Background AGM Friedgut Optimal Algorithms Summary

Basic Stuff that’s Good To Know

Relational database systems: Oracle, SQL Server, DB2, Postgres, SQLite, Dremel, Scope, Spark SQL Relations are flat (atomic values only): 1st normal form. Relations are persistent: stored in file systems, or in distributed file systems like Hadoop Physical data independence: system is allowed to organize the relation how it wishes. E.g. indexes, column-oriented DBs, partition

  • n distributed servers, replicated.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 8 / 34

slide-16
SLIDE 16

Background AGM Friedgut Optimal Algorithms Summary

Relational Algebra

Cartesian product / Join: ⋈ Projection: ΠA Selection: σC Union: ∪ Set difference: − This course: select-project-join

Dan Suciu Multi-Joins – Lecture 1 March, 2015 9 / 34

slide-17
SLIDE 17

Background AGM Friedgut Optimal Algorithms Summary

Join

R ⋈X=Y S

The set of pairs (t1,t2), with t1 ∈ R and t2 ∈ S, s.t. t1.X = t2.Y R X U a1 b1 a1 b2 a2 b3 a3 b4 S Y V a1 c1 a1 c2 a3 c3 a4 c4 T = R ⋈X=Y S X U Y V a1 b1 a1 c1 a1 b1 a1 c2 a1 b2 a1 c1 a1 b2 a1 c2 a3 b4 a3 c3 Input schemas: R(X,U), S(Y ,V ) Output schema: T(X,U,Y ,V )

Dan Suciu Multi-Joins – Lecture 1 March, 2015 10 / 34

slide-18
SLIDE 18

Background AGM Friedgut Optimal Algorithms Summary

Natural Join

R ⋈ S

Joins R,S on all common attributes, removes duplicate attributes R A B a1 b1 a1 b2 a2 b3 a3 b4 S A C a1 c1 a1 c2 a3 c3 a4 c4 T = R ⋈ S A B C a1 b1 c1 a1 b1 c2 a1 b2 c1 a1 b2 c2 a3 b4 c3 Input schemas: R(A,B), S(A,C) Output schema: T(A,B,C)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 11 / 34

slide-19
SLIDE 19

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-20
SLIDE 20

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F) Returns Output(A,B,C,D,E,F,G) = R ⋈(R.A=S.A)∧(R.E=S.E) S

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-21
SLIDE 21

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F) Returns Output(A,B,C,D,E,F,G) = R ⋈(R.A=S.A)∧(R.E=S.E) S R(A,B) ⋈ S(C,D,E)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-22
SLIDE 22

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F) Returns Output(A,B,C,D,E,F,G) = R ⋈(R.A=S.A)∧(R.E=S.E) S R(A,B) ⋈ S(C,D,E) Returns the cartesian product: Output(A,B,C,D,E) = R × S.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-23
SLIDE 23

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F) Returns Output(A,B,C,D,E,F,G) = R ⋈(R.A=S.A)∧(R.E=S.E) S R(A,B) ⋈ S(C,D,E) Returns the cartesian product: Output(A,B,C,D,E) = R × S. R(A,B) ⋈ S(A,B)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-24
SLIDE 24

Background AGM Friedgut Optimal Algorithms Summary

Natural Join Examples

Question

In each case below: what is the output schema? What does the join do? R(A,B,E,G) ⋈ S(A,C,D,E,F) Returns Output(A,B,C,D,E,F,G) = R ⋈(R.A=S.A)∧(R.E=S.E) S R(A,B) ⋈ S(C,D,E) Returns the cartesian product: Output(A,B,C,D,E) = R × S. R(A,B) ⋈ S(A,B) Returns the intersection: Output(A,B) = R ∩ S

Dan Suciu Multi-Joins – Lecture 1 March, 2015 12 / 34

slide-25
SLIDE 25

Background AGM Friedgut Optimal Algorithms Summary

Very Quick Review of Basic Join Algorithms

Compute R ⋈A=B S Nested-loop join Hash-join Merge-join (To describe in class.) Complexity: O((∣R∣ + ∣S∣ + ∣R ⋈A=B S∣)log(∣R∣ + ∣S∣)) Ignoring log factors, Complexity: O(∣Input∣ + ∣Output∣)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 13 / 34

slide-26
SLIDE 26

Background AGM Friedgut Optimal Algorithms Summary

Projection

ΠAC(T)

Projects T on the attributes A and C. T A B C a1 b1 c1 a1 b1 c2 a1 b2 c1 a1 b2 c2 a3 b4 c3 ΠAC(T) A C a1 c1 a1 c2 a3 c3 Note: projection does duplicate elimination.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 14 / 34

slide-27
SLIDE 27

Background AGM Friedgut Optimal Algorithms Summary

Selection

σA=a(R)

Returns all rows where attribute A has value a. R A B C a1 b1 c1 a1 b1 c2 a1 b2 c1 a1 b2 c2 a3 b4 c3 σC=c2(R) A B C a1 b1 c2 a1 b2 c2

Dan Suciu Multi-Joins – Lecture 1 March, 2015 15 / 34

slide-28
SLIDE 28

Background AGM Friedgut Optimal Algorithms Summary

Queries

Relational Algebra

Defined alternatively as: Relational Algebra: {⋈,σ,Π,∪,−}, or Relational Calculus, or First Order Logic: {∧,∨,∃,∀,¬,=} Non-recurisve datalog with negation, or A certain well-behaved fragment of SQL

Conjunctive queries

Defined as: {⋈,σ,Π}, or {∧,∃,=}, or A single datalog rule, or select-from-where SQL queries This course: full conjunctive queries, meaning without Π.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 16 / 34

slide-29
SLIDE 29

Background AGM Friedgut Optimal Algorithms Summary

Conjunctive Queries

Example

Q1(x,y,z,u) = R(x,y),S(y,z),T(z,u) Relational Algebra: (R(x,y) ⋈ S(y,z)) ⋈ T(z,u) First Order Logic: Q1 = {(x,y,z,u) ∣ (x,y) ∈ R ∧ (y,z) ∈ S ∧ (z,u) ∈ T}

Dan Suciu Multi-Joins – Lecture 1 March, 2015 17 / 34

slide-30
SLIDE 30

Background AGM Friedgut Optimal Algorithms Summary

Conjunctive Queries

Example

Q1(x,y,z,u) = R(x,y),S(y,z),T(z,u) Relational Algebra: (R(x,y) ⋈ S(y,z)) ⋈ T(z,u) First Order Logic: Q1 = {(x,y,z,u) ∣ (x,y) ∈ R ∧ (y,z) ∈ S ∧ (z,u) ∈ T}

Example

Q2(x,u) = R(x,y),S(y,z),T(z,u) Relational Algebra: Πx,u((R(x,y) ⋈ S(y,z)) ⋈ T(z,u)) First Order Logic: Q1 = {(x,u) ∣ ∃y∃z((x,y) ∈ R ∧ (y,z) ∈ S ∧ (z,u) ∈ T)}

Dan Suciu Multi-Joins – Lecture 1 March, 2015 17 / 34

slide-31
SLIDE 31

Background AGM Friedgut Optimal Algorithms Summary

Traditional Approach to Computing Conjunctive Queries

Q(x,y,z) = R(x,y),S(y,z),T(z,x) Optimizer generates a query plan: Temp(x,y,z) =R(x,y) ⋈ S(y,z) Q(x,y,z) =Temp(x,y,z) ⋈ T(z,x) Optimizers examines many possible plans, evaluates the cheapest plan. Problem: intermediate results may be large, and very hard to estimate.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 18 / 34

slide-32
SLIDE 32

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Consider the join of two relations: Q(x,y,z) = R(x,y),S(y,z)

Question

If ∣R∣ = m1,∣S∣ = m2, how large can ∣Q∣ be?

Dan Suciu Multi-Joins – Lecture 1 March, 2015 19 / 34

slide-33
SLIDE 33

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Consider the join of two relations: Q(x,y,z) = R(x,y),S(y,z)

Question

If ∣R∣ = m1,∣S∣ = m2, how large can ∣Q∣ be? Can be 0

Dan Suciu Multi-Joins – Lecture 1 March, 2015 19 / 34

slide-34
SLIDE 34

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Consider the join of two relations: Q(x,y,z) = R(x,y),S(y,z)

Question

If ∣R∣ = m1,∣S∣ = m2, how large can ∣Q∣ be? Can be 0 Can be m1m2

Dan Suciu Multi-Joins – Lecture 1 March, 2015 19 / 34

slide-35
SLIDE 35

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Consider the join of two relations: Q(x,y,z) = R(x,y),S(y,z)

Question

If ∣R∣ = m1,∣S∣ = m2, how large can ∣Q∣ be? Can be 0 Can be m1m2 Answer: 0 ≤ ∣Q∣ ≤ m1m2.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 19 / 34

slide-36
SLIDE 36

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Question

If ∣R∣ = m1, ∣S∣ = m2, ∣T∣ = m3, how large can the result be?

Dan Suciu Multi-Joins – Lecture 1 March, 2015 20 / 34

slide-37
SLIDE 37

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Question

If ∣R∣ = m1, ∣S∣ = m2, ∣T∣ = m3, how large can the result be? Naive answer: ≤ m1m2m3 (why?)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 20 / 34

slide-38
SLIDE 38

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Question

If ∣R∣ = m1, ∣S∣ = m2, ∣T∣ = m3, how large can the result be? Naive answer: ≤ m1m2m3 (why?) Better answer: ≤ m1m2 (why?)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 20 / 34

slide-39
SLIDE 39

Background AGM Friedgut Optimal Algorithms Summary

Upper Bound on the Size of the Answer

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Question

If ∣R∣ = m1, ∣S∣ = m2, ∣T∣ = m3, how large can the result be? Naive answer: ≤ m1m2m3 (why?) Better answer: ≤ m1m2 (why?) But also: ≤ m1m3, ≤ m2m3

Dan Suciu Multi-Joins – Lecture 1 March, 2015 20 / 34

slide-40
SLIDE 40

Background AGM Friedgut Optimal Algorithms Summary

The Hypergraph of a Query

Definition

Let Q be a full conjunctive query without self-joins. The hypergraph G of Q consists of: Nodes(G) = Vars(Q) the set of variables of Q HyperEdges(G) = Atoms(Q) the set of atoms of Q.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 21 / 34

slide-41
SLIDE 41

Background AGM Friedgut Optimal Algorithms Summary

The Hypergraph of a Query

Definition

Let Q be a full conjunctive query without self-joins. The hypergraph G of Q consists of: Nodes(G) = Vars(Q) the set of variables of Q HyperEdges(G) = Atoms(Q) the set of atoms of Q.

x y z

Q(x,y,z) = R(x,y),S(y,z),T(z,x) Q(x,y,z) = R(x,y,z),S(x),T(y),K(z),M(x,u)

x y z u

Dan Suciu Multi-Joins – Lecture 1 March, 2015 21 / 34

slide-42
SLIDE 42

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-43
SLIDE 43

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-44
SLIDE 44

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-45
SLIDE 45

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-46
SLIDE 46

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-47
SLIDE 47

Background AGM Friedgut Optimal Algorithms Summary

Fractional Edge Cover / Vertex Packing of a Hypergraph G

G = nodes x1,...,xk and hyperedges R1,...,Rℓ. An edge cover = subset of edges that contain all nodes.

Definition

A fractional edge cover = sequence of positive numbers u1,...,uℓ s.t.: ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Note: every edge cover is also a fractional edge cover (why?)

Definition

A fractional vertex packing = sequence of positive numbers v1,...,vk s.t. ∀j ∶ ∑

i∶xi∈Rj

vi ≤ 1 Duality: minu ∑j uj = maxv ∑i vi = ρ∗ = fractional edge covering number

Dan Suciu Multi-Joins – Lecture 1 March, 2015 22 / 34

slide-48
SLIDE 48

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality

Full conjunctive query: Q(x) = R1(x1),...,Rℓ(xℓ) Relation sizes: ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

Proposition (Simple!)

Let Ri1,...,Riu be any edge cover. Then ∣Q∣ ≤ mi1 ⋅ mi2⋯miu (proof in class) Atserias, Grohe and Marx proved:

Theorem (AGM’13)

Let u1,...,uℓ be any fractional edge cover. Then ∣Q∣ ≤ mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

We will prove it today. But first, let’s see examples.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 23 / 34

slide-49
SLIDE 49

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality

Full conjunctive query: Q(x) = R1(x1),...,Rℓ(xℓ) Relation sizes: ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

Proposition (Simple!)

Let Ri1,...,Riu be any edge cover. Then ∣Q∣ ≤ mi1 ⋅ mi2⋯miu (proof in class) Atserias, Grohe and Marx proved:

Theorem (AGM’13)

Let u1,...,uℓ be any fractional edge cover. Then ∣Q∣ ≤ mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

We will prove it today. But first, let’s see examples.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 23 / 34

slide-50
SLIDE 50

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality

Full conjunctive query: Q(x) = R1(x1),...,Rℓ(xℓ) Relation sizes: ∣R1∣ = m1,...,∣Rℓ∣ = mℓ

Proposition (Simple!)

Let Ri1,...,Riu be any edge cover. Then ∣Q∣ ≤ mi1 ⋅ mi2⋯miu (proof in class) Atserias, Grohe and Marx proved:

Theorem (AGM’13)

Let u1,...,uℓ be any fractional edge cover. Then ∣Q∣ ≤ mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

We will prove it today. But first, let’s see examples.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 23 / 34

slide-51
SLIDE 51

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality – A Simple Example

AGMu(Q) = mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Q(x,y,z) = R(x,y),S(y,z),T(z,x) ∣R∣ = ∣S∣ = ∣T∣ = m

Dan Suciu Multi-Joins – Lecture 1 March, 2015 24 / 34

slide-52
SLIDE 52

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality – A Simple Example

AGMu(Q) = mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Q(x,y,z) = R(x,y),S(y,z),T(z,x) ∣R∣ = ∣S∣ = ∣T∣ = m A fractional edge: u = (1/2,1/2,1/2)

x y z

1/2 1/2 1/2

Dan Suciu Multi-Joins – Lecture 1 March, 2015 24 / 34

slide-53
SLIDE 53

Background AGM Friedgut Optimal Algorithms Summary

AGM Inequality – A Simple Example

AGMu(Q) = mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Q(x,y,z) = R(x,y),S(y,z),T(z,x) ∣R∣ = ∣S∣ = ∣T∣ = m A fractional edge: u = (1/2,1/2,1/2)

x y z

1/2 1/2 1/2

It follows that ∣Q∣ ≤ m1/2m1/2m1/2 = m3/2 With m (typed) edges you can built at most m3/2 triangles!

Dan Suciu Multi-Joins – Lecture 1 March, 2015 24 / 34

slide-54
SLIDE 54

Background AGM Friedgut Optimal Algorithms Summary

AGM Bound

Definition

AGM(Q) = minu mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Obviously: ∣Q∣ ≤ AGM(Q).

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x), ∣R∣ = m1,∣S∣ = m2,∣T∣ = m3 u = (1,1,0) (1,0,1) (0,1,1) (1

2, 1 2, 1 2)

AGM(Q) = min of m1m2 m1m3 m2m3 (m1m2m3)1/2

Example

Q(x,y,z,v,w) = R(x,y),S(y,z),T(z,v),K(v,w) u = (1,0,1,1) (1,1,0,1) AGM(Q) = min of m1m3m4 m1m2m4

Dan Suciu Multi-Joins – Lecture 1 March, 2015 25 / 34

slide-55
SLIDE 55

Background AGM Friedgut Optimal Algorithms Summary

AGM Bound

Definition

AGM(Q) = minu mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Obviously: ∣Q∣ ≤ AGM(Q).

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x), ∣R∣ = m1,∣S∣ = m2,∣T∣ = m3 u = (1,1,0) (1,0,1) (0,1,1) (1

2, 1 2, 1 2)

AGM(Q) = min of m1m2 m1m3 m2m3 (m1m2m3)1/2

Example

Q(x,y,z,v,w) = R(x,y),S(y,z),T(z,v),K(v,w) u = (1,0,1,1) (1,1,0,1) AGM(Q) = min of m1m3m4 m1m2m4

Dan Suciu Multi-Joins – Lecture 1 March, 2015 25 / 34

slide-56
SLIDE 56

Background AGM Friedgut Optimal Algorithms Summary

AGM Bound

Definition

AGM(Q) = minu mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

Obviously: ∣Q∣ ≤ AGM(Q).

Example

Q(x,y,z) = R(x,y),S(y,z),T(z,x), ∣R∣ = m1,∣S∣ = m2,∣T∣ = m3 u = (1,1,0) (1,0,1) (0,1,1) (1

2, 1 2, 1 2)

AGM(Q) = min of m1m2 m1m3 m2m3 (m1m2m3)1/2

Example

Q(x,y,z,v,w) = R(x,y),S(y,z),T(z,v),K(v,w) u = (1,0,1,1) (1,1,0,1) AGM(Q) = min of m1m3m4 m1m2m4

Dan Suciu Multi-Joins – Lecture 1 March, 2015 25 / 34

slide-57
SLIDE 57

Background AGM Friedgut Optimal Algorithms Summary

AGM Bound v.s. Fractional Vertex Covering Number

AGMu(Q) = mu1

1 ⋅ mu2 2 ⋯muℓ ℓ

AGM(Q) = minu AGMu(Q) is the optimal solution to: minimize∑

j

uj log mj ∀i ∶ ∑

j∶xi∈Rj

uj ≥ 1 Notice: when m1 = ⋯ = mℓ = m then AGM(Q) = mρ∗. Next: we will prove the AGM bound

Dan Suciu Multi-Joins – Lecture 1 March, 2015 26 / 34

slide-58
SLIDE 58

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2 Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-59
SLIDE 59

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2 Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-60
SLIDE 60

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2

  • lder (u + v + w ≥ 1):

∑i aibici ≤ (∑i a

1 u

i )u(∑i b

1 v

i )v(∑i c

1 w

i )w

Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-61
SLIDE 61

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2

  • lder (u + v + w ≥ 1):

∑i aibici ≤ (∑i a

1 u

i )u(∑i b

1 v

i )v(∑i c

1 w

i )w

Theorem (Friedgut’04)

Let Q(x) = R1(x1),...,Rℓ(xℓ) be a query and u1,...,uℓ be a fractional edge cover. Then: ∑x a1,x1⋯aℓ,xℓ ≤ (∑x1 a

1 u1

1,x1) u1

⋯(∑xℓ a

1 uℓ

ℓ,xℓ) uℓ

What are the queries in the examples above?

Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-62
SLIDE 62

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2

  • lder (u + v + w ≥ 1):

∑i aibici ≤ (∑i a

1 u

i )u(∑i b

1 v

i )v(∑i c

1 w

i )w

Theorem (Friedgut’04)

Let Q(x) = R1(x1),...,Rℓ(xℓ) be a query and u1,...,uℓ be a fractional edge cover. Then: ∑x a1,x1⋯aℓ,xℓ ≤ (∑x1 a

1 u1

1,x1) u1

⋯(∑xℓ a

1 uℓ

ℓ,xℓ) uℓ

What are the queries in the examples above? QCauchy-Schwartz(x) = R(x),S(x);

Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-63
SLIDE 63

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2

  • lder (u + v + w ≥ 1):

∑i aibici ≤ (∑i a

1 u

i )u(∑i b

1 v

i )v(∑i c

1 w

i )w

Theorem (Friedgut’04)

Let Q(x) = R1(x1),...,Rℓ(xℓ) be a query and u1,...,uℓ be a fractional edge cover. Then: ∑x a1,x1⋯aℓ,xℓ ≤ (∑x1 a

1 u1

1,x1) u1

⋯(∑xℓ a

1 uℓ

ℓ,xℓ) uℓ

What are the queries in the examples above? QCauchy-Schwartz(x) = R(x),S(x); Qtriangle(x,y,z) = R(x,y),S(y,z),T(z,x);

Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-64
SLIDE 64

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality

Cauchy-Schwartz: ∑i aibi ≤ (∑i a2

i )

1 2 (∑i b2

i )

1 2

Triangle: ∑i,j,k aijbjkcki ≤ (∑i,j a2

ij)

1 2 (∑j,k b2

jk)

1 2 (∑k,i c2

ki)

1 2

  • lder (u + v + w ≥ 1):

∑i aibici ≤ (∑i a

1 u

i )u(∑i b

1 v

i )v(∑i c

1 w

i )w

Theorem (Friedgut’04)

Let Q(x) = R1(x1),...,Rℓ(xℓ) be a query and u1,...,uℓ be a fractional edge cover. Then: ∑x a1,x1⋯aℓ,xℓ ≤ (∑x1 a

1 u1

1,x1) u1

⋯(∑xℓ a

1 uℓ

ℓ,xℓ) uℓ

What are the queries in the examples above? QCauchy-Schwartz(x) = R(x),S(x); Qtriangle(x,y,z) = R(x,y),S(y,z),T(z,x); QH¨

  • lder(x) = R(x),S(x),T(x)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 27 / 34

slide-65
SLIDE 65

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-66
SLIDE 66

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-67
SLIDE 67

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-68
SLIDE 68

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z)

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-69
SLIDE 69

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z) ∑

xyz

au1

xybu2 yzcu3 zx = ∑ yz

bu2

yz ∑ x

au1

xycu3 zx

group by ∑x

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-70
SLIDE 70

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z) ∑

xyz

au1

xybu2 yzcu3 zx = ∑ yz

bu2

yz ∑ x

au1

xycu3 zx

group by ∑x ≤∑

yz

bu2

yz(∑ x

axy)u1(∑

x

czx)u3 H¨

  • lder u1 + u3 ≥ 1

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-71
SLIDE 71

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z) ∑

xyz

au1

xybu2 yzcu3 zx = ∑ yz

bu2

yz ∑ x

au1

xycu3 zx

group by ∑x ≤∑

yz

bu2

yz(∑ x

axy)u1(∑

x

czx)u3 H¨

  • lder u1 + u3 ≥ 1

=∑

yz

bu2

yzAu1 y C u3 z ≤ (∑ yz

byz)u2(∑

y

Ay)u1(∑

z

Cz)u3 Induction for Q′

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-72
SLIDE 72

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z) ∑

xyz

au1

xybu2 yzcu3 zx = ∑ yz

bu2

yz ∑ x

au1

xycu3 zx

group by ∑x ≤∑

yz

bu2

yz(∑ x

axy)u1(∑

x

czx)u3 H¨

  • lder u1 + u3 ≥ 1

=∑

yz

bu2

yzAu1 y C u3 z ≤ (∑ yz

byz)u2(∑

y

Ay)u1(∑

z

Cz)u3 Induction for Q′ =(∑

yz

byz)u2(∑

xy

axy)u1(∑

zx

czx)u3

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-73
SLIDE 73

Background AGM Friedgut Optimal Algorithms Summary

Friedgut’s Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ ∑x au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑x1 a1,x1) u1 ⋯(∑xℓ aℓ,xℓ) uℓ

Proof: by induction on ∣x∣ Base Case. ∣x∣ = 1: Q(x) = R1(x),...,Rℓ(x), u1 + ... + uℓ ≥ 1 Prove: ∑x au1

1,x⋯auℓ ℓ,x ≤ (∑x a1,x)u1⋯(∑x aℓ,x)uℓ This is H¨

  • lder.

Induction Step. Pick a variable x, and remove it. For example, Q(x,y,z) = R(x,y),S(y,z),T(z,x) becomes Q′(y,z) = R′(y),S(y,z),T ′(z) ∑

xyz

au1

xybu2 yzcu3 zx = ∑ yz

bu2

yz ∑ x

au1

xycu3 zx

group by ∑x ≤∑

yz

bu2

yz(∑ x

axy)u1(∑

x

czx)u3 H¨

  • lder u1 + u3 ≥ 1

=∑

yz

bu2

yzAu1 y C u3 z ≤ (∑ yz

byz)u2(∑

y

Ay)u1(∑

z

Cz)u3 Induction for Q′ =(∑

yz

byz)u2(∑

xy

axy)u1(∑

zx

czx)u3

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 28 / 34

slide-74
SLIDE 74

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-75
SLIDE 75

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-76
SLIDE 76

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-77
SLIDE 77

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-78
SLIDE 78

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-79
SLIDE 79

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-80
SLIDE 80

Background AGM Friedgut Optimal Algorithms Summary

The AGM Inequality – Proof

Query Q(x) = R1(x1),...,Rℓ(xℓ), fractional cover u1,...,uℓ Sizes ∣R1∣ = m1,...,∣Rℓ∣ = mℓ Prove ∣Q∣ ≤ mu1

1 ⋯muℓ ℓ

Let Dom = the domain of all constants in the relations R1,...,Rℓ. For every j = 1,...,ℓ, and every tuple xj ∈ Dom∣xj∣, define: aj,xj = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 if the tuple xj belongs to Rj

  • therwise

Then: mj = ∣Rj∣ = ∑xj∈Dom∣xj ∣ aj,xj, ∣Q∣ = ∑x∈Dom∣x∣ a1,x1⋯aℓ,xℓ Now use Friedgut’s inequality: ∣Q∣ = ∑

x

au1

1,x1⋯auℓ ℓ,xℓ ≤ (∑ x1

a1,x1)u1⋯(∑

xℓ

aℓ,xℓ)uℓ = mu1

1 ⋯muℓ ℓ

QED

Dan Suciu Multi-Joins – Lecture 1 March, 2015 29 / 34

slide-81
SLIDE 81

Background AGM Friedgut Optimal Algorithms Summary

Computing Full Conjunctive Queries

Recall: all database systems compute one join at a time This may be much larger than the maximum output size, AGM(Q). Goal: design a algorithm that runs in time AGM(Q). Worst-Case-Optimal algorithm: runs in time AGM(Q).

Dan Suciu Multi-Joins – Lecture 1 March, 2015 30 / 34

slide-82
SLIDE 82

Background AGM Friedgut Optimal Algorithms Summary

Worst-Case Optimal Algorithm

History: An algorithm that runs in time O(n ⋅ AGM(Q)) was given in [AGM’2013]. First worst-case optimal algorithm that was published: the NPRR algorithm by Ngo, Porat, R´ e, Rudra, in PODS’2012. It is complex. Earlier algorithm Leapfrog Trie-join (LFTJ), by LogicBlox. Veldhuizen proved in ICDT’2014 that LFTJ is also worst case optimal. Ngo, R´ e, Rudra gave a very simple worst-case algorithm, with a very simple optimality proof, in SIGMOD Records’2013. The algorithm is called Generic Join. Next: we discuss Generic-Join

Dan Suciu Multi-Joins – Lecture 1 March, 2015 31 / 34

slide-83
SLIDE 83

Background AGM Friedgut Optimal Algorithms Summary

Generic Join

Compute Q(x) = R1(x1),...,Rℓ(xℓ)

If ∣x∣ = 1 then return R1 ∩ ⋯ ∩ Rℓ. Otherwise, choose a variable x, occurring in atoms Ri1,...,Rik Compute A = Πx(Ri1) ∩ ⋯ ∩ Πx(Rik) For each a ∈ A, compute Resulta = Q[a/x] using Generic-Join Return ⋃a Resulta. Runtime: O(AGM(Q)) (Recall: we ignore log-factors)

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Compute A = Πx(R) ∩ Πx(T) = {a1,...,an} For each ai ∈ A, denote R′(y) = R(ai,y),T ′(z) = T(z,ai) Compute Resulti(ai,y,z) = R′(y),S(y,z),T ′(z) Return ⋃i Resulti Runtime: O(m3/2) assuming ∣R∣ = ∣S∣ = ∣T∣ = m.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 32 / 34

slide-84
SLIDE 84

Background AGM Friedgut Optimal Algorithms Summary

Generic Join

Compute Q(x) = R1(x1),...,Rℓ(xℓ)

If ∣x∣ = 1 then return R1 ∩ ⋯ ∩ Rℓ. Otherwise, choose a variable x, occurring in atoms Ri1,...,Rik Compute A = Πx(Ri1) ∩ ⋯ ∩ Πx(Rik) For each a ∈ A, compute Resulta = Q[a/x] using Generic-Join Return ⋃a Resulta. Runtime: O(AGM(Q)) (Recall: we ignore log-factors)

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Compute A = Πx(R) ∩ Πx(T) = {a1,...,an} For each ai ∈ A, denote R′(y) = R(ai,y),T ′(z) = T(z,ai) Compute Resulti(ai,y,z) = R′(y),S(y,z),T ′(z) Return ⋃i Resulti Runtime: O(m3/2) assuming ∣R∣ = ∣S∣ = ∣T∣ = m.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 32 / 34

slide-85
SLIDE 85

Background AGM Friedgut Optimal Algorithms Summary

Generic Join

Compute Q(x) = R1(x1),...,Rℓ(xℓ)

If ∣x∣ = 1 then return R1 ∩ ⋯ ∩ Rℓ. Otherwise, choose a variable x, occurring in atoms Ri1,...,Rik Compute A = Πx(Ri1) ∩ ⋯ ∩ Πx(Rik) For each a ∈ A, compute Resulta = Q[a/x] using Generic-Join Return ⋃a Resulta. Runtime: O(AGM(Q)) (Recall: we ignore log-factors)

Q(x,y,z) = R(x,y),S(y,z),T(z,x)

Compute A = Πx(R) ∩ Πx(T) = {a1,...,an} For each ai ∈ A, denote R′(y) = R(ai,y),T ′(z) = T(z,ai) Compute Resulti(ai,y,z) = R′(y),S(y,z),T ′(z) Return ⋃i Resulti Runtime: O(m3/2) assuming ∣R∣ = ∣S∣ = ∣T∣ = m.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 32 / 34

slide-86
SLIDE 86

Background AGM Friedgut Optimal Algorithms Summary

Discussion: Generic Join v.s. Yannakakis’ Alogirithm

[Yannakakis’82] described an algorithm for computing any acyclic query in time O(∣Input∣ + ∣Output∣). Basic idea: first perform a semijoin reduction to ensure that all intermediate results are ≤ ∣Output∣, then compute the query in standard fashion, one join at a time. Q(x0,x1,x2,x3,x4,x5) = R1(x0,x1),R2(x1,x2),R3(x2,x3),R4(x3,x4),R5(x4,x5) ∣R1∣ = ... = ∣R5∣ = m, AGM(Q) = m3 (optimal cover: (1,0,1,0,1)). There are instances where Q = ∅, hence Yannakakis’ algorithm takes time O(m), yet Generic-join takes time Ω(m3) (Discuss in class). Newer work on instance-optimal join algorithms [Ngo’2014]

Dan Suciu Multi-Joins – Lecture 1 March, 2015 33 / 34

slide-87
SLIDE 87

Background AGM Friedgut Optimal Algorithms Summary

Summary of Lecture 1

Joins, and conjunctive queries are very important: in SQL, in data analytics, everywhere All traditional query processing algorithms compute one join at a time (except LogicBlox!): suboptimal. The AGM bound gives a tight upper bound on the query size, expressed in terms of fractional edge cover. The Generic-Join algorithm computes the query in time bounded by the AGM bound: hence worst-case optimal.

Dan Suciu Multi-Joins – Lecture 1 March, 2015 34 / 34