DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - - PowerPoint PPT Presentation

data analytics using deep learning
SMART_READER_LITE
LIVE PREVIEW

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - - PowerPoint PPT Presentation

DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N administrivia Reminders Assignment 1: postponed to next Monday Sign up for discussion slots on Thursday


slide-1
SLIDE 1

DATA ANALYTICS USING DEEP LEARNING

GT 8803 // FALL 2019 // JOY ARULRAJ

L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N

slide-2
SLIDE 2

GT 8803 // Fall 2019

administrivia

  • Reminders

– Assignment 1: postponed to next Monday – Sign up for discussion slots on Thursday – Proposal presentations on next Wednesday

2

slide-3
SLIDE 3

GT 8803 // Fall 2019

LAST CLASS

  • Query execution models

– Tuple-at-a-time – Operator-at-a-time – Vector-at-a-time

3

SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100

A B

A.id=B.id value>100 A.id, B.value

s

p

slide-4
SLIDE 4

GT 8803 // Fall 2019

LAST CLASS

  • Access methods

– Sequential scan – Index scan – Multi-index scan

4

101 102 103 104

slide-5
SLIDE 5

GT 8803 // Fall 2019

LAST CLASS

  • Access methods

– Sequential scan – Index scan – Multi-index scan

5

101 102 103 104

Scan Direction

slide-6
SLIDE 6

GT 8803 // Fall 2019

LAST CLASS

  • Access methods

– Sequential scan – Index scan – Multi-index scan

6

101 102 103 104

Scan Direction

slide-7
SLIDE 7

GT 8803 // Fall 2019

LAST CLASS

  • Visual Query Execution Engine

– Filtering classifier, Sampling

7

OLD PLAN NEW PLAN

slide-8
SLIDE 8

GT 8803 // Fall 2019

TODAY’s AGENDA

  • Relational Algebra Equivalences
  • Plan Cost Estimation
  • Plan Enumeration
  • Visual Query Optimizer

8

slide-9
SLIDE 9

GT 8803 // Fall 2018

RELATIONAL ALGEBRA EQUIVALENCES

9

slide-10
SLIDE 10

GT 8803 // Fall 2019

ANATOMY OF A DATABASE SYSTEM

Connection Manager + Admission Control Query Parser Query Optimizer Query Executor Lock Manager (Concurrency Control) Access Methods (or Indexes) Buffer Pool Manager Log Manager Memory Manager + Disk Manager Networking Manager

10

Query Transactional Storage Manager Query Processor Shared Utilities Process Manager

Source: Anatomy of a Database System

slide-11
SLIDE 11

GT 8803 // Fall 2019

QUERY OPTIMIZATION

  • Remember that SQL is declarative.

– User tells the DBMS what answer they want, not how to get the answer.

  • There can be a big difference in performance

based on plan is used:

– 1.3 hours vs. 0.45 seconds

11

slide-12
SLIDE 12

GT 8803 // Fall 2019

IBM SYSTEM R

  • First implementation of a query optimizer.

People argued that the DBMS could never choose a query plan better than what a human could write.

  • A lot of the concepts from System R’s
  • ptimizer are still used today.

12

slide-13
SLIDE 13

GT 8803 // Fall 2019

QUERY OPTIMIZATION

  • Rule-based Optimizer

– Rewrite the query to remove inefficient things. – Does not require a cost model.

  • Cost-based Optimizer

– Use a cost model to evaluate multiple equivalent plans and pick the one with the lowest cost.

13

slide-14
SLIDE 14

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

14

slide-15
SLIDE 15

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

15

SQL Query

Parser

slide-16
SLIDE 16

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

16

SQL Query

Parser

Abstract Syntax Tree

Binder

slide-17
SLIDE 17

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

17

SQL Query

Parser

Abstract Syntax Tree

System Catalog

Binder

Name→Internal ID

slide-18
SLIDE 18

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

18

SQL Query

Parser

Abstract Syntax Tree Annotated AST

System Catalog Rewriter

(Optional)

Binder

Name→Internal ID

slide-19
SLIDE 19

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

19

SQL Query

Parser

Abstract Syntax Tree Annotated AST

System Catalog Rewriter

(Optional)

Binder Optimizer

Annotated AST Name→Internal ID

slide-20
SLIDE 20

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

20

SQL Query

Parser

Abstract Syntax Tree Annotated AST

System Catalog Rewriter

(Optional)

Binder Optimizer

Annotated AST Name→Internal ID

slide-21
SLIDE 21

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

21

SQL Query

Parser

Abstract Syntax Tree Annotated AST

Cost Model System Catalog Rewriter

(Optional)

Binder Optimizer

Annotated AST Name→Internal ID

slide-22
SLIDE 22

GT 8803 // Fall 2018

QUERY OPTIMIZATION: OVERVIEW

22

SQL Query

Parser

Abstract Syntax Tree Annotated AST

Query Plan Cost Model System Catalog Rewriter

(Optional)

Binder Optimizer

Annotated AST Name→Internal ID

slide-23
SLIDE 23

GT 8803 // Fall 2019

QUERY OPTIMIZATION IS NP-HARD

  • This is the hardest part of building a DBMS.
  • If you are good at this, you will get paid.
  • People are starting to look at employing ML

to improve the accuracy and efficacy of

  • ptimizers.

23

slide-24
SLIDE 24

GT 8803 // Fall 2019

RELATIONAL ALGEBRA EQUIVALENCES

  • Two relational algebra expressions are

equivalent if they generate the same set of tuples.

– The DBMS can identify better query plans without a cost model. – This is often called query rewriting.

24

slide-25
SLIDE 25

GT 8803 // Fall 2018

PREDICATE PUSHDOWN

25

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'

slide-26
SLIDE 26

GT 8803 // Fall 2018

PREDICATE PUSHDOWN

26

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'

slide-27
SLIDE 27

GT 8803 // Fall 2018

RELATIONAL ALGEBRA EQUIVALENCES

27

πname, cid(σgrade='A'(student⋈enrolled)) πname, cid(student⋈(σgrade='A'(enrolled)))

=

SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'

slide-28
SLIDE 28

GT 8803 // Fall 2019

RELATIONAL ALGEBRA EQUIVALENCES

  • Selections:

– Perform filters as early as possible. – Reorder predicates so that the DBMS applies the most selective one first. – Break a complex predicate, and push down

σp1∧p2∧…pn(R) = σp1(σp2(…σpn(R)))

  • Simplify a complex predicate

– (X=Y AND Y=3) → X=3 AND Y=3

28

slide-29
SLIDE 29

GT 8803 // Fall 2019

RELATIONAL ALGEBRA EQUIVALENCES

  • Projections:

– Perform them early to create smaller tuples and reduce intermediate results (if duplicates are eliminated) – Project out all attributes except the ones requested

  • r required (e.g., joining keys)

29

slide-30
SLIDE 30

GT 8803 // Fall 2018

PROJECTION PUSHDOWN

30

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'

slide-31
SLIDE 31

GT 8803 // Fall 2018

PROJECTION PUSHDOWN

31

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

student enrolled

s.sid=e.sid grade='A' s.name,e.cid

s

p

sid,cid

p

sid,namep

SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'

slide-32
SLIDE 32

GT 8803 // Fall 2019

MORE EXAMPLES

32

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-33
SLIDE 33

GT 8803 // Fall 2019

MORE EXAMPLES

33

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-34
SLIDE 34

GT 8803 // Fall 2019

MORE EXAMPLES

34

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0;X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-35
SLIDE 35

GT 8803 // Fall 2019

MORE EXAMPLES

35

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-36
SLIDE 36

GT 8803 // Fall 2019

MORE EXAMPLES

36

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-37
SLIDE 37

GT 8803 // Fall 2019

MORE EXAMPLES

37

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-38
SLIDE 38

GT 8803 // Fall 2019

MORE EXAMPLES

  • Impossible / Unnecessary Predicates
  • Join Elimination

38

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-39
SLIDE 39

GT 8803 // Fall 2019

MORE EXAMPLES

  • Impossible / Unnecessary Predicates
  • Join Elimination

39

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-40
SLIDE 40

GT 8803 // Fall 2019

MORE EXAMPLES

  • Impossible / Unnecessary Predicates
  • Join Elimination

40

Source: Lukas Eder

SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; SELECT * FROM A;

X

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-41
SLIDE 41

GT 8803 // Fall 2019

MORE EXAMPLES

41

Source: Lukas Eder

SELECT * FROM A AS A1 WHERE EXISTS(SELECT * FROM A AS A2 WHERE A1.id = A2.id);

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-42
SLIDE 42

GT 8803 // Fall 2019

MORE EXAMPLES

42

Source: Lukas Eder

SELECT * FROM A AS A1 WHERE EXISTS(SELECT * FROM A AS A2 WHERE A1.id = A2.id);

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-43
SLIDE 43

GT 8803 // Fall 2019

MORE EXAMPLES

43

Source: Lukas Eder

SELECT * FROM A;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-44
SLIDE 44

GT 8803 // Fall 2019

MORE EXAMPLES

  • Ignoring Projections
  • Merging Predicates

44

Source: Lukas Eder

SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; SELECT * FROM A;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-45
SLIDE 45

GT 8803 // Fall 2019

MORE EXAMPLES

  • Ignoring Projections
  • Merging Predicates

45

Source: Lukas Eder

SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; SELECT * FROM A;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-46
SLIDE 46

GT 8803 // Fall 2019

MORE EXAMPLES

  • Ignoring Projections
  • Merging Predicates

46

Source: Lukas Eder

SELECT * FROM A WHERE val BETWEEN 1 AND 150; SELECT * FROM A;

CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );

slide-47
SLIDE 47

GT 8803 // Fall 2019

RELATIONAL ALGEBRA EQUIVALENCES

  • Joins:

– Commutative, associative

R⋈S = S⋈R (R⋈S)⋈T = R⋈(S⋈T)

– How many different orderings are there for an n- way join?

47

slide-48
SLIDE 48

GT 8803 // Fall 2019

RELATIONAL ALGEBRA EQUIVALENCES

  • How many different orderings are there for an

n-way join?

  • Catalan number ≈4n

– Exhaustive enumeration will be too slow.

  • We’ll see in a second how an optimizer limits

the search space.

48

slide-49
SLIDE 49

GT 8803 // Fall 2018

PLAN COST ESTIMATION

49

slide-50
SLIDE 50

GT 8803 // Fall 2019

COST ESTIMATION

  • How long will a query take?

– CPU: Small cost; tough to estimate – Disk: # of block transfers – Memory: Amount of DRAM used

  • How many tuples will be read/written?
  • What statistics do we need to keep?

50

slide-51
SLIDE 51

GT 8803 // Fall 2019

STATISTICS

  • The DBMS stores internal statistics about

tables, attributes, and indexes in its internal catalog.

  • Different systems update them at different

times.

  • Manual invocations:

– Postgres/SQLite: ANALYZE – SQL Server: UPDATE STATISTICS

51

slide-52
SLIDE 52

GT 8803 // Fall 2019

STATISTICS

  • For each relation R, the DBMS maintains the

following information:

– NR: Number of tuples in R. – V(A,R): Number of distinct values for attribute A.

52

slide-53
SLIDE 53

GT 8803 // Fall 2019

DERIVABLE STATISTICS

53

slide-54
SLIDE 54

GT 8803 // Fall 2019

DERIVABLE STATISTICS

  • The selection cardinality SC(A,R) is the

average number of records with a value for an attribute A given NR/ V(A,R)

  • Note that this assumes data uniformity.

– 10,000 students, 10 colleges – how many students in SCS?

54

slide-55
SLIDE 55

GT 8803 // Fall 2019

SELECTION STATISTICS

55

slide-56
SLIDE 56

GT 8803 // Fall 2019

SELECTION STATISTICS

56

SELECT * FROM people WHERE id = 123

slide-57
SLIDE 57

GT 8803 // Fall 2019

SELECTION STATISTICS

  • Equality predicates on unique keys are easy to

estimate.

  • What about more complex predicates? What

is their selectivity?

57

SELECT * FROM people WHERE id = 123 SELECT * FROM people WHERE val > 1000 SELECT * FROM people WHERE age = 30 AND status = 'Lit'

slide-58
SLIDE 58

GT 8803 // Fall 2019

COMPLEX PREDICATES

  • The selectivity (sel) of a predicate P is the

fraction of tuples that qualify.

  • Formula depends on type of predicate:

– Equality – Range – Negation – Conjunction – Disjunction

58

slide-59
SLIDE 59

GT 8803 // Fall 2019

COMPLEX PREDICATES

  • The selectivity (sel) of a predicate P is the

fraction of tuples that qualify.

  • Formula depends on type of predicate:

– Equality – Range – Negation – Conjunction – Disjunction

59

slide-60
SLIDE 60

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

60

SELECT * FROM people WHERE age = 2

slide-61
SLIDE 61

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Assume that V(age,people) has five distinct

values (0–4) and NR = 5

  • Equality Predicate: A=constant

– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =

61

SELECT * FROM people WHERE age = 2

slide-62
SLIDE 62

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Assume that V(age,people) has five distinct

values (0–4) and NR = 5

  • Equality Predicate: A=constant

– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =

62

1 2 3 4 count age

SELECT * FROM people WHERE age = 2

slide-63
SLIDE 63

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Assume that V(age,people) has five distinct

values (0–4) and NR = 5

  • Equality Predicate: A=constant

– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =

63

1 2 3 4 count age

V(age,people)=5

SELECT * FROM people WHERE age = 2

slide-64
SLIDE 64

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Assume that V(age,people) has five distinct

values (0–4) and NR = 5

  • Equality Predicate: A=constant

– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =

64

1 2 3 4 count age

V(age,people)=5 SC(age=2)=1

SELECT * FROM people WHERE age = 2

slide-65
SLIDE 65

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Assume that V(age,people) has five distinct

values (0–4) and NR = 5

  • Equality Predicate: A=constant

– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =

65

1 2 3 4 count age

V(age,people)=5 SC(age=2)=1

SELECT * FROM people WHERE age = 2 1/5

slide-66
SLIDE 66

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

  • Range Query:

– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)

66

SELECT * FROM people WHERE age >= 2

slide-67
SLIDE 67

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

  • Range Query:

– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)

67

SELECT * FROM people WHERE age >= 2

slide-68
SLIDE 68

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

  • Range Query:

– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)

68

agemin = 0 agemax = 4

SELECT * FROM people WHERE age >= 2

slide-69
SLIDE 69

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

  • Range Query:

– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)

69

= (4–2) / (4–0) = 1/2

agemin = 0 agemax = 4

SELECT * FROM people WHERE age >= 2

slide-70
SLIDE 70

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

70

SELECT * FROM people WHERE age != 2

slide-71
SLIDE 71

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

71

SC(age=2)=1

SELECT * FROM people WHERE age != 2

slide-72
SLIDE 72

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

72

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

slide-73
SLIDE 73

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

73

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

slide-74
SLIDE 74

GT 8803 // Fall 2019 1 2 3 4 count age

SELECTIONS – COMPLEX PREDICATES

  • Negation Query:

– sel(not P) = 1 – sel(P) – Example: sel(age != 2)

  • Observation: Selectivity ≈ Probability

74

= 1 – (1/5) = 4/5

SC(age!=2)=2 SC(age!=2)=2

SELECT * FROM people WHERE age != 2

slide-75
SLIDE 75

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Conjunction:

– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')

  • This assumes that the predicates are

independent.

75

SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'

P1 P2

slide-76
SLIDE 76

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Conjunction:

– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')

  • This assumes that the predicates are

independent.

76

SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'

P1 P2

slide-77
SLIDE 77

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Conjunction:

– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')

  • This assumes that the predicates are

independent.

77

SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'

P1 P2

slide-78
SLIDE 78

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Disjunction:

– sel(P1 ⋁ P2) = sel(P1)+ sel(P2)–sel(P1⋁P2) = sel(P1)+ sel(P2)–sel(P1)· sel(P2) – sel(age=2 OR name LIKE 'A%')

  • This again assumes that the

selectivities are independent.

78

P1 P2

SELECT * FROM people WHERE age = 2 OR name LIKE 'A%'

slide-79
SLIDE 79

GT 8803 // Fall 2019

SELECTIONS – COMPLEX PREDICATES

  • Disjunction:

– sel(P1 ⋁ P2) = sel(P1)+ sel(P2)–sel(P1⋁P2) = sel(P1)+ sel(P2)–sel(P1)· sel(P2) – sel(age=2 OR name LIKE 'A%')

  • This again assumes that the

selectivities are independent.

79

P1 P2

SELECT * FROM people WHERE age = 2 OR name LIKE 'A%'

slide-80
SLIDE 80

GT 8803 // Fall 2019

RESULT SIZE ESTIMATION FOR JOINS

  • Given a join of R and S, what is the range of

possible result sizes in # of tuples?

  • In other words, for a given tuple of R, how

many tuples of S will it match?

80

slide-81
SLIDE 81

GT 8803 // Fall 2019

RESULT SIZE ESTIMATION FOR JOINS

  • General case: Rcols⋂Scols={A} where A is not a

key for either table.

– Match each R-tuple with S-tuples: estSize ≈ NR · NS / V(A,S) – Symmetrically, for S: estSize ≈ NR · NS / V(A,R)

  • Overall:

– estSize ≈ NR · NS / max({V(A,S), V(A,R)})

81

slide-82
SLIDE 82

GT 8803 // Fall 2019

COST ESTIMATIONS

  • Our formulas are nice but we assume that

data values are uniformly distributed.

82

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Uniform Approximation

slide-83
SLIDE 83

GT 8803 // Fall 2019

COST ESTIMATIONS

  • Our formulas are nice but we assume that

data values are uniformly distributed.

83

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Uniform Approximation

Distinct values of attribute # of occurrences

slide-84
SLIDE 84

GT 8803 // Fall 2019

COST ESTIMATIONS

  • Our formulas are nice but we assume that

data values are uniformly distributed.

84

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Non-Uniform Approximation

slide-85
SLIDE 85

GT 8803 // Fall 2019

COST ESTIMATIONS

  • Our formulas are nice but we assume that

data values are uniformly distributed.

85

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Non-Uniform Approximation

Bucket #1 Count=8 Bucket #2 Count=4 Bucket #3 Count=15 Bucket #4 Count=3 Bucket #5 Count=14

slide-86
SLIDE 86

GT 8803 // Fall 2019

COST ESTIMATIONS

  • Our formulas are nice but we assume that

data values are uniformly distributed.

86

Bucket #1 Count=8 Bucket #2 Count=4 Bucket #3 Count=15 Bucket #4 Count=3 Bucket #5 Count=14

5 10 15

1-3 4-6 7-9 10-12 13-15

Non-Uniform Approximation

Bucket Ranges

slide-87
SLIDE 87

GT 8803 // Fall 2019

HISTOGRAMS WITH QUANTILES

  • A histogram type wherein the "spread" of

each bucket is same.

87

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Equi-width Histogram (Quantiles)

slide-88
SLIDE 88

GT 8803 // Fall 2019

HISTOGRAMS WITH QUANTILES

  • A histogram type wherein the "spread" of

each bucket is same.

88

2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Equi-width Histogram (Quantiles)

Bucket #1 Count=12 Bucket #2 Count=12 Bucket #3 Count=9 Bucket #4 Count=12

slide-89
SLIDE 89

GT 8803 // Fall 2019

HISTOGRAMS WITH QUANTILES

  • A histogram type wherein the "spread" of

each bucket is same.

89

5 10 15 1-5 6-8 9-13 14-15 Equi-width Histogram (Quantiles)

slide-90
SLIDE 90

GT 8803 // Fall 2019

HISTOGRAMS WITH QUANTILES

  • A histogram type wherein the "spread" of

each bucket is same.

90

5 10 15 1-5 6-8 9-13 14-15 Equi-width Histogram (Quantiles)

slide-91
SLIDE 91

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

91

⋮ 1 billion tuples

SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit

slide-92
SLIDE 92

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

92

⋮ 1 billion tuples

SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit

slide-93
SLIDE 93

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

93

⋮ 1 billion tuples

SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit

Table Sample

slide-94
SLIDE 94

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

94

⋮ 1 billion tuples

sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit

Table Sample

slide-95
SLIDE 95

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

95

⋮ 1 billion tuples

sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit

Table Sample

slide-96
SLIDE 96

GT 8803 // Fall 2019

SAMPLING

  • Modern DBMSs also collect samples from

tables to estimate selectivities.

  • Update samples when the underlying tables

changes significantly.

96

⋮ 1 billion tuples

1/3 sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50

id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit

Table Sample

slide-97
SLIDE 97

GT 8803 // Fall 2019

OBSERVATION

  • Now that we can (roughly) estimate the

selectivity of predicates, what can we actually do with them?

97

slide-98
SLIDE 98

GT 8803 // Fall 2018

PLAN ENUMERATION

98

slide-99
SLIDE 99

GT 8803 // Fall 2019

QUERY OPTIMIZATION

  • After performing rule-based rewriting, the

DBMS will enumerate different plans for the query and estimate their costs.

– Single table. – Multiple tables.

  • It chooses the best plan it has seen for the

query after exhausting all plans or some timeout.

99

slide-100
SLIDE 100

GT 8803 // Fall 2019

SINGLE-TABLE QUERY PLANNING

  • Pick the best access method.

– Sequential Scan – Binary Search (clustered indexes) – Index Scan

  • Simple heuristics are often good enough for

this.

  • OLTP queries are especially easy.

100

slide-101
SLIDE 101

GT 8803 // Fall 2019

OLTP QUERY PLANNING

  • Query planning for OLTP queries is easy

because they are sargable.

– Search Argument Able – It is usually just picking the best index. – Joins are almost always on foreign key relationships with a small cardinality. – Can be implemented with simple heuristics.

101

slide-102
SLIDE 102

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

  • As number of joins increases, number of

alternative plans grows rapidly

– We need to restrict search space.

  • Fundamental decision in System R: only left-

deep join trees are considered.

– Modern DBMSs do not always make this assumption anymore.

102

slide-103
SLIDE 103

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

  • Fundamental decision in System R: Only

consider left-deep join trees.

103

⨝ ⨝ ⨝

A B C D

⨝ ⨝ ⨝

A B C D

⨝ ⨝ ⨝

A B C D

slide-104
SLIDE 104

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

  • Fundamental decision in System R: Only

consider left-deep join trees.

104

⨝ ⨝ ⨝

A B C D

⨝ ⨝ ⨝

A B C D

⨝ ⨝ ⨝

A B C D

X X

slide-105
SLIDE 105

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

  • Fundamental decision in System R: Only

consider left-deep join trees.

  • Allows for fully pipelined plans where

intermediate results are not written to temp files.

– Not all left-deep trees are fully pipelined.

105

slide-106
SLIDE 106

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

106

slide-107
SLIDE 107

GT 8803 // Fall 2019

MULTI-TABLE QUERY PLANNING

  • Enumerate the orderings

– Example: Left-deep tree #1, Left-deep tree #2…

  • Enumerate the plans for each operator

– Example: Hash, Sort-Merge, Nested Loop…

  • Enumerate the access paths for each table

– Example: Index #1, Index #2, Seq Scan…

  • Use dynamic programming to reduce the

number of cost estimations.

107

slide-108
SLIDE 108

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

108

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

R S T

slide-109
SLIDE 109

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

109

SortMerge Join

R.a=S.a

SortMerge Join

T.b=S.b

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

Hash Join

R.a=S.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

R S T

slide-110
SLIDE 110

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

110

SortMerge Join

R.a=S.a

SortMerge Join

T.b=S.b

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

Hash Join

R.a=S.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

Cost: 300 Cost: 400 Cost: 280 Cost: 200

R S T

slide-111
SLIDE 111

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

111

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

Hash Join

R.a=S.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

Cost: 300 Cost: 200

R S T

slide-112
SLIDE 112

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

112

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

Hash Join

R.a=S.a

Hash Join

S.b=T.b

SortMerge Join

S.b=T.b

SortMerge Join

S.a=R.a

Hash Join

S.a=R.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

Cost: 300 Cost: 200 Cost: 450 Cost: 300 Cost: 400 Cost: 380

R S T

slide-113
SLIDE 113

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

113

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

Hash Join

R.a=S.a

Hash Join

S.b=T.b

SortMerge Join

S.a=R.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

Cost: 300 Cost: 200 Cost: 300 Cost: 380

R S T

slide-114
SLIDE 114

GT 8803 // Fall 2018

DYNAMIC PROGRAMMING

114

Hash Join

T.b=S.b

  • • •

R ⨝ S T T ⨝ S R R ⨝ S ⨝ T

SortMerge Join

S.a=R.a

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

Cost: 200 Cost: 300

R S T

slide-115
SLIDE 115

GT 8803 // Fall 2019

CANDIDATE PLAN EXAMPLE

  • How to generate plans for

search algorithm:

– Enumerate relation orderings – Enumerate join algorithm choices – Enumerate access method choices

  • No real DBMSs does it this way.

It’s actually more messy…

115

SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b

slide-116
SLIDE 116

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #1: Enumerate table orderings

116

⨝ ⨝

T R S

⨝ ⨝

S T R

× ⨝

R S T

⨝ ⨝

R S T

⨝ ⨝

S R T

× ⨝

S T R

slide-117
SLIDE 117

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #1: Enumerate table orderings

117

⨝ ⨝

T R S

⨝ ⨝

S T R

× ⨝

R S T

⨝ ⨝

R S T

⨝ ⨝

S R T

× ⨝

S T R

Prune plans with cross- products immediately!

slide-118
SLIDE 118

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #1: Enumerate table orderings

118

⨝ ⨝

T R S

⨝ ⨝

S T R

× ⨝

R S T

⨝ ⨝

R S T

⨝ ⨝

S R T

× ⨝

S T R

X X

Prune plans with cross- products immediately!

slide-119
SLIDE 119

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #1: Enumerate table orderings

119

⨝ ⨝

T R S

⨝ ⨝

S T R

× ⨝

R S T

⨝ ⨝

R S T

⨝ ⨝

S R T

× ⨝

S T R

X X

Prune plans with cross- products immediately!

slide-120
SLIDE 120

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #2: Enumerate join algorithm choices

120

⨝ ⨝

R S T

slide-121
SLIDE 121

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #2: Enumerate join algorithm choices

121

⨝ ⨝

R S T R S T

NLJ NLJ

R S T

HJ NLJ

R S T

NLJ HJ

R S T

HJ HJ

slide-122
SLIDE 122

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #2: Enumerate join algorithm choices

122

⨝ ⨝

R S T

Do this for the other plans.

R S T

NLJ NLJ

R S T

HJ NLJ

R S T

NLJ HJ

R S T

HJ HJ

slide-123
SLIDE 123

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #2: Enumerate join algorithm choices

123

⨝ ⨝

R S T

Do this for the other plans.

R S T

NLJ NLJ

R S T

HJ NLJ

R S T

NLJ HJ

R S T

HJ HJ

slide-124
SLIDE 124

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #3: Enumerate access method choices

124

R S T

HJ HJ

slide-125
SLIDE 125

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #3: Enumerate access method choices

125

R S T

HJ HJ HJ HJ SeqScan SeqScan SeqScan HJ HJ SeqScan IndexScan(S.b) SeqScan

slide-126
SLIDE 126

GT 8803 // Fall 2019

CANDIDATE PLANS

  • Step #3: Enumerate access method choices

126

R S T

HJ HJ

Do this for the other plans.

HJ HJ SeqScan SeqScan SeqScan HJ HJ SeqScan IndexScan(S.b) SeqScan

slide-127
SLIDE 127

GT 8803 // Fall 2019

POSTGRES QUERY OPTIMIZER

  • Examines all types of join trees

– Left-deep, Right-deep, bushy

  • Two optimizer implementations:

– Traditional Dynamic Programming Approach – Genetic Query Optimizer (GEQO)

  • Postgres uses the traditional algorithm when

# of tables in query is less than 12 and switches to GEQO when there are 12 or more.

127

slide-128
SLIDE 128

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

128

1st Generation

R S T

NL NL

T R S

NL HJ

S R T

HJ HJ

slide-129
SLIDE 129

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

129

1st Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

Cost:2 00 Cost: 100

slide-130
SLIDE 130

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

130

Best:100

1st Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

Cost:2 00 Cost: 100

slide-131
SLIDE 131

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

131

Best:100

1st Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

slide-132
SLIDE 132

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

132

Best:100

1st Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

slide-133
SLIDE 133

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

133

Best:100

1st Generation 2nd Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

S R T

HJ HJ

R T S

NL HJ

T R S

HJ HJ

slide-134
SLIDE 134

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

134

Best:100

1st Generation 2nd Generation

R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

S R T

HJ HJ

R T S

NL HJ

T R S

HJ HJ

Cost: 80 Cost: 200 Cost: 110

slide-135
SLIDE 135

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

135

1st Generation 2nd Generation

Best:80 R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

S R T

HJ HJ

R T S

NL HJ

T R S

HJ HJ

Cost: 80 Cost: 200 Cost: 110

slide-136
SLIDE 136

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

136

1st Generation 2nd Generation

Best:80 R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

S R T

HJ HJ

R T S

NL HJ

T R S

HJ HJ

X

Cost: 80 Cost: 200 Cost: 110

slide-137
SLIDE 137

GT 8803 // Fall 2018

POSTGRES QUERY OPTIMIZER

137

1st Generation 2nd Generation 3rd Generation

Best:80 R S T

NL NL

Cost:3 00

T R S

NL HJ

S R T

HJ HJ

X

Cost:2 00 Cost: 100

S R T

HJ HJ

R T S

NL HJ

T R S

HJ HJ

X

Cost: 80 Cost: 200 Cost: 110

R S T

HJ HJ

R S T

HJ HJ

R T S

HJ HJ

Cost: 90 Cost:1 60 Cost: 120

slide-138
SLIDE 138

GT 8803 // Fall 2018

VISUAL QUERY OPTIMIZER

138

slide-139
SLIDE 139

GT 8803 // Fall 2019

VISUAL QUERY OPTIMIZATION

  • Queries only contain a complex predicate
  • Optimization techniques

– BlazeIt (Stanford): Rule-based optimization – PP (Microsoft Research): Cost-based optimization

139

SELECT frameID, vehType, vehColor FROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;

slide-140
SLIDE 140

GT 8803 // Fall 2019

VISUAL QUERY OPTIMIZATION

  • Queries only contain a complex predicate
  • Optimization techniques

– BlazeIt (Stanford): Rule-based optimization – PP (Microsoft Research): Cost-based optimization

140

SELECT frameID, vehType, vehColor FROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;

slide-141
SLIDE 141

GT 8803 // Fall 2019

BLAZEIT: RULE-BASED OPTIMIZER

  • Example: Content-based selection for red buses.

– Train a specialized NN to filter frames with buses – But the NN may not be accurate on every frame – Call the object detection model on uncertain frames – To account for this error rate, it uses held-out set of frames to estimate the selectivity and error rate.

  • Given an error budget, the optimizer selects

between the filters and uses rule-based

  • ptimization to select the fastest query plan

141

slide-142
SLIDE 142

GT 8803 // Fall 2019

BLAZEIT: RULE-BASED OPTIMIZER

  • Example: Choosing a filter
  • Consider two possible filters for redness:

– F1 :A filter which returns true if the over 80% of the pixels have a red-channel value of at least 200 – F2: A filter that returns the average of the red- channel values

142

slide-143
SLIDE 143

GT 8803 // Fall 2019

BLAZEIT: RULE-BASED OPTIMIZER

  • In estimating thresholds at the frame-level

based on frames from the held-out set, it learns that:

– sel(F1)= 0.9 and sel(F2)= 0.3

  • Which filter should it pick?

– More selective filter (F2)

143

slide-144
SLIDE 144

GT 8803 // Fall 2019

PP: COST-BASED OPTIMIZER

  • Decompose a complex predicate to

expressions over simple predicates

– Old: <vehType=SUV AND vehColor=red> – New: <vehType=SUV> ∧ <vehColor=red>

  • Rewrite rules (logical equivalences):

– p ∧ (Prest) ⇒ Filterp – Filterp∧q ⇒ Filterp ∧ Filterq – Filterp∨q ⇒ Filterp ∨ Filterq

144

slide-145
SLIDE 145

GT 8803 // Fall 2019

PP: COST-BASED OPTIMIZER

  • Sort the list of available filters based on:

– Filter evaluation cost (C) – Data reduction ratio (R[Accuracy])

  • Efficacy of filter = C / R[1]

– A smaller ratio of cost to data reduction indicates better performance

145

slide-146
SLIDE 146

GT 8803 // Fall 2019

PP: COST-BASED OPTIMIZER

  • Example: (p ∨ q) ∧ ¬r ∧ Prest
  • ⇒p ∨ q ⇒ Fp∨q ⇒Fp∨ Fq
  • ⇒¬r ⇒ F¬r
  • ⇒ F(p∨q)∧¬r ⇒ (Fp∨ Fq) ∧ F¬r
  • ⇒ F(p∧¬r)∨(q∧¬r) ⇒ Fp∧¬r ∨ Fq∧¬r

⇒ (Fp ∧ F¬r) ∨ (Fq ∧ F¬r)

146

slide-147
SLIDE 147

GT 8803 // Fall 2019

PP: COST-BASED OPTIMIZER

  • Pruning search space

– Limit the number of different filters to be a small constant (k)

  • Example:

– Available filters: {Fp∨q, Fp, Fp∧¬r, Fq∧¬r, Fq, F¬r} – Query requirements: {Fp∨q, Fp∨ Fq, F¬r, (Fp∨ Fq) ∧ F¬r, Fp∧¬r ∨ Fq∧¬r } – k = 2 – Candidate plans: {Fp∨q, F¬r, Fp∧¬r ∨ Fq∧¬r}

147

slide-148
SLIDE 148

GT 8803 // Fall 2019

PP: COST-BASED OPTIMIZER

  • Plan Enumeration

– First, explore different allocations of the query’s accuracy budget to individual filters. – Next, explore different orderings of filters within a conjunction or disjunction.

  • Cost Estimation

– Finally, after fixing both the accuracy thresholds and the order of filters, compute the cost and reduction rate of the resulting plan.

148

slide-149
SLIDE 149

GT 8803 // Fall 2019

PARTING THOUGHTS

  • Filter as early as possible.
  • Filter selectivity estimations

– Uniformity, Independence, Histograms

  • Dynamic programming for join orderings
  • Again, query optimization is super important.

149

slide-150
SLIDE 150

GT 8803 // Fall 2019

NEXT LECTURE

  • Convolutional neural networks

– Popular neural network architecture

150