DATA ANALYTICS USING DEEP LEARNING
GT 8803 // FALL 2019 // JOY ARULRAJ
L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY - - PowerPoint PPT Presentation
DATA ANALYTICS USING DEEP LEARNING GT 8803 // FALL 2019 // JOY ARULRAJ L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N administrivia Reminders Assignment 1: postponed to next Monday Sign up for discussion slots on Thursday
L E C T U R E # 0 9 : Q U E R Y O P T I M I Z A T I O N
GT 8803 // Fall 2019
– Assignment 1: postponed to next Monday – Sign up for discussion slots on Thursday – Proposal presentations on next Wednesday
2
GT 8803 // Fall 2019
– Tuple-at-a-time – Operator-at-a-time – Vector-at-a-time
3
SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND B.value > 100
A.id=B.id value>100 A.id, B.value
GT 8803 // Fall 2019
– Sequential scan – Index scan – Multi-index scan
4
101 102 103 104
GT 8803 // Fall 2019
– Sequential scan – Index scan – Multi-index scan
5
101 102 103 104
GT 8803 // Fall 2019
– Sequential scan – Index scan – Multi-index scan
6
101 102 103 104
GT 8803 // Fall 2019
– Filtering classifier, Sampling
7
OLD PLAN NEW PLAN
GT 8803 // Fall 2019
8
GT 8803 // Fall 2018
9
GT 8803 // Fall 2019
Connection Manager + Admission Control Query Parser Query Optimizer Query Executor Lock Manager (Concurrency Control) Access Methods (or Indexes) Buffer Pool Manager Log Manager Memory Manager + Disk Manager Networking Manager
10
Source: Anatomy of a Database System
GT 8803 // Fall 2019
– User tells the DBMS what answer they want, not how to get the answer.
– 1.3 hours vs. 0.45 seconds
11
GT 8803 // Fall 2019
12
GT 8803 // Fall 2019
– Rewrite the query to remove inefficient things. – Does not require a cost model.
– Use a cost model to evaluate multiple equivalent plans and pick the one with the lowest cost.
13
GT 8803 // Fall 2018
14
GT 8803 // Fall 2018
15
SQL Query
GT 8803 // Fall 2018
16
SQL Query
Abstract Syntax Tree
GT 8803 // Fall 2018
17
SQL Query
Abstract Syntax Tree
System Catalog
Name→Internal ID
GT 8803 // Fall 2018
18
SQL Query
Abstract Syntax Tree Annotated AST
System Catalog Rewriter
(Optional)
Name→Internal ID
GT 8803 // Fall 2018
19
SQL Query
Abstract Syntax Tree Annotated AST
System Catalog Rewriter
(Optional)
Annotated AST Name→Internal ID
GT 8803 // Fall 2018
20
SQL Query
Abstract Syntax Tree Annotated AST
System Catalog Rewriter
(Optional)
Annotated AST Name→Internal ID
GT 8803 // Fall 2018
21
SQL Query
Abstract Syntax Tree Annotated AST
Cost Model System Catalog Rewriter
(Optional)
Annotated AST Name→Internal ID
GT 8803 // Fall 2018
22
SQL Query
Abstract Syntax Tree Annotated AST
Query Plan Cost Model System Catalog Rewriter
(Optional)
Annotated AST Name→Internal ID
GT 8803 // Fall 2019
23
GT 8803 // Fall 2019
– The DBMS can identify better query plans without a cost model. – This is often called query rewriting.
24
GT 8803 // Fall 2018
25
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'
GT 8803 // Fall 2018
26
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'
GT 8803 // Fall 2018
27
SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'
GT 8803 // Fall 2019
– Perform filters as early as possible. – Reorder predicates so that the DBMS applies the most selective one first. – Break a complex predicate, and push down
– (X=Y AND Y=3) → X=3 AND Y=3
28
GT 8803 // Fall 2019
– Perform them early to create smaller tuples and reduce intermediate results (if duplicates are eliminated) – Project out all attributes except the ones requested
29
GT 8803 // Fall 2018
30
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'
GT 8803 // Fall 2018
31
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
student enrolled
s.sid=e.sid grade='A' s.name,e.cid
sid,cid
sid,namep
SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A'
GT 8803 // Fall 2019
32
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
33
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
34
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0;X
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
35
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
36
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
37
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
38
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
39
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
40
Source: Lukas Eder
SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
41
Source: Lukas Eder
SELECT * FROM A AS A1 WHERE EXISTS(SELECT * FROM A AS A2 WHERE A1.id = A2.id);
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
42
Source: Lukas Eder
SELECT * FROM A AS A1 WHERE EXISTS(SELECT * FROM A AS A2 WHERE A1.id = A2.id);
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
43
Source: Lukas Eder
SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
44
Source: Lukas Eder
SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
45
Source: Lukas Eder
SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
46
Source: Lukas Eder
SELECT * FROM A WHERE val BETWEEN 1 AND 150; SELECT * FROM A;
CREATE TABLE A ( id INT PRIMARY KEY, val INT NOT NULL );
GT 8803 // Fall 2019
– Commutative, associative
– How many different orderings are there for an n- way join?
47
GT 8803 // Fall 2019
– Exhaustive enumeration will be too slow.
48
GT 8803 // Fall 2018
49
GT 8803 // Fall 2019
– CPU: Small cost; tough to estimate – Disk: # of block transfers – Memory: Amount of DRAM used
50
GT 8803 // Fall 2019
– Postgres/SQLite: ANALYZE – SQL Server: UPDATE STATISTICS
51
GT 8803 // Fall 2019
– NR: Number of tuples in R. – V(A,R): Number of distinct values for attribute A.
52
GT 8803 // Fall 2019
53
GT 8803 // Fall 2019
– 10,000 students, 10 colleges – how many students in SCS?
54
GT 8803 // Fall 2019
55
GT 8803 // Fall 2019
56
SELECT * FROM people WHERE id = 123
GT 8803 // Fall 2019
57
SELECT * FROM people WHERE id = 123 SELECT * FROM people WHERE val > 1000 SELECT * FROM people WHERE age = 30 AND status = 'Lit'
GT 8803 // Fall 2019
– Equality – Range – Negation – Conjunction – Disjunction
58
GT 8803 // Fall 2019
– Equality – Range – Negation – Conjunction – Disjunction
59
GT 8803 // Fall 2019
60
SELECT * FROM people WHERE age = 2
GT 8803 // Fall 2019
– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =
61
SELECT * FROM people WHERE age = 2
GT 8803 // Fall 2019
– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =
62
1 2 3 4 count age
SELECT * FROM people WHERE age = 2
GT 8803 // Fall 2019
– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =
63
1 2 3 4 count age
V(age,people)=5
SELECT * FROM people WHERE age = 2
GT 8803 // Fall 2019
– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =
64
1 2 3 4 count age
V(age,people)=5 SC(age=2)=1
SELECT * FROM people WHERE age = 2
GT 8803 // Fall 2019
– sel(A=constant) = SC(P) / NR – Example: sel(age=2) =
65
1 2 3 4 count age
V(age,people)=5 SC(age=2)=1
SELECT * FROM people WHERE age = 2 1/5
GT 8803 // Fall 2019 1 2 3 4 count age
– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)
66
SELECT * FROM people WHERE age >= 2
GT 8803 // Fall 2019 1 2 3 4 count age
– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)
67
SELECT * FROM people WHERE age >= 2
GT 8803 // Fall 2019 1 2 3 4 count age
– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)
68
agemin = 0 agemax = 4
SELECT * FROM people WHERE age >= 2
GT 8803 // Fall 2019 1 2 3 4 count age
– sel(A>=a) = (Amax–a) / (Amax–Amin) – Example: sel(age >= 2)
69
= (4–2) / (4–0) = 1/2
agemin = 0 agemax = 4
SELECT * FROM people WHERE age >= 2
GT 8803 // Fall 2019 1 2 3 4 count age
70
SELECT * FROM people WHERE age != 2
GT 8803 // Fall 2019 1 2 3 4 count age
71
SC(age=2)=1
SELECT * FROM people WHERE age != 2
GT 8803 // Fall 2019 1 2 3 4 count age
72
SC(age!=2)=2 SC(age!=2)=2
SELECT * FROM people WHERE age != 2
GT 8803 // Fall 2019 1 2 3 4 count age
73
= 1 – (1/5) = 4/5
SC(age!=2)=2 SC(age!=2)=2
SELECT * FROM people WHERE age != 2
GT 8803 // Fall 2019 1 2 3 4 count age
– sel(not P) = 1 – sel(P) – Example: sel(age != 2)
74
= 1 – (1/5) = 4/5
SC(age!=2)=2 SC(age!=2)=2
SELECT * FROM people WHERE age != 2
GT 8803 // Fall 2019
– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')
75
SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'
GT 8803 // Fall 2019
– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')
76
SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'
GT 8803 // Fall 2019
– sel(P1 ⋀ P2) = sel(P1) · sel(P2) – sel(age=2 ⋀ name LIKE 'A%')
77
SELECT * FROM people WHERE age = 2 AND name LIKE 'A%'
GT 8803 // Fall 2019
– sel(P1 ⋁ P2) = sel(P1)+ sel(P2)–sel(P1⋁P2) = sel(P1)+ sel(P2)–sel(P1)· sel(P2) – sel(age=2 OR name LIKE 'A%')
78
SELECT * FROM people WHERE age = 2 OR name LIKE 'A%'
GT 8803 // Fall 2019
– sel(P1 ⋁ P2) = sel(P1)+ sel(P2)–sel(P1⋁P2) = sel(P1)+ sel(P2)–sel(P1)· sel(P2) – sel(age=2 OR name LIKE 'A%')
79
SELECT * FROM people WHERE age = 2 OR name LIKE 'A%'
GT 8803 // Fall 2019
80
GT 8803 // Fall 2019
– Match each R-tuple with S-tuples: estSize ≈ NR · NS / V(A,S) – Symmetrically, for S: estSize ≈ NR · NS / V(A,R)
– estSize ≈ NR · NS / max({V(A,S), V(A,R)})
81
GT 8803 // Fall 2019
82
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Uniform Approximation
GT 8803 // Fall 2019
83
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Uniform Approximation
Distinct values of attribute # of occurrences
GT 8803 // Fall 2019
84
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Non-Uniform Approximation
GT 8803 // Fall 2019
85
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Non-Uniform Approximation
Bucket #1 Count=8 Bucket #2 Count=4 Bucket #3 Count=15 Bucket #4 Count=3 Bucket #5 Count=14
GT 8803 // Fall 2019
86
Bucket #1 Count=8 Bucket #2 Count=4 Bucket #3 Count=15 Bucket #4 Count=3 Bucket #5 Count=14
5 10 15
1-3 4-6 7-9 10-12 13-15
Non-Uniform Approximation
Bucket Ranges
GT 8803 // Fall 2019
87
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Equi-width Histogram (Quantiles)
GT 8803 // Fall 2019
88
2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Equi-width Histogram (Quantiles)
Bucket #1 Count=12 Bucket #2 Count=12 Bucket #3 Count=9 Bucket #4 Count=12
GT 8803 // Fall 2019
89
5 10 15 1-5 6-8 9-13 14-15 Equi-width Histogram (Quantiles)
GT 8803 // Fall 2019
90
5 10 15 1-5 6-8 9-13 14-15 Equi-width Histogram (Quantiles)
GT 8803 // Fall 2019
91
SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit
GT 8803 // Fall 2019
92
SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit
GT 8803 // Fall 2019
93
SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit
GT 8803 // Fall 2019
94
sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit
GT 8803 // Fall 2019
95
sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit
GT 8803 // Fall 2019
96
1/3 sel(age>50) = SELECT AVG(age) FROM people WHERE age > 50
id name age status 1001 Obama 56 Rested 1002 Kanye 40 Weird 1003 Tupac 25 Dead 1004 Bieber 23 Crunk 1005 Andy 37 Lit 1001 Obama 56 Rested 1003 Tupac 25 Dead 1005 Andy 37 Lit
GT 8803 // Fall 2019
97
GT 8803 // Fall 2018
98
GT 8803 // Fall 2019
– Single table. – Multiple tables.
99
GT 8803 // Fall 2019
– Sequential Scan – Binary Search (clustered indexes) – Index Scan
100
GT 8803 // Fall 2019
– Search Argument Able – It is usually just picking the best index. – Joins are almost always on foreign key relationships with a small cardinality. – Can be implemented with simple heuristics.
101
GT 8803 // Fall 2019
– We need to restrict search space.
– Modern DBMSs do not always make this assumption anymore.
102
GT 8803 // Fall 2019
103
A B C D
A B C D
A B C D
GT 8803 // Fall 2019
104
A B C D
A B C D
A B C D
GT 8803 // Fall 2019
– Not all left-deep trees are fully pipelined.
105
GT 8803 // Fall 2019
106
GT 8803 // Fall 2019
– Example: Left-deep tree #1, Left-deep tree #2…
– Example: Hash, Sort-Merge, Nested Loop…
– Example: Index #1, Index #2, Seq Scan…
107
GT 8803 // Fall 2018
108
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
R S T
GT 8803 // Fall 2018
109
SortMerge Join
R.a=S.a
SortMerge Join
T.b=S.b
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
Hash Join
R.a=S.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
R S T
GT 8803 // Fall 2018
110
SortMerge Join
R.a=S.a
SortMerge Join
T.b=S.b
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
Hash Join
R.a=S.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
Cost: 300 Cost: 400 Cost: 280 Cost: 200
R S T
GT 8803 // Fall 2018
111
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
Hash Join
R.a=S.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
Cost: 300 Cost: 200
R S T
GT 8803 // Fall 2018
112
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
Hash Join
R.a=S.a
Hash Join
S.b=T.b
SortMerge Join
S.b=T.b
SortMerge Join
S.a=R.a
Hash Join
S.a=R.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
Cost: 300 Cost: 200 Cost: 450 Cost: 300 Cost: 400 Cost: 380
R S T
GT 8803 // Fall 2018
113
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
Hash Join
R.a=S.a
Hash Join
S.b=T.b
SortMerge Join
S.a=R.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
Cost: 300 Cost: 200 Cost: 300 Cost: 380
R S T
GT 8803 // Fall 2018
114
Hash Join
T.b=S.b
R ⨝ S T T ⨝ S R R ⨝ S ⨝ T
SortMerge Join
S.a=R.a
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
Cost: 200 Cost: 300
R S T
GT 8803 // Fall 2019
– Enumerate relation orderings – Enumerate join algorithm choices – Enumerate access method choices
115
SELECT * FROM R, S, T WHERE R.a = S.a AND S.b = T.b
GT 8803 // Fall 2019
116
T R S
S T R
R S T
R S T
S R T
S T R
GT 8803 // Fall 2019
117
T R S
S T R
R S T
R S T
S R T
S T R
Prune plans with cross- products immediately!
GT 8803 // Fall 2019
118
T R S
S T R
R S T
R S T
S R T
S T R
Prune plans with cross- products immediately!
GT 8803 // Fall 2019
119
T R S
S T R
R S T
R S T
S R T
S T R
Prune plans with cross- products immediately!
GT 8803 // Fall 2019
120
R S T
GT 8803 // Fall 2019
121
R S T R S T
NLJ NLJ
R S T
HJ NLJ
R S T
NLJ HJ
R S T
HJ HJ
GT 8803 // Fall 2019
122
R S T
Do this for the other plans.
R S T
NLJ NLJ
R S T
HJ NLJ
R S T
NLJ HJ
R S T
HJ HJ
GT 8803 // Fall 2019
123
R S T
Do this for the other plans.
R S T
NLJ NLJ
R S T
HJ NLJ
R S T
NLJ HJ
R S T
HJ HJ
GT 8803 // Fall 2019
124
R S T
HJ HJ
GT 8803 // Fall 2019
125
R S T
HJ HJ HJ HJ SeqScan SeqScan SeqScan HJ HJ SeqScan IndexScan(S.b) SeqScan
GT 8803 // Fall 2019
126
R S T
HJ HJ
Do this for the other plans.
HJ HJ SeqScan SeqScan SeqScan HJ HJ SeqScan IndexScan(S.b) SeqScan
GT 8803 // Fall 2019
– Left-deep, Right-deep, bushy
– Traditional Dynamic Programming Approach – Genetic Query Optimizer (GEQO)
127
GT 8803 // Fall 2018
128
R S T
NL NL
T R S
NL HJ
S R T
HJ HJ
GT 8803 // Fall 2018
129
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
GT 8803 // Fall 2018
130
Best:100
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
GT 8803 // Fall 2018
131
Best:100
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
GT 8803 // Fall 2018
132
Best:100
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
GT 8803 // Fall 2018
133
Best:100
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
S R T
HJ HJ
R T S
NL HJ
T R S
HJ HJ
GT 8803 // Fall 2018
134
Best:100
R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
S R T
HJ HJ
R T S
NL HJ
T R S
HJ HJ
Cost: 80 Cost: 200 Cost: 110
GT 8803 // Fall 2018
135
Best:80 R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
S R T
HJ HJ
R T S
NL HJ
T R S
HJ HJ
Cost: 80 Cost: 200 Cost: 110
GT 8803 // Fall 2018
136
Best:80 R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
S R T
HJ HJ
R T S
NL HJ
T R S
HJ HJ
Cost: 80 Cost: 200 Cost: 110
GT 8803 // Fall 2018
137
Best:80 R S T
NL NL
Cost:3 00
T R S
NL HJ
S R T
HJ HJ
Cost:2 00 Cost: 100
S R T
HJ HJ
R T S
NL HJ
T R S
HJ HJ
Cost: 80 Cost: 200 Cost: 110
R S T
HJ HJ
R S T
HJ HJ
R T S
HJ HJ
Cost: 90 Cost:1 60 Cost: 120
GT 8803 // Fall 2018
138
GT 8803 // Fall 2019
– BlazeIt (Stanford): Rule-based optimization – PP (Microsoft Research): Cost-based optimization
139
SELECT frameID, vehType, vehColor FROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;
GT 8803 // Fall 2019
– BlazeIt (Stanford): Rule-based optimization – PP (Microsoft Research): Cost-based optimization
140
SELECT frameID, vehType, vehColor FROM PROCESS(inputVideo) WHERE vehType=SUV ∧ vehColor=red;
GT 8803 // Fall 2019
– Train a specialized NN to filter frames with buses – But the NN may not be accurate on every frame – Call the object detection model on uncertain frames – To account for this error rate, it uses held-out set of frames to estimate the selectivity and error rate.
141
GT 8803 // Fall 2019
– F1 :A filter which returns true if the over 80% of the pixels have a red-channel value of at least 200 – F2: A filter that returns the average of the red- channel values
142
GT 8803 // Fall 2019
– sel(F1)= 0.9 and sel(F2)= 0.3
– More selective filter (F2)
143
GT 8803 // Fall 2019
– Old: <vehType=SUV AND vehColor=red> – New: <vehType=SUV> ∧ <vehColor=red>
– p ∧ (Prest) ⇒ Filterp – Filterp∧q ⇒ Filterp ∧ Filterq – Filterp∨q ⇒ Filterp ∨ Filterq
144
GT 8803 // Fall 2019
– Filter evaluation cost (C) – Data reduction ratio (R[Accuracy])
– A smaller ratio of cost to data reduction indicates better performance
145
GT 8803 // Fall 2019
146
GT 8803 // Fall 2019
– Limit the number of different filters to be a small constant (k)
– Available filters: {Fp∨q, Fp, Fp∧¬r, Fq∧¬r, Fq, F¬r} – Query requirements: {Fp∨q, Fp∨ Fq, F¬r, (Fp∨ Fq) ∧ F¬r, Fp∧¬r ∨ Fq∧¬r } – k = 2 – Candidate plans: {Fp∨q, F¬r, Fp∧¬r ∨ Fq∧¬r}
147
GT 8803 // Fall 2019
– First, explore different allocations of the query’s accuracy budget to individual filters. – Next, explore different orderings of filters within a conjunction or disjunction.
– Finally, after fixing both the accuracy thresholds and the order of filters, compute the cost and reduction rate of the resulting plan.
148
GT 8803 // Fall 2019
– Uniformity, Independence, Histograms
149
GT 8803 // Fall 2019
– Popular neural network architecture
150