Reverse Engineering Top-k Join Queries
Kiril Panev
panev@cs.uni-kl.de
Sebastian Michel
smichel@cs.uni-kl.de
Nico Weisenauer
n_weisenau10@cs.uni-kl.de
Reverse Engineering Top-k Join Queries Kiril Panev - - PowerPoint PPT Presentation
Reverse Engineering Top-k Join Queries Kiril Panev panev@cs.uni-kl.de Nico Weisenauer n_weisenau10@cs.uni-kl.de Sebastian Michel smichel@cs.uni-kl.de SELECT c.name, max(o.price) Top-3 Customers FROM customers c, Bruce Campbell 1000
panev@cs.uni-kl.de
smichel@cs.uni-kl.de
n_weisenau10@cs.uni-kl.de
2
SELECT c.name, max(o.price) FROM customers c,
WHERE c.customer_id = o.customer_id AND c.country = ‘England’ GROUP BY c.name ORDER by max(o.price) DESC LIMIT 3
reverse engineer
3
SELECT c.name, max(o.price) FROM customers c,
WHERE c.customer_id = o.customer_id AND c.country = ‘England’ GROUP BY c.name ORDER by max(o.price) DESC LIMIT 5
Customer ID Name Country Balance 1 John Doe England 250.49 2 Adam Miller England 124.56 7 Sam Burns Scotland 154.67 12 Benjamin Smith Wales 1955.22 50 Bruce Campbell England 45.99 … … … … Order ID Customer ID Price Date 1 1 24.50 11/28/14 2 1 749.90 04/01/15 23 2 22.49 12/01/11 24 2 199.99 12/30/12 78 50 1.99 10/01/12 79 50 1000.00 02/27/15 … … … …
Customers Orders
entity score Bruce Campbell 1000.00 John Doe 749.90 Adam Miller 199.99
L
4
Customer ID Name Country Balance 1 John Doe England 250.49 2 Adam Miller England 124.56 7 Sam Burns Scotland 154.67 12 Benjamin Smith Wales 1955.22 50 Bruce Campbell England 45.99 … … … … Order ID Customer ID Price Date 1 1 24.50 11/28/14 2 1 749.90 04/01/15 23 2 22.49 12/01/11 24 2 199.99 12/30/12 78 50 1.99 10/01/12 79 50 1000.00 02/27/15 … … … …
Customers Orders
entity score Bruce Campbell 1000.00 John Doe 749.90 Adam Miller 199.99
L
entity score? score?
5
Customer ID Name Country Balance 1 John Doe England 250.49 2 Adam Miller England 124.56 7 Sam Burns Scotland 154.67 12 Benjamin Smith Wales 1955.22 50 Bruce Campbell England 45.99 … … … … Order ID Customer ID Price Date 1 1 24.50 11/28/14 2 1 749.90 04/01/15 23 2 22.49 12/01/11 24 2 199.99 12/30/12 78 50 1.99 10/01/12 79 50 1000.00 02/27/15 … … … …
Customers Orders
entity score Bruce Campbell 1000.00 John Doe 749.90 Adam Miller 199.99
L
entity score
SELECT c.name, max(o.price) FROM customers c,
WHERE c.customer_id = o.customer_id AND c.country = ‘England’ GROUP BY c.name ORDER by max(o.price) DESC LIMIT 5
6
7
8
SELECT entity, score FROM A, B, … WHERE A.id = B.a_id … AND P1 and P2 and … GROUP BY entity ORDER BY 2 DESC LIMIT k reverse engineer
join predicate
entity score Bruce Campbell 1000.00 John Doe 749.90 Adam Miller 199.99 [Panev, Michel, EDBT 2016]
Find Join Predicates Ranked Candidate Joins Top-k list Valid Queries Find Predicates and Ranking Criteria Candidate Query Validation
9
1 2 3 4 5 6
10
Find Predicates, Ranking Criteria, and Candidate Query Validation
Customer ID Name Country Balance 1 John Doe England 250.49 2 Adam Miller England 124.56 7 Sam Burns Scotland 154.67 12 Benjamin Smith Wales 1955.22 50 Bruce Campbell England 45.99 … … … … Order ID Customer ID Price Date 1 1 24.50 11/28/14 2 1 749.90 04/01/15 23 2 22.49 12/01/11 24 2 199.99 12/30/12 78 50 1.99 10/01/12 79 50 1000.00 02/27/15 … … … …
Customers Orders
entity score Bruce Campbell 1000.00 John Doe 749.90 Adam Miller 199.99
L
entity score? score?
11
12
Customer1 Nation1 Orders1 Region1 Customer2 Supplier1 Customer3 Lineitem1 ... ... ... ... ... ... ... ... ... ...
13
14
15
16
SELECT A.entity, * FROM A, B, C WHERE A.id = B.id AND B.id = C.id AND A.entity = ‘e1‘
17
L.v 100 ∘ ∈ {+,×}
18
19
20
Find Join Predicates Ranked Candidate Joins Top-k list Valid Queries Find Predicates and Ranking Criteria Candidate Query Validation
21
22
score: max(A), avg(A), sum(A), sum(A+B), sum(A∗B), no-agg
23
5 10 15 20 25 1 2 3 4 5 6 8 AVG
Baseline Advanced
24
0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 Baseline Advanced
Time in s
Step1 Step2 S3+S4+S5 Instance Verification
25
TPC-H
10 20 30 40 50 60 BASELINE ADVANCED
Time in s
AVG MAX SUM S2P S2S NO
(a) 1-Join
50 100 150 200 250 300 350 400 BASELINE ADVANCED
Time in s
AVG MAX SUM S2P S2S NO
(b) 2-Joins
100 200 300 400 500 600 700 BASELINE ADVANCED
Time in s
AVG MAX SUM S2P S2S NO
(c) 3-Joins
26
27
Success_hero.jpg.aspx
28