CS4224/CS5424 Lecture 9 Distributed Query Processing Query - - PowerPoint PPT Presentation
CS4224/CS5424 Lecture 9 Distributed Query Processing Query - - PowerPoint PPT Presentation
CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a query plan that minimizes some cost function Minimize total cost CPU cost, I/O cost, & communication cost Minimize response time
Query Processing
- Translates query into a query plan that
minimizes some cost function
◮ Minimize total cost ⋆ CPU cost, I/O cost, & communication cost ◮ Minimize response time ⋆ Time elapsed for query execution CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 2
Example
- Site A: Relations R(a, c, · · · ) & S(a, · · · )
- Site B: Relations T(b, c, · · · ) & U(b, · · · )
- Query at Site B:
SELECT * FROM R, S, T, U WHERE R.a = S.a AND T.b = U.b AND R.c = T.c
- Query Plans:
⊲ ⊳ ⊲ ⊳ R S ⊲ ⊳ T U
Site B
⊲ ⊳ ⊲ ⊳ ⊲ ⊳ R S T U
Site B
Plan 1 Plan 2
CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 3
Example (cont.)
- Cost model:
◮ JC(X,Y) = CPU & I/O cost of joining relations X & Y ◮ CC(X) = Communication cost of sending relation X from one
site to another site
◮ JC(R, S) = 2000,
JC(T, U) = 2000
◮ JC(R ⊲
⊳ S, T) = 1000, JC(R ⊲ ⊳ S ⊲ ⊳ T, U) = 600
◮ JC(R ⊲
⊳ S, T ⊲ ⊳ U) = 100, CC(R ⊲ ⊳ S) = 200
⊲ ⊳ ⊲ ⊳ R S ⊲ ⊳ T U
Site B
⊲ ⊳ ⊲ ⊳ ⊲ ⊳ R S T U
Site B
Plan 1 Plan 2
CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 4
Example (cont.)
- Cost model:
◮ JC(X,Y) = CPU & I/O cost of joining relations X & Y ◮ CC(X) = Communication cost of sending relation X from one
site to another site
◮ JC(R, S) = 2000,
JC(T, U) = 2000
◮ JC(R ⊲
⊳ S, T) = 1000, JC(R ⊲ ⊳ S ⊲ ⊳ T, U) = 600
◮ JC(R ⊲
⊳ S, T ⊲ ⊳ U) = 100, CC(R ⊲ ⊳ S) = 200
100 2000
R S
2000
T U
200
600 1000 2000
R S T U
200
Plan 1: Total Cost = 4300 Plan 2: Total Cost = 3800
CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 5
Example (cont.)
- Cost model:
◮ JC(X,Y) = CPU & I/O cost of joining relations X & Y ◮ CC(X) = Communication cost of sending relation X from one
site to another site
◮ JC(R, S) = 2000,
JC(T, U) = 2000
◮ JC(R ⊲
⊳ S, T) = 1000, JC(R ⊲ ⊳ S ⊲ ⊳ T, U) = 600
◮ JC(R ⊲
⊳ S, T ⊲ ⊳ U) = 100, CC(R ⊲ ⊳ S) = 200
100 2000
R S
2000
T U
200
600 1000 2000
R S T U
200
Plan 1: Response Time = 2300 Plan 2: Response Time = 3800
CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 6
Query Processing Steps
- Query rewriting
◮ Query decomposition ⋆ Translates query into relational algebra query ◮ Data localization ⋆ Rewrites distributed query into a fragment query
- Global query optimization
◮ Finds an optimal execution plan for query
- Distributed query execution
◮ Executes query plan to compute query result CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 7
Query Decomposition
SELECT * FROM R, S WHERE R.a = S.a AND R.b = 20
− →
⊲ ⊳R.a=S.a σb=20 R S
- Normalization
◮ Rewrites query into some normal form
- Semantic Analysis
◮ Checks that query is semantically correct
- Simplification & Restructuring
◮ Rewrites query into simpler form (e.g., eliminates redundancy) CS4224/CS5424: Sem 1, 2019/20 Query Decomposition 8
Normalization
- A simple predicate defined on a relation R is of
the form “Ai op v” where Ai is an attribute of R,
- p ∈ {=, =, <, ≤, >, ≥} and v ∈ Domain(Ai)
- Conjunctive Normal Form (CNF)
(p11 ∨p12 · · ·∨p1n1) ∧ · · ·∧ (pm1 ∨pm2 · · ·∨pmnm)
- Disjunctive Normal Form (DNF)
(p11 ∧p12 · · ·∧p1n1) ∨ · · ·∨ (pm1 ∧pm2 · · ·∧pmnm)
- Each pij is a simple predicate
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Normalization 9
Review of RA Equivalence Rules
attributes(R) = Set of attributes in schema of relation R attributes(p) = Set of attributes in predicate p
- 1. Commutativity of binary operators
1.1 R × S ≡ S × R 1.2 R ⊲ ⊳ S ≡ S ⊲ ⊳ R
- 2. Associativity of binary operators
2.1 (R × S) × T ≡ R × (S × T) 2.2 (R ⊲ ⊳ S) ⊲ ⊳ T ≡ R ⊲ ⊳ (S ⊲ ⊳ T)
- 3. Idempotence of unary operators
3.1 πL′(πL(R)) ≡ πL′(R) if L′ ⊆ L ⊆ attributes(R) 3.2 σp1(σp2(R)) ≡ σp1∧p2(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 10
Review of RA Equivalence Rules (cont.)
- 4. Commutating selection with projection
4.1 πL(σp(R)) ≡ πL(σp(πL∪attributes(p)(R)))
- 5. Commutating selection with binary
- perators
5.1 σp(R × S) ≡ σp(R) × S if attributes(p) ⊆ attributes(R) 5.2 σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R) 5.3 σp(R ∪ S) ≡ σp(R) ∪ σp(S)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 11
Review of RA Equivalence Rules (cont.)
- 6. Commutating projection with binary
- perators
Let L = LR ∪ LS, where LR ⊆ attributes(R) and LS ⊆ attributes(S)
6.1 πL(R × S) ≡ πLR(R) × πLS(S) 6.2 πL(R ⊲ ⊳p S) ≡ πLR(R) ⊲ ⊳p πLS(S) if attributes(p) ∩ attributes(R) ⊆ LR and attributes(p) ∩ attributes(S) ⊆ LS 6.3 πL(R ∪ S) ≡ πL(R) ∪ πL(S)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 12
Example
Student (sid, sname, major) Course (cid, cname, area) Enrol (sid,cid, grade)
SELECT sname, cname FROM Student S, Course C, Enrol E WHERE E.sid = S.sid AND E.cid = C.cid AND major = "CS" AND area = "DB" AND grade = "F"
− →
πsname, cname σ(major=”CS”)∧(area=”DB”)∧(grade=”F”) ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid Enrol E Student S
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 13
Example (cont.)
πsname, cname σ(major=”CS”)∧(area=”DB”)∧(grade=”F”) ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid Enrol E Student S
3.2
− →
πsname, cname σmajor=”CS” σarea=”DB” σgrade=”F” ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid Enrol E Student S
σp1(σp2(R)) ≡ σp1∧p2(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname σmajor=”CS” σarea=”DB” ⊲ ⊳C.cid=E.cid Course C σgrade=”F” ⊲ ⊳E.sid=S.sid Enrol E Student S
5.2
← −
πsname, cname σmajor=”CS” σarea=”DB” σgrade=”F” ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid Enrol E Student S
σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname σmajor=”CS” σarea=”DB” ⊲ ⊳C.cid=E.cid Course C σgrade=”F” ⊲ ⊳E.sid=S.sid Enrol E Student S
5.2
− →
πsname, cname σmajor=”CS” σarea=”DB” ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student S
σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname σmajor=”CS” ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student S
5.2
← −
πsname, cname σmajor=”CS” σarea=”DB” ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student S
σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname σmajor=”CS” ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student S
5.2
− →
πsname, cname ⊲ ⊳C.cid=E.cid σarea=”DB” Course C σmajor=”CS” ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student
σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=”CS” Student S
5.2
← −
πsname, cname ⊲ ⊳C.cid=E.cid σarea=”DB” Course C σmajor=”CS” ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E Student
σp(R ⊲ ⊳p′ S) ≡ σp(R) ⊲ ⊳p′ S if attributes(p) ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=”CS” Student S
3.1
− →
πsname, cname πsname, cname, cid ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=” Student
πL′(πL(R)) ≡ πL′(R) if L′ ⊆ L ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=”CS” Student S
6.2
← −
πsname, cname πsname, cname, cid ⊲ ⊳C.cid=E.cid σarea=”DB” Course C ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=” Student
πL(R ⊲ ⊳p S) ≡ πLR(R) ⊲ ⊳p πLS(S)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=”CS” Student S
3.1
− →
πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid πsname, cid, sid ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=” Student
πL′(πL(R)) ≡ πL′(R) if L′ ⊆ L ⊆ attributes(R)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid ⊲ ⊳E.sid=S.sid πcid, sid σgrade=”F” Enrol E πsname, sid σmajor=”CS” Student S
6.2
← −
πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid πsname, cid, sid ⊲ ⊳E.sid=S.sid σgrade=”F” Enrol E σmajor=” Student
πL(R ⊲ ⊳p S) ≡ πLR(R) ⊲ ⊳p πLS(S)
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 14
Example (cont.)
πsname, cname σ(major=”CS”)∧(area=”DB”)∧(grade=”F”) ⊲ ⊳C.cid=E.cid Course C ⊲ ⊳E.sid=S.sid Enrol E Student S πsname, cname ⊲ ⊳C.cid=E.cid πcname, cid σarea=”DB” Course C πsname, cid ⊲ ⊳E.sid=S.sid πcid, sid σgrade=”F” Enrol E πsname, sid σmajor=”CS” Student S
Original Query Final Query
CS4224/CS5424: Sem 1, 2019/20 Query Decomposition: Restructuring 15
Query Localization
- Rewrites distributed query into a fragment query
- Uses data distribution information to determine
which fragments are involved
⊲ ⊳R.a=S.a σb=20 R S
− →
∪ ⊲ ⊳ S1 σb=20 R1 ⊲ ⊳ S2 σb=20 R2
Distributed Query Fragment Query
CS4224/CS5424: Sem 1, 2019/20 Query Localization 16
Localization Program
- A localization program for a fragmented relation
R is a reconstruction rule for R in terms of its fragments
- Example:
◮ Let {R1, · · · , Rn} be a complete & disjoint partitioning of R ◮ If each Ri = σFi(R), then localization program for R is
R1 ∪ · · · ∪ Rn
◮ If each Ri = πLi(R) & key(R) ∈ Li, then localization program for
R is R1 ⊲ ⊳ · · · ⊲ ⊳ Rn
CS4224/CS5424: Sem 1, 2019/20 Query Localization 17
Localized Query
- Localized query = query with each fragmented
relation replaced by its localization program
R1 = σa≤10(R), R2 = σa>10(R) S1 = σa≤10(S), S2 = σa>10(S) ⊲ ⊳R.a=S.a σb=20 R S
− →
⊲ ⊳R.a=S.a σb=20 ∪ R1 R2 ∪ S1 S2
Distributed Query Localized Query
CS4224/CS5424: Sem 1, 2019/20 Query Localization 18
Reduction Techniques
- Reduction techniques = rewriting techniques to
simplify localized queries
- Identify & eliminate query expressions on
fragments that do not contribute to query results
- Techniques:
◮ Reduction for primary horizontal fragmentation ⋆ Reduction with selection ⋆ Reduction with join ◮ Reduction for derived horizontal fragmentation ◮ Reduction for vertical fragmentation ◮ Reduction for hybrid fragmentation CS4224/CS5424: Sem 1, 2019/20 Query Localization 19
Reduction with Selection
Rule 1: σp(Ri) = ∅ if Ri = σFi(R) and Fi ∧ p = false
R = R1 ∪ R2 ∪ R3 R1 = σa<10(R) R2 = σa∈[10,70](R) R3 = σa>70(R) Q1 = σa=12(R) = σa=12(R1 ∪ R2 ∪ R3) = σa=12(R1) ∪ σa=12(R2) ∪ σa=12(R3) = σa=12(R2)
CS4224/CS5424: Sem 1, 2019/20 Query Localization 20
Reduction with Join
Rule 2: Ri ⊲ ⊳a Sj = ∅ if Ri = σFa∧F(R), Sj = σF ′
a∧F ′(S),
Fa&F ′
a are predicates on attribute a, and
Fa ∧ F ′
a = false
R = R1 ∪ R2 ∪ R3 R1 = σa<10(R) R2 = σa∈[10,70](R) R3 = σa>70(R) S = S1 ∪ S2 S1 = σa<10(S) S2 = σa≥10(S)
Q = R ⊲ ⊳a S = (R1 ∪ R2 ∪ R3) ⊲ ⊳a (S1 ∪ S2) = (R1 ⊲ ⊳a S1) ∪ (R1 ⊲ ⊳a S2) ∪ (R2 ⊲ ⊳a S1) ∪ (R2 ⊲ ⊳a S2) ∪ (R3 ⊲ ⊳a S1) ∪ (R3 ⊲ ⊳a S2) = (R1 ⊲ ⊳a S1) ∪ (R2 ⊲ ⊳a S2) ∪ (R3 ⊲ ⊳a S2)
CS4224/CS5424: Sem 1, 2019/20 Query Localization 21
Reduction for Derived Fragmentation
Rule 3: Si ⊲ ⊳a Rj = ∅ if Si = S ⋉a Ri is a derived horizontal fragmentation of S wrt R, and i = j Consider R(a, b, c), S(x, y, a) where S.a is a foreign key of R
- R1 = σb=10(R),
R2 = σb=10(R)
- S1 = S ⋉a R1,
S2 = S ⋉a R2 Q = σb=20(R) ⊲ ⊳a S = σb=20(R1 ∪ R2) ⊲ ⊳a (S1 ∪ S2) = σb=20(R2) ⊲ ⊳a (S1 ∪ S2) = (σb=20(R2) ⊲ ⊳a S1) ∪ (σb=20(R2) ⊲ ⊳a S2) = σb=20(R2) ⊲ ⊳a S2
CS4224/CS5424: Sem 1, 2019/20 Query Localization 22
Reduction for Vertical Fragmentation
Rule 4: πL(R1 ⊲ ⊳ R2 ⊲ ⊳ · · · ⊲ ⊳ Rn) = πL(R2 ⊲ ⊳ · · · ⊲ ⊳ Rn) if R1, · · · , Rn are vertical fragments of R and (attributes(R1) − key(R)) ∩ L = ∅ Consider R(a,b,c) where R = R1 ⊲ ⊳a R2 R1 = πa,b(R) R2 = πa,c(R) Q = πc(R) = πc(R1 ⊲ ⊳a R2) = πc(R2)
CS4224/CS5424: Sem 1, 2019/20 Query Localization 23
Reduction for Hybrid Fragmentation
R = (R1 ∪ R2) ⊲ ⊳a R3 R1 = πa,b(σa<10(R)) R2 = πa,b(σa≥10(R)) R3 = πa,c(R) Q = πb(σa=20(R)) = πb(σa=20((R1 ∪ R2) ⊲ ⊳a R3) = πb(σa=20(R1 ∪ R2)) = πb(σa=20(R1) ∪ σa=20(R2)) = πb(σa=20(R2))
CS4224/CS5424: Sem 1, 2019/20 Query Localization 24
Distributed Join Strategies for R ⊲ ⊳A S
- Assume that both R & S have been partitioned
- ver all the nodes
- Collocated/Local join Both R and S have been
partitioned on join key.
- Directed join Only table R has been partitioned
- n join key. Dynamically repartition S on join
key.
- Broadcast join Replicate table S to all nodes.
- Repartitioned join Neither of the tables has been
partitioned on join key. Repartition both tables
- ver all the nodes using the join key as
partitioning key.
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 25
Join Strategies: Example
- Customers (cust#, cname, city)
- Orders (order#, cust#, odate)
- Suppliers (supp#, sname, city)
Customers cust# cname city 1 Alice Singapore 2 Bob Penang 3 Carol Bangkok 4 Dave Singapore 5 Eve Singapore 6 Fred Penang 7 George Bangkok Orders
- rder#
cust#
- date
302 1 June 2013 304 2 May 2013 307 3 Nov 2013 308 1 April 2013 309 5 May 2013 311 6 Dec 2013 312 3 July 2013 Suppliers supp# sname city 32 A Bangkok 33 B Singapore 34 C Singapore 36 D Bangkok 37 E Penang 38 F Penang 39 G Singapore CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 26
Collocated Join: Example
- Customers hash partitioned on cust# using hcust
- Orders hash partitioned on cust# using hcust
- hcust(c) = (c mod 3) + 1
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Orders1
- rder#
cust#
- date
307 3 Nov 2013 311 6 Dec 2013 312 3 July 2013 Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Orders2
- rder#
cust#
- date
302 1 June 2013 308 1 April 2013 Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Orders3
- rder#
cust#
- date
304 2 May 2013 309 5 May 2013 CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 27
Collocated Join: Example (cont.)
- Query 1: Customers ⊲
⊳cust# Orders
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Orders1
- rder#
cust#
- date
307 3 Nov 2013 311 6 Dec 2013 312 3 July 2013 Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Orders2
- rder#
cust#
- date
302 1 June 2013 308 1 April 2013 Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Orders3
- rder#
cust#
- date
304 2 May 2013 309 5 May 2013
3
- i=1
(Customersi ⊲ ⊳ Ordersi)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 28
Directed Join: Example
- Customers hash partitioned on cust# using hcust
- Orders hash partitioned on order# using horder
- hcust(c) = (c mod 3) + 1
- horder(o) = (o mod 3) + 1
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Orders1
- rder#
cust#
- date
309 5 May 2013 312 3 July 2013 Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Orders2
- rder#
cust#
- date
304 2 May 2013 307 3 Nov 2013 Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Orders3
- rder#
cust#
- date
302 1 June 2013 308 1 April 2013 311 6 Dec 2013 CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 29
Directed Join: Example (cont.)
- Query 1: Customers ⊲
⊳cust# Orders
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Orders1
- rder#
cust#
- date
309 5 May 2013 312 3 July 2013 Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Orders2
- rder#
cust#
- date
304 2 May 2013 307 3 Nov 2013 Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Orders3
- rder#
cust#
- date
302 1 June 2013 308 1 April 2013 311 6 Dec 2013 CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 30
Directed Join: Example (cont.)
- Query 1: Customers ⊲
⊳cust# Orders
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Orders1
- rder#
cust#
- date
309 5 May 2013 312 3 July 2013 Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Orders2
- rder#
cust#
- date
304 2 May 2013 307 3 Nov 2013 Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Orders3
- rder#
cust#
- date
302 1 June 2013 308 1 April 2013 311 6 Dec 2013
Repartition Orders on cust# : Orders′
i = σhcust(cust#)=i(Orders)
3
- i=1
(Customersi ⊲ ⊳ Orders′
i )
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 31
Broadcast Join: Example
- Customers hash partitioned on cust# using hcust
- Suppliers hash partitioned on supp# using hsupp
- hcust(c) = (c mod 3) + 1
- hsupp(s) = (s mod 3) + 1
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 32
Broadcast Join: Example (cont.)
- Query 2: Customers ⊲
⊳city Suppliers
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 33
Broadcast Join: Example (cont.)
Query 2: Customers ⊲ ⊳city Suppliers
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang
Option 1: Broadcast Suppliers
3
- i=1
(Customersi ⊲ ⊳ Suppliers)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 34
Broadcast Join: Example (cont.)
- Query 2: Customers ⊲
⊳city Suppliers
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang
Option 2: Broadcast Customers
3
- i=1
(Customers ⊲ ⊳ Suppliersi)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 35
Repartitioned Join: Example
- Customers hash partitioned on cust# using hcust
- Suppliers hash partitioned on supp# using hsupp
- hcust(c) = (c mod 3) + 1
- hsupp(s) = (s mod 3) + 1
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 36
Repartitioned Join: Example (cont.)
- Query 2: Customers ⊲
⊳city Suppliers
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 37
Repartitioned Join: Example (cont.)
- Query 2: Customers ⊲
⊳city Suppliers
- Repartition both tables using hcity
c hcity(c) Singapore 1 Penang 2 Bangkok 3
- Customers′
i = σhcity(city)=i(Customers)
- Suppliers′
i = σhcity(city)=i(Suppliers)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 38
Repartitioned Join: Example (cont.)
Query 2: Customers ⊲ ⊳city Suppliers
Site 1 Customers1 cust# cname city 3 Carol Bangkok 6 Fred Penang Suppliers1 supp# sname city 33 B Singapore 36 D Bangkok 39 G Singapore Site 2 Customers2 cust# cname city 1 Alice Singapore 4 Dave Singapore 7 George Bangkok Suppliers2 supp# sname city 34 C Singapore 37 E Penang Site 3 Customers3 cust# cname city 2 Bob Penang 5 Eve Singapore Suppliers3 supp# sname city 32 A Bangkok 38 F Penang
3
- i=1
(Customers′
i ⊲
⊳ Suppliers′
i )
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 39
Comparison of join strategies for R ⊲ ⊳ S
Let the relations be partitioned over n nodes Join Strategy Communication Cost Collocated Directed size(R) if R is being repartitioned Broadcast (n − 1) × size(R) if R is being broadcast Repartitioned size(R) + size(S)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies 40
Query Processing in Google’s F1
- Google’s early NewSQL system
◮ Distributed relational database system ◮ Hybrid of NoSQL & RDBMS ◮ NoSQL: High availability, scalability ◮ RDBMS: Consistency, usability
- Used by Google’s AdWords system since 2012
◮ 100s of applications & 1000s of users ◮ Database is over 100 TB, 105 requests/sec CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies: Google’s F1 41
Query Processing in Google’s F1
SELECT agc.CampaignId, ac.Region, c.Language, SUM(ac.Clicks) FROM AdClick ac JOIN AdGroupCreative agc USING (AdGroupId, CreativeId) JOIN Creative c USING (CustomerId, CreativeId) WHERE c.Date = ’2013-02-23’ GROUP BY agc.CampaignId, ac.Region, c.Language
Creative (CreativeId, CustomerId, Language, ...) AdgroupCreative (AdGroupId, CreativeId, CampaignId, CustomerId, ...) Adclick (AdGroupId, CreativeId, Region, Date, Clicks, ...)
(Shute, et al., 2013)
CS4224/CS5424: Sem 1, 2019/20 Distributed Join Strategies: Google’s F1 42
References
- T. Özsu & P
. Valdureiz, Overview of Query Processing, Chapter 6, Principles of Distributed Database Systems, 3rd Edition, 2011
- T. Özsu & P
. Valdureiz, Query Decomposition and Data Localization, Chapter 7, Principles of Distributed Database Systems, 3rd Edition, 2011
- C. Baru, et al. DB2 Parallel Edition, IBM Systems Journal,
34(2), 1995.
- Google goes back to the future with SQL F1 database,
The Register, August 2013, https://www.theregister.co.uk/2013/08/30/google_f1_deepdive/
- F1: A Distributed SQL Database That Scales, VLDB
2013, https://research.google.com/pubs/pub41344.html
CS4224/CS5424: Sem 1, 2019/20 Distributed Query Processing 43