Research Collection https://doi.org/10.3929/ethz-b-000294819 - PDF document

Research Collection https://doi.org/10.3929/ethz-b-000294819 information please consult the Terms of use. This page was generated automatically upon download from the ETH Zurich Research Collection. For more Creative Commons Attribution 4.0 International Rights / License: http://doi.org/10.1145/3267809.3267822 Originally published in: Permanent Link: Conference Paper 2018-10 Publication Date: Marroquín, Renato; Müller, Ingo; Makreshanski, Darko; Alonso, Gustavo Author(s): Shared Query Execution Pay One, Get Hundreds for Free: Reducing Cloud Costs through ETH Library

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution Renato Marroquín , Ingo Müller, Darko Makreshanski, Gustavo Alonso | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 1

Motivation: Query-as-a-service  Pros  Disadvantages  No need to move data outside  Expensive with frequent usage the cloud  No intuitive way to optimize  No infrastructure deployment  Cost  No database maintenance  Throughput  Pricing model  Pay-per-byte-processed  Popular Systems | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 2

Motivation: Use cases Parameter exploration BI reporting Ad-hoc analytics  Observations:  High overlap among concurrent queries  Query burst from single user  Similar workload from multiple users Work sharing: carry out redundant work only once. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 3

Our Goal Reduce query-as-a-service costs using work sharing techniques through query rewriting | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 4

Outline  Related work  Multi-query execution  Evaluation  Summary  Conclusions | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 5

Multi Query Execution: General sharing workflow Our Focus Collect Co-plan Separate Co-execute | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 6

Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing 1 st generation: Crescando 2 nd generation: SharedDB, MQJoin, BatchDB Name Age Sex query_id  Annotate tuples with Renato 30 m Q1,Q3 query_id attribute Darko 30 m Q1,Q2,Q3  Supports all relational Ingo 32 m Q1 operators | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 7

Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing  Previous approaches need dedicated engine  High implementation effort  Needs vendor support This work: rewrite shared query plans as SQL. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 8

Outline  Related work  Multi-query execution | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 9

Multi Query Execution: Data-query model Name Age Sex SELECT * FROM Emp WHERE … Renato 30 m Darko 30 m Q1: … age < 40 Q2: … name LIKE ‘%k%’ Ingo 32 m Q3: … age <= 30 AND age > 20 Gustavo 50 m Name Age Sex query_id Renato 30 m { 1, 3 } Darko 30 m { 1, 2, 3 } Ingo 32 m { 1 } Gustavo 50 m { } | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 10

Multi Query Execution: The query_id attribute Name: Age: Sex: query_id: VARCHAR INT ENUM INTEGER ARRAY Renato 30 m { 1, 3 } Darko 30 m { 1, 2, 3 } Ingo 32 m { 1 }  Result of Qx is given by SELECT * FROM R WHERE x = ANY(query_id)  This is standard SQL! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 11

Shared operators: Shared scan Name Age Sex SELECT * FROM Emp WHERE … Renato 30 m Q1: … age < 40 Darko 30 m Q2: … name LIKE ‘%k%’ Q3: … age <= 30 AND age > 20 Ingo 32 m Gustavo 50 m SELECT *, ARRAY_REMOVE( ARRAY[ CASE WHEN age < 40 THEN 1 ELSE 0 END, CASE WHEN name LIKE ‘%k%’ THEN 2 ELSE 0 END, CASE WHEN age <= 30 AND age > 20 THEN 3 ELSE 0 END ] , 0) AS query_id FROM Emp WHERE (age < 40) OR (name LIKE ‘%k%’) OR (age <= 30 AND age > 20) | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 12

Shared operators: Shared join Departments Employees Name Age Sex D_ID query_id D_ID D_Name query_id Renato 30 m 1 { 1, 3 } 1 Systems { 2, 3 } Darko 30 m 1 { 1, 2, 3 } 2 Algorithms { 2 } Ingo 32 m 1 { 1 } 3 ML { 1, 2, 3 } SELECT * FROM Emp e JOIN Dep d ON e.did = d.did WHERE … Name Age Sex D_ID D_Name query_id Renato 30 m 1 Systems { 3 } Darko 30 m 1 Systems { 2, 3 } | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 13

Shared operators: Shared join Departments Employees Name Age Sex D_ID query_id D_ID D_Name query_id Renato 30 m 1 { 1, 3 } 1 Systems { 2, 3 } Darko 30 m 1 { 1, 2, 3 } 2 Algorithms { 2 } Ingo 32 m 1 { 1 } 3 ML { 1, 2, 3 } WITH shared_emp AS (…), -- shared scan on emp shared_dep AS (…), -- shared scan on dep shared_join_helper AS ( SELECT R.name, R.age, R.sex, S.d_id, S.d_name, ARRAY_INTERSECT(R.query_id, S.query_id) AS query_id FROM shared_emp e JOIN shared_dep d ON e.d_id = d.d_id) SELECT * FROM shared_join_helper WHERE CARDINALITY(query_id) > 0 | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 14

Shared operators: Other shared operators  GROUP BY  Use UNNEST to duplicate tuples for each query  Then group by query_id  LIMIT / TOP K  PARTITION BY query_id  Then filter by RANK() Enough to express all TPC-H queries! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 15

Evaluation: Query-as-a-Service systems  Systems-under-test  Amazon Athena  Google BigQuery  Run SQL queries against files in cloud storage  Apache Parquet  Google internal columnar storage  Micro-benchmarks  End-to-end query execution | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 17

Evaluation: Shared scan  Data  Lineitem from TPC-H Scale Factor 100  Workload  Parameterized queries  Queries in a group/batch have equal selectivity | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 18

Evaluation: Shared scan Queries Selectivity: Google BigQuery Execution time increases sublinearly with query count. 84GB table, native columnar format | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 19

Evaluation: Shared scan Queries Selectivity: Amazon Athena 27GB, Parquet format  Pay per processed byte Monetary cost (almost) independent of batch size. Google Big Query: constant 0.011$ | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 20

Evaluation: TPC-H  Data  TPC-H Scale Factor 100  Simulate multiple apps interacting with QaaS  Each emitting different TPC-H queries  Workload  128 instantiations of each query type | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 21

Evaluation: TPC-H Throughput improvement over Query-at-a-Time (batch size = 128) Athena BigQuery 100 Throughput Improvement 10 1 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 TPC-H Query Considerable speedup: up to ~128x. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 22

Evaluation: TPC-H Cost compared to Query-at-a-Time (batch size = 128) Query-at-a-time SharedExecution 1000.00 100.00 Cost [$] 16x 10.00 128x 1.00 107x $80 $240 ~$14 <$1 $227 ~$1.7 0.10 Athena BigQuery (all) BigQuery(without Q10) Again: cost (almost) independent of batch size. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 23

Summary  Shared query execution through query rewriting  Rewriting can be done with standard SQL  Evaluate our approach by executing queries end-to-end  Show improvements in execution cost and throughput Conclusion  Sharing can be implemented in:  Client  Library  Middleware | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 25

Thanks! | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 26

Backup Slides | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 27

Evaluation: TPC-H Relative cost compared to Query-at-a-Time (batch size = 128) 100 Cost Improvement 10 16x Query-at-a-time SharedExecution 107x 1 128x $80 $240 ~$14 <$1 $227 ~$1.7 0.1 Athena BigQuery (all) BigQuery (without Q10) Again: cost (almost) independent of batch size. | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 28

Shared execution techniques 2010 – 2017 1988 – 2005 2005 – 2010 Scan Work MQO Sharing Sharing row value First generation: Crescando  Index predicates to reduce work idx  Lower complexity of work  Only scans … q1 q2 q3 | | Marroquín, Müller, Makreshanski, Alonso 13.10.2018 29

Shared plans Sal Emp Dep Emp Dep Emp Dep 𝜏 𝜏 sex = “m” age > 30 𝜏 ⨝ ⨝ d_id d_id Q1 Q2 ⨝ ⨝ Sal Sal Emp Emp e_id d_id 𝜏 𝜏 sex = “f” age < 50 ⨝ ⨝ e_id e_id Q3,Q4 Q1,Q2 Q3 Q4 | | Marroquín, Müller, Makreshanski, Alonso 30.05.2018 30

Research Collection https://doi.org/10.3929/ethz-b-000294819 - PDF document

Research Collection https://doi.org/10.3929/ethz-b-000294819 information please consult the Terms of use. This page was generated automatically upon download from the ETH Zurich Research Collection. For more Creative Commons Attribution 4.0

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

New Collection Spring 2015 Spring Collection 2015 Categories Outdo tdoor or To Go Icons s

UWA Publications Collection 2013 Overview of the collection process Using Minerva Research

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Uniprocessor Garbage Collection Techniques Presented by: Shiri Dori Shai Erera Outline

Levy & Collection & Exemption Levy & Collection & Exemption from tax from tax

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

Collection House Update Company Presentation June 2020 Disclaime laimer Collection House The

Amendment One Review Bulky Item Collection, Abandoned Waste Collection and Municipal Regional

Data Collection International Labour Office Department of Statistics Data Collection data

Japanese waste paper trend Japanese waste paper trend High collection & reuse High

What is the Sanborn Map Collection? How can you access this collection? 1 2 3 What can you

Detroit Water and Sewerage Department Department Sewer Collection System Sewer Collection System

BATTERY COLLECTION AND OUTREACH PLAN Modified Household Battery Collection/Handling Procedures

Project Management in the New Economy Technische Universitt Mnchen 21. Dezember 2000

and Income Volatility Prosper Canada Policy Research Symposium March 9, 2018 Ray Boshara

Challenges and opportunities for the Brazilian PPP Programme GRILO, Leonardo (1); HARDCASTLE, Cliff

PREPARING FOR THE NEXT RECESSION Mic ichael A. . Paga gano Dir irector, Government Fin

Quantum Monte Ca Carlo calculations s of neutron ma matter er wi with Ch Chiral E Effective

danrl ingoa @ danrl_com @ingoa Dan Ldtke is the Technical Lead of SRE at Ingo Averdunk is a

Use of Ontologies to Support Design Activities Dr. Oleg Lukibanov DaimlerChrysler AG Vehicle

Gallai-Ramsey Number of Graphs Yaping Mao School of Mathematics and Statistics Qinghai Normal

Research Collection https://doi.org/10.3929/ethz-b-000294819 - PDF document

Research Collection https://doi.org/10.3929/ethz-b-000294819 information please consult the Terms of use. This page was generated automatically upon download from the ETH Zurich Research Collection. For more Creative Commons Attribution 4.0

Sunglasses SM001 Collection SM005 Collection YPC001 Collection(swimming goggles) SR001

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

Conference + Meeting Spaces Salt + Pepper TONON COLLECTION Macs Table TONON COLLECTION Pit

New Collection Spring 2015 Spring Collection 2015 Categories Outdo tdoor or To Go Icons s

UWA Publications Collection 2013 Overview of the collection process Using Minerva Research

Data Collection and HIVe Current Data Collection For those collecting data, you are use to

Uniprocessor Garbage Collection Techniques Presented by: Shiri Dori Shai Erera Outline

Levy &amp; Collection &amp; Exemption Levy &amp; Collection &amp; Exemption from tax from tax

Digital Tachograph Data Collection &amp; Analysis System 1 Outline Data Collection

Collection House Update Company Presentation June 2020 Disclaime laimer Collection House The

Amendment One Review Bulky Item Collection, Abandoned Waste Collection and Municipal Regional

Data Collection International Labour Office Department of Statistics Data Collection data

Japanese waste paper trend Japanese waste paper trend High collection &amp; reuse High

What is the Sanborn Map Collection? How can you access this collection? 1 2 3 What can you

Detroit Water and Sewerage Department Department Sewer Collection System Sewer Collection System

BATTERY COLLECTION AND OUTREACH PLAN Modified Household Battery Collection/Handling Procedures

Project Management in the New Economy Technische Universitt Mnchen 21. Dezember 2000

and Income Volatility Prosper Canada Policy Research Symposium March 9, 2018 Ray Boshara

Challenges and opportunities for the Brazilian PPP Programme GRILO, Leonardo (1); HARDCASTLE, Cliff

PREPARING FOR THE NEXT RECESSION Mic ichael A. . Paga gano Dir irector, Government Fin

Quantum Monte Ca Carlo calculations s of neutron ma matter er wi with Ch Chiral E Effective

danrl ingoa @ danrl_com @ingoa Dan Ldtke is the Technical Lead of SRE at Ingo Averdunk is a

Use of Ontologies to Support Design Activities Dr. Oleg Lukibanov DaimlerChrysler AG Vehicle

Gallai-Ramsey Number of Graphs Yaping Mao School of Mathematics and Statistics Qinghai Normal

Levy & Collection & Exemption Levy & Collection & Exemption from tax from tax

Digital Tachograph Data Collection & Analysis System 1 Outline Data Collection

Japanese waste paper trend Japanese waste paper trend High collection & reuse High