peregrine workload optimization for cloud query engines
play

Peregrine: workload optimization for cloud query engines* Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab * Peregrine: Workload Optimization for Cloud Query Engines . Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan. SOCC


  1. Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab * Peregrine: Workload Optimization for Cloud Query Engines . Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan. SOCC 2019 .

  2. DBA Workload Engine

  3. On-Premise DBA

  4. On-Premise DBA

  5. Need to reach by 10, On-Premise can we drive faster? Sure! DBA

  6. Cloud Query Engines • Setup, installation, maintenance taken care of • On-demand provisioning, pay as you go

  7. Cloud Query Engines .. ahhh! Need to reach by 10, can we drive faster? Sorry, we don’t have a DBA Reality Check for customers: Reality Check for providers: • Lots of services to choose from (even within Azure, GCP, AWS) • System developers == virtual DBAs! • Lot of knobs to tune for good perf and low cost • Too many cloud users, compared to system developers • Lack of control; and lack of expertise • Too many support requests; often redundant • And, the DBA is gone ! • Less time for feature development

  8. Cosmos: big data infra at Microsoft • 100s of thousands of machines • Exabytes of data at rest; Petabytes ingress/egress daily • 500k+ batch jobs / day • 3B+ tasks executed / day • 10s of millions interactive queries / day • 10s of thousands of SCOPE developers • 1000s of teams

  9. The missing DBA and the growing pain in Cosmos • Large number of knobs/hints at script, data, plan level • Only few expert users • Rest need guidance • Survey: better tooling for improving SCOPE queries • Support challenge • 10s of thousands incidents / years • 10 incidents per system developer on call • 100x users compared to system developers • ~10% growth in SCOPE workload in 2019

  10. On-premise pain -> Cloud pain Pain Developers Developers Database Vendor Workload ..… DS1 DS2 DS3 DSn Customer n Customer 1 Customer 2 Data Services Pain DB DB DB ..….. Workload Workload Workload DBA DBA DBA Pain Users Users Users Users

  11. The cloud opportunity Massive cloud workloads Workload Workload Workload Workload Fragmented on-premise workloads

  12. The Cosmos opportunity Massive cloud workloads Job metadata name, user, account, submit/start/end times Workload Query plans logical, physical, stage graph, estimates Several TBs of Runtime statistics metadata / day Operator-wise observables Task level logs start/end events Machine counters CPU, IO, etc.

  13. The case for a workload optimization platform • DBA-as-a-Service • Another service in the cloud (easier integration) • Based on cloud workloads at hand (instance optimization) • Engine agnostic • Not specific to different query engines, e.g., SCOPE, Spark, SQL DW, or etc. • E.g., view selection is still the same problem • Global optimizations • Cloud workloads are organized into data pipelines • People often care about end-to-end aggregate costs in the cloud

  14. St Step 1: w 1: work orkloa oad r representation on Instrument, log, and collect workload characteristics

  15. Engine-agnostic workload representation Signatures Anonymized Logical plan Physical plan Stage graph Tasks Log + metrics Log + metrics Log + metrics Log + metrics Denormalized view (Workload IR)

  16. Step 2: optimize for patterns

  17. Typical workload patterns • Consider a simplified 2D space of data and queries Data Data Data Data Queries Queries Queries Recurring Similarity Dependency Query templates appear Queries over same Queries depend on datasets over newer datasets datasets have similarities produced by previous queries

  18. Recurring pattern • Majority of production workloads • There is a regular ETL needed before other things can happen Data • Opportunity to learn from the past • Examples • Learned cardinality Queries • Learned cost models Recurring • Learned resources Query templates appear • Learned etc. over newer datasets

  19. Recap from NWDS’19 SCOPE Cardinality Estimation 𝐹𝑊𝐽𝑀 Ideal Under- Over- estimation estimation ESTIMATION CARDINALITY

  20. Recap from NWDS’19 SCOPE Cardinality Estimation Fraction Subgraph Instances 1 0.9 Ideal 0.8 0.7 Under- Over- 0.6 Neural Network estimation estimation 0.5 Linear Regression 0.4 Poisson Regression 0.3 0.2 0.1 0 10 -6 10 -4 10 -2 10 0 10 2 10 4 10 6 10 8 Estimated/Actual Cardinality Ratio Towards a Learning Optimizer for Shared Clouds . Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. VLDB 2019 .

  21. SCOPE Cost Estimation • Costs models are orders of magnitude off! BAD! ideal ideal ideal 𝐹𝑊𝐽𝑀 Under Estimation Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation ESTIMATION CARDINALITY ESTIMATION d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model COST

  22. SCOPE Cost Estimation • Costs models are orders of magnitude off! Manually tuned cost model Feeding perfect cardinalities ideal ideal • Pervasive use of user defined functions ideal • Complexity of big data systems Under Estimation • Variance in the cloud environments Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model

  23. Why cardinality is not enough? • Incrementally add features • Error drops from 110% to 40% • Additional transformations needed • Hard to come up with such heuristics • Two sets of hash join instances • Different feature weights • Hard to instance optimize manually

  24. Ensemble of Models over Recurring Patterns Operator-subgraph Operator-subgraphApprox Operator-inputs Operator no coverage no coverage no coverage a Coverage Accuracy b featurized approximately fixed fixed c error d e f Op-Subgraph Op-SubgraphApprox Op-Input Operator Combined

  25. SCOPE Cost Estimation • Can learn pretty accurate cost models! ideal ideal ideal Under Estimation Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings . Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao Le. SIGMOD 2020 (to appear) .

  26. Similarity pattern • Very typical in multi-user shared cloud environments • Cosmos, HDI, Ant Financial, ML workflows, etc. Data Data • Opportunity for multi-query optimization • SCOPE compute reuse 100 Queries Overlapping jobs emerging as a Users with overlapping jobs 80 Overlapping subgraphs Similarity onment or Percentage manage 60 Queries over same they pay 40 datasets have similarities ever, the and teams 20 ., parts of 0 generating clus er1 clus er2 clus er3 clus er4 clus er5 Computation Reuse in Analytics Job Service at Microsoft . computation reuse Alekh Jindal, Shi Qiao, Hiren Patel, Jarod Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao. SIGMOD 2018 .

  27. Spark Compute Reuse • Instrument application log • Analyze common subexpressions over Spark SQL plans • Optimizer rules to automatically materialize/reuse in future queries • Almost 30% improvement in total time on TPC-DS SparkCruise: Handsfree Computation Reuse in Spark . Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino. VLDB 2019 (Demo) .

  28. Step 3: feeding it back • Actions • Insights • Recommendations • Self-tuning

  29. Illustration: Scope and Spark query engines Optimizer Rule1: Online materialize SCOPE Optimizer Rule2: Computation Reuse Compiler flags Query Engine SCOPE Modifications to compiler/optimizer Extensions Pluggable extensions from outside Jar Query Compiler Optimizer Scheduler Runtime Result Recurring Signature Strict Signature Subexpressions View Feedback Workload Repository SCOPE Common Selection Service Selected Views Connectors Parsers Learn Enumerators Cardinality Query Subexpressions IR Peregrine Cardinality Models

  30. Peregrine Summary Workload-aware ..… Hive Spark SCOPE Query Engines ..… Ingest • Easier to add newer features Metadata Plans Statistics Signatures Representation Parse Query Plan • Easier to add newer engines Workload Feature Store Instrumentation Enumerate • Easier for people to participate Workload Intermediate Representation (IR) • Researchers, developers, interns • Abstracts the painful steps Patterns • Build on top of each other ..… Sharing Recurring Coordinating Optimization • Focus on workload optimizations Workload Mathematical Solvers Machine Learning Graph Analytics • Gray Systems Lab: aka.ms/gsl Learned optimizations, Dependency-driven optimizations, Multi-query Optimization, e.g., Learned Cardinality e.g., physical design for pipeline e.g., CloudViews Feedback Insights Recommendations Self-tuning Workload Feedback Query Annotations Dashboard Alerts We are hiring! Users Feedback Service

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend