peregrine workload optimization for cloud query engines
play

Peregrine: workload optimization for cloud query engines Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan DBA Workload Engine On-Premise DBA On-Premise DBA Need to reach by 10, On-Premise can


  1. Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan

  2. DBA Workload Engine

  3. On-Premise DBA

  4. On-Premise DBA

  5. Need to reach by 10, On-Premise can we drive faster? Sure! DBA

  6. Cloud Query Engines • Setup, installation, maintenance taken care of • On-demand provisioning, pay as you go

  7. Cloud Query Engines .. ahhh! Need to reach by 10, can we drive faster? Sorry, we don’t have a DBA Reality Check for customers: Reality Check for providers: • Lots of services to choose from (even within Azure, GCP, AWS) • System developers == virtual DBAs! • Lot of knobs to tune for good perf and low cost • Too many cloud users, compared to system developers • Lack of control; and lack of expertise • Too many support requests; often redundant • And, the DBA is gone ! • Less time for feature development

  8. Cosmos: big data infra at Microsoft • 100s of thousands of machines • Exabytes of data at rest; Petabytes ingress/egress daily • 500k+ batch jobs / day • 3B+ tasks executed / day • 10s of millions interactive queries / day • 10s of thousands of SCOPE developers • 1000s of teams

  9. The missing DBA and the growing pain in Cosmos • Large number of knobs/hints at script, data, plan level • Only few expert users • Rest need guidance • Survey: better tooling for improving SCOPE queries • Support challenge • 10s of thousands incidents / years • 10 incidents per system developer on call • 100x users compared to system developers • ~10% growth in SCOPE workload in 2019

  10. The cloud pain Pain Developers Developers Database Vendor Workload ..… DS1 DS2 DS3 DSn Customer n Customer 1 Customer 2 Data Services Pain DB DB DB ..….. Workload Workload Workload DBA DBA DBA Pain Users Users Users Users

  11. The cloud opportunity Massive cloud workloads Workload Workload Workload Workload Fragmented on-premise workloads

  12. The Cosmos opportunity Massive cloud workloads Job metadata name, user, account, submit/start/end times Workload Query plans logical, physical, stage graph, estimates Several TBs of Runtime statistics metadata / day Operator-wise observables Task level logs start/end events Machine counters CPU, IO, etc.

  13. The case for a workload optimization platform • DBA-as-a-Service • Another service in the cloud (easier integration) • Based on cloud workloads at hand (instance optimization) • Engine agnostic • Not specific to different query engines, e.g., SCOPE, Spark, SQL DW, or etc. • E.g., view selection is still the same problem • Global optimizations • Cloud workloads are organized into data pipelines • People often care about end-to-end aggregate costs in the cloud

  14. St Step 1: w 1: work orkloa oad r representation on Instrument, log, and collect workload characteristics

  15. Engine-agnostic workload representation Signatures Anonymized Logical plan Physical plan Stage graph Tasks Log + metrics Log + metrics Log + metrics Log + metrics Denormalized view (Workload IR)

  16. Step 2: optimize for patterns

  17. Typical workload patterns • Consider a simplified 2D space of data and queries Data Data Data Data Queries Queries Queries Recurring Similarity Dependency Query templates appear Queries over same Queries depend on datasets over newer datasets datasets have similarities produced by previous queries

  18. Recurring pattern • Majority of production workloads • There is a regular ETL needed before other things can happen • Opportunity to learn from the past • Examples ideal • Learned cardinality* • Learned cost models • Learned resources • Learned etc. * Towards a Learning Optimizer for Shared Clouds . Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. VLDB 2019 .

  19. Similarity pattern • Very typical in multi-user shared cloud environments • Cosmos, HDI, Ant Financial, ML workflows, etc. • Opportunity for multi-query optimization • Examples 100 Overlapping jobs emerging as a Users with overlapping jobs • CloudViews* 80 Overlapping subgraphs onment or Percentage manage • Checkpointing 60 they pay 40 • Caching ever, the and teams 20 • Etc. ., parts of 0 generating clus er1 clus er2 clus er3 clus er4 clus er5 computation reuse * Computation Reuse in Analytics Job Service at Microsoft . Alekh Jindal, Shi Qiao, Hiren Patel, Jarod Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao. SIGMOD 2018 . * Selecting Subexpressions to Materialize at Datacenter Scale . Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel. VLDB 2018 .

  20. Dependency pattern • Queries are typically organized in pipelines • Smaller steps that are easier to build and maintain • Dependency driven optimizations/analytics* • Relative importance of jobs for scheduling • Physical design tuning • Etc. * Dependency-driven analytics: A compass for uncharted data oceans . R. Mavlyutov, C. Curino, B. Asipov, and P. Cudré-Mauroux. CIDR 2017.

  21. Step 3: feeding it back • Actions • Insights • Recommendations • Self-tuning

  22. Self-tuning Query Engine Rules Feedback Lookup & Action Configs Query Compiler Optimizer Scheduler Runtime Result Feedback Service Workload Query Workload Representation Optimization Annotations Annotation: signature --> actions

  23. Illustration: Scope and Spark query engines Optimizer Rule1: Online materialize SCOPE Optimizer Rule2: Computation Reuse Compiler flags Query Engine SCOPE Modifications to compiler/optimizer Extensions Pluggable extensions from outside Jar Query Compiler Optimizer Scheduler Runtime Result Recurring Signature Strict Signature Subexpressions View Feedback Workload Repository SCOPE Common Selection Service Selected Views Connectors Parsers Learn Enumerators Cardinality Query Subexpressions IR Cardinality Models

  24. The third axis: people • Easier for people to play with the query workloads • Abstracts many of the painful steps • Allows people to build on top of each other • Focus more on the workload optimizations • Enabled several • Researchers • Developers • Interns

  25. Workload-aware ..… Hive Spark SCOPE Query Engines Summary ..… Ingest Metadata Plans Statistics Signatures Representation Parse Query Plan Workload • Gray Systems Labs (GSL) Feature Store Instrumentation Enumerate https://azuredata.microsoft.com/labs/gsl Workload Intermediate Representation (IR) Patterns ..… Sharing Recurring Coordinating Optimization Workload Mathematical Solvers Machine Learning Graph Analytics Learned optimizations, Dependency-driven optimizations, Multi-query Optimization, e.g., Learned Cardinality e.g., physical design for pipeline e.g., CloudViews • GSL@SoCC: 4 papers, 1 poster Feedback • We are hiring! Insights Recommendations Self-tuning Workload Feedback Query Annotations Dashboard Alerts Users Feedback Service

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend