Peregrine: workload optimization for cloud query engines Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan

DBA Workload Engine

On-Premise DBA

Need to reach by 10, On-Premise can we drive faster? Sure! DBA

Cloud Query Engines • Setup, installation, maintenance taken care of • On-demand provisioning, pay as you go

Cloud Query Engines .. ahhh! Need to reach by 10, can we drive faster? Sorry, we don’t have a DBA Reality Check for customers: Reality Check for providers: • Lots of services to choose from (even within Azure, GCP, AWS) • System developers == virtual DBAs! • Lot of knobs to tune for good perf and low cost • Too many cloud users, compared to system developers • Lack of control; and lack of expertise • Too many support requests; often redundant • And, the DBA is gone ! • Less time for feature development

Cosmos: big data infra at Microsoft • 100s of thousands of machines • Exabytes of data at rest; Petabytes ingress/egress daily • 500k+ batch jobs / day • 3B+ tasks executed / day • 10s of millions interactive queries / day • 10s of thousands of SCOPE developers • 1000s of teams

The missing DBA and the growing pain in Cosmos • Large number of knobs/hints at script, data, plan level • Only few expert users • Rest need guidance • Survey: better tooling for improving SCOPE queries • Support challenge • 10s of thousands incidents / years • 10 incidents per system developer on call • 100x users compared to system developers • ~10% growth in SCOPE workload in 2019

The cloud pain Pain Developers Developers Database Vendor Workload ..… DS1 DS2 DS3 DSn Customer n Customer 1 Customer 2 Data Services Pain DB DB DB ..….. Workload Workload Workload DBA DBA DBA Pain Users Users Users Users

The cloud opportunity Massive cloud workloads Workload Workload Workload Workload Fragmented on-premise workloads

The Cosmos opportunity Massive cloud workloads Job metadata name, user, account, submit/start/end times Workload Query plans logical, physical, stage graph, estimates Several TBs of Runtime statistics metadata / day Operator-wise observables Task level logs start/end events Machine counters CPU, IO, etc.

The case for a workload optimization platform • DBA-as-a-Service • Another service in the cloud (easier integration) • Based on cloud workloads at hand (instance optimization) • Engine agnostic • Not specific to different query engines, e.g., SCOPE, Spark, SQL DW, or etc. • E.g., view selection is still the same problem • Global optimizations • Cloud workloads are organized into data pipelines • People often care about end-to-end aggregate costs in the cloud

St Step 1: w 1: work orkloa oad r representation on Instrument, log, and collect workload characteristics

Engine-agnostic workload representation Signatures Anonymized Logical plan Physical plan Stage graph Tasks Log + metrics Log + metrics Log + metrics Log + metrics Denormalized view (Workload IR)

Step 2: optimize for patterns

Typical workload patterns • Consider a simplified 2D space of data and queries Data Data Data Data Queries Queries Queries Recurring Similarity Dependency Query templates appear Queries over same Queries depend on datasets over newer datasets datasets have similarities produced by previous queries

Recurring pattern • Majority of production workloads • There is a regular ETL needed before other things can happen • Opportunity to learn from the past • Examples ideal • Learned cardinality* • Learned cost models • Learned resources • Learned etc. * Towards a Learning Optimizer for Shared Clouds . Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. VLDB 2019 .

Similarity pattern • Very typical in multi-user shared cloud environments • Cosmos, HDI, Ant Financial, ML workflows, etc. • Opportunity for multi-query optimization • Examples 100 Overlapping jobs emerging as a Users with overlapping jobs • CloudViews* 80 Overlapping subgraphs onment or Percentage manage • Checkpointing 60 they pay 40 • Caching ever, the and teams 20 • Etc. ., parts of 0 generating clus er1 clus er2 clus er3 clus er4 clus er5 computation reuse * Computation Reuse in Analytics Job Service at Microsoft . Alekh Jindal, Shi Qiao, Hiren Patel, Jarod Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao. SIGMOD 2018 . * Selecting Subexpressions to Materialize at Datacenter Scale . Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel. VLDB 2018 .

Dependency pattern • Queries are typically organized in pipelines • Smaller steps that are easier to build and maintain • Dependency driven optimizations/analytics* • Relative importance of jobs for scheduling • Physical design tuning • Etc. * Dependency-driven analytics: A compass for uncharted data oceans . R. Mavlyutov, C. Curino, B. Asipov, and P. Cudré-Mauroux. CIDR 2017.

Step 3: feeding it back • Actions • Insights • Recommendations • Self-tuning

Self-tuning Query Engine Rules Feedback Lookup & Action Configs Query Compiler Optimizer Scheduler Runtime Result Feedback Service Workload Query Workload Representation Optimization Annotations Annotation: signature --> actions

Illustration: Scope and Spark query engines Optimizer Rule1: Online materialize SCOPE Optimizer Rule2: Computation Reuse Compiler flags Query Engine SCOPE Modifications to compiler/optimizer Extensions Pluggable extensions from outside Jar Query Compiler Optimizer Scheduler Runtime Result Recurring Signature Strict Signature Subexpressions View Feedback Workload Repository SCOPE Common Selection Service Selected Views Connectors Parsers Learn Enumerators Cardinality Query Subexpressions IR Cardinality Models

The third axis: people • Easier for people to play with the query workloads • Abstracts many of the painful steps • Allows people to build on top of each other • Focus more on the workload optimizations • Enabled several • Researchers • Developers • Interns

Workload-aware ..… Hive Spark SCOPE Query Engines Summary ..… Ingest Metadata Plans Statistics Signatures Representation Parse Query Plan Workload • Gray Systems Labs (GSL) Feature Store Instrumentation Enumerate https://azuredata.microsoft.com/labs/gsl Workload Intermediate Representation (IR) Patterns ..… Sharing Recurring Coordinating Optimization Workload Mathematical Solvers Machine Learning Graph Analytics Learned optimizations, Dependency-driven optimizations, Multi-query Optimization, e.g., Learned Cardinality e.g., physical design for pipeline e.g., CloudViews • GSL@SoCC: 4 papers, 1 poster Feedback • We are hiring! Insights Recommendations Self-tuning Workload Feedback Query Annotations Dashboard Alerts Users Feedback Service

Peregrine: workload optimization for cloud query engines Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan DBA Workload Engine On-Premise DBA On-Premise DBA Need to reach by 10, On-Premise can

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab *

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

Proudly Partnered With: 1 PEREGRINE LUNAR LANDER Peregrine is a product line that will serve

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Transitioning from Peregrine to Eagle HPC Operations January 2019 NREL | 1 Sections

A Pattern-Aware Graph Mining System Kasra Jamshidi Rakesh Mahadasa Keval Vora Simon Fraser

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

CQRS AT DBA Morten Jokumsen eBay classifieds Morten Jokumsen Software Architect @ eBay

Goals Database Administration All large and small databases need database Database

Crowd View: Converting Investors' Opinions into Indicators Chung-Chi Chen, Hen-Hsen Huang,

PROPOSED PRIVATISATION OF ARA JLIG, Straits Trading and Cheung Kong Property to partner with

Po PostgreSQL tuning fo for Oracle DBAs Ab About me me Her Herv Sc Schweitzer CTO

Automating Schema Migrations with GitHub Actions, skeema & gh-ost Shlomi Noach GitHub

SQL Azure for DBA's Mark S. Rasmussen - iPaper Martin D. Schmidt Miracle A/S Whois Mark

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Peregrine: workload optimization for cloud query engines Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Zhicheng Yin, Rathijit Sen, Subru Krishnan DBA Workload Engine On-Premise DBA On-Premise DBA Need to reach by 10, On-Premise can

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab *

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

Proudly Partnered With: 1 PEREGRINE LUNAR LANDER Peregrine is a product line that will serve

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Transitioning from Peregrine to Eagle HPC Operations January 2019 NREL | 1 Sections

A Pattern-Aware Graph Mining System Kasra Jamshidi Rakesh Mahadasa Keval Vora Simon Fraser

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

CQRS AT DBA Morten Jokumsen eBay classifieds Morten Jokumsen Software Architect @ eBay

Goals Database Administration All large and small databases need database Database

Crowd View: Converting Investors' Opinions into Indicators Chung-Chi Chen, Hen-Hsen Huang,

PROPOSED PRIVATISATION OF ARA JLIG, Straits Trading and Cheung Kong Property to partner with

Po PostgreSQL tuning fo for Oracle DBAs Ab About me me Her Herv Sc Schweitzer CTO

Automating Schema Migrations with GitHub Actions, skeema &amp; gh-ost Shlomi Noach GitHub

SQL Azure for DBA's Mark S. Rasmussen - iPaper Martin D. Schmidt Miracle A/S Whois Mark

Introduction to differential binding Peter Humburg Statistician, Macquarie University DataCamp

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Automating Schema Migrations with GitHub Actions, skeema & gh-ost Shlomi Noach GitHub