Peregrine: workload optimization for cloud query engines* Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab * Peregrine: Workload Optimization for Cloud Query Engines . Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan. SOCC 2019 .

DBA Workload Engine

On-Premise DBA

Need to reach by 10, On-Premise can we drive faster? Sure! DBA

Cloud Query Engines • Setup, installation, maintenance taken care of • On-demand provisioning, pay as you go

Cloud Query Engines .. ahhh! Need to reach by 10, can we drive faster? Sorry, we don’t have a DBA Reality Check for customers: Reality Check for providers: • Lots of services to choose from (even within Azure, GCP, AWS) • System developers == virtual DBAs! • Lot of knobs to tune for good perf and low cost • Too many cloud users, compared to system developers • Lack of control; and lack of expertise • Too many support requests; often redundant • And, the DBA is gone ! • Less time for feature development

Cosmos: big data infra at Microsoft • 100s of thousands of machines • Exabytes of data at rest; Petabytes ingress/egress daily • 500k+ batch jobs / day • 3B+ tasks executed / day • 10s of millions interactive queries / day • 10s of thousands of SCOPE developers • 1000s of teams

The missing DBA and the growing pain in Cosmos • Large number of knobs/hints at script, data, plan level • Only few expert users • Rest need guidance • Survey: better tooling for improving SCOPE queries • Support challenge • 10s of thousands incidents / years • 10 incidents per system developer on call • 100x users compared to system developers • ~10% growth in SCOPE workload in 2019

On-premise pain -> Cloud pain Pain Developers Developers Database Vendor Workload ..… DS1 DS2 DS3 DSn Customer n Customer 1 Customer 2 Data Services Pain DB DB DB ..….. Workload Workload Workload DBA DBA DBA Pain Users Users Users Users

The cloud opportunity Massive cloud workloads Workload Workload Workload Workload Fragmented on-premise workloads

The Cosmos opportunity Massive cloud workloads Job metadata name, user, account, submit/start/end times Workload Query plans logical, physical, stage graph, estimates Several TBs of Runtime statistics metadata / day Operator-wise observables Task level logs start/end events Machine counters CPU, IO, etc.

The case for a workload optimization platform • DBA-as-a-Service • Another service in the cloud (easier integration) • Based on cloud workloads at hand (instance optimization) • Engine agnostic • Not specific to different query engines, e.g., SCOPE, Spark, SQL DW, or etc. • E.g., view selection is still the same problem • Global optimizations • Cloud workloads are organized into data pipelines • People often care about end-to-end aggregate costs in the cloud

St Step 1: w 1: work orkloa oad r representation on Instrument, log, and collect workload characteristics

Engine-agnostic workload representation Signatures Anonymized Logical plan Physical plan Stage graph Tasks Log + metrics Log + metrics Log + metrics Log + metrics Denormalized view (Workload IR)

Step 2: optimize for patterns

Typical workload patterns • Consider a simplified 2D space of data and queries Data Data Data Data Queries Queries Queries Recurring Similarity Dependency Query templates appear Queries over same Queries depend on datasets over newer datasets datasets have similarities produced by previous queries

Recurring pattern • Majority of production workloads • There is a regular ETL needed before other things can happen Data • Opportunity to learn from the past • Examples • Learned cardinality Queries • Learned cost models Recurring • Learned resources Query templates appear • Learned etc. over newer datasets

Recap from NWDS’19 SCOPE Cardinality Estimation 𝐹𝑊𝐽𝑀 Ideal Under- Over- estimation estimation ESTIMATION CARDINALITY

Recap from NWDS’19 SCOPE Cardinality Estimation Fraction Subgraph Instances 1 0.9 Ideal 0.8 0.7 Under- Over- 0.6 Neural Network estimation estimation 0.5 Linear Regression 0.4 Poisson Regression 0.3 0.2 0.1 0 10 -6 10 -4 10 -2 10 0 10 2 10 4 10 6 10 8 Estimated/Actual Cardinality Ratio Towards a Learning Optimizer for Shared Clouds . Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao. VLDB 2019 .

SCOPE Cost Estimation • Costs models are orders of magnitude off! BAD! ideal ideal ideal 𝐹𝑊𝐽𝑀 Under Estimation Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation ESTIMATION CARDINALITY ESTIMATION d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model COST

SCOPE Cost Estimation • Costs models are orders of magnitude off! Manually tuned cost model Feeding perfect cardinalities ideal ideal • Pervasive use of user defined functions ideal • Complexity of big data systems Under Estimation • Variance in the cloud environments Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model

Why cardinality is not enough? • Incrementally add features • Error drops from 110% to 40% • Additional transformations needed • Hard to come up with such heuristics • Two sets of hash join instances • Different feature weights • Hard to instance optimize manually

Ensemble of Models over Recurring Patterns Operator-subgraph Operator-subgraphApprox Operator-inputs Operator no coverage no coverage no coverage a Coverage Accuracy b featurized approximately fixed fixed c error d e f Op-Subgraph Op-SubgraphApprox Op-Input Operator Combined

SCOPE Cost Estimation • Can learn pretty accurate cost models! ideal ideal ideal Under Estimation Over Estimation Over Estimation Under Estimation Over Estimation Under Estimation d) Cost Models with Cardinality Feedback c) Manually Improved Cost Model Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings . Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao Le. SIGMOD 2020 (to appear) .

Similarity pattern • Very typical in multi-user shared cloud environments • Cosmos, HDI, Ant Financial, ML workflows, etc. Data Data • Opportunity for multi-query optimization • SCOPE compute reuse 100 Queries Overlapping jobs emerging as a Users with overlapping jobs 80 Overlapping subgraphs Similarity onment or Percentage manage 60 Queries over same they pay 40 datasets have similarities ever, the and teams 20 ., parts of 0 generating clus er1 clus er2 clus er3 clus er4 clus er5 Computation Reuse in Analytics Job Service at Microsoft . computation reuse Alekh Jindal, Shi Qiao, Hiren Patel, Jarod Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao. SIGMOD 2018 .

Spark Compute Reuse • Instrument application log • Analyze common subexpressions over Spark SQL plans • Optimizer rules to automatically materialize/reuse in future queries • Almost 30% improvement in total time on TPC-DS SparkCruise: Handsfree Computation Reuse in Spark . Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino. VLDB 2019 (Demo) .

Step 3: feeding it back • Actions • Insights • Recommendations • Self-tuning

Illustration: Scope and Spark query engines Optimizer Rule1: Online materialize SCOPE Optimizer Rule2: Computation Reuse Compiler flags Query Engine SCOPE Modifications to compiler/optimizer Extensions Pluggable extensions from outside Jar Query Compiler Optimizer Scheduler Runtime Result Recurring Signature Strict Signature Subexpressions View Feedback Workload Repository SCOPE Common Selection Service Selected Views Connectors Parsers Learn Enumerators Cardinality Query Subexpressions IR Peregrine Cardinality Models

Peregrine Summary Workload-aware ..… Hive Spark SCOPE Query Engines ..… Ingest • Easier to add newer features Metadata Plans Statistics Signatures Representation Parse Query Plan • Easier to add newer engines Workload Feature Store Instrumentation Enumerate • Easier for people to participate Workload Intermediate Representation (IR) • Researchers, developers, interns • Abstracts the painful steps Patterns • Build on top of each other ..… Sharing Recurring Coordinating Optimization • Focus on workload optimizations Workload Mathematical Solvers Machine Learning Graph Analytics • Gray Systems Lab: aka.ms/gsl Learned optimizations, Dependency-driven optimizations, Multi-query Optimization, e.g., Learned Cardinality e.g., physical design for pipeline e.g., CloudViews Feedback Insights Recommendations Self-tuning Workload Feedback Query Annotations Dashboard Alerts We are hiring! Users Feedback Service

Peregrine: workload optimization for cloud query engines* Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab * Peregrine: Workload Optimization for Cloud Query Engines . Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan. SOCC

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy,

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

Proudly Partnered With: 1 PEREGRINE LUNAR LANDER Peregrine is a product line that will serve

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Transitioning from Peregrine to Eagle HPC Operations January 2019 NREL | 1 Sections

A Pattern-Aware Graph Mining System Kasra Jamshidi Rakesh Mahadasa Keval Vora Simon Fraser

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel

Announcements Assignment #3 is due on Monday INF 111 / CSE 121: Quiz #3 regrades are due

Algebraic structures in exceptional geometry Martin Cederwall Based on: D. Berman, MC, A.

Solving a 6120-bit DLP on a Desktop Computer Faruk G olo glu, Robert Granger , Gary McGuire,

B2.1 Cardinality of Infinite Sets The cardinality of a finite set is the number of elements it

Key Concepts from CS235 This!lecture!summarizes!key!concepts!from! CS235&Formal&

-locales and Booleanization in Formal Topology Francesco Ciraulo Tullio Levi-Civita

Uncountable Cantors Theorem shows how to keep finding Sets bigger infinities. Albert R

Peregrine: workload optimization for cloud query engines* Alekh - PowerPoint PPT Presentation

Peregrine: workload optimization for cloud query engines* Alekh Jindal Gray Systems Lab * Peregrine: Workload Optimization for Cloud Query Engines . Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan. SOCC

Peregrine: workload optimization for cloud query engines Alekh Jindal, Hiren Patel, Abhishek Roy,

Workload, Fatigue, and Sleep Disruption 1 Workload 1.What is workload? 2.What is the

Proudly Partnered With: 1 PEREGRINE LUNAR LANDER Peregrine is a product line that will serve

Network Query Engines Network Query Engines Craig Knoblock USC Information Sciences Institute 1

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

WORKLOAD WORKLOAD WORKLOAD During exercise, nasal breathing causes a reduction in FEO 2

ASHA Workload Calculator What is Direct and Other indirect workload? activities Services

Transitioning from Peregrine to Eagle HPC Operations January 2019 NREL | 1 Sections

A Pattern-Aware Graph Mining System Kasra Jamshidi Rakesh Mahadasa Keval Vora Simon Fraser

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set 10 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Set11 Search Engines &amp; SEO Outline How do search engines work? Basic operation

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

DAY 2 Agenda for Today Introduce the workload characterization problem. Discuss a

Improved Models of Distortion Cost for Statistical Machine Translation Spence Green, Michel

Announcements Assignment #3 is due on Monday INF 111 / CSE 121: Quiz #3 regrades are due

Algebraic structures in exceptional geometry Martin Cederwall Based on: D. Berman, MC, A.

Solving a 6120-bit DLP on a Desktop Computer Faruk G olo glu, Robert Granger , Gary McGuire,

B2.1 Cardinality of Infinite Sets The cardinality of a finite set is the number of elements it

Key Concepts from CS235 This!lecture!summarizes!key!concepts!from! CS235&amp;Formal&amp;

-locales and Booleanization in Formal Topology Francesco Ciraulo Tullio Levi-Civita

Uncountable Cantors Theorem shows how to keep finding Sets bigger infinities. Albert R

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set 10 Search Engines & SEO Outline How do search engines work? Basic operation

Set11 Search Engines & SEO Outline How do search engines work? Basic operation

Key Concepts from CS235 This!lecture!summarizes!key!concepts!from! CS235&Formal&