Towards a Learning Optimizer for Shared Clouds* Chenggang Wu, Alekh - - PowerPoint PPT Presentation

towards a learning optimizer for shared clouds
SMART_READER_LITE
LIVE PREVIEW

Towards a Learning Optimizer for Shared Clouds* Chenggang Wu, Alekh - - PowerPoint PPT Presentation

Towards a Learning Optimizer for Shared Clouds* Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao February 8, 2019 * C. Wu, A. Jindal, S. Amizadeh, H. Patel, W. Le, S. Qiao, and S. Rao. Towards a Learning


slide-1
SLIDE 1

Towards a Learning Optimizer for Shared Clouds*

Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao

* C. Wu, A. Jindal, S. Amizadeh, H. Patel, W. Le, S. Qiao, and S. Rao. Towards a Learning Optimizer for Shared Clouds. In PVLDB, 12(3): 210–222, 2018.

February 8, 2019

slide-2
SLIDE 2

Rise of Big Data Systems

BIG

DATA

Hive Spark Flink Calcite BigQuery Big SQL HDInsight SCOPE Etc.

Declarative query interface Cost-based query optimizer (CBO)

SELECT Customer.cname, Item.iname FROM Customer INNER JOIN Order ON Customer.cid == Order.cid INNER JOIN Item ON Item.iid == Order.iid WHERE Item.iprice > 100 AND Customer.cage < 18;

Customer Item Order Filter Filter Age < 18 Price>100 Join Join cid iid 50 1000 10 600 800 6000 400

Good plan => Good performance Problem: CBO can make mistakes

  • esp. Cardinality Estimation

SYSTEM

slide-3
SLIDE 3

Rise of Big Data Systems

The root of all evil, the Achilles Heel of query optimization, is the estimation of the size of intermediate results, known as cardinalities. – [Guy Lohman, SIGMOD Blog 2014] CARDINALITY ESTIMATION

𝐹𝑊𝐽𝑀

Hive Spark Flink Calcite BigQuery Big SQL HDInsight SCOPE Etc.

BIG

DATA

SYSTEM

slide-4
SLIDE 4

Rise of Big Data Systems

TUNING!

Hive Spark Flink Calcite BigQuery Big SQL HDInsight SCOPE Etc.

Collecting Statistics Providing Query Hints Database Administration

BIG

DATA

SYSTEM

slide-5
SLIDE 5

Rise of the Clouds

MANAGED

Hive Spark Flink Calcite BigQuery Big SQL HDInsight SCOPE Etc.

SERVERLESS

No Admin No Expertise No Control Collecting Statistics Providing Query Hints Database Administration

BIG

DATA

SYSTEM

slide-6
SLIDE 6

Rise of the Clouds

Hive Spark Flink Calcite BigQuery Big SQL HDInsight SCOPE Etc.

BIG

DATA

SYSTEM

SELF TUNING!

slide-7
SLIDE 7

Hope: Shared Cloud Infrastructures

BIG

DATA

SYSTEM

Massive volumes of query logs Shared data processing Centrally visible query workload

slide-8
SLIDE 8

Cosmos: shared cloud infra at Microsoft

  • SCOPE Workloads:
  • Batch processing in a job service
  • 100Ks jobs; 1000s users; EBs data; 100Ks nodes
  • Cardinality estimation in SCOPE:
  • 1 day’s log from Asimov
  • Lots of constants for best effort estimation
  • Big data, unstructured Data, custom code
  • Workload patterns
  • Recurring jobs
  • Shared query subgraphs
  • Can we learn cardinality models?
Customer Item Order Filter Filter Age < 18 Price>100 Join Join cid iid 50 1000 10 600 800 6000 400 Item Filter Price>100 1000 600 Aggregate

Q1 Q2 Under- estimation Over- estimation

Ideal

slide-9
SLIDE 9

Learning Cardinality Model

  • Strict: cache previously seen values
  • Low coverage
  • Online feedback
  • General: learning a single model
  • Hard to featurize
  • Hard to train
  • Prediction latency
  • Low accuracy
  • Template: learning a model per subgraph template

=> No one-size-fits-all

Subgraph Type Logical Expression Parameter Values Data Inputs Strict Fixed Fixed Fixed General Variable Variable Variable Template Fixed Variable Variable

slide-10
SLIDE 10

Learned Cardinality Models

  • Subgraph Template:
  • Same logical subexpression
  • Different physical implementation
  • Different parameters and inputs
  • Feature Selection
  • Model Selection
  • Generalized liner models due to their

interpretability

  • More complex models, such as multi-

layer perceptron harder to train

Table 3: The features used for learning cardinality.

Name Description JobName Name of the job containing the subgraph NormJobName Normalize job name InputCardinality Total cardinality of all inputs to the subgraph P ow(InputCardinality, 2) Square of InputCardinality Sqrt(InputCardinality) Square root of InputCardinality Log(InputCardinality) Log of InputCardinality AvgRowLength Average output row length InputDataset Name of all input datasets to the subgraph Parameters One or more parameters in the subgraph

Customer Filter

Age < 18

Join Order cid Customer’ Filter

Age < 20

Join Order’ cid

slide-11
SLIDE 11

Accuracy: 10-fold cross validation

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10-6 10-4 10-2 100 102 104 106 108 Fraction Subgraph Instances Estimated/Actual Cardinality Ratio Neural Network Linear Regression Poisson Regression

Model 75th Percentile Error 90th Percentile Error Default SCOPE 74602% 5931418% Poisson Regression 1.5% 32%

Note: Neural network overfits due to small

  • bservation and feature space per model
slide-12
SLIDE 12

Applicability: %tage subgraphs having models

Applicability (%)

20 40 60 80 100 Train Duration

1-day 2-day 4-day 1-week 2-week 1-month

Jobs Subgraphs

Varying Training Window

Applicability (%)

20 40 60 80 100 Test Slide Duration 1-day 1-week 1-month Jobs Subgraphs

Sliding Test Window

slide-13
SLIDE 13

End-to-end Feedback Loop

Model Lookup & Prediction

Compiler Optimizer Scheduler Runtime

Query Result

Actual runtime statistics Optimized plans & estimated statistics Execution graphs & resources Compiled query DAGs

Workload Analyzer Parallel Trainer

Cardinality Models

Model Server

Figure 5: The feedback loop architecture.

Large number of smaller, highly accurate models Trained offline over new batches of data Easy to featurize with low overhead Annotation hints to the query

  • ptimizer

Accurate and easy to understand

slide-14
SLIDE 14

Performance

  • Subset of hourly jobs from Asimov
  • These queries process unstructured data, use SPJA operators, and a UDO
  • Re-ran the queries over same production data, but with redirected output

Latency (s) 175 350 525 700 Queries 1 2 3 4 5 6 7 8 Default Optimizer With CardLearner Processing Time (s) 3000 6000 9000 12000 Queries 1 2 3 4 5 6 7 8 Default Optimizer With CardLearner Number of Vertices 300 600 900 1200 Queries 1 2 3 4 5 6 7 8 Default Optimizer With CardLearner

slide-15
SLIDE 15

Avoiding Learning Bias

  • Learning only what is seen
  • Exploratory join ordering
  • Actively try different join orders
  • Pruning: discard plans with subexpressions that are more expensive than at

least one other plan

  • Maximize new observations when comparing plans
  • Execution strategies
  • Static workload tuning
  • Using sample data
  • Leveraging recurring/overlapping jobs

X Y Z

Actual:100 Estimated:50 Actual:100

(a) Plan 1

X Z Y

Estimated:75 Actual:75 Actual:100

(b) Plan 2

Y Z X

Estimated:200 Actual:50 Actual:100

(c) Plan 3

slide-16
SLIDE 16

Takeaways

  • Big data systems increasingly use cost-based optimization
  • Users cannot tune these systems in managed/serverless services
  • Hard to achieve a one-size-fits-all query optimizer
  • Instance optimized systems are more feasible
  • Very promising results from SCOPE workloads:
  • Could achieve very high accuracy
  • Reasonably large applicability, could further apply exploration
  • Performance gains, most significant being less resource consumption
  • Learned cardinality models a step towards self-learning optimizers