CherryPick: Adaptively Unearthing the Best Cloud Configurations for - - PowerPoint PPT Presentation

▶

Mar 10, 2024 9 likes •280 views

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics O. Alipourfard et al. Presented by Dmitry Kazhdan Overview Background Prior work CherryPick Evaluation Criticism Recent work

SLIDE 1

O. Alipourfard et al.

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

Presented by Dmitry Kazhdan

SLIDE 2

Overview

Background
Prior work
CherryPick
Evaluation
Criticism
Recent work
Conclusions
Questions

SLIDE 3

Background

SLIDE 4

Background

Opportunities:

Cloud computing
Big data analytics
Cost savings

SLIDE 5

Background

Challenges:

Complex performance model
Cost model tradeoffs
Heterogeneous applications
Limited number of samples (from a large configuration space)

SLIDE 6

Prior Work

Ernest
Coordinate descent
Exhaustive search
Random search

SLIDE 7

CherryPick

SLIDE 8

CherryPick

Uses Bayesian Optimisation to build performance models
Finds optimal/near-optimal configurations in only a few test runs
Uses the acquisition function to draw samples

SLIDE 9

CherryPick

Initial: Modified:

SLIDE 10

CherryPick Workflow

SLIDE 11

CherryPick Implementation

Search Controller
Cloud Monitor
Bayesian Optimisation Engine
Cloud Controller

SLIDE 12

Evaluation

SLIDE 13

Evaluation

Applications: TPC-DS, TPC-H, TeraSort, SparkReg, SparkKm
66 cloud configurations
Objective: reduce cost of execution under runtime constraint
Compared with:
Exhaustive search
Coordinate Descent
Random Search (with a budget)
Ernest

SLIDE 14

Evaluation

Metric 1: the expense to run a job with the selected configuration
Metric 2: the expense to run all sampled configurations
20 independent runs
10th, 50th and 90th percentiles computed

SLIDE 15

Evaluation

SLIDE 16

Evaluation

SLIDE 17

Evaluation

Investigated parameter tuning
Investigated performance behaviour

SLIDE 18

Evaluation

Handling workload variation

SLIDE 19

Criticism

SLIDE 20

Criticism/Discussion

“With 4x cost, random search can find similar configurations to CherryPick on the median”

SLIDE 21

Criticism/Discussion

SLIDE 22

Criticism/Discussion

3/4 comparison tasks are easy to beat (nothing to compare with)
Not using available information efficiently

SLIDE 23

Recent Work

SLIDE 24

Recent Work

PARIS
Scout
Arrow
Micky

SLIDE 25

Conclusions

SLIDE 26

Conclusions

Introduced CherryPick
Compared to existing systems
Presented evaluation results
Criticism

SLIDE 27

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

Overview

Background

Background

Opportunities:

Background

Challenges:

Prior Work

CherryPick

CherryPick

CherryPick

Initial: Modified:

CherryPick Workflow

CherryPick Implementation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Criticism

Criticism/Discussion

“With 4x cost, random search can find similar configurations to CherryPick on the median”

Criticism/Discussion

Criticism/Discussion

Recent Work

Recent Work

Conclusions

Conclusions

Questions?