CherryPick: Adaptively Unearthing the Best Cloud Configurations for - - PowerPoint PPT Presentation

cherrypick adaptively unearthing the best cloud
SMART_READER_LITE
LIVE PREVIEW

CherryPick: Adaptively Unearthing the Best Cloud Configurations for - - PowerPoint PPT Presentation

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics O. Alipourfard et al. Presented by Dmitry Kazhdan Overview Background Prior work CherryPick Evaluation Criticism Recent work


slide-1
SLIDE 1
  • O. Alipourfard et al.

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics

Presented by Dmitry Kazhdan

slide-2
SLIDE 2

Overview

  • Background
  • Prior work
  • CherryPick
  • Evaluation
  • Criticism
  • Recent work
  • Conclusions
  • Questions
slide-3
SLIDE 3

Background

slide-4
SLIDE 4

Background

Opportunities:

  • Cloud computing
  • Big data analytics
  • Cost savings
slide-5
SLIDE 5

Background

Challenges:

  • Complex performance model
  • Cost model tradeoffs
  • Heterogeneous applications
  • Limited number of samples (from a large configuration space)
slide-6
SLIDE 6

Prior Work

  • Ernest
  • Coordinate descent
  • Exhaustive search
  • Random search
slide-7
SLIDE 7

CherryPick

slide-8
SLIDE 8

CherryPick

  • Uses Bayesian Optimisation to build performance models
  • Finds optimal/near-optimal configurations in only a few test runs
  • Uses the acquisition function to draw samples
slide-9
SLIDE 9

CherryPick

Initial: Modified:

slide-10
SLIDE 10

CherryPick Workflow

slide-11
SLIDE 11

CherryPick Implementation

  • Search Controller
  • Cloud Monitor
  • Bayesian Optimisation Engine
  • Cloud Controller
slide-12
SLIDE 12

Evaluation

slide-13
SLIDE 13

Evaluation

  • Applications: TPC-DS, TPC-H, TeraSort, SparkReg, SparkKm
  • 66 cloud configurations
  • Objective: reduce cost of execution under runtime constraint
  • Compared with:
  • Exhaustive search
  • Coordinate Descent
  • Random Search (with a budget)
  • Ernest
slide-14
SLIDE 14

Evaluation

  • Metric 1: the expense to run a job with the selected configuration
  • Metric 2: the expense to run all sampled configurations
  • 20 independent runs
  • 10th, 50th and 90th percentiles computed
slide-15
SLIDE 15

Evaluation

slide-16
SLIDE 16

Evaluation

slide-17
SLIDE 17

Evaluation

  • Investigated parameter tuning
  • Investigated performance behaviour
slide-18
SLIDE 18

Evaluation

  • Handling workload variation
slide-19
SLIDE 19

Criticism

slide-20
SLIDE 20

Criticism/Discussion

“With 4x cost, random search can find similar configurations to CherryPick on the median”

slide-21
SLIDE 21

Criticism/Discussion

slide-22
SLIDE 22

Criticism/Discussion

  • 3/4 comparison tasks are easy to beat (nothing to compare with)
  • Not using available information efficiently
slide-23
SLIDE 23

Recent Work

slide-24
SLIDE 24

Recent Work

  • PARIS
  • Scout
  • Arrow
  • Micky
slide-25
SLIDE 25

Conclusions

slide-26
SLIDE 26

Conclusions

  • Introduced CherryPick
  • Compared to existing systems
  • Presented evaluation results
  • Criticism
slide-27
SLIDE 27

Questions?