Stratus Cost-aware container scheduling in the public cloud Andrew - - PowerPoint PPT Presentation

stratus
SMART_READER_LITE
LIVE PREVIEW

Stratus Cost-aware container scheduling in the public cloud Andrew - - PowerPoint PPT Presentation

Stratus Cost-aware container scheduling in the public cloud Andrew Chung Jun Woo Park, Greg Ganger PARALLEL DATA LABORATORY Carnegie Mellon University Carnegie Mellon Parallel Data Laboratory Motivation IaaS CSPs provide per-time VM


slide-1
SLIDE 1

PARALLEL DATA LABORATORY

Carnegie Mellon University

Carnegie Mellon

Parallel Data Laboratory

Stratus

Cost-aware container scheduling in the public cloud Andrew Chung

Jun Woo Park, Greg Ganger

slide-2
SLIDE 2

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Motivation

  • IaaS CSPs provide per-time VM rental of diverse offerings
  • VM types and sizes
  • Contract types (e.g., reliable/on-demand, dynamically-priced/spot,…)
  • Can add/remove VMs from virtual cluster (VC) any time
  • VMs paid-for by-the-second while rented
  • Pay for full VM even if only partially used!
  • Mgmt complex, but sched research has not focused on both
  • 1. Dynamically-sized clusters
  • 2. Clusters with wide diversity of instance types, sizes, and contracts

2

slide-3
SLIDE 3

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • IaaS CSPs provide per-time VM rental of diverse offerings
  • VM types and sizes
  • Contract types (e.g., reliable/on-demand, dynamically-priced/spot,…)
  • Can add/remove VMs from virtual cluster (VC) any time
  • VMs paid-for by-the-second while rented
  • Pay for full VM even if only partially used!
  • Mgmt complex, but sched research has not focused on both
  • 1. Dynamically-sized clusters
  • 2. Clusters with wide diversity of instance types, sizes, and contracts

Motivation

3

How can we take advantage of 
 diverse offerings and virtual cluster elasticity to
 lower cost of executing batch workloads?

slide-4
SLIDE 4

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task slot Task slot Task slot

Empty VM

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

Public cloud sched properties

4

Now Time

slide-5
SLIDE 5

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

Task C Task B Task A

Example where VM resource-time is wasted

Public cloud sched properties

4

Now Time

slide-6
SLIDE 6

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

Task C Task B Task A

Example where VM resource-time is wasted

Public cloud sched properties

4

Now Time

Looks well-packed here, but…

slide-7
SLIDE 7

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

Task C Task B Task A

Example where VM resource-time is wasted

Public cloud sched properties

4

Now Time

Bubbles
 unused VM resources over time

slide-8
SLIDE 8

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

Public cloud sched properties

4

Task C Task B Task A

Example where VM resource-time is wasted

slide-9
SLIDE 9

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Property 1: Wasted resource-time is wasted money
  • Money-saving key: Minimize resource-time “bubbles”
  • 1. Resource-cost-awareness: Pick right-sized, cost-eff VMs
  • 2. Efficiently using rental time: Keep VMs highly utilized when

rented, release VMs if no pending tasks

  • Property 2: Possible to have no task queue time
  • Replaced by VM spin-up time
  • Allows bounded workload latency

Public cloud sched properties

4

slide-10
SLIDE 10

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Overview and goals

  • Stratus: VC sched middleware for public clouds
  • Suited for collections of batch jobs
  • How to size VC and where to place tasks
  • Goals: Lower the cost of executing batch workloads

with minimum makespan impact

  • Cost-efficiency by reducing “resource bubbles”
  • Makespan-minimization by sched tasks as they arrive

5

slide-11
SLIDE 11

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task C Task B Task A

Stratus: aligning task runtimes

Efficiently using rental time

  • Ideally, all tasks assigned to VM finish at same time
  • 0% utilized (new) → 100% utilized → 0% utilized → released
  • Stratus packs tasks on VMs to align task runtimes
  • Does so with a new technique: runtime binning

6

Now Time

slide-12
SLIDE 12

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task C Task B Task A

Bad alignment of task runtimes Bubbles

Efficiently using rental time

  • Ideally, all tasks assigned to VM finish at same time
  • 0% utilized (new) → 100% utilized → 0% utilized → released
  • Stratus packs tasks on VMs to align task runtimes
  • Does so with a new technique: runtime binning

6

Now Time

slide-13
SLIDE 13

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • RT bins: logical bins of disjoint time intervals sized exp
  • [now = 0, 1), [1, 2), [2, 4), [4, 8), [8, 16),…, and so on
  • Task assigned to bin according to remaining runtime from now
  • Ex: Task A, which runs for 11 more time units, in blue bin ([8, 16))

Runtime (RT) binning

7

Now 1 2 4 8

Task A

slide-14
SLIDE 14

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • RT bins: logical bins of disjoint time intervals sized exp
  • [now = 0, 1), [1, 2), [2, 4), [4, 8), [8, 16),…, and so on
  • Task assigned to bin according to remaining runtime from now
  • Ex: Task A, which runs for 11 more time units, in blue bin ([8, 16))
  • VM assigned to bin based on longest remaining task RT
  • Ex: VM with only Task A assigned to blue bin → blue border

Runtime (RT) binning

7

Now 1 2 4 8

Task A

slide-15
SLIDE 15

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task A Task B

  • RT bins: logical bins of disjoint time intervals sized exp
  • [now = 0, 1), [1, 2), [2, 4), [4, 8), [8, 16),…, and so on
  • Task assigned to bin according to remaining runtime from now
  • Ex: Task A, which runs for 11 more time units, in blue bin ([8, 16))
  • VM assigned to bin based on longest remaining task RT
  • Ex: VM with only Task A assigned to blue bin → blue border

Runtime (RT) binning

7

Now 1 2 4 8

slide-16
SLIDE 16

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task A Task B

  • RT bins: logical bins of disjoint time intervals sized exp
  • [now = 0, 1), [1, 2), [2, 4), [4, 8), [8, 16),…, and so on
  • Task assigned to bin according to remaining runtime from now
  • Ex: Task A, which runs for 11 more time units, in blue bin ([8, 16))
  • VM assigned to bin based on longest remaining task RT
  • Ex: VM with only Task A assigned to blue bin → blue border

Runtime (RT) binning

7

Now 1 2 4 8

slide-17
SLIDE 17

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task A Task B

  • RT bins: logical bins of disjoint time intervals sized exp
  • [now = 0, 1), [1, 2), [2, 4), [4, 8), [8, 16),…, and so on
  • Task assigned to bin according to remaining runtime from now
  • Ex: Task A, which runs for 11 more time units, in blue bin ([8, 16))
  • VM assigned to bin based on longest remaining task RT
  • Ex: VM with only Task A assigned to blue bin → blue border

Runtime (RT) binning

7

Now 1 2 4 8

Task C

slide-18
SLIDE 18

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release

8

Now 1 2 4 8

Task A

slide-19
SLIDE 19

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release

8

Now 1 2 4 8

Task A

slide-20
SLIDE 20

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release

8

Now 1 2 4 8

Task A Full

slide-21
SLIDE 21

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release

8

Now 1 2 4 8

Task A Full Full

slide-22
SLIDE 22

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release

8

Now 1 2 4 8

Task A Full Full

slide-23
SLIDE 23

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Packing tasks to VMs

  • Packing preference for task in runtime bin β
  • VM in β > VM in greater RT bins > VM in lesser RT bins
  • Least impact to extend VM time-to-release
  • Only scale-out as last resort

8

Now 1 2 4 8

Task A Full Full Full Scale-out

slide-24
SLIDE 24

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Resource-cost-awareness

  • Stratus performs dynamic selection of VC composition
  • Acquire new VMs only if tasks don’t fit on any VMs
  • Release VMs as soon as they become empty
  • Recall: diverse offerings and dynamic pricing of VMs
  • Key: Resource-cost-aware scale-out that considers

both packing of pending tasks & dynamic rental costs

  • Eval packing of combinations of tasks in same runtime bin
  • n to candidate VMs based on cost-per-resource-utilized
  • Packing/scaling in isolation with another increases cost

9

slide-25
SLIDE 25

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

slide-26
SLIDE 26

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

slide-27
SLIDE 27

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.9

slide-28
SLIDE 28

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.9

slide-29
SLIDE 29

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.5

slide-30
SLIDE 30

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $2.0

slide-31
SLIDE 31

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.5

slide-32
SLIDE 32

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.9

slide-33
SLIDE 33

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

$-per-resource-utilized: $0.9

slide-34
SLIDE 34

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

slide-35
SLIDE 35

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Acquiring new VMS

10

Now 1 2 4 8

Pending tasks

slide-36
SLIDE 36

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Task A (over-est) Task D (correct-est)

  • Mis-estimates can lead to low resource utilization

Task C (under-est)

Runtime mis-estimates

11

Task B (correct-est)

Now 1 2 4 8

Example: 4 tasks on 2 instances

slide-37
SLIDE 37

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Mis-estimates can lead to low resource utilization

Task D (correct-est) Task C (under-est)

Runtime mis-estimates

11

Bubble Task B (correct-est)

Now 1 2 4 8

Example: 4 tasks on 2 instances

slide-38
SLIDE 38

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Bubble Task C (new est)

  • Mis-estimates can lead to low resource utilization

Runtime mis-estimates

11

Now 1 2 4 8

Example: 4 tasks on 2 instances Released

slide-39
SLIDE 39

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

  • Mis-estimates can lead to low resource utilization
  • RT binning mitigates mis-estimates to some degree
  • Adjusting mis-estimates
  • Over-estimates: No adjustment necessary (task done)
  • Under-estimates: Assume task has run for half of its runtime
  • Instance-clearing: If VM experiences low utilization for

extended period of time, migrate tasks and re-distribute

Runtime mis-estimates

11

slide-40
SLIDE 40

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Experimental setup

  • Simulation-based experiments
  • Workloads: Google and TwoSigma cluster traces
  • Focus on batch jobs
  • Filter out jobs running > 1 day
  • EC2 spot market for dynamically-priced markets
  • Same family VMs for comparable perf

12

slide-41
SLIDE 41

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Fleet (Spot Fleet + ECS, Amazon offerings)

  • LowestPrice + BinPack policy

Evaluation: Normalized cost

13

Normalized cost 0.25 0.5 0.75 1 Google TwoSigma

Fleet Stratus

Note: Comparisons vs other state-of-the-art-schedulers left out for brevity
 Stratus lowers cost vs each compared scheduler by at least 17%

slide-42
SLIDE 42

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Evaluation: Normalized cost

13

Normalized cost 0.25 0.5 0.75 1 Google TwoSigma

Fleet Stratus

Stratus

  • 17% (Google) and 22% cost reduction (TwoSigma)

Note: Comparisons vs other state-of-the-art-schedulers left out for brevity
 Stratus lowers cost vs each compared scheduler by at least 17%

slide-43
SLIDE 43

Carnegie Mellon

Parallel Data Laboratory

http://www.pdl.cmu.edu/ Andrew Chung, SoCC 2018

Summary

  • Packing/scaling heuristics based on runtime binning
  • Allows for high utilization of resources during rental period
  • Scale VC by simultaneous consideration of possible

packings and available instance types and prices

  • Indep consideration of packing/scaling leads to higher cost
  • ~17% cost reduction on Google and TwoSigma traces

compared to next-best evaluated scheduler

  • Attains high resource utilization

14