DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale - PowerPoint PPT Presentation

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale   ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg O’Shea, Stavros Volos 1

Public Cloud DC hosting enterprise customers   O(100K) servers, mostly small tenants 2

Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 3

Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 One VM in compute V T O R1 V T O R2 VTORa VTORb server in compute rack T X1 R X1 TXb R X b S S D b compute storage 4

Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 One VM in compute V T O R1 V T O R2 VTORa VTORb server in compute rack T X1 R X1 TXb R X b One VHD in storage server in storage rack S S D b compute storage 5

Small customer : one VM accessing storage T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b 11

Result: a multi-resource “demand vector” T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 14

Encodes resource id and proportions T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 V T O R1 V T O R2 VTORa VTORb T X1 R X1 TXb R X b S S D b compute storage 15

Encodes resource id and proportions T X1 VTOR1 VTORb R X b S S D b T X b VTORa VTOR2 R X1 Any element could be a V T O R1 V T O R2 VTORa VTORb bottleneck to performance T X1 R X1 TXb R X b S S D b compute storage 16

Demand vectors form a sparse demand matrix r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 17

Columns are shared physical resources r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 18

Rows are tenants’ demand vectors r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - - - - - - - - - n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 19

Shown as fractions of a resource r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 - - - - - - - - - - n 2 - - - - - - - - - - - - - - - - - - - - n 3 - - - - - - - - - - n 4 - - - - - - - - - - n 5 n 6 - - - - - - - - - - - - - - - - - - - - n 7 n 8 - - - - - - - - - - n 9 - - - - - - - - - - 20

Large and very sparse matrix r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 DC matrix 100K by 100K - 1.0 - .30 - .23 .55 - - .31 n 4 Rows mostly empty - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 21

Provider has multi-resource allocation problem • Goal: maintain acceptable service level for all tenants • Acceptable means always “willing to pay” • Avoid abrupt performance collapse for any tenant • Assuming aggressive (noisy) neighbors and oversubscription • DC-DRF builds on existing multi-resource algorithms • DRF [Ghodsi et al, NSDI’11] • EDRF [Parkes et al., EC2012] • Challenging at DC scale: EDRF iterates and is 22

Systems aspects 23

Systems challenges • How to capture multi-resource demand vectors? • How to enforce multi-resource allocations? • DRF implies central SDN-like controller – good or bad? • Good: Simpler algorithm and global view • Bad: EDRF at Public Cloud DC scale 24

SIGCOMM 2015 demonstration 25

SIGCOMM 2015 demonstration Central controller running EDRF Pass1: reservation-based SLAs Pass2: work conservation of residual 26

SIGCOMM 2015 demonstration 4 tenants, 30 VMs each Spread over 10 servers R/W to 2X storage servers 40Gb RDMA switch 27

SIGCOMM 2015 demonstration Demand estimation and enforcement in HyperV 28

SIGCOMM 2015 demonstration Aggressive red tenant Perf. collapses for blue,yellow,green 29

video 30

SIGCOMM 2015 demonstration What did we learn from prototype? Potentially very powerful. But EDRF algorithm not scaling well. 31

The algorithms • to understand DC-DRF first understand EDRF • to understand DRF first understand max-min 32

Max-Min fairness : mice before elephants • Maximize the minimum allocation across competing tenants • Allocate fractions of a single shared resource based on demand • No tenant gets a larger fraction than its demand • Tenants with unsatisfiable demand obtain equal share Residual resource = 1.0 Residual resource = 0.7 Tenants remaining = 4 Tenants remaining = 2 .35 D Current share = 1.0/4 Current share = 0.7/2 x t = 0.25 x t = 0.6 .35 C Demand 0.35 0.5 x t =0.35 x t =0.25 B 0.2 0.2 0.1 0.1 A Allocated D A B C Tenant 33

How to handle multiple resources? r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 - 1.0 - .30 - .23 .55 - - .31 n 4 - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 34

Dominant Resource Fairness (DRF) • For each tenant identifies its Dominant Resource • The resource of which it demands the largest fraction • Apply max-min fairness across dominant shares • Maximize smallest dominant share in system • Then second smallest, and so on… • Think : find the smallest mouse across all columns 35

Demand vectors normalized by Dominant Resource r 0 r 1 r 2 r 3 r 4 r 5 r 6 r 7 r 8 r 9 n 0 - - 1.0 - - - - - - .92 n 1 .95 - .47 - - - - - 1.0 - n 2 .54 1.0 - .30 .33 .23 .55 - .56 .31 - .41 .20 .12 .13 .09 .23 1.0 .23 .13 n 3 - 1.0 - .30 - .23 .55 - - .31 n 4 - .41 .09 .12 1.0 .64 .23 .20 .13 .13 n 5 n 6 .32 - .09 .12 1.0 .64 .23 .20 .13 .13 - - - - - - 1.0 - - .57 n 7 n 8 - - .56 .64 .20 .32 .13 .09 1.0 .23 n 9 .90 .27 .45 .64 .20 .32 .13 .09 1.0 .56 36

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale - PowerPoint PPT Presentation

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg OShea, Stavros Volos 1 Public Cloud DC hosting enterprise customers O(100K) servers, mostly small tenants 2

D Dominant Resource Fairness (DRF) i t R F i (DRF) Fair Allocation of Multiple Resource Types

The Regulated Product Submission: Progress Update IM DRF Public Stakeholders Session S ydney,

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Secret Sharing and Visual Cryptography Outline Secret Sharing Visual Secret Sharing

Adaptive Knowledge Sharing in Multi-Task Learning: Improving Low-Resource Neural Machine

Advanced Tools from Modern Cryptography Lecture 3 Secret-Sharing (ctd.) Secret-Sharing Last

CS 744: DRF Shivaram Venkataraman Fall 2019 ADMINISTRIVIA - Assignment 1 details - Assignment 2

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

CS 744: DRF Shivaram Venkataraman Fall 2020 ML knowledge ADMINISTRIVIA q TEY Attend tM%L -

Building a Private Cloud Cloud Infrastructure Using Opensource Building a Private Cloud OSCON

KAFKA STREAMS CLOUD MONITORING AWS CLOUD MONITORING AWS APP CLOUD MONITORING AWS HTTP APP

UTSA Community-Based Secure Information and Resource Sharing in AWS Public Cloud Cyber Incident

Object Oriented Programming COP3330 / CGS5409 Intro to Data Structures Vectors Linked

Week 15 - Monday What did we talk about last time? Timsort Tries Lab hours

1 Peter Series Lesson #132 May 31, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Phylogeny and Sequence Simulation with Hybridization Michael Woodhams School of Mathematics and

Announcements Reminder: Candidates on campus in next few weeks Day-by-day schedule may change,

Sor$ng, Stacks, Queues Bryce Boe 2013/11/06 CS24, Fall 2013

The HISPACAT comparative database of syntactic constructions and its applications to syntactic

Data Structures and Object-Oriented Design VIII Spring 2014 Carola Wenk Collections and Maps