DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale - - PowerPoint PPT Presentation

dc drf adaptive multi resource sharing at public cloud
SMART_READER_LITE
LIVE PREVIEW

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale - - PowerPoint PPT Presentation

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg OShea, Stavros Volos 1 Public Cloud DC hosting enterprise customers O(100K) servers, mostly small tenants 2


slide-1
SLIDE 1

DC-DRF : Adaptive Multi- Resource Sharing at Public Cloud Scale


ACM Symposium on Cloud Computing 2018 Ian A Kash, Greg O’Shea, Stavros Volos

1

slide-2
SLIDE 2

Public Cloud DC hosting enterprise customers
 O(100K) servers, mostly small tenants

2

slide-3
SLIDE 3

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

3

slide-4
SLIDE 4

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

One VM in compute server in compute rack

compute storage

4

slide-5
SLIDE 5

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

One VM in compute server in compute rack One VHD in storage server in storage rack

compute storage

5

slide-6
SLIDE 6

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

6

slide-7
SLIDE 7

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

7

slide-8
SLIDE 8

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

8

slide-9
SLIDE 9

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

9

slide-10
SLIDE 10

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

10

slide-11
SLIDE 11

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

11

slide-12
SLIDE 12

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

12

slide-13
SLIDE 13

Small customer : one VM accessing storage

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

13

slide-14
SLIDE 14

Result: a multi-resource “demand vector”

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

14

slide-15
SLIDE 15

Encodes resource id and proportions

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

15

slide-16
SLIDE 16

Encodes resource id and proportions

T X1 R X1

V T O R1 V T O R2 S S D b VTORa VTORb

TXb R X b

VTOR1 T X1 VTORb R X b S S D b T X b VTORa VTOR2 R X1

compute storage

Any element could be a bottleneck to performance

16

slide-17
SLIDE 17

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • Demand vectors form a sparse demand matrix

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

17

slide-18
SLIDE 18

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • Columns are shared physical resources

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

18

slide-19
SLIDE 19

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • Rows are tenants’ demand vectors

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

19

slide-20
SLIDE 20

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92
  • Shown as fractions of a resource

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

20

slide-21
SLIDE 21

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Large and very sparse matrix

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

DC matrix 100K by 100K Rows mostly empty

21

slide-22
SLIDE 22

Provider has multi-resource allocation problem

  • Goal: maintain acceptable service level for all tenants
  • Acceptable means always “willing to pay”
  • Avoid abrupt performance collapse for any tenant
  • Assuming aggressive (noisy) neighbors and oversubscription
  • DC-DRF builds on existing multi-resource algorithms
  • DRF [Ghodsi et al, NSDI’11]
  • EDRF [Parkes et al., EC2012]
  • Challenging at DC scale: EDRF iterates and is

22

slide-23
SLIDE 23

Systems aspects

23

slide-24
SLIDE 24

Systems challenges

  • How to capture multi-resource demand vectors?
  • How to enforce multi-resource allocations?
  • DRF implies central SDN-like controller – good or bad?
  • Good: Simpler algorithm and global view
  • Bad: EDRF at Public Cloud DC scale

24

slide-25
SLIDE 25

SIGCOMM 2015 demonstration

25

slide-26
SLIDE 26

SIGCOMM 2015 demonstration

Central controller running EDRF Pass1: reservation-based SLAs Pass2: work conservation of residual

26

slide-27
SLIDE 27

SIGCOMM 2015 demonstration

4 tenants, 30 VMs each Spread over 10 servers R/W to 2X storage servers 40Gb RDMA switch

27

slide-28
SLIDE 28

SIGCOMM 2015 demonstration

Demand estimation and enforcement in HyperV

28

slide-29
SLIDE 29

SIGCOMM 2015 demonstration

Aggressive red tenant

  • Perf. collapses for

blue,yellow,green

29

slide-30
SLIDE 30

video

30

slide-31
SLIDE 31

SIGCOMM 2015 demonstration

What did we learn from prototype? Potentially very powerful. But EDRF algorithm not scaling well.

31

slide-32
SLIDE 32

The algorithms

  • to understand DC-DRF first understand EDRF
  • to understand DRF first understand max-min

32

slide-33
SLIDE 33

Max-Min fairness : mice before elephants

  • Maximize the minimum allocation across competing tenants
  • Allocate fractions of a single shared resource based on demand
  • No tenant gets a larger fraction than its demand
  • Tenants with unsatisfiable demand obtain equal share

A B C D 0.1 0.2 0.5 0.6

Residual resource = 1.0 Tenants remaining = 4 Current share = 1.0/4 xt = 0.25

xt =0.25

Allocated Residual resource = 0.7 Tenants remaining = 2 Current share = 0.7/2 xt = 0.35

xt=0.35 0.1 0.2 A B .35 .35 C D

Demand Tenant

33

slide-34
SLIDE 34

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

How to handle multiple resources?

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

34

slide-35
SLIDE 35

Dominant Resource Fairness (DRF)

  • For each tenant identifies its Dominant Resource
  • The resource of which it demands the largest fraction
  • Apply max-min fairness across dominant shares
  • Maximize smallest dominant share in system
  • Then second smallest, and so on…
  • Think : find the smallest mouse across all columns

35

slide-36
SLIDE 36

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Demand vectors normalized by Dominant Resource

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

36

slide-37
SLIDE 37

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Maximize (max-min) smallest dominant share

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

.37 .30 .246 .63 .33 .40 .35 .45 .35 .33

xtr =

37

slide-38
SLIDE 38

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Find residual resource with smallest xtr

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

.37 .30 .246 .63 .33 .40 .35 .45 .35 .33

xtr =

38

slide-39
SLIDE 39

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Use xr8 to allocate at every resource

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

.37 .30 .246 .63 .33 .40 .35 .45 .35 .33

xtr =

39

slide-40
SLIDE 40

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Eliminate r8 if residual capacity hits zero

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

.37 .30 .246 .63 .33 .40 .35 .45 .35 .33

xtr =

40

slide-41
SLIDE 41

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

And eliminate tenants demanding r8

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

.37 .30 .246 .63 .33 .40 .35 .45 .35 .33

xtr =

41

slide-42
SLIDE 42

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Next round: find new smallest xtr and so on…

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

42

slide-43
SLIDE 43

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • .35
  • .32

.23

  • .12
  • .25
  • .13 .25
  • .07 .08 .06 .14
  • .14 .08
  • .10 .05 .03 .03 .02 .06 .25 .06 .03
  • .35
  • .10
  • .08 .19
  • .11
  • .10 .02 .03 .25 .16 .06 .05 .03 .03

.08

  • .02 .03 .25 .16 .06 .05 .03 .03
  • .35
  • .20
  • .14 .16 .05 .08 .03 .02 .25 .06

.22 .07 .11 .16 .05 .08 .03 .02 .25 .14

result : allocation matrix

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

43

slide-44
SLIDE 44

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • .35
  • .32

.23

  • .12
  • .25
  • .13 .25
  • .07 .08 .06 .14
  • .14 .08
  • .10 .05 .03 .03 .02 .06 .25 .06 .03
  • .35
  • .10
  • .08 .19
  • .11
  • .10 .02 .03 .25 .16 .06 .05 .03 .03

.08

  • .02 .03 .25 .16 .06 .05 .03 .03
  • .35
  • .20
  • .14 .16 .05 .08 .03 .02 .25 .06

.22 .07 .11 .16 .05 .08 .03 .02 .25 .14

result : allocation matrix

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

Issue: EDRF is iterative. Sparsity implies slow elimination of tenants.

44

slide-45
SLIDE 45

DC-DRF algorithm

45

slide-46
SLIDE 46

Goal

  • Monitor and adjust shares at 10-30 second intervals
  • Resource demands variation in datacentre traces [Angel et al.,

OSDI’14]

  • Using demands plausibly realistic of Public Cloud DC

46

slide-47
SLIDE 47

DC-DRF: two tactics to improve scalability

  • 1. Algorithmic: extending EDRF
  • Operate to a time deadline chosen by operator (“control interval”)
  • Variable degree of approximation: trading resource utilization for time
  • 2. HPC: maximize rate of computation
  • Parallel where possible
  • Optimize for thread and NUMA locality
  • SIMD vector instructions

47

slide-48
SLIDE 48

Algorithm: inner and outer loops

OuterLoop(time t) // runs one per control interval Initialize demand matrix for this interval Set approximation control variable [0, 1] timeOut = InnerLoop() if elapsed time exceeds t then return true Eliminate a resource when 1- full // e.g. =0.01 at 99% Resources and tenants eliminated earlier and in fewer rounds if (timeOut) then increase() else decrease()

48

slide-49
SLIDE 49

Tactic #2 : HPC

  • Goal: minimize value of required to meet deadline
  • Minimize error due to approximation and maximize utilization
  • Do this by extracting as much perf as we can from the

platform.

49

slide-50
SLIDE 50

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Parallelism : resource tiles over large sparse matrix

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

50

slide-51
SLIDE 51

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Alternating with tenant tiles

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

51

slide-52
SLIDE 52

r0 r1 r2 r3 r4 r5 r6 r7 r8 r9

  • 1.0
  • .92

.95

  • .47
  • 1.0
  • .54 1.0
  • .30 .33 .23 .55
  • .56 .31
  • .41 .20 .12 .13 .09 .23 1.0 .23 .13
  • 1.0
  • .30
  • .23 .55
  • .31
  • .41 .09 .12 1.0 .64 .23 .20 .13 .13

.32

  • .09 .12 1.0 .64 .23 .20 .13 .13
  • 1.0
  • .57
  • .56 .64 .20 .32 .13 .09 1.0 .23

.90 .27 .45 .64 .20 .32 .13 .09 1.0 .56

Alternating with tenant tiles

n0 n1 n2 n3 n4 n5 n6 n7 n8 n9

Carefully cache-aligned memory and bespoke mem barriers make this lock- free.

52

slide-53
SLIDE 53

Single socket parallelisation

53

slide-54
SLIDE 54

NUMA-aware aggregation and mem allocation

54

slide-55
SLIDE 55

SIMD: AVX512 vector instruction set

__mm512_vindex vindex_512 = _MM512_LOAD_VINDEX(*ptr); __m512r mu_tr = _mm512_i32gather_pr(vindex_512,pScratchR); mu_tr = _mm512_add_pr(mu_tr, A_irt); _mm512_mask_i32scatter_pr(pScratchR, m, vindex_512, mu_tr);

55

slide-56
SLIDE 56

SIMD: AVX512 vector instruction set

__mm512_vindex vindex_512 = _MM512_LOAD_VINDEX(*ptr); __m512r mu_tr = _mm512_i32gather_pr(vindex_512,pScratchR); mu_tr = _mm512_add_pr(mu_tr, A_irt); _mm512_mask_i32scatter_pr(pScratchR, m, vindex_512, mu_tr);

Identify 16 values in 100K array

56

slide-57
SLIDE 57

SIMD: AVX512 vector instruction set

__mm512_vindex vindex_512 = _MM512_LOAD_VINDEX(*ptr); __m512r mu_tr = _mm512_i32gather_pr(vindex_512,pScratchR); mu_tr = _mm512_add_pr(mu_tr, A_irt); _mm512_mask_i32scatter_pr(pScratchR, m, vindex_512, mu_tr);

Pull them all into 512-bit register.

57

slide-58
SLIDE 58

SIMD: AVX512 vector instruction set

__mm512_vindex vindex_512 = _MM512_LOAD_VINDEX(*ptr); __m512r mu_tr = _mm512_i32gather_pr(vindex_512,pScratchR); mu_tr = _mm512_add_pr(mu_tr, A_irt); _mm512_mask_i32scatter_pr(pScratchR, m, vindex_512, mu_tr);

Perform arithmetic on them all at once.

58

slide-59
SLIDE 59

SIMD: AVX512 vector instruction set

__mm512_vindex vindex_512 = _MM512_LOAD_VINDEX(*ptr); __m512r mu_tr = _mm512_i32gather_pr(vindex_512,pScratchR); mu_tr = _mm512_add_pr(mu_tr, A_irt); _mm512_mask_i32scatter_pr(pScratchR, m, vindex_512, mu_tr);

Scatter them back into 100K array

59

slide-60
SLIDE 60

Evaluation

60

slide-61
SLIDE 61

Approach

  • Method: synthetic demands based on Azure traces [Cortez,

SOSP’17]

  • Synthetic demand for 100K resources X 1M tenants
  • Demand vector sizes [2,128] from truncated Gaussian (most tenants small)
  • Deadline for DC-DRF: 8 seconds
  • Compare to baseline single-threaded EDRF in unbounded time
  • Show overall results and breakdown
  • DC-DRF: both approximation and HPC
  • sDC-DRF: approximation only
  • pEDRF: HPC only, finish at deadline

61

slide-62
SLIDE 62

Utilization relative to baseline

62

slide-63
SLIDE 63

Utilization relative to baseline

63

~10% of resources wasted

slide-64
SLIDE 64

Utilization relative to baseline

64

slide-65
SLIDE 65

Utilization relative to baseline

65

slide-66
SLIDE 66

Outer loop : adapting epsilon

66

slide-67
SLIDE 67

Outer loop : adapting epsilon

67

8 second deadline

slide-68
SLIDE 68

Outer loop : adapting epsilon

68

Search for epsilon starts at zero

slide-69
SLIDE 69

Outer loop : adapting epsilon

69

Inner loop timed out: Outer loop increases epsilon

slide-70
SLIDE 70

Outer loop : adapting epsilon

70

Completed short of deadline: Outer loop decreases epsilon

slide-71
SLIDE 71

Outer loop : adapting epsilon

71

slide-72
SLIDE 72

Outer loop : adapting epsilon

72

slide-73
SLIDE 73

Outer loop : adapting epsilon

73

slide-74
SLIDE 74

Summary

EDRF DC-DRF

  • util. drop/

baseline 1Mx100K: 15 mins 8 secs 0.0065% 1Mx1M: 129 mins 8 secs 0.06%

74

slide-75
SLIDE 75

Summary

EDRF DC-DRF

  • util. drop/

baseline 1Mx100K: 15 mins 8 secs 0.0065% 1Mx1M: 129 mins 8 secs 0.06%

75

slide-76
SLIDE 76

Conclusion

DC-DRF enables multi-resource allocation to be calculated at Public Cloud scale in bounded time.

76

slide-77
SLIDE 77

Thankyou

77

slide-78
SLIDE 78

Backup video from demo at SIGCOMM’15

  • 4 x tenants with 3 VMs each on 10 compute servers
  • Accessing 2X RAMD storage servers over RDMA
  • Demand estimation and vector rate limiters in Hyper-V drivers
  • Central controller using EDRF algorithm in two passes
  • per-tenant aggregate reservation and intra-tenant work conservation
  • Inter-tenant work conservation

78

slide-79
SLIDE 79

Specification

Outer loop : find to meet deadline

while true do

// ingest latest observed demands…

Inner loop : approximation of EDRF

while do

79

slide-80
SLIDE 80

Specification

Outer loop : find to meet deadline

while true do

// ingest latest observed demands…

Inner loop : approximation of EDRF

while do

Iterate until done or timeout

80

slide-81
SLIDE 81

Specification

Outer loop : find to meet deadline

while true do

// ingest latest observed demands…

Inner loop : approximation of EDRF

while do

Smallest for this round

81

slide-82
SLIDE 82

Specification

Outer loop : find to meet deadline

while true do

// ingest latest observed demands…

Inner loop : approximation of EDRF

while do

trades utilization for speed

82

slide-83
SLIDE 83

Specification

Outer loop : find to meet deadline

while true do

// ingest latest observed demands…

Inner loop : approximation of EDRF

while do

Adjust to meet deadline

83

slide-84
SLIDE 84

DRF fairness properties

  • Formal fairness properties:
  • Sharing incentive : no tenant would prefer a simple resource

partitioning

  • Strategy-proof : no benefit from falsified demands
  • Envy-free : no tenant would prefer another tenant’s allocation
  • Pareto-fairness : increasing one tenant decreases another

84

slide-85
SLIDE 85

epsilon

85

slide-86
SLIDE 86

epsilon

Choice of 8 sec deadline Find a point to rhs of knee

86

slide-87
SLIDE 87

What worked in our prototypes

  • Distribute enforcement mechanisms into edge hypervisors
  • Classification, demand estimation, rate limiters
  • Central SDN-like controller calculating shares
  • Simpler algorithm: easier to build confidence
  • Complete information beats partial views (think: B4 and SWAN)
  • For detail see
  • IoFlow: single-resource Max-Min [Thereska et al., SOSP’13]
  • Pulsar: multi-resource EDRF [Angel et al., OSDI’14] :
  • Filo : distributed EDRF [Marandi at al., USENIX ATC 16]

87