Dominant Resource Fairness in Cloud Computing Systems with - - PowerPoint PPT Presentation

dominant resource fairness in cloud computing systems
SMART_READER_LITE
LIVE PREVIEW

Dominant Resource Fairness in Cloud Computing Systems with - - PowerPoint PPT Presentation

Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers Wei Wang , Baochun Li, Ben Liang Department of Electrical and Computer Engineering University of Toronto April 30, 2014 Introduction Cloud computing system


slide-1
SLIDE 1

Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers

Wei Wang, Baochun Li, Ben Liang Department of Electrical and Computer Engineering University of Toronto April 30, 2014

slide-2
SLIDE 2

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Introduction

Cloud computing system represents unprecedented heterogeneity

Server specifjcation Resource demand profjles of computing tasks

2

slide-3
SLIDE 3

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Confjgurations of servers in one of Google’s clusters

CPU and memory units are normalized to the maximum server

Heterogenous servers

3

Number of servers CPUs Memory 6732 0.50 0.50 3863 0.50 0.25 1001 0.50 0.75 795 1.00 1.00 126 0.25 0.25 52 0.50 0.12 5 0.50 0.03 5 0.50 0.97 3 1.00 0.50 1 0.50 0.06

slide-4
SLIDE 4

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Heterogeneous resource demand

4

Ghodsi et al. NSDI11

slide-5
SLIDE 5

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

How should resources be allocated fairly and efficiently?

slide-6
SLIDE 6

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

State-of-the-Art Resource Allocation Mechanisms

slide-7
SLIDE 7

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Single-resource abstraction

Partition a server’s resources into slots

E.g., a slot = (1 CPU core, 2 GB RAM)

Allocate resources to users at the granularity of slots

Hadoop Fair Scheduler & Capacity Scheduler Dryad Quincy scheduler

Ignores the heterogeneity of both server specifjcations and demand profjles

7

slide-8
SLIDE 8

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Dominant Resource Fairness (DRF)

8

slide-9
SLIDE 9

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Dominant Resource Fairness (DRF)

Dominant resource

The one that requires the most allocation share

8

slide-10
SLIDE 10

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Dominant Resource Fairness (DRF)

Dominant resource

The one that requires the most allocation share

For example

A cluster: (9 CPUs, 18 GB RAM) Job of user 1: (1 CPU, 4 GB RAM) Job of user 2: (3 CPUs, 1 GB RAM)

8

slide-11
SLIDE 11

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Dominant Resource Fairness (DRF)

Dominant resource

The one that requires the most allocation share

For example

A cluster: (9 CPUs, 18 GB RAM) Job of user 1: (1 CPU, 4 GB RAM) Job of user 2: (3 CPUs, 1 GB RAM)

DRF allocation

Equalize the dominant share each user receives 3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB) Equalized dominant share = 2/3

8

slide-12
SLIDE 12

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Why DRF?

9

slide-13
SLIDE 13

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Why DRF?

Addresses the demand heterogeneity

9

slide-14
SLIDE 14

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Why DRF?

Addresses the demand heterogeneity Highly attractive allocation properties [Ghodsi11]

Pareto optimality Envy freeness Truthfulness Sharing incentive and more…

9

slide-15
SLIDE 15

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

However…

DRF assumes an all-in-one resource model

The entire resource pool is modeled as one super computer

Ignores the heterogeneity of servers

Allocation depends only on the total amount of resources

May lead to an infeasible allocation

10

slide-16
SLIDE 16

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

An infeasible DRF allocation

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB) Job of user 2: (3 CPUs, 1 GB)

DRF allocation

3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB)

11

slide-17
SLIDE 17

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

An infeasible DRF allocation

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB) Job of user 2: (3 CPUs, 1 GB)

DRF allocation

3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB)

11

Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB)

slide-18
SLIDE 18

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

An infeasible DRF allocation

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB) Job of user 2: (3 CPUs, 1 GB)

DRF allocation

3 jobs for User 1: (3 CPUs, 12 GB) 2 jobs for User 2: (6 CPUs, 2 GB)

11

Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB)

User 1 can schedule at most 2 jobs!

slide-19
SLIDE 19

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

A quick fjx of DRF

Per-Server DRF

For each server, allocate its resources to all users, using DRF

However…

Per-server DRF may lead to an arbitrarily inefficient allocation See the paper for details

12

slide-20
SLIDE 20

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Can the attractiveness of DRF extend to a heterogeneous environment?

slide-21
SLIDE 21

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

The ambiguity of dominant resource

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB)

  • 14

Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB)

slide-22
SLIDE 22

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

The ambiguity of dominant resource

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB)

  • 14

Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB)

How to defjne dominant resource?

For server 1, the dominant resource is CPU For server 2, the dominant resource is memory For the entire resource pool, the dominant resource is memory

slide-23
SLIDE 23

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Our answer: DRFH

A generalization of DRF mechanism in Heterogeneous environments

Equalizes every user’s global dominant share

Retains almost all the attractive allocation properties of DRF

Pareto optimality Envy-freeness Truthfulness Weak sharing incentive and more…

Easy to implement

15

slide-24
SLIDE 24

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

DRFH Allocation

slide-25
SLIDE 25

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

A global view of dominant resource

Global dominant resource

The one that requires the maximum allocation share of the entire resource pool

The same example

A cluster: (9 CPUs, 18 GB) Job of user 1: (1 CPU, 4 GB)

17

Memory CPUs Server 1 Server 2 (1 CPU, 14 GB) (8 CPUs, 4 GB)

Memory is the global dominant resource

slide-26
SLIDE 26

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Key intuition

Max-min fairness on the global dominant resources, subject to resource constraints per server

  • 18

max

A

min

i∈U Gi(Ai)

s.t. X

i∈U

Ailr ≤ clr, ∀l ∈ S, r ∈ R .

Total availability of resource r on server l Allocation share of resource r user i receives

  • n server l

Global dominant share

slide-27
SLIDE 27

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

DRFH Properties

slide-28
SLIDE 28

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Fairness property

20

slide-29
SLIDE 29

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Fairness property

DRFH is envy-free

No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation

20

slide-30
SLIDE 30

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Fairness property

DRFH is envy-free

No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation

DRFH is truthful

No user can schedule more computing tasks by misreporting its resource demand Strategic behaviours are commonly seen in real system [Ghodsi11]

20

slide-31
SLIDE 31

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Fairness property

DRFH is envy-free

No user can schedule more computing tasks by taking the other’s resource allocation No one will envy the other’s allocation

DRFH is truthful

No user can schedule more computing tasks by misreporting its resource demand Strategic behaviours are commonly seen in real system [Ghodsi11]

20

slide-32
SLIDE 32

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Resource utilization

DRFH is Pareto optimal No user can schedule more tasks without decreasing the number of tasks scheduled for the others

No resource that could be utilized to serve a user is left idle

21

slide-33
SLIDE 33

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Service isolation

Equal partition

Allocation A is an equal partition if it divides every resource evenly among all n users

  • 22

X

l2S

Ailr = 1/n, 8r 2 R, i 2 U .

Allocation share of resource r user i receives on server l

slide-34
SLIDE 34

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Service isolation

Equal partition

Allocation A is an equal partition if it divides every resource evenly among all n users

  • Weak sharing incentive

There exists an equal allocation A’ under which each user schedules fewer tasks than those under DRFH DRFH is unanimously preferred to an equal allocation by all users

22

X

l2S

Ailr = 1/n, 8r 2 R, i 2 U .

Allocation share of resource r user i receives on server l

slide-35
SLIDE 35

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Comparison

DRFH

Pareto optimality Envy freeness Truthfulness Weak sharing incentive

23

DRF (all-in-one model)

Pareto optimality Envy freeness Truthfulness Strong sharing incentive

slide-36
SLIDE 36

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Comparison

DRFH

Pareto optimality Envy freeness Truthfulness Weak sharing incentive

23

DRF (all-in-one model)

Pareto optimality Envy freeness Truthfulness Strong sharing incentive

DRFH retains almost all the attractive properties of DRF

slide-37
SLIDE 37

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Trace-Driven Simulation

slide-38
SLIDE 38

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Resource utilization

25

200 400 600 800 1000 1200 1400 0.2 0.4 0.6 0.8 1 Time (min) CPU Utilization Best−Fit DRFH First−Fit DRFH Slots 200 400 600 800 1000 1200 1400 0.2 0.4 0.6 0.8 1 Time (min) Memory Utilization Best−Fit DRFH First−Fit DRFH Slots

slide-39
SLIDE 39

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Job completion times

26

20 40 60 80 Job Size (tasks) Completion Time Reduction 1−50 51−100 101−500 501−1000 >1000

−1% 2% 25% 43% 62%

slide-40
SLIDE 40

Wei Wang, Department of Electrical and Computer Engineering, University of Toronto

Conclusions

We have studied a multi-resource fair allocation problem in a heterogeneous cloud computing system We have generalized DRF to DRFH and shown that it possesses a set of highly attractive allocation properties We have designed an effective heuristic algorithm that implements DRFH in a real-world system

  • http://iqua.ece.toronto.edu/~weiwang/

27