SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy - - PowerPoint PPT Presentation

snc meister admitting more tenants with tail latency slos
SMART_READER_LITE
LIVE PREVIEW

SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy - - PowerPoint PPT Presentation

SNC-Meister: Admitting More Tenants With Tail Latency SLOs Timothy Zhu Daniel S. Berger Mor Harchol-Balter Carnegie Mellon University University of Kaiserslautern Carnegie Mellon University Presented By: Zane Ma & Shuo Feng


slide-1
SLIDE 1

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC-Meister: Admitting More Tenants With Tail Latency SLOs

1

Timothy Zhu Carnegie Mellon University Daniel S. Berger
 University of Kaiserslautern Mor Harchol-Balter Carnegie Mellon University Presented By: Zane Ma & Shuo Feng

slide-2
SLIDE 2

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Cloud Request Latency

2

High performance cloud computing in a single datacenter Ex: MapReduce, Heron, HDFS Cloud networks provide latency service-level objectives (SLOs) Typically guarantee 99% or 99.9% request latency, rather than packet latency

slide-3
SLIDE 3

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Cloud Request Latency

3

High performance cloud computing in a single datacenter Ex: MapReduce, Heron, HDFS Cloud networks provide latency service-level objectives (SLOs) Typically guarantee 99% or 99.9% request latency, rather than packet latency

Goal: Achieving high tenancy while meeting tail latency SLOs

slide-4
SLIDE 4

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Datacenter Network

Latency Causes

4

Tenant VM 1 Tenant VM 2

Switch Server VM

Queue Queue

Assumption: typical behavior, no hardware failure, flash crowds, etc. Short lived bursts caused by network queues and services

slide-5
SLIDE 5

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Modeling Latency

5

Deterministic Network Calculus

Calculate fixed maximum rate/burst constraints from historical traces Consider worst case scenario from adversarial coordination (i.e. 100% latency) Used by Silo (SIGCOMM 2015), QJump (NSDI 2015), PriorityMeister (SoCC 2014)

slide-6
SLIDE 6

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma 6

Model maximum rate/ burstiness as a probabilistic distribution Does not assume all tenants are adversarially correlated - lower target latency percentile (e.g. 99.9%)

Stochastic Network Calculus

Modeling Latency

Deterministic Network Calculus

Calculate fixed maximum rate/burst constraints from historical traces Consider worst case scenario from adversarial coordination (i.e. 100% latency) Used by Silo (SIGCOMM 2015), QJump (NSDI 2015), PriorityMeister (SoCC 2014)

slide-7
SLIDE 7

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Modeling Latency

7

Deterministic Network Calculus Stochastic Network Calculus

slide-8
SLIDE 8

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Modeling Latency

8

Deterministic Network Calculus Stochastic Network Calculus 99.9% latency

slide-9
SLIDE 9

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Modeling Latency

9

Deterministic Network Calculus Stochastic Network Calculus 99.9% latency

slide-10
SLIDE 10

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

10

Tenant VM 1 Tenant VM 2

Switch Server VM

Queue Queue

slide-11
SLIDE 11

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

11

Switch

Queue Queue

Arrival Processes

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3

slide-12
SLIDE 12

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

12

Switch

Queue Queue

Arrival Processes Service Processes

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S1 S2

slide-13
SLIDE 13

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

13

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM

slide-14
SLIDE 14

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

14

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = switch latency + server latency

slide-15
SLIDE 15

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

15

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1, 0.99) + Latency(A3, S2, 0.99)

slide-16
SLIDE 16

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

16

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1, 0.99) + Latency(A3, S2, 0.99) S1 slowed down by A2!

A1

slide-17
SLIDE 17

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

17

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S’1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1, 0.99) + Latency(A3, S2, 0.99) S1 slowed down by A2! —> S’1 = Leftover(S1, A2)

A1

slide-18
SLIDE 18

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

18

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S’1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S1 S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2)

A1

slide-19
SLIDE 19

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

19

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S’1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output(A1, S’1)

A1

slide-20
SLIDE 20

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

20

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S’1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output(A1, S’1) Adding latencies does not preserve SLO %!

A1

slide-21
SLIDE 21

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Example

21

Switch

Queue Queue

Tenant VM 1 Tenant VM 2

Server VM

A1 A2 A3 S’1 S2

Goal: Get 99% latency SLO bound between Tenant VM 1 and Server VM Total latency = Latency(A1, S’1, 0.99) + Latency(A3, S2, 0.99) S’1 = Leftover(S1, A2) A3 = Output(A1, S’1) Adding latencies does not preserve SLO %! Convolution(L1, L2, 0.99)

A1

slide-22
SLIDE 22

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Operators

22

Operator Meaning Latency(A, S, N) N% latency for a given A, S Leftover(S, A) S adjusted/reduced by A Output(A, S) Resultant output distribution of A and S Convolution(L1, L2) Combine latencies L1, L2 Aggregation(A1, A2) Multiplexed A1 and A2

slide-23
SLIDE 23

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Implementation Challenges

23

SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators

slide-24
SLIDE 24

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Implementation Challenges

24

SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators

slide-25
SLIDE 25

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Implementation Challenges

25

SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators

Switching between high and low phases

slide-26
SLIDE 26

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Implementation Challenges

26

SNC order of operations optimizations Tunable dependencies between tenants Modeling burstiness - Markov Modulated Poisson Process Programming language abstraction for applying SNC operators

slide-27
SLIDE 27

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Experimental Setup

27

Silo: DNC, fixed 1.5Kb bursts, trial and error manual bandwidth selection Silo++: Silo with dynamic bandwidth selection QJump: manual priority class assignment QJump++: QJump with automatically assigned priority class PriorityMeister: automatically derived rates from tenant trace Real production 2015 traces from large internet company

slide-28
SLIDE 28

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Results

28

More Tenants High Network Utilization

slide-29
SLIDE 29

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Results

29

Scales to high SLO % #Tenants Scales to Cluster Size

slide-30
SLIDE 30

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Future Work / Discussion

30

Bootstrapping representative historical traces/logs is a chicken-and-egg

  • problem. How can we improve the process?

How can we build fault-tolerance into SNC-Meister? Any practical SLO mechanism should account for as many failure scenarios as possible. The paper makes an assumption about latency within a single datacenter, why do we need this assumption? What if this assumption is not met? When most of the tenants are dependent on one another, why does SNC show higher latency than DNC?

slide-31
SLIDE 31

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

Backup Slides

31

slide-32
SLIDE 32

SNC-Meister: Admitting More Tenants with Tail Latency SLOs ▪︎ Zane Ma

SNC Operators

32