Load Balancing Guardrails Keeping Your Heavy Traffic on the Road to - - PowerPoint PPT Presentation

load balancing guardrails
SMART_READER_LITE
LIVE PREVIEW

Load Balancing Guardrails Keeping Your Heavy Traffic on the Road to - - PowerPoint PPT Presentation

Load Balancing Guardrails Keeping Your Heavy Traffic on the Road to Low Response Times Isaac Grosof (CMU) Ziv Scully (CMU) Mor Harchol-Balter (CMU) 1 Goal: Optimal Load Balancing Assumptions: Stochastic Arrivals Q1: How to Dispatcher


slide-1
SLIDE 1

Load Balancing Guardrails

Keeping Your Heavy Traffic on the Road to Low Response Times

Isaac Grosof (CMU) Ziv Scully (CMU) Mor Harchol-Balter (CMU)

1

slide-2
SLIDE 2

Goal: Optimal Load Balancing

Objective: Minimize mean response time E[T] Q1: How to dispatch? Q2: How to schedule?

2

Assumptions: Stochastic Arrivals Known Sizes Preempt-Resume SRPT SRPT SRPT Theorem: SRPT is optimal

Dispatcher Servers

slide-3
SLIDE 3

SRPT: Very little prior work

Dispatcher

SRPT SRPT SRPT

Prior Work on Dispatching

3

Dispatcher

FCFS FCFS FCFS FCFS: Tons of prior work

slide-4
SLIDE 4

Join-Shortest-Queue (JSQ): Winston, Weber,

Whitt, Lin, Raghavendra, Foley, McDonald, Bramson, Lu, Prabhakar, Eschenfeldt, Gamarnik, …

Join-Shortest-of-d-Queues (JSQ-d):

Vvedenskaya, Dobrushin, Karpelevich, Mitzenmacher, Bramson, Ying, Srikant, Kang, Muckherjee, Borst, Leeuqaarden, ...

Least-Work-Left (LWL): Lee, Longton, Kingman,

Takahashi, Daly, Tijms, Van Hoorn, Ma, Mark, Breur, Hokstad, Kimura, Gupta, Harchol-Balter, Dai, Zwart, Osogami, Whitt, …

Size-Interval-Task-Assignment (SITA):

Harchol-Balter, Crovella, Murta, Bachmat, Sarfati, Vesilo, Scheller-Wolf, …

Prior Work on Dispatching - FCFS

4

Dispatcher

FCFS FCFS FCFS FCFS: Tons of prior work

slide-5
SLIDE 5

Prior Work on Dispatching - SRPT

5

Random dispatch: Trivial First Policy Iteration (FPI) Heuristic [Hyytiä & Aalto ‘12] Multilayered Round Robin [Down & Wu ‘06] SRPT: Very little prior work

Dispatcher

SRPT SRPT SRPT

slide-6
SLIDE 6

Good for FCFS ⇒ Good for SRPT

6

Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers

Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1

Mean response time E[T] for SRPT servers

200 150 100 50

Load (ρ) Random SITA-E LWL

?

slide-7
SLIDE 7

Good for FCFS ⇏ Good for SRPT

7

Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers

Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1

Mean response time E[T] for SRPT servers

200 150 100 50

Load (ρ) Random SITA-E LWL

slide-8
SLIDE 8

Our Contribution: Guardrails

8

Possibly very bad

  • Disp. Policy P

SRPT SRPT SRPT

Guaranteed heavy traffic

  • ptimal

SRPT

  • Disp. Policy P

+ Guardrails

SRPT SRPT SRPT

slide-9
SLIDE 9

Our Contribution: Guardrails

9

Possibly very bad

  • Disp. Policy P

SRPT SRPT SRPT

Guaranteed heavy traffic

  • ptimal

SRPT

k

  • Disp. Policy P

+ Guardrails

1 SRPT 1 SRPT 1 SRPT

slide-10
SLIDE 10

Good for FCFS ⇏ Good for SRPT

10

Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers

Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1

Mean response time E[T] for SRPT servers

200 150 100 50

Load (ρ) Random SITA-E LWL G-Random G-SITA-E G-LWL

slide-11
SLIDE 11

Dispatching to SRPT Servers

Dispatcher

SRPT SRPT SRPT SRPT

11

slide-12
SLIDE 12

Dispatching to SRPT Servers

SRPT SRPT

12

slide-13
SLIDE 13

Dispatching to SRPT Servers

SRPT SRPT A small job needs me!

13

Leads to bad E[T]

slide-14
SLIDE 14

Problem: Small Job Imbalance

SRPT SRPT

14

Dispatcher

More small jobs left More small jobs right Balanced small jobs

slide-15
SLIDE 15

Problem: Small Job Imbalance

SRPT SRPT

15

Dispatcher A small job needs me!

More small jobs left More small jobs right Balanced small jobs

slide-16
SLIDE 16

Guardrails

SRPT SRPT

16

Dispatcher

More small jobs left More small jobs right Balanced small jobs

slide-17
SLIDE 17

Guardrails

SRPT SRPT

17

Dispatcher

More small jobs left More small jobs right Balanced small jobs

slide-18
SLIDE 18

Guardrails

SRPT SRPT

18

Dispatcher

More small jobs left More small jobs right Balanced small jobs

slide-19
SLIDE 19

Guardrails

SRPT SRPT

19

Dispatcher

More small jobs left More small jobs right Balanced small jobs

slide-20
SLIDE 20

Guardrails

20

SRPT SRPT Dispatcher

To Do:

  • >2 job sizes?
  • >2 servers?
slide-21
SLIDE 21

Guardrails

21

SRPT SRPT Dispatcher >2 job sizes?

Prob. Size

slide-22
SLIDE 22

Guardrails: Bucketing

22

Dispatcher

SRPT SRPT

slide-23
SLIDE 23

Guardrails: Bucketing

23

[1, 10] [10, 100] [100, 1000]

… …

SRPT SRPT

slide-24
SLIDE 24

Guardrails: Bucketing

24

[1, 10] [10, 100] [100, 1000]

… …

SRPT SRPT SRPT SRPT

slide-25
SLIDE 25

Precise Dispatching Requirement

Job of size 𝑦 has rank 𝑠 ↔ 𝑦 ∈ [𝑑𝑠, 𝑑𝑠+1)* 𝑊

𝑗 𝑠 𝑢 = Volume of rank 𝑠 work dispatched to server 𝑗 by time 𝑢.

Guardrail requirement: ∀ ranks 𝑠, ∀ servers 𝑗, 𝑘, ∀ times 𝑢,

25

* 𝑑 is chosen as a function of load 𝜍.

| 𝑊

𝑗 𝑠 𝑢 − 𝑊 𝑘 𝑠 𝑢 | ≤ 𝑑𝑠+1

slide-26
SLIDE 26

The Guardrail Theorem

26

Possibly Very bad

  • Disp. Policy P

SRPT SRPT SRPT SRPT

k

w.r.t. E[T]

Guaranteed heavy traffic optimal

  • Disp. Policy P +

Guardrails

1

SRPT

1

SRPT

1

SRPT

lim

𝜍→1

𝐹[Resp. Time of Disp. Policy P with Guardrails] 𝐹[Resp. Time of Single SRPT Superserver] = 1, ∀𝑄