Load Balancing Guardrails
Keeping Your Heavy Traffic on the Road to Low Response Times
Isaac Grosof (CMU) Ziv Scully (CMU) Mor Harchol-Balter (CMU)
1
Load Balancing Guardrails Keeping Your Heavy Traffic on the Road to - - PowerPoint PPT Presentation
Load Balancing Guardrails Keeping Your Heavy Traffic on the Road to Low Response Times Isaac Grosof (CMU) Ziv Scully (CMU) Mor Harchol-Balter (CMU) 1 Goal: Optimal Load Balancing Assumptions: Stochastic Arrivals Q1: How to Dispatcher
Load Balancing Guardrails
Keeping Your Heavy Traffic on the Road to Low Response Times
Isaac Grosof (CMU) Ziv Scully (CMU) Mor Harchol-Balter (CMU)
1
Goal: Optimal Load Balancing
Objective: Minimize mean response time E[T] Q1: How to dispatch? Q2: How to schedule?
2
Assumptions: Stochastic Arrivals Known Sizes Preempt-Resume SRPT SRPT SRPT Theorem: SRPT is optimal
Dispatcher Servers
SRPT: Very little prior work
Dispatcher
SRPT SRPT SRPT
Prior Work on Dispatching
3
Dispatcher
FCFS FCFS FCFS FCFS: Tons of prior work
Join-Shortest-Queue (JSQ): Winston, Weber,
Whitt, Lin, Raghavendra, Foley, McDonald, Bramson, Lu, Prabhakar, Eschenfeldt, Gamarnik, …
Join-Shortest-of-d-Queues (JSQ-d):
Vvedenskaya, Dobrushin, Karpelevich, Mitzenmacher, Bramson, Ying, Srikant, Kang, Muckherjee, Borst, Leeuqaarden, ...
Least-Work-Left (LWL): Lee, Longton, Kingman,
Takahashi, Daly, Tijms, Van Hoorn, Ma, Mark, Breur, Hokstad, Kimura, Gupta, Harchol-Balter, Dai, Zwart, Osogami, Whitt, …
Size-Interval-Task-Assignment (SITA):
Harchol-Balter, Crovella, Murta, Bachmat, Sarfati, Vesilo, Scheller-Wolf, …
Prior Work on Dispatching - FCFS
4
Dispatcher
FCFS FCFS FCFS FCFS: Tons of prior work
Prior Work on Dispatching - SRPT
5
Random dispatch: Trivial First Policy Iteration (FPI) Heuristic [Hyytiä & Aalto ‘12] Multilayered Round Robin [Down & Wu ‘06] SRPT: Very little prior work
Dispatcher
SRPT SRPT SRPT
Good for FCFS ⇒ Good for SRPT
6
Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers
Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1
Mean response time E[T] for SRPT servers
200 150 100 50
Load (ρ) Random SITA-E LWL
?
Good for FCFS ⇏ Good for SRPT
7
Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers
Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1
Mean response time E[T] for SRPT servers
200 150 100 50
Load (ρ) Random SITA-E LWL
Our Contribution: Guardrails
8
→
Possibly very bad
SRPT SRPT SRPT
Guaranteed heavy traffic
SRPT
+ Guardrails
SRPT SRPT SRPT
Our Contribution: Guardrails
9
→
Possibly very bad
SRPT SRPT SRPT
Guaranteed heavy traffic
SRPT
k
+ Guardrails
1 SRPT 1 SRPT 1 SRPT
Good for FCFS ⇏ Good for SRPT
10
Distribution: Bounded Pareto [1, 106], α=1.5. 10 servers
Dispatcher SRPT SRPT SRPT 0.7 0.8 0.9 1
Mean response time E[T] for SRPT servers
200 150 100 50
Load (ρ) Random SITA-E LWL G-Random G-SITA-E G-LWL
Dispatching to SRPT Servers
Dispatcher
SRPT SRPT SRPT SRPT
11
Dispatching to SRPT Servers
SRPT SRPT
12
Dispatching to SRPT Servers
SRPT SRPT A small job needs me!
13
Leads to bad E[T]
Problem: Small Job Imbalance
SRPT SRPT
14
Dispatcher
More small jobs left More small jobs right Balanced small jobs
Problem: Small Job Imbalance
SRPT SRPT
15
Dispatcher A small job needs me!
More small jobs left More small jobs right Balanced small jobs
Guardrails
SRPT SRPT
16
Dispatcher
More small jobs left More small jobs right Balanced small jobs
Guardrails
SRPT SRPT
17
Dispatcher
More small jobs left More small jobs right Balanced small jobs
Guardrails
SRPT SRPT
18
Dispatcher
More small jobs left More small jobs right Balanced small jobs
Guardrails
SRPT SRPT
19
Dispatcher
More small jobs left More small jobs right Balanced small jobs
Guardrails
20
SRPT SRPT Dispatcher
To Do:
Guardrails
21
SRPT SRPT Dispatcher >2 job sizes?
Prob. Size
Guardrails: Bucketing
22
Dispatcher
SRPT SRPT
Guardrails: Bucketing
23
[1, 10] [10, 100] [100, 1000]
… …
SRPT SRPT
Guardrails: Bucketing
24
[1, 10] [10, 100] [100, 1000]
… …
SRPT SRPT SRPT SRPT
Precise Dispatching Requirement
Job of size 𝑦 has rank 𝑠 ↔ 𝑦 ∈ [𝑑𝑠, 𝑑𝑠+1)* 𝑊
𝑗 𝑠 𝑢 = Volume of rank 𝑠 work dispatched to server 𝑗 by time 𝑢.
Guardrail requirement: ∀ ranks 𝑠, ∀ servers 𝑗, 𝑘, ∀ times 𝑢,
25
* 𝑑 is chosen as a function of load 𝜍.
| 𝑊
𝑗 𝑠 𝑢 − 𝑊 𝑘 𝑠 𝑢 | ≤ 𝑑𝑠+1
The Guardrail Theorem
26
Possibly Very bad
SRPT SRPT SRPT SRPT
k
w.r.t. E[T]
Guaranteed heavy traffic optimal
→
Guardrails
1
SRPT
1
SRPT
1
SRPT
lim
𝜍→1
𝐹[Resp. Time of Disp. Policy P with Guardrails] 𝐹[Resp. Time of Single SRPT Superserver] = 1, ∀𝑄