SLIDE 1 Fast and Accurate Load Balancing for Geo-Distributed Storage Systems
Kirill L. Bogdanov1 Waleed Reda1,2 Gerald Q. Maguire Jr.1 Dejan Kostic1 Marco Canini3
1KTH Royal Institute of Technology 2Université Catholique de Louvain 3KAUST
SLIDE 2
Geo-Distributed Services
2
Datacenter Clients
Service Level Objective (SLO):
Request completion time at the target percentile (e.g., 30 ms at 95th percentile)
SLIDE 3
Geo-Distributed Services
3
Datacenter Clients Web-based services demonstrate temporal and spatial variability in load Problem: it is difficult to meet strict SLOs, while maintaining high resource utilization and low cost
SLIDE 4 Approach 1 - Datacenter Elasticity
4
Arrival rate [1000x req/s]
SLIDE 5 Approach 1 - Datacenter Elasticity
5
Arrival rate [1000x req/s]
SLIDE 6 Approach 1 - Datacenter Elasticity
6
Arrival rate [1000x req/s]
SLIDE 7 Approach 1 - Datacenter Elasticity
7
Lead to
Provisioning delay (minutes) due to time needed to spawn and warm up a VM Hard to predict workload far into the future Load spikes can be short lived Provisioning delays SLO violations Unused capacity
Arrival rate [1000x req/s]
SLIDE 8 Approach 2 - Geo-Distributed Load Balancing
8
Excessive or insufficient redirection Redirection delays Inaccurate response time estimation
Redirection delay SLO violations Excessive redirection
Arrival rate [1000x req/s] Arrival rate [1000x req/s]
How much to redirect?
SLIDE 9 Our Approach: Kurma
9
Tames SLO violation at the target level Reacts to changes in load within seconds Accurately estimates remote rate of SLO violations
Avoids unnecessary scaling out
Arrival rate [1000x req/s] Arrival rate [1000x req/s]
SLIDE 10 Request Completion Time
10
Wide Area Network
Base Propagation: Stable component associated with packet propagation along a network path Delay Variance: Variable component associated with competing traffic and queuing
Server 1 Server 2 Datacenter Ireland Datacenter Frankfurt
Service Time: Variable component associated with load on the server
Kurma solves global optimization model while considering: Base Propagation + Delay Variance + Service Time at all datacenters
SLIDE 11 Understanding Service Time
11
5 Server Cassandra cluster Datacenter Frankfurt
SLIDE 12 Understanding Service Time
12
5 Server Cassandra cluster Datacenter Frankfurt
Challenge: How to accurately estimate remote fraction of SLO violations at runtime under variable network conditions?
SLIDE 13 Understanding Service Time
13
5 Server Cassandra cluster Datacenter Frankfurt
SLIDE 14 Wide Area Network
Understanding Service Time
14
7000 5 Server Cassandra cluster Datacenter Frankfurt 5 Server Cassandra cluster Datacenter Ireland
Insight: the farther away a remote datacenter is, the less loaded it should be to serve remote requests within a given SLO target
SLIDE 15 Understanding WAN Latency
15
Base propagation delay Service time distribution recorded locally at a specific load
Monte Carlo Simulations
SLIDE 16 Understanding WAN Latency
16
Base propagation delay Service time distribution recorded locally at a specific load
Monte Carlo Simulations
SLIDE 17 Understanding WAN Latency
17
Base propagation delay Service time distribution recorded locally at a specific load Gives SLO violation rate given a specific load and WAN conditions
Monte Carlo Simulations
SLIDE 18 Understanding WAN Latency
18
Base propagation delay
Estimation Error
SLIDE 19
Incorporating WAN and Load
19
SLIDE 20
Incorporating WAN and Load
20
SLIDE 21
Incorporating WAN and Load
SLIDE 22
Incorporating WAN and Load
22
SLIDE 23
Optimisation Model
Runtime load in each datacenter {λ1,λ2, λ3}
+
Optimisation Problem ✓ Minimize global SLO violations (KurmaPerf) ✓ Minimize the cost of running a service (KurmaCost)
23
SLIDE 24 Implementation
24
Global View: latencies + loads … …
Each Epoch 2.5 sec → 0.4Hz
Perform run-time WAN latency measurements Aggregate load information (rates of requests) Exchange metrics to obtain global view Solve decentralized performance model Datacenter London Datacenter Frankfurt Datacenter Stockholm
SLIDE 25 Implementation
25 Each Epoch 2.5 sec → 0.4Hz
Perform run-time WAN latency measurements Aggregate load information (rates of requests) Exchange metrics to obtain global view Solve decentralized performance model Datacenter London Datacenter Frankfurt Datacenter Stockholm Enforce computed rates of requests redirection
SLIDE 26 Evaluation Setup
Geo-distributed Cassandra cluster
- 3 Amazon EC2 datacenter (Ireland, Frankfurt, London)
- 5 x r5.large VMs per datacenter
- SLO: 30 ms at the 95th percentile
- Modified YCSB to replay workload traces
(World Cup http://ita.ee.lbl.gov/html/contrib/WorldCup.html)
Experiments:
- Minimizing SLO violations for reads
- Maintaining Target SLO (accuracy)
- Cost Savings for 1 min billing intervals (simulations)
- Reads and writes, scalability, etc. link here.
26
SLIDE 27
Workload Trace
27 Load threshold for 5% SLO violations No elastic scaling
SLIDE 28 28
The numbers shown above the bars indicate the amount of inter-datacentre traffic transferred, whiskers → 75th percentile Kurma’s SLO violations are at 2.4%
Cumulative Normalized SLO Violations
SLIDE 29 29
The numbers shown above the bars indicate the amount of inter-datacentre traffic transferred, whiskers → 75th percentile Kurma’s SLO violations are at 2.4%
Cumulative Normalized SLO Violations
SLIDE 30 Average Provisioning Cost Over 30 Consecutive Days
30
All Shared WAN latency = 0ms Bandwidth cost = 0$ All local
- Reactive threshold based elastic controller
- Minimum billing period of 1 minute
- Results obtained using simulations
Total Cost [US$] Per Day
SLIDE 31 Average Provisioning Cost Over 30 Consecutive Days
31
All Shared WAN latency = 0ms Bandwidth cost = 0$
Keeps SLO violations under 5% (minimize redirections while avoiding scaling out)
KurmaCost KurmaPerf All local
- Reactive threshold based elastic controller
- Minimum billing period of 1 minute
- Results obtained using simulations
Minimize SLO violations (no consideration for traffic usage)
Total Cost [US$] Per Day
SLIDE 32
Taming SLO Violations Under Elastic Threshold
32 No elastic scaling
SLIDE 33 Conclusion
Kurma – fast and accurate load balancer for geo-distributed systems that takes advantage of spatial variability in load Decouples end-to-end response time into components of base propagation latency, network congestion, and service time distribution By operating at the granularity of a few seconds, Kurma reduces SLO violations or lowers the costs of running services by avoiding excessive global service
33
Contact: KIRILLB@kth.se