Fast and Accurate Load Balancing for Geo-Distributed Storage Systems - PowerPoint PPT Presentation

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Université Catholique de Louvain 3 KAUST

Geo-Distributed Services Service Level Objective (SLO): Clients Request completion time at the target percentile (e.g., 30 ms at 95 th percentile) Datacenter 2

Geo-Distributed Services Web-based services demonstrate Clients temporal and spatial variability in load Datacenter Problem: it is difficult to meet strict SLOs, while maintaining high resource utilization and low cost 3

Approach 1 - Datacenter Elasticity [1000x req/s] Arrival rate 4

Approach 1 - Datacenter Elasticity Provisioning delays Lead to [1000x req/s] Unused capacity Arrival rate overprovisioning! SLO violations Provisioning delay (minutes) due to time needed to spawn and warm up a VM Hard to predict workload far into the future Load spikes can be short lived 7

Approach 2 - Geo-Distributed Load Balancing Redirection Redirection delay Excessive [1000x req/s] Arrival rate redirection delays Inaccurate SLO violations response time How estimation much to [1000x req/s] Arrival rate redirect? Excessive or insufficient redirection 8

Our Approach: Kurma Reacts to changes in load within [1000x req/s] seconds Arrival rate Avoids unnecessary scaling out Accurately estimates remote rate of SLO violations [1000x req/s] Arrival rate Tames SLO violation at the target level 9

Request Completion Time Datacenter Frankfurt Datacenter Ireland Server 2 Server 1 Wide Area Network Base Propagation: Stable Delay Variance: Variable Service Time: component associated with component associated Variable component packet propagation along a with competing traffic associated with network path and queuing load on the server Kurma solves global optimization model while considering: Base Propagation + Delay Variance + Service Time at all datacenters 10

Understanding Service Time Datacenter Frankfurt 5 Server Cassandra cluster 11

Understanding Service Time Datacenter Frankfurt 5 Server Cassandra cluster Challenge: How to accurately estimate remote fraction of SLO violations at runtime under variable network conditions? 12

Understanding Service Time Datacenter Frankfurt 5 Server Cassandra cluster 13

Understanding Service Time Datacenter Frankfurt Datacenter Ireland 5 Server Cassandra cluster 5 Server Cassandra cluster Wide Area Network 7000 Insight: the farther away a remote datacenter is, the less loaded it should be to serve remote requests within a given SLO target 14

Understanding WAN Latency Base propagation delay Monte Carlo Simulations Service time distribution recorded locally at a specific load 15

Understanding WAN Latency Base propagation delay Monte Carlo Simulations Service time distribution recorded locally at a specific load 16

Understanding WAN Latency Base propagation delay Monte Carlo Simulations Service time distribution Gives SLO violation rate recorded locally at a given a specific load specific load and WAN conditions 17

Understanding WAN Latency Base propagation delay Estimation Error 18

Incorporating WAN and Load 19

Incorporating WAN and Load

Optimisation Model Runtime load in each + datacenter { λ 1 , λ 2 , λ 3 } Optimisation Problem ✓ Minimize global SLO violations (KurmaPerf) ✓ Minimize the cost of running a service (KurmaCost) 23

Implementation Global View: Each Epoch latencies + 2.5 sec → 0.4Hz loads Perform run-time WAN latency measurements Aggregate load information (rates of requests) Datacenter Stockholm … Exchange metrics to obtain global view Solve decentralized … performance model Datacenter London Datacenter Frankfurt 24

Implementation Each Epoch 2.5 sec → 0.4Hz Perform run-time WAN latency measurements Aggregate load information (rates of requests) Datacenter Stockholm Exchange metrics to obtain global view Solve decentralized performance model Datacenter London Enforce computed rates of Datacenter requests redirection Frankfurt 25

Evaluation Setup Geo-distributed Cassandra cluster • 3 Amazon EC2 datacenter (Ireland, Frankfurt, London) • 5 x r5.large VMs per datacenter SLO: 30 ms at the 95 th percentile • • Modified YCSB to replay workload traces (World Cup http://ita.ee.lbl.gov/html/contrib/WorldCup.html) Experiments: • Minimizing SLO violations for reads • Maintaining Target SLO (accuracy) • Cost Savings for 1 min billing intervals (simulations) • Reads and writes, scalability, etc. link here. 26

Workload Trace No elastic scaling Load threshold for 5% SLO violations 27

Cumulative Normalized SLO Violations Kurma’s SLO violations are at 2.4% The numbers shown above the bars indicate the amount of inter-datacentre traffic transferred, whiskers → 75 th percentile 28

Cumulative Normalized SLO Violations Kurma’s SLO violations are at 2.4% The numbers shown above the bars indicate the amount of inter-datacentre traffic transferred, whiskers → 75 th percentile 29

Average Provisioning Cost Over 30 Consecutive Days Total Cost [US$] Per Day All Shared All local WAN latency = 0ms Bandwidth cost = 0$ - Reactive threshold based elastic controller - Minimum billing period of 1 minute - Results obtained using simulations 30

Average Provisioning Cost Over 30 Consecutive Days Total Cost [US$] Per Day All Shared KurmaCost KurmaPerf All local WAN latency = 0ms Bandwidth cost = 0$ Keeps SLO violations under 5% - Reactive threshold based elastic controller Minimize SLO violations - Minimum billing period of 1 minute (minimize redirections while (no consideration for traffic usage) - Results obtained using simulations avoiding scaling out) 31

Taming SLO Violations Under Elastic No elastic Threshold scaling 32

Conclusion Kurma – fast and accurate load balancer for geo-distributed systems that takes advantage of spatial variability in load Decouples end-to-end response time into components of base propagation latency, network congestion, and service time distribution By operating at the granularity of a few seconds, Kurma reduces SLO violations or lowers the costs of running services by avoiding excessive global service overprovisioning 33 Contact: KIRILLB@kth.se

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems - PowerPoint PPT Presentation

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Universit Catholique de Louvain 3 KAUST

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Gone WILD Richard Wang, Dana Butnariu, Jennifer Rexford Key Tradeoffs Load Balancing 1. Fast

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Deterministic Load Balancing and Dictionaries in the Parallel Disk Model Mette Berger, Esben

2nd End-User Group Meeting on 3D Face Recognition Martin Willich Project Manager, Infrastructure

1" DataXFormer:"Leveraging"the"Web"

Energy Dependence of Multiplicity Fluctuations in Heavy Ion Collisions Benjamin Lungwitz, IKF

RESTFUL APIS IN GO Frankfurter Entwicklertag 2018 Ralf Wirdemann Navigate : Space / Arrow

Targeting Financial Stability: Macroprudential or Monetary Policy David Aikman (Bank of England),

Neural Factors of Mindfulness: Using Your Mind To Change Your Brain for the Better Frankfurt,

MiFID II: One Year On The No. 1 Pan-European Equity Trading Venue European market share by

On Contraction Method in function spaces and the partial match problem Henning Sulzbach J. W.

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems - PowerPoint PPT Presentation

Fast and Accurate Load Balancing for Geo-Distributed Storage Systems Kirill L. Bogdanov 1 Waleed Reda 1,2 Gerald Q. Maguire Jr. 1 Dejan Kostic 1 Marco Canini 3 1 KTH Royal Institute of Technology 2 Universit Catholique de Louvain 3 KAUST

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -&gt; 2

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Gone WILD Richard Wang, Dana Butnariu, Jennifer Rexford Key Tradeoffs Load Balancing 1. Fast

Drive-Thru: Drive-Thru: Fast, Accurate Evaluation of Fast, Accurate Evaluation of Storage Power

Vertical Stress Increases Chapter 8 Point Load 1 3/25/2015 Point Load Point Load

TAKING DATA ON FORM TAKING DATA ON FORM- -WOUND WOUND MOTORS MOTORS By : Manuel Manny

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals Assignment of

Deterministic Load Balancing and Dictionaries in the Parallel Disk Model Mette Berger, Esben

2nd End-User Group Meeting on 3D Face Recognition Martin Willich Project Manager, Infrastructure

1&quot; DataXFormer:&quot;Leveraging&quot;the&quot;Web&quot;

Energy Dependence of Multiplicity Fluctuations in Heavy Ion Collisions Benjamin Lungwitz, IKF

RESTFUL APIS IN GO Frankfurter Entwicklertag 2018 Ralf Wirdemann Navigate : Space / Arrow

Targeting Financial Stability: Macroprudential or Monetary Policy David Aikman (Bank of England),

Neural Factors of Mindfulness: Using Your Mind To Change Your Brain for the Better Frankfurt,

MiFID II: One Year On The No. 1 Pan-European Equity Trading Venue European market share by

On Contraction Method in function spaces and the partial match problem Henning Sulzbach J. W.

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

1" DataXFormer:"Leveraging"the"Web"