L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - PowerPoint PPT Presentation

L O A D B A L A N C I N G I S I M P O S S I B L E

LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE

WHAT IS LOAD BALANCING?

[DIAGRAM DESCRIBING LOAD BALANCING]

[ALLEGORY DESCRIBING LOAD BALANCING]

6 SLIDE LOAD BALANCING IS IMPOSSIBLE Why Load Balance? Three major reasons. The least of which is balancing load. Abstraction Failure Balancing Load Treat many servers as one Transparent failover Single entry point Recover seamlessly Spread the load efficiently across servers Simplification Simplification

R A N D O M T H E I N G L O R I O U S D E FA U LT A N D B A N E O F M Y E X I S T E N C E

LOAD BALANCING IS IMPOSSIBLE • Simplicity • Few edge cases What’s good about random? • Easy failover • Works identically when distributed 8 SLIDE

LOAD BALANCING IS IMPOSSIBLE • Latency What’s bad about • Especially long-tail latency random? • Useable capacity 9 SLIDE

B A L L S - I N T O - B I N S

If you throw m balls into n bins, what is the maximum load of any one bin?

import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for chosen_bin in nr.randint(0, n, m): bins[chosen_bin] += 1 print bins [129, 100, 134, 113, 117, 136, 148, 123]

import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for weight in nr.uniform(0, 2, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += weight print bins [133.1, 133.9, 144.7, 124.1, 102.9, 125.4, 114.2, 121.3]

How do you model request latency?

What do Erlang and getting kicked by a horse have in common?

POISSON PROCESS

WHY IS THAT A PROBLEM?

Even if your application has perfect constant response time ... It doesn’t.

Log-normal Distribution MEAN: 1.0 99.9th: 14.1 99th: 6.0 50th: 0.6 95th: 3.1 75th: 1.2

User-Generated Content Social Ad-serving Photos

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 def normalize(value): return value / lognorm_mean * desired_mean for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [128.7, 116.7, 136.1, 153.1, 98.2, 89.1, 125.4, 130.4]

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

THIS IS WHY PERFECTION IS IMPOSSIBLE

1 ._. 2 4

WHAT EFFECT DOES IT HAVE?

Random simulation Actual distribution

The probability of a single resource request avoiding the 99th percentile is 99%. The probability of all N resource requests in a page avoiding the 99th percentile is (99% ^ N ). 99% ^ 69 = 49.9%

SO WHAT DO WE DO ABOUT IT?

Random simulation JSQ simulation

Join-shortest-queue

L E T ’ S T H R O W A W R E N C H I N T O T H I S . . . D I S T R I B U T E D L O A D B A L A N C I N G A N D W H Y I T M A K E S E V E R Y T H I N G H A R D E R

DISTRIBUTED RANDOM IS EXACTLY THE SAME

DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): a = nr.randint(0, n) b = nr.randint(0, n) chosen_bin = a if bins[a] < bins[b] else b bins[chosen_bin] += normalize(weight) [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

[100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7] STANDARD DEVIATION: 22.9 [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6] STANDARD DEVIATION: 1.18

Random simulation JSQ simulation Randomized JSQ simulation

A N O T H E R C R A Z Y I D E A

WRAP UP

LOAD BALANCING IS IMPOSSIBLE THANKS BYE tyler@fastly.com @tbmcmullen 58 SLIDE

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - PowerPoint PPT Presentation

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE WHAT IS LOAD BALANCING? [DIAGRAM DESCRIBING LOAD BALANCING] [ALLEGORY DESCRIBING LOAD BALANCING] 6 SLIDE

D ATA center networks use multi-rooted Clos topologies to balances the number of flowcells. It

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop

Cloud-based Global Load Balancing Improve Performance and reliability while reducing IT costs

Load Shift Working Group AUG 22, 2018 10AM 2PM PST CPUC COURTYARD ROOM

Using Octavia deep dive Dean H. Lorenz, IBM Research Haifa Allan Hu, Cloud Networking

Fahime Alizade & Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

Security for smart Electricity GRIDs Project type: Collaborative project small or medium

Optimized in-memory IBOR architecture in a cloud environment Using Apache Ignite Rafique Awan

Oregon PUD Association Annual Meeting www.avangrid.com 1 Who is Avangrid Renewables? AVANGRID

The evolution of load-balancing in a company remarkably like ours, with some sort of web

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

INVITED SPEAKER PRESENTATION GUIDELINES A goal of this Congress is to engage all attendees in

Supplementary Report May 2006 Kawasaki Kisen Kaisha, Ltd. 1. Shipping Markets 2. Financial Data

Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 Gianluca Iannaccone 2 Josep

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - PowerPoint PPT Presentation

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE WHAT IS LOAD BALANCING? [DIAGRAM DESCRIBING LOAD BALANCING] [ALLEGORY DESCRIBING LOAD BALANCING] 6 SLIDE

D ATA center networks use multi-rooted Clos topologies to balances the number of flowcells. It

Communication and Topology-aware Load Balancing in Charm++ with TreeMatch Joint lab 10th workshop

Cloud-based Global Load Balancing Improve Performance and reliability while reducing IT costs

Load Shift Working Group AUG 22, 2018 10AM 2PM PST CPUC COURTYARD ROOM

Using Octavia deep dive Dean H. Lorenz, IBM Research Haifa Allan Hu, Cloud Networking

Fahime Alizade &amp; Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

Security for smart Electricity GRIDs Project type: Collaborative project small or medium

Optimized in-memory IBOR architecture in a cloud environment Using Apache Ignite Rafique Awan

Oregon PUD Association Annual Meeting www.avangrid.com 1 Who is Avangrid Renewables? AVANGRID

The evolution of load-balancing in a company remarkably like ours, with some sort of web

How to Re-Architect without Breaking Stuff (too much) Owen G Garrett Ma March 2018

Load Balancing in Cellular Networks with User-in-the-loop: A Spatial Traffic Shaping Approach

Benefits Local catalog Seamless discovery and delivery: find it, click it, get it Consortial

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software &amp; IBM TSS

STI-BT: A Scalable Transactional Index Nuno Diegues and Paolo Romano 34th International Conference

1 The SP Suite has the capability to design an entire concrete structure from foundation to roof.

Transmission Plan Development Draft 2012/2013 ISO Transmission Plan Stakeholder Meeting Neil

AUTOMATIC CONTINGENCY SELECTION Ejebe/Wollenberg EE 8725 Presentation November 3, 2015 Tahnee

2018 Results February 1 st , 2019 2018 Results February 1 st 2019 / 2 Disclaimer This document

INVITED SPEAKER PRESENTATION GUIDELINES A goal of this Congress is to engage all attendees in

Supplementary Report May 2006 Kawasaki Kisen Kaisha, Ltd. 1. Shipping Markets 2. Financial Data

Load Shedding in Network Monitoring Applications Pere Barlet-Ros 1 Gianluca Iannaccone 2 Josep

Fahime Alizade & Rawi Ramdhan } Introduction Why scan the Internet? How to detect

Software Asset Management (SAM) Sarah Lawrence Software Asset Auditor, Software & IBM TSS