L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - - PowerPoint PPT Presentation

l o a d b a l a n c i n g
SMART_READER_LITE
LIVE PREVIEW

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - - PowerPoint PPT Presentation

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE WHAT IS LOAD BALANCING? [DIAGRAM DESCRIBING LOAD BALANCING] [ALLEGORY DESCRIBING LOAD BALANCING] 6 SLIDE


slide-1
SLIDE 1

L O A D B A L A N C I N G

I S

I M P O S S I B L E

slide-2
SLIDE 2

SLIDE LOAD BALANCING IS IMPOSSIBLE

Tyler McMullen

tyler@fastly.com @tbmcmullen

2

slide-3
SLIDE 3

WHAT IS LOAD BALANCING?

slide-4
SLIDE 4

[DIAGRAM DESCRIBING LOAD BALANCING]

slide-5
SLIDE 5

[ALLEGORY DESCRIBING LOAD BALANCING]

slide-6
SLIDE 6

SLIDE LOAD BALANCING IS IMPOSSIBLE 6

Abstraction Balancing Load Failure

Treat many servers as one Single entry point Simplification Transparent failover Recover seamlessly Simplification Spread the load efficiently across servers

Why Load Balance?

Three major reasons. The least of which is balancing load.

slide-7
SLIDE 7

R A N D O M

T H E I N G L O R I O U S D E FA U LT

A N D B A N E O F M Y E X I S T E N C E

slide-8
SLIDE 8

SLIDE LOAD BALANCING IS IMPOSSIBLE

What’s good about random?

8

  • Simplicity
  • Few edge cases
  • Easy failover
  • Works identically when distributed
slide-9
SLIDE 9

SLIDE LOAD BALANCING IS IMPOSSIBLE

What’s bad about random?

9

  • Latency
  • Especially long-tail latency
  • Useable capacity
slide-10
SLIDE 10

B A L L S - I N T O - B I N S

slide-11
SLIDE 11

If you throw m balls into n bins, what is the maximum load

  • f any one bin?
slide-12
SLIDE 12
slide-13
SLIDE 13

import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for chosen_bin in nr.randint(0, n, m): bins[chosen_bin] += 1 print bins [129, 100, 134, 113, 117, 136, 148, 123]

slide-14
SLIDE 14

import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for weight in nr.uniform(0, 2, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += weight print bins [133.1, 133.9, 144.7, 124.1, 102.9, 125.4, 114.2, 121.3]

slide-15
SLIDE 15

How do you model request latency?

slide-16
SLIDE 16

What do

Erlang

and

getting kicked by a horse

have in common?

slide-17
SLIDE 17

POISSON PROCESS

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

WHY IS THAT A PROBLEM?

slide-23
SLIDE 23

50ms

slide-24
SLIDE 24
slide-25
SLIDE 25

Even if your application has perfect constant response time ... It doesn’t.

slide-26
SLIDE 26

Log-normal Distribution

50th: 0.6 75th: 1.2 95th: 3.1 99th: 6.0 99.9th: 14.1

MEAN: 1.0

slide-27
SLIDE 27

User-Generated Content Social Ad-serving Photos

slide-28
SLIDE 28

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 def normalize(value): return value / lognorm_mean * desired_mean for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [128.7, 116.7, 136.1, 153.1, 98.2, 89.1, 125.4, 130.4]

slide-29
SLIDE 29

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

slide-30
SLIDE 30

THIS IS WHY PERFECTION IS IMPOSSIBLE

slide-31
SLIDE 31

._.

1 2 4

slide-32
SLIDE 32

WHAT EFFECT DOES IT HAVE?

slide-33
SLIDE 33

Random simulation Actual distribution

slide-34
SLIDE 34

The probability of a single resource request avoiding the 99th percentile is 99%. The probability of all N resource requests in a page avoiding the 99th percentile is (99% ^ N).

99% ^ 69 = 49.9%

slide-35
SLIDE 35
slide-36
SLIDE 36

SO WHAT DO WE DO ABOUT IT?

slide-37
SLIDE 37

Random simulation JSQ simulation

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40

Join-shortest-queue

slide-41
SLIDE 41

L E T ’ S T H R O W A W R E N C H I N T O T H I S . . .

D I S T R I B U T E D L O A D B A L A N C I N G

A N D W H Y I T M A K E S E V E R Y T H I N G H A R D E R

slide-42
SLIDE 42

DISTRIBUTED RANDOM IS EXACTLY THE SAME

slide-43
SLIDE 43

DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

slide-48
SLIDE 48

mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): a = nr.randint(0, n) b = nr.randint(0, n) chosen_bin = a if bins[a] < bins[b] else b bins[chosen_bin] += normalize(weight) [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

slide-49
SLIDE 49

[100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7] [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

STANDARD DEVIATION: 1.18 STANDARD DEVIATION: 22.9

slide-50
SLIDE 50

Random simulation JSQ simulation Randomized JSQ simulation

slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

A N O T H E R C R A Z Y I D E A

slide-56
SLIDE 56
slide-57
SLIDE 57

WRAP UP

slide-58
SLIDE 58

SLIDE LOAD BALANCING IS IMPOSSIBLE

THANKS BYE

tyler@fastly.com @tbmcmullen

58