l o a d b a l a n c i n g
play

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING - PowerPoint PPT Presentation

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE WHAT IS LOAD BALANCING? [DIAGRAM DESCRIBING LOAD BALANCING] [ALLEGORY DESCRIBING LOAD BALANCING] 6 SLIDE


  1. L O A D B A L A N C I N G I S I M P O S S I B L E

  2. LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2 SLIDE

  3. WHAT IS LOAD BALANCING?

  4. [DIAGRAM DESCRIBING LOAD BALANCING]

  5. [ALLEGORY DESCRIBING LOAD BALANCING]

  6. 6 SLIDE LOAD BALANCING IS IMPOSSIBLE Why Load Balance? Three major reasons. The least of which is balancing load. Abstraction Failure Balancing Load Treat many servers as one Transparent failover Single entry point Recover seamlessly Spread the load efficiently across servers Simplification Simplification

  7. R A N D O M T H E I N G L O R I O U S D E FA U LT A N D B A N E O F M Y E X I S T E N C E

  8. LOAD BALANCING IS IMPOSSIBLE • Simplicity • Few edge cases What’s good about random? • Easy failover • Works identically when distributed 8 SLIDE

  9. LOAD BALANCING IS IMPOSSIBLE • Latency What’s bad about • Especially long-tail latency random? • Useable capacity 9 SLIDE

  10. B A L L S - I N T O - B I N S

  11. If you throw m balls into n bins, what is the maximum load of any one bin?

  12. import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for chosen_bin in nr.randint(0, n, m): bins[chosen_bin] += 1 print bins [129, 100, 134, 113, 117, 136, 148, 123]

  13. import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for weight in nr.uniform(0, 2, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += weight print bins [133.1, 133.9, 144.7, 124.1, 102.9, 125.4, 114.2, 121.3]

  14. How do you model request latency?

  15. What do Erlang and getting kicked by a horse have in common?

  16. POISSON PROCESS

  17. WHY IS THAT A PROBLEM?

  18. 50ms

  19. Even if your application has perfect constant response time ... It doesn’t.

  20. Log-normal Distribution MEAN: 1.0 99.9th: 14.1 99th: 6.0 50th: 0.6 95th: 3.1 75th: 1.2

  21. User-Generated Content Social Ad-serving Photos

  22. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 def normalize(value): return value / lognorm_mean * desired_mean for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [128.7, 116.7, 136.1, 153.1, 98.2, 89.1, 125.4, 130.4]

  23. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

  24. THIS IS WHY PERFECTION IS IMPOSSIBLE

  25. 1 ._. 2 4

  26. WHAT EFFECT DOES IT HAVE?

  27. Random simulation Actual distribution

  28. The probability of a single resource request avoiding the 99th percentile is 99%. The probability of all N resource requests in a page avoiding the 99th percentile is (99% ^ N ). 99% ^ 69 = 49.9%

  29. SO WHAT DO WE DO ABOUT IT?

  30. Random simulation JSQ simulation

  31. Join-shortest-queue

  32. L E T ’ S T H R O W A W R E N C H I N T O T H I S . . . D I S T R I B U T E D L O A D B A L A N C I N G A N D W H Y I T M A K E S E V E R Y T H I N G H A R D E R

  33. DISTRIBUTED RANDOM IS EXACTLY THE SAME

  34. DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE

  35. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]

  36. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): a = nr.randint(0, n) b = nr.randint(0, n) chosen_bin = a if bins[a] < bins[b] else b bins[chosen_bin] += normalize(weight) [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]

  37. [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7] STANDARD DEVIATION: 22.9 [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6] STANDARD DEVIATION: 1.18

  38. Random simulation JSQ simulation Randomized JSQ simulation

  39. A N O T H E R C R A Z Y I D E A

  40. WRAP UP

  41. LOAD BALANCING IS IMPOSSIBLE THANKS BYE tyler@fastly.com @tbmcmullen 58 SLIDE

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend