genuinely distributed byzantine machine learning
play

Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi - PowerPoint PPT Presentation

first.last@epfl.ch Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi Rachid Guerraoui Arsany Guirguis L Nguyn Hoang Sbastien Rouault Swiss Federal Institute of Technology (EPFL) August 6, 2020 The Big Picture


  1. first.last@epfl.ch Genuinely Distributed Byzantine Machine Learning El-Mahdi El-Mhamdi Rachid Guerraoui Arsany Guirguis Lê Nguyên Hoang Sébastien Rouault Swiss Federal Institute of Technology (EPFL) August 6, 2020

  2. The Big Picture Machine learning (ML) tackles critical tasks ... 1

  3. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust 1

  4. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust using Literature: robust when the model training 1

  5. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 1

  6. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago 1

  7. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago 1

  8. The Big Picture Machine learning (ML) tackles critical tasks ... ...so ML should be made robust Literature: robust when training the model 4y ago Genuinely distributed, Byzantine ML 1

  9. Machine learning (ML) Boat Goat ... 2

  10. Machine learning (ML) Boat Goat ~1 to 100 millions ... 2

  11. Machine learning (ML) Krust ZrOm ~1 to 100 millions ... 2

  12. Machine learning (ML) Brust GOrm ~1 to 100 millions ... 2

  13. Machine learning (ML) Bost GOat ~1 to 100 millions ... 2

  14. Machine learning (ML) Boat Goat ~1 to 100 millions ... 2

  15. Stochastic Gradient Descent (SGD) 4.2 0.5 1.0 0.8 Training loop: 1. Estimate gradient 5.7 0.3 2. Turn potentiometers ~1 to 100 following the gradient millions 3. Loop back to step 1. -.- 3

  16. Stochastic Gradient Descent (SGD) 4.2 Training loop: -0.5 1. Estimate gradient -1.0 0.8 2. Turn potentiometers -5.7 following the gradient 0.3 3. Loop back to step 1. 3

  17. Stochastic Gradient Descent (SGD) 4.2 Training loop: -0.5 1. Estimate gradient -1.0 0.8 2. Turn potentiometers -5.7 following the gradient 0.3 3. Loop back to step 1. 3

  18. Distributed SGD parameter server ~1 to 100 millions worker network 4

  19. Distributed SGD 4.2 4.1 -0.5 -0.5 -1.0 -1.0 0.8 0.7 -5.7 -5.7 0.4 0.3 parameter server 4.3 4.3 -0.5 -0.5 -0.9 -1.0 0.7 0.9 -5.7 -5.7 0.3 0.4 ~1 to 100 4.2 4.1 millions -0.5 -0.5 -1.0 -1.0 0.9 0.8 -5.7 -5.7 0.2 0.3 worker network 4

  20. Distributed SGD parameter server 4.2 -0.5 4.1 -1.0 -0.5 4.3 0.8 -1.0 -0.5 -5.7 0.7 -0.9 0.4 -5.7 0.7 0.3 -5.7 0.3 ~1 to 100 millions worker network 4

  21. Distributed SGD parameter server ~1 to 100 millions worker network 4

  22. Distributed, Byzantine SGD parameter server ~1 to 100 millions worker network 5

  23. Distributed, Byzantine SGD 4.2 -537 -0.5 -752 -1.0 349 0.8 412 -5.7 824 0.4 -153 parameter server 4.3 -537 -0.5 -752 -0.9 349 0.7 412 -5.7 824 0.3 -153 ~1 to 100 4.2 4.1 millions -0.5 -0.5 -1.0 -1.0 0.9 0.8 -5.7 -5.7 0.2 0.3 worker network 5

  24. Distributed, Byzantine SGD parameter server 4.2 -0.5 4.1 -1.0 -0.5 -537 0.8 -1.0 -752 -5.7 0.7 349 0.4 -5.7412 0.3 824 -153 ~1 to 100 millions worker network 5

  25. Distributed, Byzantine SGD parameter server ~1 to 100 millions worker network 5

  26. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 6

  27. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 MDA Median 4.2 -0.5 4.1 4.1 -1.0 -0.5 -0.5 -537 0.8 Krum -1.0 -1.0 ≈ -752 -5.7 0.7 0.7 349 0.4 -5.7412 -5.7 Bulyan 0.3 0.3 824 -153 GeoMed 6

  28. Byzantine-resilient SGD 4.2 -0.5 4.1 -537 -1.0 -752 -0.5 -537 0.8 Average 349 -1.0 ≈ -752 -5.7 0.7 412 349 0.4 -5.7412 824 -153 0.3 824 -153 4.2 -0.5 4.1 4.1 -1.0 -0.5 -0.5 -537 0.8 MDA -1.0 -1.0 ≈ -752 -5.7 0.7 0.7 349 0.4 -5.7412 -5.7 0.3 0.3 824 -153 6

  29. Problem single point of failure 7

  30. Problem… solution 7

  31. Problem… solution a n z t i y n B e s C u o s n n s e 7

  32. Problem… solution… nope a n z t i y n B e s C u o s n n s e asynchronous network 8

  33. Key problem: divergence A 1 B 2 3 C D 9

  34. Key problem: divergence A 1 B 2 3 C D 9

  35. Key problem: divergence A 1 B 2 3 C D 9

  36. Key problem: divergence A 1 B 2 3 C D 9

  37. Key problem: divergence A 1 B 2 3 C D 9

  38. Key problem: divergence A 1 B 2 3 C D 9

  39. Key problem: divergence A 1 B 2 3 C D 9

  40. Key problem: divergence A 1 B 2 3 C D 9

  41. Key problem: divergence A 1 B 2 3 C D 9

  42. Key problem: divergence A 1 B 2 3 C D 9

  43. Key problem: divergence A 1 B 2 3 C D 9

  44. Key problem: divergence A 1 B 2 3 C D 9

  45. The goal Can we keep the ~1 to 100 millions ~1 to 100 millions ~1 to 100 millions "close" to each other... ...despite network asynchrony ... ...and Byzantine behaviors? 10

  46. Key approach Can we bring the ~1 to 100 millions ~1 to 100 millions ~1 to 100 millions back closer to each other... ...despite network asynchrony ... ...and Byzantine behaviors? 11

  47. Key approach: +1 round A 1 B 2 3 C D 11

  48. Key approach: toy example 1 2 3 4 = 1-parameter model: & one 12

  49. Key approach: toy example 1 2 3 4 diameter & one 12

  50. Key approach: toy example 1 2 3 4 reduced diameter & one 12

  51. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  52. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  53. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  54. Key approach: toy example 1 1 2 2 3 3 4 4 & one 12

  55. Key approach: last remark 1 1 2 2 3 3 4 4 & one 13

  56. Key approach: last remark ×2 1 1 ×2 2 2 2 ×2 3 3 3 ×2 4 4 4 & one 13

  57. Key approach: last remark ×2 1 1 ×2 2 2 2 ×2 3 3 3 ×2 4 4 4 & one 13

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend