distributed deeplearning at scale
play

Distributed DeepLearning at Scale Soumith Chintala Facebook AI - PowerPoint PPT Presentation

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep Learning Research at FAIR Deep Learning on GPUs Deep Learning at scale Emerging Trends Deep Learning Research at Facebook AI Research Image


  1. Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research

  2. Overview • Deep Learning Research at FAIR • Deep Learning on GPUs • Deep Learning at scale • Emerging Trends

  3. Deep Learning Research at Facebook AI Research

  4. Image Intelligence: Classification

  5. Image Intelligence Language Translation from Visual Learning

  6. Image Intelligence : Detection

  7. Image Intelligence : Detection

  8. Image Intelligence : Detection

  9. Image Intelligence : Detection 1x1# conv# 56x56# 512x14x14# 512x1x1# VGG# f segm (x):#224x224# 2x2# 512x14x14# pool# f score (x):#1x1 # # x:#3x224x224# 512x7x7# 512x1x1# 1024x1x1#

  10. Image Intelligence : Detection image scores

  11. Image Intelligence : Detection image image scores scores

  12. Image Intelligence : Detection

  13. Image Intelligence https://code.facebook.com/posts/accessibility/

  14. Video Intelligence

  15. Image and Video Generation Predicting the Future

  16. Natural Language Understanding chatbots, personal assistants • Memory networks • Language Translation • Reading, Writing and answering Questions

  17. Deep Learning at Scale

  18. Deep Learning at Scale GPU-powered Convolution Neural Networks

  19. Deep Learning at Scale GPU-powered Convolution Neural Networks

  20. Deep Learning at Scale GPU-powered Convolution Neural Networks Alex Khrizevsky

  21. Deep Learning at Scale GPU-powered Convolution Neural Networks Alex Khrizevsky

  22. Deep Learning at Scale GPU-powered Convolution Neural Networks • Convolutions, GEMM take all the time • Faster Convolutions = faster research

  23. Deep Learning at Scale GPU-powered Convolution Neural Networks

  24. Deep Learning at Scale GPU-powered Convolution Neural Networks Winograd transform based Convolutions

  25. Deep Learning at Scale GPU-powered Convolution Neural Networks • The standard in deep learning: NVIDIA GPUs + CUDA + CuDNN

  26. Deep Learning at Scale GPU-powered Convolution Neural Networks • Exotic new hardware! • Custom chips (Yunji Chen et. al., Nervana Systems)

  27. Deep Learning at Scale Multi-GPU Training • Use multiple GPUs on single machine

  28. Deep Learning at Scale Multi-GPU Training • Data parallel

  29. Deep Learning at Scale Multi-GPU Training • Model parallel

  30. Deep Learning at Scale Multi-GPU Training • Pipeline-parallel

  31. Deep Learning at Scale Multi-GPU Training Bottleneck: interconnects

  32. Deep Learning at Scale Multi-Machine Training • Multi-machine SGD Send gradients

  33. Deep Learning at Scale Multi-Machine Training • Multi-machine SGD Send Weights

  34. Deep Learning at Scale Multi-Machine Training • Elastic Averaging SGD! (Sixin Zhang, Anna Choromanska, Yann LeCun)

  35. Deep Learning at Scale Multi-Machine Training • Elastic Averaging SGD! Train synchronously Occasionally, check with master Dont go too far from everyone else

  36. Deep Learning at Scale Multi-Machine Training • Elastic Averaging SGD! Train synchronously Occasionally, check with neighbors Dont go too far from everyone else

  37. Deep Learning at Scale Multi-Machine Training • Elastic Averaging SGD! • Empirical speedup of SquareRoot(N) • N = number of nodes • No communication overhead with pre-fetching • 128 GPUs (32 clients * 4 GPUs) • Sharded parameters over 64 CPU servers • Tau = 10, prefetch = 5 • zero overhead

  38. Deep Learning at Scale Multi-Machine Training • Elastic Averaging SGD! • Fun fact: Trained AlexNet in 5 epochs of Imagenet data • Good success in training Vision and Text networks

  39. Big Sur Open Compute for Deep Learning • Serviceability • Thermal Efficiency • Performance

  40. Big Sur Hot swappable fan modules Open Compute for Deep Learning Removable GPU baseboard GPU removal using 2 thumb screws Cables to change Swap PCI-e Topologies topologies with incredible ease Removable motherboard tray Rails for in-rack servicing 2.5” drive carriers

  41. Big Sur PCI-e Topologies — Matter!

  42. Big Sur PCI-e Topologies — Matter!

  43. Torch

  44. Emerging Trends

  45. Emerging Trends E ffi cient Collectives + Imperative Programs • Data / Model / Pipeline parallel seems su ffi cient • Torch (nn / autograd / distlearn) • Ca ff e

  46. Emerging Trends Computational Graph Toolkits • Intel CnC, Ca ff e, TensorFlow, MXNet, Theano • Graph placement hints + execution • DSLs to write the computation graphs

  47. Silver Bullet Imperative Language + Graph Compiler • Best of both worlds • Hard problem of automatic graph placement • Limited heuristic-driven success

  48. Presence at GTC 2016 If you want to chat in-person, drop us an email • Big Sur Hardware • Kevin Lee kevinlee@fb.com • Doug Wimer dwimer@fb.com • Soumith Chintala soumith@fb.com • Multi-GPU / Multi-machine Training Nicolas Vasilache ntv@fb.com • Je ff Johnson jhj@fb.com • Soumith Chintala soumith@fb.com • • Computation Graphs, Automatic Placement Je ff Johnson jhj@fb.com • Andrew Tulloch tulloch@fb.com • Yangqing Jia jiayq@fb.com • Soumith Chintala soumith@fb.com •

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend