transparent parallelization of neural network training
play

Transparent parallelization of neural network training Cyprien Noel - PowerPoint PPT Presentation

Transparent parallelization of neural network training Cyprien Noel Flickr / Yahoo - GTC 2015 by overthemoon Outline Neural Nets at Flickr Training Fast Parallel Distributed Q&A Tagging Photos Class Probability


  1. Transparent parallelization of neural network training Cyprien Noel Flickr / Yahoo - GTC 2015 by overthemoon

  2. Outline ▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

  3. Tagging Photos Class Probability Flowers 0.98 Classifiers Outdoors 0.95 Classifiers Classifiers Cat 0.001 Grass 0.6 Any photo on Flickr is classified using computer vision

  4. Auto Tags Feeding Search ▪ ▪ ▪ ▪ ▪ ▪

  5. Tagging the Flickr corpus ▪ Classify millions of new photos per day ▪ Apply new models to billions of photos ▪ Train new models using Caffe

  6. Training new models ▪ Manual experimentation ▪ Hyperparameter search ▪ Limitation is training time → Parallelize Caffe

  7. Goals ▪ “Transparent” ▪ Code Isolation ▪ Existing Models ▪ Globally connected layers ▪ Existing Infrastructure

  8. Outline ▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

  9. GoogLeNet, 2014

  10. Ways to Parallelize ▪ Model ▪ Caffe team enabling this now ▪ Data ▪ Synchronous ▪ Asynchronous

  11. Outline ▪ Neural Nets at Flickr ▪ Training Faster ▪ Parallel ▪ Distributed ▪ Q&A

  12. First Approach: CPU ▪ Hogwild! (2011) ▪ Cores read and write from shared buffer ▪ No synchronization ▪ Data races surprisingly low

  13. MNIST CPU

  14. Hogwild ▪ Plateaus with core counts ▪ Some Potential ▪ On a grid ▪ With model parallelism

  15. But we are at GTC

  16. GPU Cluster ▪ A lot of time spent preparing experiments ▪ Code Deployment ▪ ▪ Data Handling ▪ On the fly datasets for “big data”

  17. Outline ▪ Neural Nets at Flickr ▪ Training Fast ▪ Parallel ▪ Distributed ▪ Q&A

  18. Second Approach: Lots of Boxes

  19. Second Approach: Lots of Boxes ▪ Exchange gradients between nodes ▪ Parameter server setup ▪ Easy: move data fast

  20. GPU memory - PCI - Ethernet

  21. Second Approach: Lots of Boxes ▪ 230MB * 2 * N per batch ▪ TCP/UDP chokes ▪ Machines unreachable ▪ No InfiniBand or RoCE

  22. Second Approach: Lots of Boxes ▪ Modify Caffe: chunk parameters

  23. Packet_mmap Buffer App Kernel App Kernel

  24. MNIST

  25. ImageNet

  26. NVIDIA ▪ Large Machines ▪ 4 or 8 GPUs ▪ Root PCI switches ▪ InfiniBand

  27. Third Approach: CUDA P2P ▪ GPUs on single machine ▪ Data Feeding ▪ Caffe Pipeline ▪ Async Streams

  28. State of Things ▪ Async ~8x but no momentum ▪ Sync ~2x ▪ Combining both, and model parallelism ▪ Working on auto tuning of params (batch, rate) ▪ Different ratios of compute vs. IO

  29. Takeaway ▪ Check Caffe, including Flickr’s contributions ▪ CUDA + Docker = Love ▪ Small SOC servers might be interesting for ML

  30. Thanks! Flickr vision team Flickr backend team Yahoo labs cypof@yahoo-inc.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend