Distributed Deep Learning Mathew Salvaris What will be covered - PowerPoint PPT Presentation

Distributed Deep Learning Mathew Salvaris

What will be covered • Overview of Distributed Training • What affects distributed training • Network • Model • Data location • Data format

Deep Learning Model (CNN) penultimate layer Cat Dog Mouse RGB Channels Convolution layer Pooling layer Fully connected layer of input image with Kernels

Distributed training mode: Data parallelism Worker 1 Worker 2 Job manager CNN model Subset 1 CNN model CNN model Subset 2 Dataset

Distributed training mode: Model parallelism Worker 1 Worker 2 Job manager CNN model CNN model Subset 1 Subset 1 CNN model Dataset

Data parallelism vs model parallelism Data parallelism Model parallelism • Easier implementation • Better scalability of large models • Less memory on each GPU • Stronger fault tolerance • Higher cluster utilization

Horovod: Ring All Reduce

Effects of Network, Model and Precision

Clusters of 8 nodes using K80 , P40 , P100 and V100 (4 GPUs per node+Infiniband) Two MPI configurations OpenMPI+NCCL and IntelMPI Setup

345 experiments across many different models including ResNet50 , MobileNet V2 etc. Using synthetic data Batch size remains 64 across all models and GPUs Use the benchmarking scripts from TensorFlow Experiments

Distributed training with synthetic data Compute Pool I A I

Single GPU Mathew Salvaris @msalvaris

32 GPUs

32 GPUs Mathew Salvaris @msalvaris

MobileNet Mathew Salvaris @msalvaris

GPU K80 P40 P100 V100 Time it takes to process batch on GPU Batch Execution Time it takes to transfer weights Data Transfer between GPUs

ResNet50 Full Precision vs Mixed Precision [32 V100s] 90 25000 23,436 80 82 20000 70 60 SCALING EFFICIENCY 54 15000 IMAGES/SECOND 50 40 10000 30 6,629 20 5000 10 0 0 Full precision[64] Mixed precision [256] Images/second Scaling efficiency

Effects of Storage

Using ResNet50 across three frameworks [PyTorch, TensorFlow, Keras] Using real and synthetic data. Real data on local, NFS and Blob storage Batch size remains 64 across all configurations Uses V100 GPUs Experiments

Distributed training with NFS Copy Data Compute Pool NFS I Share A I Mounted Fileshare

Distributed training with blob storage Copy Data Compute Pool Mounted I Blob A I Mounted Fileshare

Distributed training with local storage Copy Data Compute Pool I A I Mounted Fileshare

ResNet50 - Relative performance across storage 1 0.8 0.6 0.4 0.2 0 TensorFlow Keras PyTorch Synthetic Local(SSD) NFS Premium Blob Blob

Data Loaders and Preprocessors Keras Data Loader PyTorch Data Loader Simple with no parameters for buffering and Specify number of workers with num_workers parallelizing

Highly configurable Many options : buffer, shuffle, cache TensorFlow and shard Daunting and easy to get wrong https://www.tensorflow.org/guide/performance/datasets

Effects of Data Type

TensorFlow Records • Binary data format created for TensorFlow – Recommended format for TensorFlow • Can aggregate number of examples to smaller number of TFRecords – efficient for transferring and reading in the cloud • Have to export data to format - Has to be tailored to use case

ResNet50 – Data Type Performance [Average] 7,000 6,000 5,000 AVERAGE IMAGES/SECOND 4,000 3,000 2,000 1,000 0 8 16 32 Synthetic Images TFRecords

ResNet50 – Data Format Performance [Maximum] 7,000 6,000 5,000 MAXIMUM IMAGES/SECOND 4,000 3,000 2,000 1,000 0 8 16 32 Synthetic Images TFRecords

Asynchronous distributed training Tradeoff between batch size and other parameters Optimization of TensorFlow pipeline Things not Other data formats such as Parquet (Petastorm) discussed Transform libraries [albumentations] Distributed file systems BeeGFs and other storage GlusterFS, Lustre etc. Models other than CNN

Do try to use enhanced networking wherever possible especially for the latest GPUs Training small models using distributed training is not recommended Summary Do use TFRecords or other columnar or row based data formats Not all data loaders are equal

Thanks & Questions?

Distributed Deep Learning Mathew Salvaris What will be covered - PowerPoint PPT Presentation

Distributed Deep Learning Mathew Salvaris What will be covered Overview of Distributed Training What affects distributed training Network Model Data location Data format Deep Learning Model (CNN) penultimate layer Cat

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Panel 4 Use of Competency Based Approach for recruiting and training (An implementation

Friends of the Arroyos A tributary of the Alameda Creek Alliance Arroyo Mocho * Arroyo Las

Measuring Skills Acquisition: Discussion Jesse Rothstein FESAC December 11, 2015 President Obama,

producer in Trinidad October 2018 Follow Us LSE / TSX: TXP 1 Tou ouchstone Exp xploration

Facebook Q2 2020 Results investor.fb.com Facebook Daily Active Users (DAUs) In Millions 1,785

Informa PLC Analyst Presentation April 16 2020 8AM Stephen A. Carter: Good morning, everybody.

Meeting Program Sunday 2:00 pm - 9:00 pm Arrival and Check-in 6:00 pm Dinner 7:30 pm - 7:40

Article 1 Census Messaging Strategy November 2019 Research Objectives & Timeline Title

Distributed Deep Learning Mathew Salvaris What will be covered - PowerPoint PPT Presentation

Distributed Deep Learning Mathew Salvaris What will be covered Overview of Distributed Training What affects distributed training Network Model Data location Data format Deep Learning Model (CNN) penultimate layer Cat

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

Distributed DeepLearning at Scale Soumith Chintala Facebook AI Research Overview Deep

Distributed Synthetic Data Platform for Deep Learning Applications BITCOIN OR ETHER AMAZON DEEP

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Panel 4 Use of Competency Based Approach for recruiting and training (An implementation

Friends of the Arroyos A tributary of the Alameda Creek Alliance Arroyo Mocho * Arroyo Las

Measuring Skills Acquisition: Discussion Jesse Rothstein FESAC December 11, 2015 President Obama,

producer in Trinidad October 2018 Follow Us LSE / TSX: TXP 1 Tou ouchstone Exp xploration

Facebook Q2 2020 Results investor.fb.com Facebook Daily Active Users (DAUs) In Millions 1,785

Informa PLC Analyst Presentation April 16 2020 8AM Stephen A. Carter: Good morning, everybody.

Meeting Program Sunday 2:00 pm - 9:00 pm Arrival and Check-in 6:00 pm Dinner 7:30 pm - 7:40

Article 1 Census Messaging Strategy November 2019 Research Objectives &amp; Timeline Title

Article 1 Census Messaging Strategy November 2019 Research Objectives & Timeline Title