DEE DEEP LE P LEARNI ARNING NG MOD MODELS ELS Mathew Salvaris - - PowerPoint PPT Presentation

dee deep le p learni arning ng mod models els
SMART_READER_LITE
LIVE PREVIEW

DEE DEEP LE P LEARNI ARNING NG MOD MODELS ELS Mathew Salvaris - - PowerPoint PPT Presentation

DIS DISTRI TRIBUT BUTED ED TRA TRAINI INING NG OF OF DEE DEEP LE P LEARNI ARNING NG MOD MODELS ELS Mathew Salvaris @msalvaris Ilia Karmanov @ikdeepl Miguel Fierro @miguelgfierro Rosetta Stone of


slide-1
SLIDE 1

DIS DISTRI TRIBUT BUTED ED TRA TRAINI INING NG OF OF DEE DEEP LE P LEARNI ARNING NG MOD MODELS ELS

Mathew Salvaris @msalvaris Ilia Karmanov @ikdeepl Miguel Fierro @miguelgfierro

slide-2
SLIDE 2

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

more info: https://github.com/ilkarman/DeepLearningFrameworks

Rosetta Stone of Deep Learning

slide-3
SLIDE 3

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

ImageNet Competition

error (%)

ImageNet top-5 error

15.3% 7.3% 6.7% 3.6% 3.1% 5.1% (human)

AlexNet (2012) VGG (2014) Inception (2015) ResNet (2015) Inception- ResNet (2016) NASNet (2017)

3.8%

AmoebaNet (2017)

3.8% 2.4%

ResNext Instagram (2018)

slide-4
SLIDE 4

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training mode: Data parallelism

Dataset CNN model Subset 1 CNN model Worker 1 Subset 2 CNN model Worker 2 Job manager

slide-5
SLIDE 5

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training mode: Model parallelism

Dataset CNN model Dataset Submodel 1 Worker 1 Submodel 2 Worker 2 Job manager Dataset

slide-6
SLIDE 6

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Data parallelism vs model parallelism

Da Data ta parallelism arallelism ▪ Easier implementation ▪ Stronger fault tolerance ▪ Higher cluster utilization Mod

  • del

el parallelism arallelism ▪ Better scalability of large models ▪ Less memory on each GPU Wh Why no y not bo t both th? ? Da Data ta para aralleli llelism sm fo for r CN CNN lay N layers ers an and model del par aralle allelism lism in in FC FC la laye yers rs

so source: ce: Alex ex Krizhevs zhevsky ky. . 2014. On One weird rd trick ck fo for pa paralleli lelizing zing co convolutio lutional nal neura ural l netwo works.

  • rks. https

ps://a //arxiv.o rxiv.org/a rg/abs bs/14 /1404.5 04.5997

slide-7
SLIDE 7

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Training strategies: parameter averaging

Subset 1 CNN model Worker 1 Subset 2 CNN model Worker 2 Average of weights for each worker

slide-8
SLIDE 8

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Training strategies: distributed gradient based

Subset 1 CNN model Worker 1 Subset 2 CNN model Worker 2 Gradients of each worker

Synchronous Asynchronous

slide-9
SLIDE 9

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Overview of distributed training

Install software and containers Provision clusters

  • f VMs

Schedule jobs Distribute data Share results Handling failures Scale resources

slide-10
SLIDE 10

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Azure Distributed Platforms ▪Batch AI ▪Batch Shipyard ▪DL Workspace

Horovod

slide-11
SLIDE 11

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Batch Shipyard

https://github.com/Azure/batch-shipyard

  • Supports Docker and Singularity:

run your Docker and Singularity containers within the same job, side-by-side or even concurrently

  • Move data easily between locally

accessible storage systems, remote filesystems, Azure Blob or File Storage, and compute nodes

  • Supports local storage, Azure

Blob or File Storage, and NFS.

  • Low priority nodes
slide-12
SLIDE 12

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Batch AI

https://github.com/Azure/BatchAI

  • Supports running on Docker

container as well as the Data Science Virtual Machine

  • Supports local storage, Azure

Blob or File Storage, and NFS.

  • Low priority nodes
slide-13
SLIDE 13

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

DL Workspace

https://github.com/Microsoft/DLWorkspace

  • Runs jobs inside Docker
  • Uses Kubernetes
  • Can be deployed anywhere not

just Azure

  • Supports local storage and NFS
slide-14
SLIDE 14

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

A I

1) Create scripts to run on Batch AI and transfer them to file storage 2) Write the data to storage 3) Create the docker containers for each DL framework and transfer them to a container registry

1 2 3

I

Training with Batch AI

slide-15
SLIDE 15

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

1) Create a Batch AI Pool 2) Each job will pull in the appropriate container, script and load data from chosen storage 3) Once the job is completed all the results will be written to the fileshare

Batch AI Pool

1 2 2 2 3

A I

I

slide-16
SLIDE 16

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Batch AI Interface

CLI

az batchai cluster create

  • -name nc24r
  • -image UbuntuLTS
  • -vm-size Standard_NC24rs_v3
  • -min 8 --max 8
  • -afs-name $FILESHARE_NAME
  • -afs-mount-path extfs
  • -storage-account-name $STORAGE_ACCOUNT_NAME
  • -storage-account-key $storage_account_key
  • -nfs $NFS_NAME
  • -nfs-mount-path nfs

Python SDK

slide-17
SLIDE 17

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with NFS

▪ Batch AI cluster configuration with NFS share

A I

I Batch AI Pool

NFS Share Mounted Fileshare Copy Data

az batchai cluster create

  • -name nc24r
  • -image UbuntuLTS
  • -vm-size Standard_NC24rs_v3
  • -min 8 --max 8
  • -afs-name $FILESHARE_NAME
  • -afs-mount-path extfs
  • -storage-account-name $STORAGE_ACCOUNT_NAME
  • -storage-account-key $storage_account_key
  • -nfs $NFS_NAME
  • -nfs-mount-path nfs
slide-18
SLIDE 18

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with blob storage

▪ Batch AI cluster configuration with mounted blob

A I

I Batch AI Pool

Mounted Blob Mounted Fileshare Copy Data

az batchai cluster create

  • -name nc24r
  • -image UbuntuLTS
  • -vm-size Standard_NC24rs_v3
  • -min 8 --max 8
  • -afs-name $FILESHARE_NAME
  • -afs-mount-path extfs
  • -container-name $CONTAINER_NAME
  • -container-mount-path extcn
  • -storage-account-name $STORAGE_ACCOUNT_NAME
  • -storage-account-key $storage_account_key
slide-19
SLIDE 19

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with local storage

▪ Batch AI cluster configuration with copying the data to the nodes

A I

I Batch AI Pool

Node preparation configuration Copy Data

az batchai cluster create

  • -name nc24r
  • -image UbuntuLTS
  • -vm-size Standard_NC24r
  • -min 8 --max 8
  • -afs-name $FILESHARE_NAME
  • -afs-mount-path extfs
  • -container-name $CONTAINER_NAME
  • -container-mount-path extcn
  • -storage-account-name $STORAGE_ACCOUNT_NAME
  • -storage-account-key $storage_account_key
  • c cluster.json

Mounted Fileshare

slide-20
SLIDE 20

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training Results

images/second

slide-21
SLIDE 21

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training Results

images/second

slide-22
SLIDE 22

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training Results

images/second

slide-23
SLIDE 23

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with Horovod

slide-24
SLIDE 24

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with Horovod

slide-25
SLIDE 25

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with Horovod

slide-26
SLIDE 26

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with PyTorch

slide-27
SLIDE 27

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with Chainer

slide-28
SLIDE 28

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Distributed training with CNTK

1-bit SGD with MPI Blocked Momentum with MPI

slide-29
SLIDE 29

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Demo

slide-30
SLIDE 30

Mathew Salvaris (@msalvaris) – Ilia Karmanov (@ikdeepl) – Miguel Fierro (@miguelgfierro)

Acknowledgements

Hongzhi Li Alex Sutton Alex Yukhanov

Attribution of some images: http://morguefile.com/

slide-31
SLIDE 31

Thanks!

Mathew Salvaris @msalvaris Ilia Karmanov @ikdeepl Miguel Fierro @miguelgfierro