Containerizing Deep Learning Frameworks with Singularity Rengan Xu, - PowerPoint PPT Presentation

Containerizing Deep Learning Frameworks with Singularity Rengan Xu, Frank Han, Nishanth Dandapanthula HPC & AI Solutions Engineering, Dell EMC

Agenda • Dell EMC HPC & AI Solutions Engineering • Why use containers? • Singularity Containers  Singularity vs Docker  Interpretability between Singularity vs Docker  Singularity workflow • Containerizing DL frameworks  Issues and workarounds  eg. Caffe2 • Performance Results  Horovod + TensorFlow  MXNet  Caffe2 2 of 20

Dell EMC HPC & AI Solutions Engineering Conduct application Heading Heading performance studies and Design, develop and integrate Lorem ipsum Lorem ipsum HPC systems develop best practices dolor sit amet, dolor sit amet, consectetur consectetur adipiscing elit. adipiscing elit. HPC & AI Innovation Lab Act as the focal point for joint Prototype and evaluate R&D activities advanced technologies 3 of 20

Containers and Virtualization Machine: A Recap • Container has no hypervisor • Container has no guest OS source: https://www.docker.com/what-container 4 of 20

Need for Containerization • Why do we need containers? – Simplify application building – Application isolation – Faster application deployment – Validate and reproduce results – Server consolidation/Server efficiency – Can be deployed on bare metal or on virtual machines • Benefits of Containers – Lightweight – Low overhead – Easier application sharing among users – Reproducibility • Example containers – LXC – Docker – Singularity 5 of 20

Singularity Vs Docker Feature Singularity Docker Multiple containers can be run on same hardware   Can be created and destroyed more quickly   Do not need entire OS, only a core run time   Transferable to other machines easily   Image format Single file Layered Image Use with HPC schedulers X  Native Support for MPI  X Support for GPUs  X root owned Daemon process X  6 of 20

Singularity: Workflow Summary source: http://singularity.lbl.gov/docs-flow 7 of 20

Interpretability between Singularity vs Docker • Create Singularity image from Docker Hub $ singularity pull docker://tensorflow/tensorflow • • Create Singularity image from Nvidia GPU Cloud Docker Registry $ export SREGISTRY_NVIDIA_BASE="ngcr.io“ • $ export SREGISTRY_CLIENT=nvidia • $ export SREGISTRY_NVIDIA_USERNAME='$oauthtoken‘ • $ export SREGISTRY_NVIDIA_TOKEN='[NGC_API_KEY]‘ • $ sregistry pull nvidia://tensorflow:17.11 • 8 of 20

Singularity MPI run on the host run in the container • Has built-in support for all MPI implementations (OpenMPI, MPICH, Intel MPI, etc.) • Host MPI version must be newer or equal to the version inside the container • Example: – mpirun –np 4 singularity exec centos_ompi.img /usr/bin/mpi_ring source: https://wikihub.berkeley.edu/download/attachments/129695919/Containers_in_HPC_summary_Singularity.pdf 9 of 20 9

Challenges and Workarounds • Why containerize DL Frameworks – Every DL framework has too many dependences – Each dependent library has special version requirement – All DL frameworks are changing frequently – The friendly supported OS for most DL frameworks is Ubuntu, where as datacenter deployments are RHEL/Centos • Why we moved to singularity – Scaling containerized deep learning frameworks past a single node • Issues faced with Singularity – PCIe device driver mismatch • Workarounds – GPUs › The container should always use the host GPU driver › Create a symbolic links for all GPU driver related files and then bind it to container › Update to latest drivers since they are backward compatible – InfiniBand › The InfiniBand driver is kernel dependent, and the solution is to make the container OS and host OS compatible and the container reuses the InfiniBand driver and libraries on the host 10 of 20

Singularity recipe for Caffe2 11 of 20

Building the container 12 of 20

Build Caffe2 inside the container 13 of 20

Run the container • ${mpirun_options} ${profile_options} \ singularity exec -s /bin/bash \ -B $host_paths -B $PWD:/mnt \ -B /usr/lib64:/ibverb_libs -B /etc/libibverbs.d -B /sys/class/infiniband_verbs \ centos7_caffe2_dev_sandbox /mnt/caffe2_singularity_cmd.sh \ ${WORK_DIR} ${gpu_arch} ${gpus_per_node} $network ${run_id} ${num_nodes} $epochs $profile $debug $mpi ) >& $train_log 14 of 20

Testbed • 8 Dell EMC PowerEdge C4140 nodes. – In process of updating to 32 nodes with NVLINK • Nvidia V100-PCIe GPUs • Intel Xeon Skylake CPU • Mellanox 100Gbps EDR Infiniband • CUDA 9.0, CUDNN 7.0, NCCL 2.0 • Dataset: ILSVRC 2012 15 of 20

Performance Results – MXNet MXNet Resnet50 • In FP32 mode, batch size: 64 per GPU 0.2% • In FP16 mode, batch size: 128 per GPU 18000 16000 • IPoIB, rsync are used for nodes 14000 communication 12000 -0.7% • Speedup of 32 V100 is 29.4x in FP32 and Imaegs/sec 10000 -0.3% 25.8x in FP16 8000 -0.7% 6000 0.5% 4000 -1.1% 0.3% -1.1% -1.9% 0.4% 2000 -1.1% -0.8% 0 1 V100 2 V100 4 V100 8 V100 16 V100 32 V100 FP32 Singularity FP32 bare-metal FP16 Singularity FP16 bare-metal Performance difference between Singularity vs bare-metal 16 of 20

Performance Results – Horovod + TensorFlow • In FP32 mode, batch size: 128 per GPU Horovod+TensorFlow Resnet50 • In FP16 mode, batch size: 256 per GPU 0.3% 16000 • MPI used for multi-node communication 14000 • Speedup of 32 V100 is 22.4x in FP32 and 12000 23.7x in FP16 Images/sec 10000 -0.3% 1.2% 8000 6000 0.2% 0.4% 4000 0.2% 0.0% 0.1% 0.6% 2000 -1.6% 0.3% -1.6% 0 1 V100 2 V100 4 V100 8 V100 16 V100 32 V100 FP32 Singularity FP32 bare-metal FP16 Singularity FP16 bare-metal Performance difference between Singularity vs bare-metal 17 of 20

Performance Results – Caffe2 Caffe2 Resnet50 • In FP32 mode, batch size: 64 per GPU • In FP16 mode, batch size: 128 per GPU 4000 -5.1% • Redis and IPoIB are used for nodes 3500 -0.4% communication 3000 -0.1% • Caffe2 performance unstable on multiple 2.4% Images/sec 2500 4.0% -1.9% nodes -0.1% 2000 1500 -0.3% -0.2% 1000 -0.5% 0.2% 0.0% 500 0 1 V100 2 V100 4 V100 8 V100 16 V100 32 V100 FP32 Singularity FP32 bare-metal FP16 Singularity FP16 bare-metal Performance difference between Singularity vs bare-metal 18 of 20

Conclusions and Future Work • Conclusions – Singularity simplifies the building and deployment of DL in both single-node and multi-node – Easy to use Singularity on GPU server – Straightforward to run MPI on InfiniBand interconnect – No performance loss compared to bare-metal • Future Work – File system impact for DL models – Scale impact for DL model accuracy – Research on neural networks with model parallelism – Case studies with appropriate DL models • Build Optimal Solutions targeted to DL vertical. 19 of 20

www.hpcatdell.com { Rengan.Xu ,Frank.Han1,Nishanth.Dandapanthu}@Dell.com

Containerizing Deep Learning Frameworks with Singularity Rengan Xu, - PowerPoint PPT Presentation

Containerizing Deep Learning Frameworks with Singularity Rengan Xu, Frank Han, Nishanth Dandapanthula HPC & AI Solutions Engineering, Dell EMC Agenda Dell EMC HPC & AI Solutions Engineering Why use containers?

Lessons Learned Containerizing GlusterFS and Ceph with Docker and Kubernetes Huamin Chen

Singularity Part 2 Jeff Chase Today Singularity: abstractions

Singularity formation in incompressible fluids Tarek M. Elgindi (UC-San Diego) In-Jee Jeong

Mesos + Singularity: Mesos + Singularity: PaaS automation for mortals PaaS automation for

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Containerizing Databases at New Relic (What We Learned) Bryant Vinisky and Joshua Galbraith

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Containerizing Byte-Addressable NVM PDSW-DISCS 2016: 1ST JOINT

Microservices and Cloud-native Applications Containerizing Traditional Applications Managing

(The Singularity Theorems of) Lorentzian geometry Melanie Graf University of Vienna 19.10.2016

Singularity vs. the Hard Way Part 1 Jeff Chase Today

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

An An An Anal alysi ysis s of of Con ontain tainer er-based based Pl Plat atforms forms

THEORY & CHALLENGES Joseph R. Crosby Presentation to Multistate Associates, Inc. The Sales

Case Presentation Template For the Analysis Team Outline the facts Describe the

REALISTIC EVALUATION AND THE THEORY OF CHANGE Marielle.jansen@hetccv.nl Consultant R&D

APH / PPHTD / Upriver Sister Ports Mid-West Logistic Alternative True Gateway Terminal in

Rich Lucente Senior Solutions Architect rlucente@redhat.com 11 Jan 2016 Software Disrupts

Rol Haven in Wereldwijde Logistieke Ketens Hoe kunnen we duurzaam nieuwe toegevoegde waarde

S9500 - Deep Learning Framework Container Optimizations Joey Conway, Senior Product Manager of

Sambuz

Useful Links

Newsletter

Mail Us

Containerizing Deep Learning Frameworks with Singularity Rengan Xu, - PowerPoint PPT Presentation

Containerizing Deep Learning Frameworks with Singularity Rengan Xu, Frank Han, Nishanth Dandapanthula HPC & AI Solutions Engineering, Dell EMC Agenda Dell EMC HPC & AI Solutions Engineering Why use containers?

Lessons Learned Containerizing GlusterFS and Ceph with Docker and Kubernetes Huamin Chen

Singularity Part 2 Jeff Chase Today Singularity: abstractions

Singularity formation in incompressible fluids Tarek M. Elgindi (UC-San Diego) In-Jee Jeong

Mesos + Singularity: Mesos + Singularity: PaaS automation for mortals PaaS automation for

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Containerizing Databases at New Relic (What We Learned) Bryant Vinisky and Joshua Galbraith

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Containerizing Byte-Addressable NVM PDSW-DISCS 2016: 1ST JOINT

Microservices and Cloud-native Applications Containerizing Traditional Applications Managing

(The Singularity Theorems of) Lorentzian geometry Melanie Graf University of Vienna 19.10.2016

Singularity vs. the Hard Way Part 1 Jeff Chase Today

Web Frameworks Web Frameworks Banned for homework assignments Now that you're starting

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

An An An Anal alysi ysis s of of Con ontain tainer er-based based Pl Plat atforms forms

THEORY &amp; CHALLENGES Joseph R. Crosby Presentation to Multistate Associates, Inc. The Sales

Case Presentation Template For the Analysis Team Outline the facts Describe the

REALISTIC EVALUATION AND THE THEORY OF CHANGE Marielle.jansen@hetccv.nl Consultant R&amp;D

APH / PPHTD / Upriver Sister Ports Mid-West Logistic Alternative True Gateway Terminal in

Rich Lucente Senior Solutions Architect rlucente@redhat.com 11 Jan 2016 Software Disrupts

Rol Haven in Wereldwijde Logistieke Ketens Hoe kunnen we duurzaam nieuwe toegevoegde waarde

S9500 - Deep Learning Framework Container Optimizations Joey Conway, Senior Product Manager of

Sambuz

Useful Links

Newsletter

Mail Us

THEORY & CHALLENGES Joseph R. Crosby Presentation to Multistate Associates, Inc. The Sales

REALISTIC EVALUATION AND THE THEORY OF CHANGE Marielle.jansen@hetccv.nl Consultant R&D