1
BREAKING THE BARRIERS TO AI- SCALE IN THE ENTERPRISE
Charlie Boyle Senior Director, DGX Systems
BREAKING THE BARRIERS TO AI- SCALE IN THE ENTERPRISE Charlie Boyle - - PowerPoint PPT Presentation
BREAKING THE BARRIERS TO AI- SCALE IN THE ENTERPRISE Charlie Boyle Senior Director, DGX Systems 1 DEEP LEARNING DATA CENTER Reference Architecture 2 DISTRIBUTED DEEP LEARNING SINGLE GPU DATA PARALLEL DATA AND MODEL PARALLEL MODEL
1
Charlie Boyle Senior Director, DGX Systems
2
3
SINGLE GPU DATA PARALLEL MODEL PARALLEL DATA AND MODEL PARALLEL Data & Model Parallel training yields increasingly faster time-to-solution
C
4x GPU 16x GPU 4x GPU 1x GPU
A D B A B C D A B C D A C D B
4
Today’s business needs to scale-out AI, without scaling-up cost or complexity
effortless operations
time-to-solution
4
5
1 2 3 5 4 6 Two Intel Xeon Platinum CPUs 7 1.5 TB System Memory
5
30 TB NVME SSDs Internal Storage NVIDIA Tesla V100 32GB Two GPU Boards 8 V100 32GB GPUs per board 6 NVSwitches per board 512GB Total HBM2 Memory interconnected by Plane Card Twelve NVSwitches 2.4 TB/sec bi-section bandwidth Eight EDR Infiniband/100 GigE 1600 Gb/sec Total Bi-directional Bandwidth PCIe Switch Complex 8
9 9
Dual 10/25 Gb/sec Ethernet
6 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
DGX-1, SEP’17 DGX-2, Q3‘18
software improvements across the stack including NCCL, cuDNN, etc.
Workload: FairSeq, 55 epochs to solution. PyTorch training performance.
Time to Train (days)
1.5 15
5 10 15 20
DG DGX-2 DGX-1 with h V100
10 Times Faster
days days
7
package design
interconnect
per GPU
7
8
that demands unrestricted model parallelism
broadband, NVSwitch delivers a networking fabric for the future, today
bandwidth, equivalent to a PCIe bus with 1,200 lanes
HD <45s
9
Rapidly ingest the largest datasets into cache
transferring huge datasets
storage technologies
bandwidth (vs. 2 GB/sec for 7TB
10 10
Faster, more resilient, boot and storage management
DL and HPC applications
2.7GHz, 24 cores
11 11
Grow your DL cluster effortlessly, using the connectivity you prefer
Ethernet (ROCE)
Bandwidth with low-latency
Dual 10/25 Gb/sec
12 12
Enable your own private DL Training Cloud for your Enterprise
simultaneously access DGX-2
each user and their experiments
within VMs — either all GPU’s or as few as 1
13
13
NVIDIA GPUs AWS-EC2 | GCP | Azure | DGX NVIDIA CONTAINER RUNTIME KUBERNETES NVIDIA GPU CLOUD
14 14
LIMITLESS DEEP LEARNING FOR EXPLORATION WITHOUT BOUNDARIES
The World’s Most Powerful Deep Learning System for the Most Complex Deep Learning Challenges
For More Information: nvidia.com/dgx-2
15