Building On-prem GPU Training Infrastructure By Stephen Balaban - - PowerPoint PPT Presentation

building on prem gpu training infrastructure
SMART_READER_LITE
LIVE PREVIEW

Building On-prem GPU Training Infrastructure By Stephen Balaban - - PowerPoint PPT Presentation

Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran


slide-1
SLIDE 1

Building On-prem GPU Training Infrastructure

By Stephen Balaban CEO, Lambda

slide-2
SLIDE 2
slide-3
SLIDE 3

Lambda Customers

slide-4
SLIDE 4

About Me

  • Started using CNNs for face recognition in 2012.
  • First employee at Perceptio. We developed image

recognition CNNs that ran locally on the iPhone. Acquired by Apple in 2015.

  • Published in SPIE and NeurIPS.
slide-5
SLIDE 5

Workshop Structure

  • Audience survey
  • Presentation w/ Q&A
  • Q&A + Workshop
slide-6
SLIDE 6

5 Stages of GPU Cloud Grief

slide-7
SLIDE 7

It all starts with the Shock of an expensive AWS bill.

slide-8
SLIDE 8

Stage 1 - Denial “This won’t happen again next month.”

slide-9
SLIDE 9

Stage 2 - Anger “The bill doubled again!”

slide-10
SLIDE 10

Stage 3 - Bargaining with your account manager.

slide-11
SLIDE 11

Stage 4 - Depression “Spot instances and reserved instances aren’t enough, this is hopeless.”

slide-12
SLIDE 12

Stage 5 - Acceptance “GPU cloud services are expensive. Managing hardware is scary.”

slide-13
SLIDE 13

Hardware: A Quick Rundown

  • 1. GPUs
  • 2. CPUs
  • 3. GPU-GPU Bandwidth & PCIe Topology
slide-14
SLIDE 14

GPUs

slide-15
SLIDE 15

GPU Speed Comparisons

Source: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/
slide-16
SLIDE 16

Performance / $

Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/
slide-17
SLIDE 17

CPUs

slide-18
SLIDE 18

What to look for

Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/
  • 1. Number of PCIe lanes. (Affects total bandwidth.)
  • 2. NUMA Node Topology. (Affects GPU peering.)
slide-19
SLIDE 19

GPU Peering & PCIe Topology

slide-20
SLIDE 20

PCIe Topology

16x 16x 16x 16x 16x 16x
slide-21
SLIDE 21

Dual Root PCIe Topology

Arrow is 16x PCIe Connection

CPU

G P U 4 G P U 5 G P U 6 G P U 7

CPU

PEX 8748 G P U G P U 1 G P U 2 G P U 3 CPU-CPU Interconnect PEX 8748 PEX 8748 PEX 8748 Source: Lambda
slide-22
SLIDE 22 Source: Lambda

Single Root PCIe Topology

CPU PEX 8796 PEX 8796

G P U 4 G P U 5 G P U 6 G P U 7 G P U G P U 1 G P U 2 G P U 3 Arrow is 16x PCIe Connection
slide-23
SLIDE 23 Source: Lambda

Cascaded PCIe Topology

CPU PEX 8796

G P U G P U 1 G P U 2 G P U 3 Arrow is 16x PCIe Connection

PEX 8796

G P U 4 G P U 5 G P U 6 G P U 7
slide-24
SLIDE 24 Source: Lambda

NVLink System Topology

Arrow is 16x PCIe Connection Green Double Arrow is NVLink Open Circle is CPU-CPU Comm

CPU

GPU 4 GPU 5 GPU 6 GPU 7

CPU

PEX 8748 GPU 0 GPU 1 GPU 2 GPU 3 CPU-CPU Interconnect PEX 8748 PEX 8748 PEX 8748
slide-25
SLIDE 25

Real Life Examples

slide-26
SLIDE 26 Source: ASUS
slide-27
SLIDE 27 Source: Supermicro

Single Root Complex vs Dual Root Complex

Single Root Complex

(4029GP-TRT2)

Dual Root Complex

(4028GR-TRT)
slide-28
SLIDE 28

1080 Ti GPUDirect Peer-to-Peer Bandwidth Benchmark

16x 16x 16x 16x 16x 16x Source: Lambda
slide-29
SLIDE 29 Source: Lambda

No Peering on the new 2080 Ti

Topology used in this experiment. (For the 1080 Ti, no NVLink.)

slide-30
SLIDE 30

Lambda Stack = GPU-enabled Frameworks

Source: https://lambdalabs.com/lambda-stack-deep-learning-software

For Ubuntu 16.04 or 18.04. One command:

LAMBDA_REPO=$(mktemp) && \ wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \ sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \ sudo apt-get update && sudo apt-get install -y lambda-stack-cuda

Also comes as a Docker Container.

slide-31
SLIDE 31

Cost Comparison: On-prem vs. Cloud

p3dn.24xlarge Instance

AWS $160,308/year with reserved pricing Lambda Hyperplane $109,008 once

(Add $15,000 / year if you want to co-locate instead.)
slide-32
SLIDE 32

Cost Comparison: On-prem vs. Cloud

p3.16xlarge Instance

AWS $139,371/year with reserved pricing Lambda Blade $28,389 once

(Add $15,000 / year if you want to co-locate instead.)
slide-33
SLIDE 33

Cost Comparison: On-prem vs. Cloud

p3.8xlarge Instance

AWS $69,729/year with reserved pricing Lambda Quad $12,472 once

slide-34
SLIDE 34

Thank You!

Tweet @LambdaAPI @stephenbalaban LAMBDALABS.COM/BLOG