Building On-prem GPU Training Infrastructure
By Stephen Balaban CEO, Lambda
Building On-prem GPU Training Infrastructure By Stephen Balaban - - PowerPoint PPT Presentation
Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran
Building On-prem GPU Training Infrastructure
By Stephen Balaban CEO, Lambda
Lambda Customers
About Me
recognition CNNs that ran locally on the iPhone. Acquired by Apple in 2015.
Workshop Structure
5 Stages of GPU Cloud Grief
It all starts with the Shock of an expensive AWS bill.
Stage 1 - Denial “This won’t happen again next month.”
Stage 2 - Anger “The bill doubled again!”
Stage 3 - Bargaining with your account manager.
Stage 4 - Depression “Spot instances and reserved instances aren’t enough, this is hopeless.”
Stage 5 - Acceptance “GPU cloud services are expensive. Managing hardware is scary.”
Hardware: A Quick Rundown
GPU Speed Comparisons
Source: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/Performance / $
Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/What to look for
Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/PCIe Topology
16x 16x 16x 16x 16x 16xDual Root PCIe Topology
Arrow is 16x PCIe ConnectionCPU
G P U 4 G P U 5 G P U 6 G P U 7CPU
PEX 8748 G P U G P U 1 G P U 2 G P U 3 CPU-CPU Interconnect PEX 8748 PEX 8748 PEX 8748 Source: LambdaSingle Root PCIe Topology
CPU PEX 8796 PEX 8796
G P U 4 G P U 5 G P U 6 G P U 7 G P U G P U 1 G P U 2 G P U 3 Arrow is 16x PCIe ConnectionCascaded PCIe Topology
CPU PEX 8796
G P U G P U 1 G P U 2 G P U 3 Arrow is 16x PCIe ConnectionPEX 8796
G P U 4 G P U 5 G P U 6 G P U 7NVLink System Topology
Arrow is 16x PCIe Connection Green Double Arrow is NVLink Open Circle is CPU-CPU CommCPU
GPU 4 GPU 5 GPU 6 GPU 7CPU
PEX 8748 GPU 0 GPU 1 GPU 2 GPU 3 CPU-CPU Interconnect PEX 8748 PEX 8748 PEX 8748Single Root Complex vs Dual Root Complex
Single Root Complex
(4029GP-TRT2)Dual Root Complex
(4028GR-TRT)1080 Ti GPUDirect Peer-to-Peer Bandwidth Benchmark
16x 16x 16x 16x 16x 16x Source: LambdaNo Peering on the new 2080 Ti
Topology used in this experiment. (For the 1080 Ti, no NVLink.)
Lambda Stack = GPU-enabled Frameworks
Source: https://lambdalabs.com/lambda-stack-deep-learning-softwareFor Ubuntu 16.04 or 18.04. One command:
LAMBDA_REPO=$(mktemp) && \ wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \ sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \ sudo apt-get update && sudo apt-get install -y lambda-stack-cudaAlso comes as a Docker Container.
Cost Comparison: On-prem vs. Cloud
p3dn.24xlarge Instance
AWS $160,308/year with reserved pricing Lambda Hyperplane $109,008 once
(Add $15,000 / year if you want to co-locate instead.)Cost Comparison: On-prem vs. Cloud
p3.16xlarge Instance
AWS $139,371/year with reserved pricing Lambda Blade $28,389 once
(Add $15,000 / year if you want to co-locate instead.)Cost Comparison: On-prem vs. Cloud
p3.8xlarge Instance
AWS $69,729/year with reserved pricing Lambda Quad $12,472 once
Thank You!
Tweet @LambdaAPI @stephenbalaban LAMBDALABS.COM/BLOG