Building On-prem GPU Training Infrastructure By Stephen Balaban - PowerPoint PPT Presentation

Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda

Lambda Customers

About Me Started using CNNs for face recognition in 2012. ● First employee at Perceptio. We developed image ● recognition CNNs that ran locally on the iPhone. Acquired by Apple in 2015. Published in SPIE and NeurIPS. ●

Workshop Structure ● Audience survey ● Presentation w/ Q&A ● Q&A + Workshop

5 Stages of GPU Cloud Grief

It all starts with the Shock of an expensive AWS bill.

Stage 1 - Denial “This won’t happen again next month.”

Stage 2 - Anger “The bill doubled again!”

Stage 3 - Bargaining with your account manager.

Stage 4 - Depression “Spot instances and reserved instances aren’t enough, this is hopeless.”

Stage 5 - Acceptance “GPU cloud services are expensive. Managing hardware is scary.”

Hardware: A Quick Rundown 1. GPUs 2. CPUs 3. GPU-GPU Bandwidth & PCIe Topology

GPU Speed Comparisons Source: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/

Performance / $ Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/

What to look for 1. Number of PCIe lanes. (Affects total bandwidth.) 2. NUMA Node Topology. (Affects GPU peering.) Source: https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-titan-v-vs-1080-ti-benchmark/

GPU Peering & PCIe Topology

PCIe Topology 16x 16x 16x 16x 16x 16x

Dual Root PCIe Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 G G G G G G G G P P P P P P P P U U U U U U U U 4 5 6 7 0 1 2 3 Arrow is 16x PCIe Connection Source: Lambda

Single Root PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda

Cascaded PCIe Topology CPU PEX 8796 PEX 8796 G G G G G G G G P P P P P P P P U U U U U U U U 0 1 2 3 4 5 6 7 Arrow is 16x PCIe Connection Source: Lambda

NVLink System Topology CPU-CPU CPU CPU Interconnect PEX PEX PEX PEX 8748 8748 8748 8748 Open Circle is CPU-CPU Comm GPU 0 GPU 1 GPU 4 GPU 5 Green Double Arrow is NVLink GPU 2 GPU 3 GPU 6 GPU 7 Arrow is 16x PCIe Connection Source: Lambda

Real Life Examples

Source: ASUS

Single Root Complex vs Dual Root Complex Single Root Complex Dual Root Complex (4029GP-TRT2) (4028GR-TRT) Source: Supermicro

1080 Ti GPUDirect Peer-to-Peer Bandwidth Benchmark 16x 16x 16x 16x 16x 16x Source: Lambda

No Peering on the new 2080 Ti Topology used in this experiment. (For the 1080 Ti, no NVLink.) Source: Lambda

Lambda Stack = GPU-enabled Frameworks For Ubuntu 16.04 or 18.04. One command: LAMBDA_REPO=$(mktemp) && \ wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \ sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \ sudo apt-get update && sudo apt-get install -y lambda-stack-cuda Also comes as a Docker Container. Source: https://lambdalabs.com/lambda-stack-deep-learning-software

Cost Comparison: On-prem vs. Cloud p3dn.24xlarge Instance Lambda Hyperplane AWS $109,008 once $160,308/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)

Cost Comparison: On-prem vs. Cloud p3.16xlarge Instance Lambda Blade AWS $28,389 once $139,371/year with reserved pricing (Add $15,000 / year if you want to co-locate instead.)

Cost Comparison: On-prem vs. Cloud p3.8xlarge Instance Lambda Quad AWS $12,472 once $69,729/year with reserved pricing

Thank You! Tweet @LambdaAPI @stephenbalaban LAMBDALABS.COM/BLOG

Building On-prem GPU Training Infrastructure By Stephen Balaban - PowerPoint PPT Presentation

Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran

Prese sentation n con onte test Prem: Senior Div-P-$4, B-$3.75, R-$3, W-$2.25 Prem: Junior

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

Effective Packaging Solutions For HPP Prem Singh, President ident & Chief Technolog logy y

An open access electronic PROM/PREM system Jonathan Field DC MSc FRCC(pain) FBCA Background

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge & Learning Week

Presented to: Presented by: World Bank Staff Ranjana Mukherjee PREM Knowledge & Learning

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge & Learning Week

Strongly coupled dense matter and hedgehog black holes S. Prem Kumar (Swansea U.) February 1,

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

Decision aid methodologies in transportation Lecture 6: Miscellaneous Topics Prem Kumar

Decision aid methodologies in transportation Lecture 5: Revenue Management Prem Kumar

Decision aid methodologies in transportation Lecture 7: More Applications Prem Kumar

An Investigation of the Impact of Language Runtime on the Performance and Cost of Serverless

2017 Order of Omega and Standards of Excellence Awards Adam Culley THE BEST OF US You get

Alpha Presentation Faia: Fashion Artificial Intelligence Assistant The Capstone Experience Team

A natural counting of lambda terms Maciej Bendkowski Theoretical Computer Science Jagiellonian

Lambda-N and Sigma-N interactions from 2+1 lattice QCD with almost realistic masses H. Nemura 1 ,

Weighted relational models of typed lambda-calculi Jim Laird , Giulio Manzonetto , Guy

TkInter: Buttons Python Marquette University Buttons Apps have buttons You press on them,

L a mb da Ne two rking / Applic a tio ns in GL RORI AD/ K RE ONe t2 2 1 s t A P A N

Building On-prem GPU Training Infrastructure By Stephen Balaban - PowerPoint PPT Presentation

Building On-prem GPU Training Infrastructure By Stephen Balaban CEO, Lambda Lambda Customers About Me Started using CNNs for face recognition in 2012. First employee at Perceptio. We developed image recognition CNNs that ran

Prese sentation n con onte test Prem: Senior Div-P-$4, B-$3.75, R-$3, W-$2.25 Prem: Junior

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

Effective Packaging Solutions For HPP Prem Singh, President ident &amp; Chief Technolog logy y

An open access electronic PROM/PREM system Jonathan Field DC MSc FRCC(pain) FBCA Background

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge &amp; Learning Week

Presented to: Presented by: World Bank Staff Ranjana Mukherjee PREM Knowledge &amp; Learning

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge &amp; Learning Week

Strongly coupled dense matter and hedgehog black holes S. Prem Kumar (Swansea U.) February 1,

Decision aid methodologies in transportation Lecture 2: Aircraft Scheduling Prem Kumar

Decision aid methodologies in transportation Lecture 6: Miscellaneous Topics Prem Kumar

Decision aid methodologies in transportation Lecture 5: Revenue Management Prem Kumar

Decision aid methodologies in transportation Lecture 7: More Applications Prem Kumar

An Investigation of the Impact of Language Runtime on the Performance and Cost of Serverless

2017 Order of Omega and Standards of Excellence Awards Adam Culley THE BEST OF US You get

Alpha Presentation Faia: Fashion Artificial Intelligence Assistant The Capstone Experience Team

A natural counting of lambda terms Maciej Bendkowski Theoretical Computer Science Jagiellonian

Lambda-N and Sigma-N interactions from 2+1 lattice QCD with almost realistic masses H. Nemura 1 ,

Weighted relational models of typed lambda-calculi Jim Laird , Giulio Manzonetto , Guy

TkInter: Buttons Python Marquette University Buttons Apps have buttons You press on them,

L a mb da Ne two rking / Applic a tio ns in GL RORI AD/ K RE ONe t2 2 1 s t A P A N

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Effective Packaging Solutions For HPP Prem Singh, President ident & Chief Technolog logy y

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge & Learning Week

Presented to: Presented by: World Bank Staff Ranjana Mukherjee PREM Knowledge & Learning

Presented to: Presented by: World Bank Staff Gary Reid PREM Knowledge & Learning Week