DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta - PowerPoint PPT Presentation

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager 1

Deep Learning has changed the way we think about developing software 2

NVIDIA DRIVE END-TO-END PLATFORM COLLECT DATA DRIVE TRAIN MODELS SIMULATE Cars Pedestrians Cars Pedestrians Lanes Lanes Path Path Signs Lights Signs Lights

INDUSTRY GRADE DEEP LEARNING What does it take to get DNNs into Production ? DA T A Inference Data Compute Inference How to Build Compute, Storage Data Scale and Management DL Deployment Infrastructure and other Infra to enable Training Infrastructure Progression for Path to Production 4

GENERIC DEEP LEARNING WORKFLOW FOR AUTONOMOUS VEHICLES 5

DL FOR AUTONOMOUS VEHICLES PBs of data, large-scale POST /datasets/{id} labeling, large-scale training, etc. Inference optimized DNN (TensorRT) Datasets Manually selected data Labels Train/test Deep Learning data Labeling Metrics Simulation, verification results 6

POST /datasets/{id} DL FOR AUTONOMOUS VEHICLES Active learning strategies to meet business needs Trained Models Inference optimized DNN Mine highly confused / most (TensorRT) informative data Datasets Intelligently selected data Labels Train/test Deep Learning data Labeling 7

“ Autonomous vehicles need to be driven more than 11 billion miles to be 20% better than humans. With a fleet of 100 vehicles, 24 hours a day, 365 days a year, at 25 miles per hour, this would take 518 years. Rand Corporation, Driving to Safety 8

DL FOR AUTONOMOUS VEHICLES Assumptions regarding scale Data Collection fleet == 100 cars 2000h of data collected per car, per year Assuming 5 2MP cameras per car, radar data, etc. => 1 TB / h / car Grand total of 200 PB collected per year! Only 1/1000 likely to be used for training (curated, labeled data) 12.1 years training a ResNet50-like network on Pascal, 1.5 years on DGX1 w/ Volta Today, with 8 DGX1s, and 1/10th of that training data, can train in 1 week 10

Challenges for building DL workflows for Autonomous Vehicles Scaling Tracking Best Managing Workflows Experiments Practices Datasets Tracking large, continuously Optimal scheduling and Reproducible Research Collaborating on datasets, workflows and experiments evolving datasets automation of AI workflows Performance tracking 11

OVERALL WORKFLOW Data Platform Application Platform Continuous Optimization Ingest petabytes of Build training Inspect recorded recorded data workflows workflows Discover best model Transcode and Validate with re- Generate index raw data simulation metrics Label data and Deploy to TensorRT export for training and run with NVIDIA Guide selection DRIVE of data 12

DATA PLATFORM 13

DL DATA PLATFORM Continuously Validate Repeat Process Label Analyze Collect Curate Export Dashboard Process Data Annotate Metrics Curate Ingestion Export Storage Cluster Data Management and Services 14

DATA – COLLECTION AND INGESTION Collecting data and processing ➢ Continuously ingest data, at roughly 1TB/hour/car ➢ Data Ingestion linearly increases with number of cars. ➢ Diverse data-sets get better DNN ➢ Dedicated systems for Ingestion ➢ Transcoding of raw data to consumable formats ➢ Data compression and caching 15

DATA COMPRESSION A discussion Couple of factors ➢ Data compression - Car and/or Cloud ➢ Data environment – Day/Night, Urban vs Highway ➢ Lossless vs Lossy compression ➢ NVIDIA’s Experience ➢ DW exposes lossless compression today LRAW, ~2x compression. ➢ Lossy compression – Active area of R&D, How does AI work on compressed data? Good area of R&D NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 16

DATA – COLLECTION AND INGESTION Useful data for AI & Applying DNNs 10s of PBs Data 100s of PBs of 20% to 50% of data Labelling Data may not be useful throughput DNNs Raw Data Compressed Data Useful Data Labeled Data Data from test fleets of 10, 30, 50 and 100 cars 17

DATA – CURATION AND INDEXING Selecting the most interesting data for labeling Search from recorded sessions Frame selection 18

DATA – LABELING & EXPORT Ensuring quality of labels Unlabeled frame Dataset Export Labeled frame ➢ Standard guidelines and processes are required to correctly annotate frames ➢ Producing high quality labeled data exported for model training purpose ➢ QA and double labeling is important 19

APPLICATION/COMPUTE PLATFORM 20

DL BUILDING AUTONOMOUS VEHICLES Steps Train the model on Continuous Optimize and real data Build the Model Integration Validate the Model Deploy the Model (hyperparameter tuning) Build a Make sure that the Make the model work Prepare the model for Provide functionality Goal promising model code base remains with real data and serving and validate it using the model bug free optimize it Iteration Hours Hours Days - Weeks Hours - Weeks Milliseconds Time # of 1 10s 10s – 100s 10s Hundreds (test fleet) Machines Millions (live fleet) 2-4 TitanX / 4-8 Tesla P/V100 4-8 Tesla P/V100 4-8 Tesla P/V100 Xavier GPU Tesla P/V 100 21

DL Application Platform Test Validate Repeat Use Run Analyze Build Datasets Training Results Experiments Workflow Model Store Dataset Manager Experiment Service Service Training Cluster (10’s of thousands of GPUs) 22

AV CLUSTER On Premises Infrastructure ➢ Cluster using NVIDIA DGX-1 with Volta Every DGX1 connected ➢ via Infiniband for multi- node training Level1 Hierarchical Storage – ➢ Storage Level0 Storage Local SSD in DGX-1 and Hundreds high bandwidth Storage of TBs 7TB for training data cache High- SSD bandwidth In DGX-1 Storage Multiple Level of Storage ➢ Hierarchies Dedicated connection ➢ between on premises and cloud Infra for dedicated bandwidth.

STORAGE REQUIREMENTS Tiered Planning for Storage On Premises Infrastructure CLOUD Storage Architecture should be • of multiple tiers. • On Premises Level0 Storage : 7TB SSD • per DGX1 Level1 Storage – Hundreds • of high bandwidth TBs Level2 storage, Level3 Level1 Dedicated Storage Level0 Storage Storage connection Storage Private/Public Cloud • Highly Cold Level2 Storage – Highly available • 7TB storage High- replicated SSD for bandwidth Available Storage, 10s of storage In DGX-1 archival Storage PBs • Level 3 Storage – Cold storage for Archival, may be 50’s PBs NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 24

DL FOR AUTONOMOUS VEHICLES Infrastructure 960 TFLOPs per DGX1 (FP16) 7TB SSD per DGX1 High-speed external storage (multi-PB) Infiniband as interconnect NCCL 2.0 Data+model management

CONTINUOUS OPTIMIZATION 26

WORKFLOW AUTOMATION & OPTIMIZATION Workflow Automation Continuous Optimization • Self documenting Workflows • Ease of training models with ▪ Traceability of data new data ▪ Models ▪ Integration with Data Platform ▪ Experiment sets • Testing and validating ▪ Datasets ▪ Rigorous Testing ▪ Versioning ▪ Simulation • Compute • Metrics calculation ▪ Automated Scheduling ▪ Data diversity ▪ Optimal GPU selection ▪ KPIs tracking • Collaboration ▪ Accuracy ▪ Best practices ▪ Performance ▪ Modular flexible extensible APIs 27

BEST MODEL DISCOVERY Hyper Parameters • Training parameters ▪ Learning rate, batch size, optimizer, weight decay, regularization strength • Model architecture ▪ Batch-norm, activation functions, convolution stride, filter size • Data augmentation ▪ Max translation, color augmentations, potentially shearing, flips, crops • Post-processing ▪ Clustering 28

EXAMPLE WORKFLOW From Data to Training to Deployment Get Data Train & Test Adjust Export Test & Validate and Repeat Dataset Exported Fine Tuned exported Trained Model from Model Model At the Edge Labeling Software Continuously Optimize Fine-tune 29

DEPLOYMENT - INFERENCE 30

TENSORRT DEPLOYMENT WORKFLOW Step 1 : Optimize trained model Plan 1 Plan 2 Plan 3 TensorRT Optimizer Trained Neural (platform, batch size, Serialize to disk Optimized Plans Network precision) Step 2 : Deploy optimized plans with runtime Plan 1 Plan 2 Embedded Automotive Data center Plan 3 TensorRT Runtime Engine 31

NVIDIA’S END -TO-END PRODUCT FAMILY TRAINING INFERENCE Fully Integrated DL Supercomputer Data Center Automotive Embedded DGX-1 Desk Side Data Center Tesla P4 Tesla V100 Drive PX2 Jetson TX1 DGX Station Tesla V100/P40 Tesla P100 33

HOW GPU BASED INFRA IS HELPING 34

AI IS YOUR COMPETITIVE ADVANTAGE Significant Return on Investment REDUCED TTM COMPETITIVE REDUCE TIME TO MARKET ADVANTAGE REVENUES (TTM) OVERALL LOWER AVOID FINES AND DATACENTER TCO SETTLEMENTS 35

NEXT STEPS Identify and enable the right scale and capabilities Deep dive on your current and future state use of AI for Self-Driving Understand and discuss your goals + objectives, frame approach and size scale Develop phased roadmap for AI computational scale Leverage NVIDIA Deep Learning Institute to train and develop your team 36

THANK YOU 37

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta - PowerPoint PPT Presentation

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager 1 Deep Learning has changed the way we think about developing software 2 NVIDIA DRIVE

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

A New Approach to Lossy Compression and Applications to Security Eva C. Song Department of

optiMize Plant Wide Automatic Process & Production Diagnostics Hot Cold Foil Levelling

www.mtri.org Characterization of Unpaved Road Conditions through the Use of Remote Sensing Goal

Tap Changer Acoustic/Vibration-Based Condition Monitoring System 1 Background Record Goal: to

17.08.20 OPPORTUNITY DAY Every Team Needs a Great Leader But they also need a Good Team.

Issues And Perspectives In Government Media Relations Presentation On Behalf Of The Nigerian Press

BRC RAM November 2019 Release Presentation 12 Month Rolling: Oct18 - Sep19 BRC RAM

DAY 20 th March 2020 Contents Corporate Overview and Current Projects Financial Statements Y2019

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta - PowerPoint PPT Presentation

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager 1 Deep Learning has changed the way we think about developing software 2 NVIDIA DRIVE

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

A New Approach to Lossy Compression and Applications to Security Eva C. Song Department of

optiMize Plant Wide Automatic Process &amp; Production Diagnostics Hot Cold Foil Levelling

www.mtri.org Characterization of Unpaved Road Conditions through the Use of Remote Sensing Goal

Tap Changer Acoustic/Vibration-Based Condition Monitoring System 1 Background Record Goal: to

17.08.20 OPPORTUNITY DAY Every Team Needs a Great Leader But they also need a Good Team.

Issues And Perspectives In Government Media Relations Presentation On Behalf Of The Nigerian Press

BRC RAM November 2019 Release Presentation 12 Month Rolling: Oct18 - Sep19 BRC RAM

DAY 20 th March 2020 Contents Corporate Overview and Current Projects Financial Statements Y2019

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

optiMize Plant Wide Automatic Process & Production Diagnostics Hot Cold Foil Levelling