DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta - - PowerPoint PPT Presentation

deep learning infrastructure
SMART_READER_LITE
LIVE PREVIEW

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta - - PowerPoint PPT Presentation

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager 1 Deep Learning has changed the way we think about developing software 2 NVIDIA DRIVE


slide-1
SLIDE 1

1

Pradeep Gupta | Solutions Architecture, Autonomous Driving Poonam Chitale | AI Infra Product Manager

DEEP LEARNING INFRASTRUCTURE FOR AUTONOMOUS VEHICLES

slide-2
SLIDE 2

2

Deep Learning has changed the way we think about developing software

slide-3
SLIDE 3

NVIDIA DRIVE END-TO-END PLATFORM

COLLECT DATA SIMULATE DRIVE

Pedestrians Cars Lanes Path Lights Signs

TRAIN MODELS

Pedestrians Cars Lanes Path Lights Signs
slide-4
SLIDE 4

4

INDUSTRY GRADE DEEP LEARNING

What does it take to get DNNs into Production ?

Data Scale and Management How to Build Compute, Storage and other Infra to enable Training DL Deployment Infrastructure

DA T A

Inference

Data Compute Inference

Infrastructure Progression for Path to Production

slide-5
SLIDE 5

5

GENERIC DEEP LEARNING WORKFLOW FOR AUTONOMOUS VEHICLES

slide-6
SLIDE 6

6

POST /datasets/{id} Datasets Deep Learning Manually selected data Labels Train/test data Labeling Metrics Simulation, verification results Inference optimized DNN (TensorRT)

DL FOR AUTONOMOUS VEHICLES

PBs of data, large-scale labeling, large-scale training, etc.

slide-7
SLIDE 7

7

Datasets Intelligently selected data Train/test data Inference optimized DNN (TensorRT) POST /datasets/{id} Trained Models Labels Mine highly confused / most informative data

DL FOR AUTONOMOUS VEHICLES

Active learning strategies to meet business needs

Deep Learning Labeling

slide-8
SLIDE 8

8

“Autonomous vehicles need to be driven more than 11 billion miles to be 20% better than humans. With a fleet of 100 vehicles, 24 hours a day, 365 days a year, at 25 miles per hour, this would take 518 years.

Rand Corporation, Driving to Safety

slide-9
SLIDE 9

10

DL FOR AUTONOMOUS VEHICLES

Data Collection fleet == 100 cars 2000h of data collected per car, per year Assuming 5 2MP cameras per car, radar data, etc. => 1 TB / h / car Grand total of 200 PB collected per year! Only 1/1000 likely to be used for training (curated, labeled data) 12.1 years training a ResNet50-like network on Pascal, 1.5 years on DGX1 w/ Volta Today, with 8 DGX1s, and 1/10th of that training data, can train in 1 week

Assumptions regarding scale

slide-10
SLIDE 10

11

Challenges for building DL workflows for Autonomous Vehicles

Best Practices

Collaborating on datasets, workflows and experiments

Managing Datasets

Tracking large, continuously evolving datasets

Tracking Experiments

Reproducible Research Performance tracking Optimal scheduling and automation of AI workflows

Scaling Workflows

slide-11
SLIDE 11

12

OVERALL WORKFLOW

Application Platform

Build training workflows Discover best model Validate with re- simulation Deploy to TensorRT and run with NVIDIA DRIVE

Data Platform

Transcode and index raw data Ingest petabytes of recorded data Label data and export for training Guide selection

  • f data

Continuous Optimization

Inspect recorded workflows Generate metrics

slide-12
SLIDE 12

13

DATA PLATFORM

slide-13
SLIDE 13

14

Label Export Process Curate Analyze Dashboard Collect

Process Curate Annotate Export Data Ingestion Metrics

Continuously Validate Repeat

DL DATA PLATFORM

Storage Cluster Data Management and Services

slide-14
SLIDE 14

15

DATA – COLLECTION AND INGESTION

Collecting data and processing

➢ Continuously ingest data, at roughly 1TB/hour/car ➢ Data Ingestion linearly increases with number of cars. ➢ Diverse data-sets get better DNN ➢ Dedicated systems for Ingestion ➢ Transcoding of raw data to consumable formats ➢ Data compression and caching

slide-15
SLIDE 15

16 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DATA COMPRESSION

Couple of factors ➢ Data compression - Car and/or Cloud ➢ Data environment – Day/Night, Urban vs Highway ➢ Lossless vs Lossy compression ➢ NVIDIA’s Experience

➢ DW exposes lossless compression today LRAW, ~2x compression. ➢ Lossy compression – Active area of R&D, How does AI work on compressed data? Good area of R&D

A discussion

slide-16
SLIDE 16

17

DATA – COLLECTION AND INGESTION

Useful data for AI & Applying DNNs

Raw Data Compressed Data Useful Data Labeled Data

100s of PBs of Data 10s of PBs Data 20% to 50% of data may not be useful Labelling throughput

Data from test fleets of 10, 30, 50 and 100 cars

DNNs

slide-17
SLIDE 17

18

DATA – CURATION AND INDEXING

Selecting the most interesting data for labeling

Search from recorded sessions Frame selection

slide-18
SLIDE 18

19

DATA – LABELING & EXPORT

Ensuring quality of labels

➢ Standard guidelines and processes are required to correctly annotate frames ➢ Producing high quality labeled data exported for model training purpose ➢ QA and double labeling is important Unlabeled frame Labeled frame Dataset Export

slide-19
SLIDE 19

20

APPLICATION/COMPUTE PLATFORM

slide-20
SLIDE 20

21

DL BUILDING AUTONOMOUS VEHICLES

Steps

Prepare the model for serving and validate it Hours - Weeks 10s 4-8 Tesla P/V100 Optimize and Validate the Model Provide functionality using the model Milliseconds Hundreds (test fleet) Millions (live fleet) Xavier Deploy the Model Make the model work with real data and

  • ptimize it

Days - Weeks 10s – 100s 4-8 Tesla P/V100 Train the model on real data

(hyperparameter tuning)

Make sure that the code base remains bug free Hours 10s 4-8 Tesla P/V100 Continuous Integration Build a promising model Hours 1 2-4 TitanX / Tesla P/V 100 Build the Model

Goal Iteration Time # of Machines GPU

slide-21
SLIDE 21

22

Model Store Workflow Manager

Run Training Use Datasets Analyze Results Build Experiments

Dataset Service Experiment Service

Training Cluster (10’s of thousands of GPUs)

Test Validate Repeat

DL Application Platform

slide-22
SLIDE 22

AV CLUSTER

On Premises Infrastructure

Level1 Storage Hundreds

  • f TBs

High- bandwidth Storage Level0 Storage 7TB SSD In DGX-1

➢ Cluster using NVIDIA DGX-1 with Volta ➢ Every DGX1 connected via Infiniband for multi- node training ➢ Hierarchical Storage – Local SSD in DGX-1 and high bandwidth Storage for training data cache ➢ Multiple Level of Storage Hierarchies ➢ Dedicated connection between on premises and cloud Infra for dedicated bandwidth.

slide-23
SLIDE 23

24 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

STORAGE REQUIREMENTS

Tiered Planning for Storage

Level1 Storage High- bandwidth Storage

  • Storage Architecture should be
  • f multiple tiers.
  • On Premises
  • Level0 Storage : 7TB SSD

per DGX1

  • Level1 Storage – Hundreds
  • f high bandwidth TBs

storage,

  • Private/Public Cloud
  • Level2 Storage – Highly

Available Storage, 10s of PBs

  • Level 3 Storage – Cold

storage for Archival, may be 50’s PBs

Level0 Storage 7TB SSD In DGX-1 Level2 Storage Highly available replicated storage Level3 Storage Cold storage for archival

On Premises Infrastructure CLOUD

Dedicated connection

slide-24
SLIDE 24

DL FOR AUTONOMOUS VEHICLES

Infrastructure 960 TFLOPs per DGX1 (FP16) 7TB SSD per DGX1 High-speed external storage (multi-PB) Infiniband as interconnect NCCL 2.0 Data+model management

slide-25
SLIDE 25

26

CONTINUOUS OPTIMIZATION

slide-26
SLIDE 26

27

Workflow Automation

WORKFLOW AUTOMATION & OPTIMIZATION

  • Self documenting Workflows

▪ Traceability of data ▪ Models ▪ Experiment sets ▪ Datasets ▪ Versioning

  • Compute

▪ Automated Scheduling ▪ Optimal GPU selection

  • Collaboration

▪ Best practices ▪ Modular flexible extensible APIs

Continuous Optimization

  • Ease of training models with

new data

▪ Integration with Data Platform

  • Testing and validating

▪ Rigorous Testing ▪ Simulation

  • Metrics calculation

▪ Data diversity ▪ KPIs tracking ▪ Accuracy ▪ Performance

slide-27
SLIDE 27

28

BEST MODEL DISCOVERY

Hyper Parameters

  • Training parameters

▪ Learning rate, batch size, optimizer, weight decay, regularization strength

  • Model architecture

▪ Batch-norm, activation functions, convolution stride, filter size

  • Data augmentation

▪ Max translation, color augmentations, potentially shearing, flips, crops

  • Post-processing

▪ Clustering

slide-28
SLIDE 28

29

From Data to Training to Deployment

Dataset exported from Labeling Software Trained Model

Continuously Optimize Fine-tune

Fine Tuned Model Exported Model

At the Edge

Train & Test Adjust Export Get Data Test & Validate and Repeat

EXAMPLE WORKFLOW

slide-29
SLIDE 29

30

DEPLOYMENT - INFERENCE

slide-30
SLIDE 30

31

TENSORRT DEPLOYMENT WORKFLOW

TensorRT Optimizer (platform, batch size, precision) TensorRT Runtime Engine Optimized Plans Trained Neural Network

Step 1: Optimize trained model

Plan 1 Plan 2 Plan 3 Serialize to disk

Step 2: Deploy optimized plans with runtime

Plan 1 Plan 2 Plan 3

Embedded Automotive Data center

slide-31
SLIDE 31
slide-32
SLIDE 32

33

NVIDIA’S END-TO-END PRODUCT FAMILY

TRAINING INFERENCE

Embedded

Jetson TX1

Data Center

Tesla P4 Tesla V100/P40

Automotive

Drive PX2

Tesla P100

Tesla V100 DGX Station

Desk Side Fully Integrated DL Supercomputer

DGX-1

Data Center

slide-33
SLIDE 33

34

HOW GPU BASED INFRA IS HELPING

slide-34
SLIDE 34

35

AI IS YOUR COMPETITIVE ADVANTAGE

Significant Return on Investment REDUCE TIME TO MARKET (TTM) REDUCED TTM COMPETITIVE ADVANTAGE REVENUES OVERALL LOWER DATACENTER TCO AVOID FINES AND SETTLEMENTS

slide-35
SLIDE 35

36

NEXT STEPS

Deep dive on your current and future state use of AI for Self-Driving Understand and discuss your goals + objectives, frame approach and size scale Develop phased roadmap for AI computational scale Leverage NVIDIA Deep Learning Institute to train and develop your team

Identify and enable the right scale and capabilities

slide-36
SLIDE 36

37

THANK YOU