Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm - PowerPoint PPT Presentation

Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm Certified Instructor, NVIDIA Deep Learning Institute 1

DEEP LEARNING INSTITUTE DLI Mission Helping people solve challenging problems using AI and deep learning. • Developers, data scientists and engineers • Self-driving cars, healthcare and robotics • Training, optimizing, and deploying deep neural networks 2

• Caffe • NVIDIA’S DIGITS • Deep Learning Approach • NVIDIA’S TensorRT TOPICS • Lab • Lab Details • Launching the Lab Environment • Review / Next Steps 3 3

CAFFE 4

Frameworks Many Deep Learning Tools … 5 5

WHAT IS CAFFE? An open framework for deep learning developed by the Berkeley Vision and Learning Center (BVLC) • Pure C++/CUDA architecture caffe.berkeleyvision.org • Command line, Python, MATLAB interfaces http://github.com/BVLC/caffe Fast, well-tested code • Pre-processing and deployment tools, reference models and examples • • Image data management • Seamless GPU acceleration Large community of contributors to the open-source project • 6

CAFFE FEATURES Deep Learning model definition Protobuf model format name : “conv1” type: “Convolution” • Strongly typed format bottom: “data” • Human readable top : “conv1” convolution_param { • Auto-generates and checks Caffe num_output: 20 code kernel_size: 5 • Developed by Google stride: 1 weight_filler { • Used to define network architecture and training type: “xavier” parameters } • No coding required! } 7 7

NVIDIA’S DIGITS 8

NVIDIA’S DIGITS Interactive Deep Learning GPU Training System Configure DNN Monitor Progress Visualization Process Data 9 9

NVIDIA’S DIGITS Accuracy obtained from validation dataset Loss function (Validation) Loss function (Training) 10 10

DEEP LEARNING APPROACH 11

Deep Learning Approach Train: Errors Dog Dog DNN Cat Raccoon Cat Honey badger Deploy: DNN Dog 12

IMAGES Deep Learning Approach Conv Convolutional Neural Network Pool Conv Pool Pool Conv Pool Pool Fully connected 1x1 Conv Fully connected PREDICTIONS CAR CLASS TRUCK DIGGER BACKGROUND 13

Deep Learning Approach Neural network training and inference 14 14

NVIDIA’S TENSORRT 15

TensorRT • Inference engine for production deployment of deep learning applications Allows developers to focus on developing AI powered applications • • TensorRT ensures optimal inference performance 16 16

TensorRT Optimizer • Fuse network layers • Eliminate concatenation layers OPTIMIZED • Kernel specialization INFERENCE • Auto-tuning for target platform RUNTIME • Select optimal tensor layout • Batch size tuning TRAINED NEURAL NETWORK developer.nvidia.com/tensorrt 17

TensorRT Optimizer Vertical Layer Fusion CBR = Convolution, Bias and ReLU developer.nvidia.com/tensorrt 18

TensorRT Optimizer Horizontal Layer Fusion (Layer Aggregation) CBR = Convolution, Bias and ReLU developer.nvidia.com/tensorrt 19

TensorRT Optimizer Supported layers • Convolution: 2D Activation: ReLU, tanh and sigmoid • Pooling: max and average • ElementWise: sum, product or max of two tensors • • LRN: cross-channel only Fully-connected: with or without bias • SoftMax: cross-channel only • Deconvolution • 20 20

TensorRT Optimizer • Scalability: • Output/Input Layers can connect with other deep learning framework directly • Caffe, Theano, Torch, TensorFlow • Reduced Latency: • INT8 or FP16 • INT8 delivers 3X more throughput compared to FP32 • INT8 uses 61% less memory compared to FP32 21 21

TensorRT Runtime Two Phases • Build: optimizations on the network configuration and generates an optimized plan for computing the forward pass Deploy: Forward and output the inference result • Inputs I/O Layers Deploy Plan File Build Deploy Output Model Max File Batch Batchsize size 22 22

TensorRT Runtime • No need to install and run a deep learning framework on the deployment hardware • Plan = runtime (serialized) object • Plan will be smaller than the combination of model and weights • Ready for immediate use • Alternatively, state can be serialized and saved to disk or to an object store for distribution • Three files needed to deploy a classification neural network: • Network architecture file (deploy.prototxt) • Trained weights (net.caffemodel) • Label file to provide a name for each output class 23 23

LAB DETAILS 24

Lab Architectures / Datasets • GoogleNet • CNN architecture trained for image classification using the ilsvrc12 Imagenet dataset • 1000 class labels to an entire image based on the dominant object present • pedestrian_detectNet • CNN architecture able to assign a global classification to an image and detect multiple objects within the image and draw bounding boxes around them • Pre-trained model provided has been trained for the task of pedestrian detection using a large dataset of pedestrians in a variety of indoor and outdoor scenes 25 25

Lab Tasks • GPU Inference Engine (GIE) = TensorRT • Part 1: Inference using DIGITS • Will use existing model in DIGITS to perform inference on a single image • Part 2: Inference using Pycaffe • Programming production-like deployable inference code • Part 3: NVIDIA TensorRT • Will run TensorRT Optimizer to build a plan • Deploy the plan using TensorRT Runtime 26 26

LAUNCHING THE LAB ENVIRONMENT 27

NAVIGATING TO QWIKLABS 1. Navigate to: https://nvlabs.qwiklab.com Login or create a new 2. account Please use the email address used to register for session 28

ACCESSING LAB ENVIRONMENT 3. Select the event specific In-Session Class in the upper left Click the “Deep 4. Learning Network Deployment” Class from the list 29

LAUNCHING THE LAB ENVIRONMENT Click on the Select 5. button to launch the lab environment After a short • wait, lab Connection information will be shown Please ask Lab • Assistants for help! 30

LAUNCHING THE LAB ENVIRONMENT 6. Click on the Start Lab button 31

LAUNCHING THE LAB ENVIRONMENT You should see that the lab environment is “launching” towards the upper-right corner 32

CONNECTING TO THE LAB ENVIRONMENT 7. Click on “here” to access your lab environment / Jupyter notebook 33

CONNECTING TO THE LAB ENVIRONMENT You should see your “Deep Learning Network Deployment” Jupyter notebook 34

Jupyter Notebook Introduction Interface: Run 35 35

STARTING DIGITS Instruction in Jupyter notebook will link you to DIGITS 36

ACCESSING DIGITS • Will be prompted to enter a username to access DIGITS • Can enter any username • Use lower case letters 37

REVIEW / NEXT STEPS 38

WHAT’S NEXT • Use / practice what you learned Discuss with peers practical applications of DNN • Reach out to NVIDIA and the Deep Learning Institute • Look for local meetups • Follow people like Andrej Karpathy and Andrew Ng • 39

WHAT’S NEXT TAKE SURVEY ACCESS ONLINE LABS …for the chance to win an NVIDIA SHIELD Check your email for details to access more TV. DLI training online. Check your email for a link. ATTEND WORKSHOP JOIN DEVELOPER PROGRAM Visit www.nvidia.com/dli for workshops in Visit https://developer.nvidia.com/join for your area. more. 40 40

Instructor: Twin Karmakharm www.nvidia.com/dli 42

Join the Conversation #GTC18 CONNECT LEARN DISCOVER INNOVATE Connect with technology Gain insight and valuable Discover the latest Hear about disruptive experts from NVIDIA and hands-on training through breakthroughs in fields such innovations as startups and as autonomous vehicles, researchers present their other leading organisations. hundreds of sessions and research posters. HPC, smart cities, VR, work. robotics, and more. USE CODE NVMDIERINGER TO SAVE 25% | REGISTER AT WWW.GPUTECHCONF.EU Join us at Europe’s premier conference on artificial intelligence. 9-11 October 2018 at the International Congress Centre, Munich. 43

APPENDIX 44

Lab Debug Can’t display Ipython Notebook? 45 45

Lab Debug Don’t know if cell is running?? You should see In[*] and not In[ ] or In[<some number>]. Solid grey circle in the top-right of the browser window If you only see #1 and not #2, then you need to try the following in order: Press the stop button on the toolbar. Try again. Click Kernel -> Restart. Try again. Save the Notebook and refresh the page. Try again. End the lab from the qwikLABS page and start a new instance. All work will be lost. (Please let me know before you do this) 46 46

Lab Debug Reverse to some checkpoint 47 47

Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm - PowerPoint PPT Presentation

Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm Certified Instructor, NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI Mission Helping people solve challenging problems using AI and deep learning.

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Fast Neural Network Inference with TensorRT on Autonomous Vehicles Zejia Zheng (zheng@zoox.com)

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Announcements Containers Working with Lists >>> digits = [1, 8, 2, 8] >>>

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

MAXIMIZING UTILIZATION FOR DATA CENTER INFERENCE WITH TENSORRT INFERENCE SERVER David Goodwin,

INTEGRATION OF DALI WITH TENSORRT ON XAVIER Josh Park (joshp@nvidia.com), Manager - Automotive Deep

S8495: DEPLOYING DEEP NEURAL NETWORKS AS-A-SERVICE USING TENSORRT AND NVIDIA-DOCKER Prethvi

Cold Case : The Lost MNIST Digits The Sherlocks: Chhavi Yadav NYU Lon Bottou FAIR,NYU What

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

How large is large enough? Large-eddy simulation of clear and cloudy convective boundary layers

NSTX-U Supported by Physical Characteristics of Neoclassical Toroidal Viscosity in Tokamaks for

Improving implementation of SLS Adrian Balint 1 Armin Biere 2 solvers for SAT and new

CENG4480 Lecture 09: Memory 2 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 26, 2020)

Developing maintainable CBR Systems _ Applying SIAM to empolis orenge __

Cluster Based Routing (CBR) Protocol with Adaptive Scheduling for Mobility and Energy Awareness in

The relevance of quantitative theory for historical demography David de la Croix Universit

Motivation for Simulations NS-2 Tutorial Cheap -- does not require costly equipment

Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm - PowerPoint PPT Presentation

Neural Network Deployment with DIGITS and TensorRT Twin Karmakharm Certified Instructor, NVIDIA Deep Learning Institute 1 DEEP LEARNING INSTITUTE DLI Mission Helping people solve challenging problems using AI and deep learning.

TensorRT 2. Setup of the TensorRT inference engine 2. Setup of the TensorRT inference engine 3. I/O

Fast Neural Network Inference with TensorRT on Autonomous Vehicles Zejia Zheng (zheng@zoox.com)

S8822 OPTIMIZING NMT WITH TENSORRT Micah Villmow Senior TensorRT Software Engineer 2 100

DEEP LEARNING DEPLOYMENT WITH NVIDIA TENSORRT Shashank Prasanna Deep Learning in Production -

Lesson 9 - I can multiply 3 digits by 1 digit Today we will learn to multiply 3 digits by 1

Announcements Containers Working with Lists &gt;&gt;&gt; digits = [1, 8, 2, 8] &gt;&gt;&gt;

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

MAXIMIZING UTILIZATION FOR DATA CENTER INFERENCE WITH TENSORRT INFERENCE SERVER David Goodwin,

INTEGRATION OF DALI WITH TENSORRT ON XAVIER Josh Park (joshp@nvidia.com), Manager - Automotive Deep

S8495: DEPLOYING DEEP NEURAL NETWORKS AS-A-SERVICE USING TENSORRT AND NVIDIA-DOCKER Prethvi

Cold Case : The Lost MNIST Digits The Sherlocks: Chhavi Yadav NYU Lon Bottou FAIR,NYU What

DEEP INTO TRTIS: BERT PRACTICAL DEPLOYMENT ON NVIDIA GPU Xu Tianhao, Deep Learning Solution

IPv6 Deployment WG in IPv6 Promotion Council and its Deployment Guideline 2005.2.23 IPv6

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Presented by: Doretta Richardson Pre-Deployment Brief Got Deployment? 2 Pre-Deployment Workshop

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

How large is large enough? Large-eddy simulation of clear and cloudy convective boundary layers

NSTX-U Supported by Physical Characteristics of Neoclassical Toroidal Viscosity in Tokamaks for

Improving implementation of SLS Adrian Balint 1 Armin Biere 2 solvers for SAT and new

CENG4480 Lecture 09: Memory 2 Bei Yu byu@cse.cuhk.edu.hk (Latest update: November 26, 2020)

Developing maintainable CBR Systems ___________ Applying SIAM to empolis orenge ____________

Cluster Based Routing (CBR) Protocol with Adaptive Scheduling for Mobility and Energy Awareness in

The relevance of quantitative theory for historical demography David de la Croix Universit

Motivation for Simulations NS-2 Tutorial Cheap -- does not require costly equipment

Announcements Containers Working with Lists >>> digits = [1, 8, 2, 8] >>>

Developing maintainable CBR Systems _ Applying SIAM to empolis orenge __