gpu inference engine
play

GPU INFERENCE ENGINE Michael Andersch, 7 th April 2016 WHAT IS - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley NVIDIA GIE: HIGH-PERFORMANCE GPU INFERENCE ENGINE Michael Andersch, 7 th April 2016 WHAT IS INFERENCE, ANYWAYS? Building a deep neural network based application Step 1: Use data to train the neural network -


  1. April 4-7, 2016 | Silicon Valley NVIDIA GIE: HIGH-PERFORMANCE GPU INFERENCE ENGINE Michael Andersch, 7 th April 2016

  2. WHAT IS INFERENCE, ANYWAYS? Building a deep neural network based application Step 1: Use data to train the neural network - training Step 2: Use the neural network to process unseen data - inference 2

  3. INFERENCE VS TRAINING How is inference different from training? 1. No backpropagation / static weights enables graph optimizations, simplifies memory management 2. Tendency towards smaller batch sizes harder to amortize weight loading, achieve high GPU utilization 3. Reduced precision requirements provides opportunity for BW savings and accelerated arithmetic 3

  4. OPTIMIZING SOFTWARE FOR INFERENCE Extracting every bit of performance What’s running on the GPU: cuDNN optimizations Support for standard tensor layouts and major frameworks Available automatically and “for free” How you use it: Framework optimizations Every last bit of performance matters Challenging due to framework structure Changes to one framework don’t propagate to others 4

  5. OPTIMIZING SOFTWARE FOR INFERENCE Challenge: Efficient small batch convolutions Optimal convolution algorithm depends on convolution layer dimensions Winograd speedup over GEMM-based convolution (VGG-E layers, N=1) 2.26 2.07 2.03 1.98 1.92 1.84 1.83 1.25 0.73 conv 1.1 conv 1.2 conv 2.1 conv 2.2 conv 3.1 conv 3.2 conv 4.1 conv 4.2 conv 5.0 Meta-parameters (data layouts, texture memory) afford higher performance Using texture memory for convolutions: 13% inference speedup (GoogLeNet, batch size 1) 5

  6. OPTIMIZING SOFTWARE FOR INFERENCE Challenge: Graph optimization tensor concat 3x3 conv. 5x5 conv. 1x1 conv. 1x1 conv. 1x1 conv. 1x1 conv. max pool input 6

  7. OPTIMIZING SOFTWARE FOR INFERENCE Challenge: Graph optimization next input concat relu relu relu relu bias bias bias bias 3x3 conv. 3x3 conv. 3x3 conv. 1x1 conv. relu relu max pool bias bias 1x1 conv. 3x3 conv. input concat 7

  8. OPTIMIZING SOFTWARE FOR INFERENCE Graph optimization: Vertical fusion next input concat 1x1 CBR 5x5 CBR 3x3 CBR 1x1 CBR max pool 1x1 CBR 1x1 CBR input concat 8

  9. OPTIMIZING SOFTWARE FOR INFERENCE Graph optimization: Horizontal fusion next input concat 1x1 CBR 5x5 CBR 3x3 CBR max pool 1x1 CBR input concat 9

  10. OPTIMIZING SOFTWARE FOR INFERENCE Graph optimization: Concat elision next input 1x1 CBR 5x5 CBR 3x3 CBR max pool 1x1 CBR input 10

  11. OPTIMIZING SOFTWARE FOR INFERENCE Graph optimization: Concurrency next input 1x1 CBR 5x5 CBR 3x3 CBR max pool 1x1 CBR input 11

  12. OPTIMIZING SOFTWARE FOR INFERENCE Challenge: Effective use of cuBLAS intrinsics Run GEMV instead of GEMM Small batch sizes degrade N dimension B matrix becomes narrow Pre-transpose weight matrices Allows using NN/NT GEMM, where NT > NN > TN 12

  13. ACCELERATED INFERENCE ON PASCAL Support for fast mixed precision arithmetic Inference products will support a new dedicated vector math instruction Multi-element dot product, 8-bit integer inputs, 32-bit accumulator 4x the rate of equivalent FP32 operations Full-speed FP32 processing for any layers that require higher precision 13

  14. BUT WHO WILL IMPLEMENT IT? Introducing NVIDIA GIE: GPU Inference Engine EXECUTION ENGINE OPTIMIZATION ENGINE STRATEGY 14

  15. GPU INFERENCE ENGINE WORKFLOW OPTIMIZATION ENGINE DIGITS TRAINING TOOLS STRATEGY EXECUTION ENGINE 15

  16. SUMMARY Inference on the GPU Tesla M4 Hyperscale Accelerator GPUs are a great platform for inference Efficiency: great performance/watt Scalability: from 3W to 300W GPU- based inference affords … … same performance in much tighter power envelope … freeing up the CPU to do other work Questions: mandersch@nvidia.com, or find me after the talk! 16

  17. April 4-7, 2016 | Silicon Valley THANK YOU JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend