S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS - PowerPoint PPT Presentation

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS Anudeep Nallamothu - NVIDIA Solutions Architect Andrew Bull - NVIDIA Solutions Architect

Realtime Streaming Video Analytics • Framework for Analyzing Video • • Understand the Basics: DeepStream SDK 3.0 • Hardware Platforms AGENDA An Overview of TensorRT 5.0 • Transfer Learning Toolkit • Build with DeepStream: Example Applications • Getting Started Resources • 2

REALTIME STREAMING VIDEO ANALYTICS 3

REALTIME STREAMING VIDEO ANALYTICS FROM EDGE TO CLOUD Parking Management Traffic Engineering Access Control Managing operations Managing Logistics Retail Analytics Optical Inspection Content Filtering 4

FRAMEWORK FOR ANALYZING VIDEO 5

FRAMEWORK FOR ANALYZING VIDEO MULTIMEDIA APIs COMPOSITE STREAM &BATCH MULTIMEDIA APIs CUDA TENSORRT, CUDA PROCESSING LOCAL DISPLAY TRACK, METADATA Metadata DECODE PRE-PROCESS DETECT, PROCESSING CLASSIFY REMOTE DISPLAY Data Analytics Perception 6

DEEPSTREAM FOR AI APPLICATION PERFORMANCE AND SCALE Perception and Analytics NEXT • Multi-GPU containerized applications • 360D cameras Solution framework Perception – edge to cloud • Dynamic stream management • Optical flow Scalability IOT services • Unified APIs across platforms • • Remote display Perception Multi-streams/ multi-DNNs • • Multi-GPU dynamic • Platform specific APIs orchestration • Custom graphs Indexed video storage and • Streams: Multi (Tesla), single(Jetson) • retrieval v3.0 • Workflow templates for full solutions v2.0 v1.0 NVIDIA Other Other Other Other DeepStream Next – POR can change 7

DEEPSTREAM 3.0 8

DEEP LEARNING FOR IVA End-to-end workflow Accelerate building and deploying heterogeneous applications for IVA use cases with TLT & DeepStream 3.0 9

DEEPSTREAM SDK 10

NVIDIA IVA PLATFORM Deploy from the edge to the cloud EDGE / ON-PREMISE CORE/CLOUD Inference Training and Inference NVR Camera Data center NVR / APPLIANCE SERVER DEEPSTREAM  TENSORRT  JETPACK QUADRO / TESLA TESLA / DGX JETSON 11

WHAT’S NEW IN DEEPSTREAM 3.0 LATEST GPUs - TESLA T4, DYNAMIC STREAM NEW PLUGINS JETSON XAVIER MANAGMENT PLUGIN LOW LEVEL LIB GPU Add, remove, modify Increased capability TensorRT 5, CUDA 10 streams on the fly and throughput EASY TO SCALE AND HIGH EFFIENCY AND CONNECT EDGE TO CLOUD THROUGHPUT WITH TLT MANAGE Deploy in Docker TLT model files are plug-n- Stream and Batch Analytics Containers play on Metadata 12

DEEPSTREAM STREAMING ARCHITECTURE IMAGE DISPLAY/ RTSP/RAW DECODE/ISP DNN(s) BATCHING TRACKING VIZULIZATION PROCESSING STORAGE DECODE, SCALE, STREAM DETECT & ON SCREEN CAPTURE CAMERA DEWARP, TRACKING OUTPUT MGMT CLASSIFY DISPLAY PROCESS CROP GigE NVDEC GPU CPU GPU GPU GPU HDMI ISP ISP VPA TC VPA VIC SATA VIC DLA CPU 13

DEEPSTREAM BUILDING BLOCK A plugin model based pipeline architecture • • Graph-based pipeline interface to allow high-level component interconnect Input + Output + PLUGIN [Metadata] Metadata • Heterogenous processing on GPU and CPU Hides parallelization and synchronization • under the hood Low Level API LOW LEVEL LIB Inherently multi-threaded • Hardware GPU 14

NVIDIA-ACCELERATED PLUGINS Plugin Name Functionality Accelerated video decoders gst-nvvideocodecs Stream aggregator - muxer and batching gst-nvstreammux TensorRT based inference for detection & classification gst-nvinfer Reference KLT tracker implementation gst-nvtracker On-Screen Display API to draw boxes and text overlay gst-nvosd Renders frames from multi-source into 2D grid array gst-tiler Accelerated X11 / EGL based renderer plugin gst-eglglessink Scaling, format conversion, rotation gst-nvvidconv Dewarping for 360 Degree camera input Gst-nvdewarp Meta data generation Gst-nvmsgconv Gst-nvmsgbroker Messaging to Cloud 15

SCALE WITH DEEPSTREAM IN DOCKER Discover GPU-Accelerated Containers Innovate in Minutes, Not Weeks Stay Up to Date https://www.nvidia.com/en-us/gpu-cloud/ 16

DEEPSTREAM IOT 17

DEEPSTREAM WITH AZURE IOT EDGE APPLIANCE Azure CLOUD DeepStream container IoT DPS IoT Edge Runtime IoT Edge Agent IoT Edge Hub IoT Hub Web Client Storage and Indexer Service Docker CUDA DRIVER IoT Edge Daemon Search & Query NVIDIA GPU HSM 18

HARDWARE PLATFORMS 19

NVIDIA T4 UNIVERSAL INFERENCE ACCELERATOR 320 Turing Tensor Cores 2,560 CUDA Cores H.264 Decode Throughput H.265 Decode Throughput 65 FP16 TFLOPS | 130 INT8 TOPS | 260 INT4 TOPS (Streams) (Streams) 16GB | 320GB/s 80 120 70 W 100 60 80 40 60 40 20 20 0 0 720p30 1080p30 4K30 720p30 1080p30 4K30 P4 T4 P4 T4 20

THE JETSON FAMILY From AI at the Edge to Autonomous Machines JETSON NANO JETSON TX2 JETSON AGX XAVIER 5 - 10W 7 – 15W 10 – 30W 0.5 TFLOPS (FP16) 1.3 TFLOPS (FP16) 10 TFLOPS (FP16) | 32 TOPS (INT8) 45mm x 70mm 50mm x 87mm 100mm x 87mm $129 AVAIABLE IN Q2 $299 - $749 $1099 AI at the edge Fully autonomous machines Multiple devices - Same software 21

JETSON NANO JETSON TX2 JETSON AGX XAVIER 128 Core Maxwell 256 Core Pascal NVIDIA Volta architecture with 512 NVIDIA CUDA GPU 0.5 TFLOPs (FP16) 1.3 TFLOPS (FP16) cores and 64 Tensor cores CPU 4 core ARM A57 @ 1.43 GHz 6 core Denver and A57 @ 2GHz 8-core ARM v8.2 64-bit CPU, 8 MB L2 + 4 MB L3 8 GB 128 bit 4 GB 128 bit LPDDR4 Memory 4 GB 64 bit LPDDR4 25.6 GB/s LPDDR4 16 GB 256-bit LPDDR4x 51 GB/s 58 GB/s Softwa Storage 16 GB eMMC 16 GB eMMC 32 GB eMMC 32 GB eMMC 5.1 re 2x1000MP/sec | 4x 4K @ 60 (HEVC) 4K @ 30 | 4x 1080p @ 30 | 8x 720p @ 30 2x 4K @ 60 | 4x 4K @ 30| 14x 1080p @ 30 Video Encode 8x 4K @ 30 (HEVC)| 16x 1080p @ 60 (HEVC) (H.264/H.265) (H.264/H.265) 32x 1080p @ 30 (HEVC) Power mode 5W|10W 7.5W|15W 7.5W|15W 10W|20W 2x1500MP/sec| 2x 8K @ 30 (HEVC) 4K @ 60 | 2x 4K @ 30 | 8x 1080p @ 30 | 16x 2x 4K @ 60| 4x 4K @ 30| 14x 1080p @ 30 Video Decode 6x 4K @ 60 (HEVC) | 12x 4K @ 30 (HEVC) 720p @ 30 | (H.264/H.265) (H.264/H.265) 26x 1080p @ 60 (HEVC) |52x 1080p @ 30 (HEVC) 16 lanes MIPI CSI-2 | 8 SLVS-EC 12 (3x4 or 4x2) MIPI CSI-2 DPHY 1.1 lanes 12 (3x4 or 6x2) MIPI CSI-2 D-PHY 1.2 lanes D-PHY 1.2 (2.5Gb/s per pair, total up to 40 Gbps) Camera (1.5 Gbps) (30 Gbps) C-PHY 1.1(2.5Gsym/s per trio, total up to 109 Gbps) WiFi/BT Requires external chip Requires external chip Onboard Requires external chip HDMI 2.0 or DP1.2 | eDP 1.4 | DSI (1 x2) HDMI 2.0 or DP 1.2 | eDP 1.4 | DSI (2 x4) Three multi-mode DP 1.2a/eDP 1.4/HDMI 2.0 a/b Display 2 simultaneous 3 simultaneous No DSI support UPHY 1 x1/2/4 PCIE | 1 USB 3.0 1+ 1 x4 or 1+1+1 x1/x2 PCIe or 3xUSB 3.0 16 lanes PCIe Gen 4 1x8 + 1x4 + 1x2 + 2x1 SATA None 1x SATA through PCIe x1 Bridge Power mode 5W|10W 7.5W|15W 7.5W|15W 10W|20W USB OTG Not supported Not Supported Not Supported Not Supported 69.6mm x 45mm 260 pin edge connector, No 87mm x 50mm 400 pin connector, Integrated 100 mm x87 mm Mechanical 22 TTP TTP 699-pin connector

JETSON NANO RUNS MODERN AI Inference 50 40 30 Img/sec 20 10 0 Resnet50 Inception v4 VGG-19 SSD SSD SSD Tiny Yolo Unet Super OpenPose Mobilenet-v2 Mobilenet-v2 Mobilenet-v2 resolution (300x300) (960x544) (1920x1080) Coral dev board (Edge TPU) Raspberry Pi 3 + Intel Neural Compute Stick 2 Jetson Nano Not supported/DNR 24

TENSORRT 26

NVIDIA TensorRT From Every Framework, Optimized For Each Target Platform TESLA T4 JETSON Xavier TensorRT TensorRT DRIVE AGX NVIDIA DLA TESLA V100 27

TENSORRT OVERVIEW High-performance Deep Learning Inference Engine for Production Deployment ONNX ONNX ONNX We Are Here 28

FRAMEWORKS GPU PLATFORMS Inference Optimizer and Runtime TESLA T4 TensorRT NVIDIA TensorRT 5 Optimizer Runtime DRIVE PX 2 Data center, embedded & automotive NVIDIA DLA In-framework support for TensorFlow TESLA V100 Support for all other frameworks and ONNX TensorRT inference server microservice with Docker and Kubernetes integration New layers and APIs New OS support for Windows and CentOS *New in TRT5 29

MODEL IMPORTING AI Researchers  Data Scientists  Example: Importing a TensorFlow model Other Frameworks Python/C++ API Python/C++ API Network Model Importer Definition API Runtime inference C++ or Python API 30 developer.nvidia.com/tensorrt

FP16, INT8 PRECISION CALIBRATION Reduced Precision Inference Performance Precision Dynamic Range (ResNet50) FP32 INT8 Difference 38 ~ +3.4x10 6,000 Top 1 38 Top 1 Training precision FP32 -3.4x10 FP16 Googlenet 68.87% 68.49% 0.38% Tensor Core No calibration required FP16 -65504 ~ +65504 5,000 VGG 68.56% 68.45% 0.11% Requires calibration INT8 -128 ~ +127 Resnet-50 73.11% 72.54% 0.57% 4,000 Images/Second Resnet-152 75.18% 74.56% 0.61% 3,000 Precision calibration for INT8 inference: 2,000 INT8  Minimizes information loss between FP32 and FP32 1,000 INT8 inference on a calibration dataset FP32 FP32  Completely automatic 0 CPU-Only P4 V100 31

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS - PowerPoint PPT Presentation

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS Anudeep Nallamothu - NVIDIA Solutions Architect Andrew Bull - NVIDIA Solutions Architect Realtime Streaming Video Analytics Framework for Analyzing Video Understand

AN INTRODUCTION TO DEEPSTREAM SDK Kaustubh Purandare March 2018 AGENDA Introduction to

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian?

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

pan effects for slides in VBScript with ByteScout Image To Video SDK How To: tutorial on pan

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

CONVERTIGO SDK THE ULTIMATE CLIENT MOBILE API FOR CONVERTIGO MBAAS WHY CONVERTIGO SDK ?

Hello World! The Microsoft Bot Ecosystem Bot Service / Bot Builder SDK Bot Builder SDK

ARC SDK overview ARC SDK overview Martin Skou Andersen University of Copenhagen skou@nbi.ku.dk

Qt5 & Yocto: SDK and app migration Denys Dmytriyenko LCPD, Arago Project Texas Instruments

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

ECP SW deployment SDK / Spack in production environments Scalable tools Workshop 7/11/18

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

DactyMatch Green Bit Green Bit Fingerprint Recognition Recognition Fingerprint SDK v.2.2

Itron Idea Labs Smart City Demo Presented by 1 Itron SDK Sequans MWC Demo - Description By

Luca Bedogni Dipartimento di Informatica: Scienza e Ingegneria Universit di Bologna SDK and

Distributed Shared Memory (DSM) Robert Gasparyan, Angela Gong, Judson Wilson CS 240, Spring 2015

LTTng Project Updates Outline Outline LTTng 2.11 Upcoming LTTng features LTTng 2.12

Shrug Daemon Doing DNS the Hard Way Since 2015

OpenStack Storlets Project Update, OpenStack Summit Vancouver Kota Tsuyuzaki [kota_@irc,

Project Plan Kubernetes Cluster Inspection Tool The Capstone Experience Team Google Dave Ackley

DBIS: Directory-Based Information Services A replacement for NIS and RFC2307 by Mark R.

Linux Plumbers Conference 2011 LTTng 2.0 : Application, Library and Kernel tracing within your

IPsec real end-to-end security without VPNs FUKT Computer Society Teddy Hogeborn Bjrn

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS - PowerPoint PPT Presentation

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS Anudeep Nallamothu - NVIDIA Solutions Architect Andrew Bull - NVIDIA Solutions Architect Realtime Streaming Video Analytics Framework for Analyzing Video Understand

AN INTRODUCTION TO DEEPSTREAM SDK Kaustubh Purandare March 2018 AGENDA Introduction to

Android SDK Tools in Debian Kai-Chung Yan &lt;seamlikok@gmail.com&gt; Why Android SDK in Debian?

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 5/8/2017 NVIDIA Video Technologies New SDK Release

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

pan effects for slides in VBScript with ByteScout Image To Video SDK How To: tutorial on pan

GET TO KNOW THE NVIDIA GRID TM SDK Shounak Deshpande, NVIDIA Background NVIDIA GRID SDK AGENDA

CONVERTIGO SDK THE ULTIMATE CLIENT MOBILE API FOR CONVERTIGO MBAAS WHY CONVERTIGO SDK ?

Hello World! The Microsoft Bot Ecosystem Bot Service / Bot Builder SDK Bot Builder SDK

ARC SDK overview ARC SDK overview Martin Skou Andersen University of Copenhagen skou@nbi.ku.dk

Qt5 &amp; Yocto: SDK and app migration Denys Dmytriyenko LCPD, Arago Project Texas Instruments

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

ECP SW deployment SDK / Spack in production environments Scalable tools Workshop 7/11/18

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

DactyMatch Green Bit Green Bit Fingerprint Recognition Recognition Fingerprint SDK v.2.2

Itron Idea Labs Smart City Demo Presented by 1 Itron SDK Sequans MWC Demo - Description By

Luca Bedogni Dipartimento di Informatica: Scienza e Ingegneria Universit di Bologna SDK and

Distributed Shared Memory (DSM) Robert Gasparyan, Angela Gong, Judson Wilson CS 240, Spring 2015

LTTng Project Updates Outline Outline LTTng 2.11 Upcoming LTTng features LTTng 2.12

Shrug Daemon Doing DNS the Hard Way Since 2015

OpenStack Storlets Project Update, OpenStack Summit Vancouver Kota Tsuyuzaki [kota_@irc,

Project Plan Kubernetes Cluster Inspection Tool The Capstone Experience Team Google Dave Ackley

DBIS: Directory-Based Information Services A replacement for NIS and RFC2307 by Mark R.

Linux Plumbers Conference 2011 LTTng 2.0 : Application, Library and Kernel tracing within your

IPsec real end-to-end security without VPNs FUKT Computer Society Teddy Hogeborn Bjrn

Android SDK Tools in Debian Kai-Chung Yan <seamlikok@gmail.com> Why Android SDK in Debian?

Qt5 & Yocto: SDK and app migration Denys Dmytriyenko LCPD, Arago Project Texas Instruments