S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel - PowerPoint PPT Presentation

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grüner GTC March 2019

About RidgeRun GStreamer Overview CUDA Overview GstCUDA Introduction Agenda Application Examples Performance Statistics GstCUDA Demo on TX2 Q&A 2

About Us ● US Company - R&D Lab in Costa Rica ● 15 years of experience ● Embedded Linux and GStreamer experts ● Custom multimedia solutions ● Digital signal/image processing ● AI and Machine Learning solutions ● System optimization: CUDA, GStreamer, OpenCL, OpenGL, OpenVX, Vulkan ● Support for embedded and resource constrained systems ● Professional services, dedicated teams and specialized tools 3

Medical Industry Automotive Industry Smart Devices Computer Vision ● Complex multimedia applications require a lot of processing resources ● GStreamer offers a flexible way for creating multimedia applications ● CUDA offers high performance accelerated processing capabilities 4

● Open source framework for audio and video applications ● Based on a pipeline architecture ● Extensible design based on plugins (more than 1000 freely available) ● Automatic format and synchronization handling ● Tools for easy prototyping Modularity Portability Flexibility 5

Basic MP4 player GStreamer Pipeline ● Each plugin represents a different processing module ● The plugins are linked and arranged in a pipeline ● Freedom to build arbitrary pipelines for different applications 6

Modular design lets you change your application easily! Easily change from SW to Easily change your HW accelerated processing application end use 7

Modular design lets you change your application easily! Code equivalent : gst-launch v4l2src ! videoconverter ! x265enc ! mpegtsmux ! filesink Code equivalent : gst-launch v4l2src ! videoconverter ! omxh265enc ! mpegtsmux ! udpsink 8

GstCUDA 10

GstCUDA 11

What Does GstCUDA Solve? 12

Integration Complexities ● ● ● 13

Development Time Create GStreamer plugin with CUDA support Generate CUDA algorithm Integrate CUDA algorithm Without 5 3 Months 10 days Total = 3.5 months days GstCUDA Generate CUDA algorithm Integrate CUDA algorithm ● Reduce development time With 0.1 ● Focus on the CUDA logic 10 days Total = 10.1 days GstCUDA day ● Minimize time to market 14

Performance Bottleneck Memcpy Memcpy ● ● ● 15

Performance Bottleneck Without GstCUDA With GstCUDA ● Data transfers bottleneck ● Efficient memory handling cause poor performance improves performance ● Limited framerate at high ● Up to 2x 4K@60fps resolutions 16

Supported Platforms ● Focused for NVIDIA Embedded Platforms Jetson TX1, TX2, TX2i and Jetson AGX Xavier Nano 17

GstCUDA Key Features 18

GstCUDA Key Features 19

Framework Overview 20

Quick Prototyping Elements 21

Cudafilter Element location = median_filter.so 22

Cudamux Element IR location = thermal_overlay.so 23

CUDA Algorithm Interface ● Make your CUDA algorithm compatible by implementing these interfaces Cudafilter Interface Cudamux Interface bool open(); bool open(); bool close(); bool close(); bool process (const GstCudaData &inbuf, bool process (vector<GstCudaData> GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); bool process_ip (const GstCudaData bool process_ip (vector<GstCudaData> &inbuf, GstCudaData &outbuf); &inbufs, GstCudaData &outbuf); 24

Buffer Processing Methods process_ip (In place) process (Not in place) 25

Create Your Custom Element ● Some applications may require specialized elements ● GstCUDA provides bases classes to simplify development • • 26

GstCUDA Framework Usage Example ● 27

GstCUDA Framework Summary ● The framework includes: Quick prototyping GstCUDA API Set of examples elements ● Utils to handle ● Generic elements to ● Complete GstCUDA memory interfaces evaluate custom element boilerplate algorithms ● GStreamer Unified ● CUDA algorithms for Memory allocators ● Runtime loading of the prototyping CUDA algorithms elements ● Parent classes for different topologies 28

GstCUDA Application Areas Examples Video 29

Industrial Applications: Border Enhancement 30

Automation Applications: Hough Transform 31

Security Applications: Motion Detection/Estimation 32

Performance Statistics 33

Varying Algorithm / Fixed Image Size Test Conditions ● Image convolution algorithm location = convolution.so ● Stressing compute capabilities ● Variable convolution kernel size ● 1080p@240fps / 1080p@60fps stream input ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● Not In-place 34

Varying Algorithm / Fixed Image Size Framerate Stats 35

Varying Algorithm / Fixed Image Size Processing Time Stats 36

Varying Algorithm / Fixed Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 37

Fixed Algorithm / Varying Image Size Test Conditions ● Memory copy algorithm location = memcpy.so ● Stressing data transfer ● Variable input resolution ● Cudafilter element ● Unified Memory allocator ● Jetson TX2 platform ● In-place vrs not In-place 38

Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 245 fps by the video source 39

Fixed Algorithm / Varying Image Size Processing Time Stats 40

Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 41

Fixed Algorithm / Varying Image Size Test Conditions ● Simple image mixing algorithm location = mixer.so ● Stressing data transfer ● Variable input resolution ● Cudamux element ● Unified Memory allocator ● In-place=True ● Jetson TX2 platform 42

Fixed Algorithm / Varying Image Size Framerate Stats Note: Maximum Framerate limited to 240fps by the video source 43

Fixed Algorithm / Varying Image Size CPU Load Stats GPU Load Stats *baseline = simple capture pipeline (without GstCUDA) 44

GstCUDA Live Demo on Jetson TX2 Sobel Filter 1080p60fps Code equivalent : gst-launch-1.0 nvcamerasrc sensor-id=2 fpsRange=60,60 ! "video/x-raw(memory:NVMM),width=1920,height=1080,framerate=6 0/1,format=I420" ! nvvidconv ! "video/x-raw" ! queue ! cudafilter in-place=false location=/borders.so ! queue ! nvoverlaysink 45

Resources ● GstCUDA wiki page: ○ gstcuda.ridgerun.com ● RidgeRun Website: ○ ridgerun.com ● RidgeRun Contact: ○ ridgerun.com/contact 46

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel - PowerPoint PPT Presentation

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grner GTC March 2019 About RidgeRun GStreamer Overview CUDA Overview GstCUDA Introduction Agenda Application Examples Performance Statistics GstCUDA

WebKit, HTML5 media and GStreamer on multiple platforms Spreading GStreamer awesome in WebKit

Case study Using Gstreamer for building Automated Webcasting Systems 26.10.10 - Gstreamer

GStreamer for Tiny Devices Olivier Crte Open First Who am I ? GStreamer at Collabora since

Trick Modes in GStreamer GStreamer Conference 2014, Dsseldorf 17 October 2014 Sebastian Drge

WHAT'S NEW IN GSTREAMER? WHAT'S NEW IN GSTREAMER? FOSDEM 3 February 2018, Brussels Tim-Philipp

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Adaptive video streaming with Ice and GStreamer Using ICE middleware with GStreamer to implement

GStreamer on Android Who are we? A short Introduction to GStreamer Pipeline based multimedia

High complexity GStreamer pipelines Buzztard / Audio / Gstreamer Stefan Sauer

Whats cooking in GStreamer FOSDEM, Brussels 1 February 2014 Tim-Philipp Mller

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Four Decades of Living Proof No-Till Thr hrough No ough No-Till Till Systems Systems Works

Presentation Outline Article July 1999 Source: CiteSeer CITATIONS READS 0 16 2 authors ,

Fishy Compost North Central Regional Aquaculture Center Missouri Aquaculture Association Kansas

Price Loss Coverage Program (PLC) FSA 2014 Farm Bill Training ARCPLC 1 PLC - Overview

FY21 1 Pr Prop opose osed Bu Budget get Alaska Seafood Marketing Institute Domestic

Improving nutrition, hydration and social engagement for Older People in Care Homes and

Facilitator: Catherine Cooper Introductions Emergency procedures Details on how

DELIVERING THE PRIESKA PROJECT A strong foundation for near-term development and long-term growth