VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif - - PowerPoint PPT Presentation

visionworks
SMART_READER_LITE
LIVE PREVIEW

VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif - - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley VISIONWORKS A CUDA ACCELERATED COMPUTER VISION LIBRARY S6783 Elif Albuz, April 4, 2016 Motivation Introduction to VisionWorks VisionWorks Software Stack AGENDA VisionWorks Programming Model


slide-1
SLIDE 1

April 4-7, 2016 | Silicon Valley

S6783 Elif Albuz, April 4, 2016

VISIONWORKS™

A CUDA ACCELERATED COMPUTER VISION LIBRARY

slide-2
SLIDE 2

2

AGENDA

Motivation Introduction to VisionWorks™ VisionWorks™ Software Stack VisionWorks™ Programming Model Conclusion Demo

slide-3
SLIDE 3

3

COMPUTER VISION

Intelligent Video Analytics Drones Autonomous Driving Robotics Augmented Reality

slide-4
SLIDE 4

4

COMPUTER VISION

slide-5
SLIDE 5

5

COMPUTER VISION APP DEVELOPMENT

Concept Reference Implementation Product Port to target & optimize

slide-6
SLIDE 6

6

VISIONWORKS™ MOTIVATION

Deliver high performance, robust computer vision primitives Ease development of computer vision applications on Tegra platforms Accelerate prototype to product cycle

Depth Map Optical Flow Corner detection

slide-7
SLIDE 7

7

CUDA accelerated library (OpenVX primitives + NVIDIA extensions + Plus Algorithms)

VISIONWORKS™ AT A GLANCE

Flexible framework for seamlessly adding user-defined primitives. Interoperability with OpenCV Thread-safe API Documentation, tutorials, sample software pipelines that teach use

  • f primitives and framework
slide-8
SLIDE 8

8

JETSON TK1 Pro  Drive PX2 JETSON TK1

VISIONWORKS™ SUPPORTED PLATFORMS

Ubuntu Linux 14.04, Windows 8 Drive PX JETSON TX1

Automotive Embedded Desktop

slide-9
SLIDE 9

9

VISIONWORKS™ TOOLKIT SOFTWARE STACK

CUDA Acceleration Framework

OpenVXTM Framework & Primitives NVIDIA VisionWorks Framework & Primitive Extensions VisionWorks SfM

NVIDIA Khronos

VisionWorks Core Library Source Samples

VisionWorks Source Samples

Feature Tracking, Hough Transform, Stereo Depth Extraction, Camera Hist Equalization..

NVXIO Multimedia Abstraction

VisionWorks-Plus

VisionWorks Object Tracker

. . .

VisionWorks CUDA API

slide-10
SLIDE 10

10

VISIONWORKS™ PRIMITIVES

IMAGE ARITHMETIC

Absolute Difference Accumulate Image Accumulate Squared Accumulate Weighted Add/ Subtract/ Multiply + Channel Combine Channel Extract Color Convert + CopyImage Convert Depth Magnitude MultiplyByScalar Not / Or / And / Xor Phase Table Lookup Threshold

FLOW & DEPTH

Median Flow Optical Flow (LK) + Semi-Global Matching Stereo Block Matching IME Create Motion Field IME Refine Motion Field IME Partition Motion Field

GEOMETRIC TRANSFORMS

Affine Warp + Warp Perspective + Flip Image Remap Scale Image +

FILTERS

BoxFilter Convolution Dilation Filter Erosion Filter Gaussian Filter Gaussian Pyramid Laplacian3x3 Median Filter Scharr3x3 Sobel 3x3

FEATURES

Canny Edge Detector FAST Corners + FAST Track Harris Corners + Harris Track Hough Circles Hough Lines

ANALYSIS

Histogram Histogram Equalization Integral Image Mean Std Deviation Min Max Locations

NVIDIA Extensions All OpenVX Primitives

+ type/mode extension by NVIDIA NVIDIA extension primitives

slide-11
SLIDE 11

11

VISIONWORKS™ PRIMITIVES

  • VisionWorks primitives are CUDA optimized

(except MedianFlow & FindHomography extensions)

  • 85% of VisionWorks OpenVX API is also accelerated with NEON.

Table of NEON optimized primitives are listed in VisionWorks Toolkit Ref.

(Go to "VisionWorks API" -> "NVIDIA Extensions API" -> "Vision Primitives API”)

  • Primitive acceleration with VisionWorks
  • Up to 92x speedup compared to OpenCV CPU kernels on Drive PX (Ave 8x)
  • Up to 13x speedup compared to OpenCV CUDA kernels on Drive PX (Ave 2x)

(Measured on Drive PX, OS=‘V4L' Linux Kernel='3.18.21-tegra-g06aec38' CPU Rate='1632 MHz' GPU Rate='844 MHz' EMC Rate='1600 MHz’)

NVIDIA Extensions All OpenVX Primitives

slide-12
SLIDE 12

12

Feature Tracker Stereo Depth Extraction OpenCV-NPP- OpenVX Interop Hough Lines & Circles

+ Video stabilization + Iterative Motion Estimation/Flow and other platform specific samples (available only on certain platforms) Camera Capture, OpenGL interop, Video playback

VISIONWORKS™ SAMPLE APPLICATIONS

slide-13
SLIDE 13

13

Camera input

ISP & Camera Processing CUDA

CSI

VISIONWORKS SAMPLE APPLICATIONS

NVXIO MULTIMEDIA ABSTRACTION

Vision processing GFX Render

Video/image file input Streamed video/image input

Image/Video Encode . . . Image/Video Decode

Interop/EGLStre ams Interop/EGLStre ams

NVXIO

CPU COMPLEX (Multi-core ARM v8)

SECURITY ENGINE 2D ENGINE (VIC) VIDEO ENCODER VIDEO DECODER AUDIO ENGINE (APE) SAFETY ENGINE (SCE) IMAGE PROC (ISP) SAFETY MANAGER (HSM) BOOT PROC (BPMP) CAN PROC (SPE) I/O

GPU

slide-14
SLIDE 14

14

Structure From Motion Object Tracker

VISIONWORKS™ PLUS ALGORITHMS

slide-15
SLIDE 15

15

Programming with VisionWorks Library

slide-16
SLIDE 16

16

VISIONWORKS™ PROGRAMMING MODEL

VisionWorks OpenVX™ Immediate Mode VisionWorks OpenVX™ Graph Mode VisionWorks CUDA API

Standard specified heterogeneous compute API with individual function calls Heterogeneous compute API with graph

  • ptimizations

 Extensible with user defined nodes Direct CUDA API for advanced CUDA developers

slide-17
SLIDE 17

17

VISIONWORKS OPENVX™ IMMEDIATE MODE

VIDEO STABILIZATION SAMPLE

OpenVX Immediate mode API enables developers to easily port their applications. OpenVX API Immediate mode calls are prefixed with “vxu” Ported Video Stabilization algorithm in OpenCV to VisionWorks Immediate Mode.

Color Conversion Optical Flow

Stabilized frames

Cv::Mat to Vx_image Processs pts & Find Homography Warp Perspective

OpenCV image Source

Feature detection Image Pyramid

slide-18
SLIDE 18

18

VISIONWORKS OPENVX™ IMMEDIATE MODE

VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is accelerated by 2.6x (including the overhead for Mat to vx_image conversions)

Color Conversion Optical Flow

Stabilized frames

Cv::Mat to Vx_image Processs pts & Find Homography Warp Perspective

OpenCV image Source

Feature detection Image Pyramid

0.6x 1.4x 1.7x 4.9x 2.3x 4.6x

slide-19
SLIDE 19

19

VISIONWORKS OPENVX™ GRAPH MODE

VIDEO STABILIZATION SAMPLE

OpenVX API graph mode calls are prefixed with “vx” OpenVX Graph enables advanced optimizations

  • Buffer reuse, kernel fusion
  • Efficient use of streaming and CUDA textures
  • Automatic scheduling across processing units based on various factors (safety, perf,..)
  • Tiling and pipelining vision functions at sub-frame level

Color Conversion Optical Flow

Stabilized frames

Processs pts & Find Homography Warp Perspective

Image Source

Feature detection Image Pyramid

slide-20
SLIDE 20

20

VISIONWORKS OPENVX™ GRAPH MODE

VIDEO STABILIZATION SAMPLE

Performance boost: Video stabilization application is further accelerated compared to immediate mode.

Color Conversion Optical Flow

Stabilized frames

Processs pts & Find Homography Warp Perspective

Image Source

Feature detection Image Pyramid

slide-21
SLIDE 21

21

VISIONWORKS CUDA API

FEATURE TRACKING SAMPLE

VisionWorks CUDA API enables developer with low-level access. Developer manages

  • Data allocations and transfer
  • Scheduling and pipelining

YUV frame Gray frame Camera/image/video Input data Rendering/Output

nvxcuColor Convert nvxcuChannel Extract nvxcuOptica lFlowPyrLK nvxcuHarris Track nvxcuGaussian Pyramid

RGB frame (CUDA buffer) Array of keypoints

slide-22
SLIDE 22

22

VISIONWORKS™ API SELECTION

VisionWorks OpenVX™ Immediate Mode VisionWorks OpenVX™ Graph Mode VisionWorks CUDA API

Quick port from other libraries  To be able to reassign CPU and GPU tasks based

  • n perf.

Let the graph manager to hide overheads, optimize and manage data  To be able to reassign CPU and GPU tasks based on perf. Low level CUDA API access for advanced CUDA developers

slide-23
SLIDE 23

23

DEBUGGING WITH VISIONWORKS™

Enable VisionWorks debug markers with “export NVX_PROF=nvtx”

slide-24
SLIDE 24

24

VISIONWORKS™ DOCUMENTATION

Installed location: /usr/share/visionWorks/docs

slide-25
SLIDE 25

25

First Khronos OpenVX™ 1.0 compliant library (Jan 2015) VisionWorks enables key demos (CES’16 and more at GTC) 27K downloads (embedded) since release in Nov, 2015 + Installed by default

  • n all automotive platforms

VISIONWORKS™ FACTS

Weekly VisionWorks downloads for various platforms

slide-26
SLIDE 26

26

CONCLUSION

  • VisionWorks Toolkit delivers multiple levels of API

– OpenVX Immediate Mode, OpenVX Graph Mode, VisionWorks CUDA API

  • Heterogeneous API enables switching from GPU to CPU

– this is very powerful, reducing productization time

  • Delivers high performance

– Offers significant speedup over CUDA optimized OpenCV functions

  • Adopts native media APIs on Tegra platforms and delivers ready to use code

samples

S6739-VisionWorks™ Toolkit Programming Tutorial Room LL20A L6129-VisionWorks™ Toolkit LAB Session Room 210C H6115 - Designing Computer Vision Applications with VisionWorks™ Pod B

slide-27
SLIDE 27

27

RESOURCES & USEFUL LINKS

http://www.embedded-vision.com/ https://www.khronos.org/openvx/ https://developer.nvidia.com/embedded/visionworks VisionWorks Webinars - https://developer.nvidia.com/embedded/learn/tutorials

slide-28
SLIDE 28

28

VISIONWORKS WITH DEEP LEARNING DEMO FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings

  • f the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). [3] M. Cordts, M. Omran, S. Ramos, T. Scharwächter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset," in CVPR Workshop on The Future of Datasets in Vision, 2015. 2015.

slide-29
SLIDE 29

29

DEEP LEARNING & VISION DEMO FULLY CONVOLUTIONAL NETWORK

[1] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings

  • f the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

[2] Efficient Convolutional Patch Networks for Scene Understanding CVPR Workshop on Scene Understanding (CVPR-WS). 2015.

slide-30
SLIDE 30

30

Introduction VisionWorks API OpenVX Sample Overview

slide-31
SLIDE 31

31

VISIONWORKS™ Sample Applications

NVXIO (Multimedia Abstraction) Histogram Eq w/Camera input Feature tracking with compressed images

Source Samples with multimedia I/0

Hough Lines with decoded video

. . . Platform Software Stack (Multimedia, Interop, GL, UI, System)

slide-32
SLIDE 32

32

PLATFORMS & MULTIMEDIA API

Platform Camera Decode Interop Render Encode

Android Android Camera HAL v3.0 Android API CUDA-OpenGL interop? OpenGLES 3.0 (?) Vibrante NvMedia capture NvMedia +Gst NvMedia h264 ES EGLStreams OpenGLES (GLFW) Gst Linux4Tegra Gst-capture Gst+OpenMAX EGLStreams OpenGLES Gst+OpenMAX Ubuntu Linux 14.04 V4L through OpenCV4Tegra Gst+VDPAU CUDA-OpenGL Interop OpenGL Gst Windows x64 V4W/OpenCV NVCUVID (Gst?) CUDA-OpenGL Interop OpenGL Ffmeg/OpenCV

Gst - Gstreamer

slide-33
SLIDE 33

33

“Multi-quote slide sample.” — Source: Either a name or publication text here, OR, a company logo to the right “Multi-quote slide sample.” — Source: Either a name or publication text here, OR, a company logo to the right “Multi-quote slide sample.” — Source: Either a name or publication text here, OR, a company logo to the right