PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL - - PowerPoint PPT Presentation

programming tutorial thierry lepley april 4 th 2016
SMART_READER_LITE
LIVE PREVIEW

PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL - - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley PROGRAMMING TUTORIAL Thierry Lepley, April 4 th 2016 TUTORIAL GOAL Intermediate Tutorial for Developers Understand philosophy of the API Understand main features of the API Start developing with VisionWorks


slide-1
SLIDE 1

April 4-7, 2016 | Silicon Valley

Thierry Lepley, April 4th 2016

PROGRAMMING TUTORIAL

slide-2
SLIDE 2

2 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TUTORIAL GOAL

Intermediate Tutorial for Developers

Understand philosophy of the API Understand main features of the API Start developing with VisionWorks Extra Credit Come and ask more questions at the VisionWorks hangout (H6115)

slide-3
SLIDE 3

3

INTRODUCTION

slide-4
SLIDE 4

4 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VISIONWORKS API

Core data objects: images, arrays, pyramids, etc. Execution Framework : graphs, nodes, delays, etc. Computer Vision primitives Image filtering functions Image arithmetic and analysis Geometric transformations Feature extraction and tracking Depth and Flow User extensibility : user kernels CUDA Interop

What It Gives Access To ?

color convert Gaussian pyramid

pyrLK optical flow

pyr 0 pyr -1 pts 0 pts -1

slide-5
SLIDE 5

5 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VISIONWORKS SOFTWARE STACK

Tegra K1/X1, Kepler/Maxwell GPU CUDA Acceleration Framework Computer Vision Application

NVIDIA Khronos User

Extended OpenVXTM API Low level NVXCU API (alpha) OpenVX Framework and Primitives VisionWorks Framework and Primitive Extensions Cuda Interop

slide-6
SLIDE 6

6 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Open consortium creating royalty-free, open standard Main OpenVX goals

1. Define a subset of relevant primitives and image/data format 2. Enable acceleration on modern heterogeneous architectures 3. Provide portability and target performance portability across systems

Overview

slide-7
SLIDE 7

7 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

Timeline

Early 2012

OpenVX Working group formed

October 2014

OpenVX 1.0 released

June 2015

OpenVX 1.0.1 released

Jan 2015

First conformant implementation

Nov 2015

First public implementation

slide-8
SLIDE 8

8

AGENDA

Programming Basics Efficient IO Graph and Delay

slide-9
SLIDE 9

9

AGENDA

Programming Basics General Philosophy Primitives Data Objects Code Example

slide-10
SLIDE 10

10

AGENDA

Programming Basics General Philosophy Primitives Data Objects Code Example

slide-11
SLIDE 11

11 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VISIONWORKS : C API

Can interop with any language Application Implementation of the API No portability issue across compilers

Java Application C++ application C application C implementation C++ Implementation C API

slide-12
SLIDE 12

12 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

CONTEXT

Need to be created first Objects are created in a context

An OpenVX World

vx_context context = vxCreateContext(); vx_image img = vxCreateImage(context, 640, 480, VX_DF_IMAGE_RGB);

slide-13
SLIDE 13

13 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OBJECTS

Reference Counted

VisionWorks World (context) Application World

Image

  • bject

Graph Object

vx_image img

(reference)

vx_image img = vxCreateImage(context, ...); vx_graph graph = vxCreateGraph(context); vxBox3x3Node(graph, img, out); vxReleaseImage(&img);

(reference)

The Application gets object references Object not destroyed until ref_count == 0 Safe Memory Management

vx_graph graph

(reference)

slide-14
SLIDE 14

14 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OBJECT REFERENCES

One reference type per object type : vx_image, vx_array, etc. Some functions work on any object reference : down-cast to vx_reference

vx_reference

vx_status status = vxGetStatus((vx_reference)array); vxSetParameterByIndex(node, 0, (vx_reference)input_image); vx_array array = vxCreateArray(context, ...); vx_image img = vxCreateImage(array, ...); // Compile time error

slide-15
SLIDE 15

15 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ERROR MANAGEMENT

Most of API calls : a vx_status code returned Object creation : use vxGetStatus to check the object

Status Code

if (vxuColorConvert(context, input, output) != VX_SUCCESS) { /* Error */ } vx_image img = vxCreateImage(context, 640, 480, VX_DF_IMAGE_RGB); if (vxGetStatus((vx_reference)img) != VX_SUCCESS) { /* Error */ }

slide-16
SLIDE 16

16 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ERROR MANAGEMENT

Registered in a context Called each time an error occurs

Textual Information : Log Callback

void logCallback(vx_context c, vx_reference r, vx_status s, const vx_char string[] m) { /* Do something */ } vxRegisterLogCallback(context, logCallback, vx_false_e);

slide-17
SLIDE 17

17 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

THREAD SAFETY

Functions: Same API function can be concurrently called from multiple thread Objects: A context and its objects can be shared across threads The application must ensure there is no ‘data race’ (e.g. with synchro)

Read Image

Context image

Read Image Write Image Write Image

T1 T2

!

slide-18
SLIDE 18

18

ANY QUESTION SO FAR ?

slide-19
SLIDE 19

19

AGENDA

Programming Basics General Philosophy Primitives Data Objects Code Example

slide-20
SLIDE 20

20 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

COMPUTER VISION PRIMITIVES

IMAGE ARITHMETIC

Absolute Difference Accumulate Image Accumulate Squared Accumulate Weighted Add / Subtract/ Multiply + Channel Combine Channel Extract Color Convert + CopyImage Convert Depth Magnitude Not / Or / And / Xor Phase Table Lookup Threshold FLOW & DEPTH Median Flow Optical Flow (LK) + Semi-Global Matching Stereo Block Matching IME Create Motion Field IME Refine Motion Field IME Partition Motion Field

GEOMETRIC TRANSFORMS

Warp Affine + Warp Perspective + Flip Image Remap Scale Image +

FILTERS

BoxFilter Convolution Dilation Filter Erosion Filter Gaussian Filter Gaussian Pyramid Laplacian 3x3 Median 3x3 Scharr 3x3 Sobel 3x3

FEATURES

Canny Edge Detector Fast Corners + Fast Track Harris Corners + Harris Track Hough Circles Hough Lines

ANALYSIS

Histogram Histogram Equalization Integral Image Mean Std Deviation Min Max Locations

+ Standard with NVIDIA Extensions NVIDIA Proprietary

slide-21
SLIDE 21

21 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

PRIMITIVES EXECUTION

Immediate mode Graph mode 2 options Primitive

slide-22
SLIDE 22

22 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

PRIMITIVES EXECUTION

Blocking calls similar to OpenCV usage model Prefixed with ‘vxu’ Useful for fast prototyping

Immediate Mode

// 3x3 box filter vxuBox3x3(context, src0, tmp); // Absolute Difference of two images vxuAbsDiff(context, tmp, src, dest);

slide-23
SLIDE 23

23 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

PRIMITIVES EXECUTION

Workload given ahead-of-time More optimization opportunities Good fit with video stream processing

Graph Mode

vx_graph graph = vxCreateGraph(context); // Create nodes and check the graph ahead of time (errors detected here) vxBox3x3Node(graph, src0, tmp); vxAbsDiffNode(graph, tmp, src1, dest); vxVerifyGraph(graph); // Execute the graph at runtime vxProcessGraph(graph);

slide-24
SLIDE 24

24 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

BORDER MANAGEMENT

Supported Modes

? ? ? ? ? 3x3 box filter A A B A A B C C

Replicate

n n n n A B n C

Constant (n) Undefined (default)

? ? ? ? ?

slide-25
SLIDE 25

25 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

BORDER MODES

Enum : VX_BORDER_MODE_[UNDEFINED | CONSTANT | REPLICATE] Immediate execution: context attribute (state) Graph : node attribute

API

vx_border_mode_t mode = { VX_BORDER_MODE_CONSTANT, 0}; vxSetContextAttribute(context, VX_CONTEXT_ATTRIBUTE_IMMEDIATE_BORDER_MODE, &mode, sizeof(mode)); vxuBox3x3(context, src, dest); vx_border_mode_t mode = { VX_BORDER_MODE_CONSTANT, 0}; vx_node node = vxBox3x3Node(graph, src, tmp); vxSetNodeAttribute(node, VX_NODE_ATTRIBUTE_BORDER_MODE, &mode, sizeof(mode));

slide-26
SLIDE 26

26 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TARGET COMPUTE DEVICE

Most primitives have both CPU and GPU implementations

Functionality

CPU GPU GPU 2 GPU 1

Primitive Execution

Target controllable with the API Default: automatic assignment Context GPU device ID controllable

slide-27
SLIDE 27

27 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TARGET COMPUTE DEVICE

API

Options: NVX_DEVICE_GPU, NVX_DEVICE_CPU, NVX_DEVICE_ANY Immediate execution: context attribute (state) Graph : node setter function

nvx_device_type_e target = NVX_DEVICE_GPU; vxSetContextAttribute(context, VX_CONTEXT_ATTRIBUTE_IMMEDIATE_TARGET_DEVICE, &target, sizeof(target)); vxuBox3x3(context, src, dest); vx_node node = vxBox3x3Node(graph, src, tmp); nvxSetNodeTargetDevice(node, NVX_DEVICE_GPU);

slide-28
SLIDE 28

28

ANY QUESTION SO FAR ?

slide-29
SLIDE 29

29

AGENDA

Programming Basics General Philosophy Primitives Data Objects

a) Data Object philosophy b) Focus: Images c) Focus: Pyramids d) Focus: Arrays

Code Example

slide-30
SLIDE 30

30

DATA OBJECTS

Images

Image: vx_image + Image Pyramid : vx_pyramid +

Arrays

Array : vx_array + Distribution : vx_distribution + Look-up-table : vx_lut +

Matrices

Matrix: vx_matrix + Convolution : vx_convolution + Remap : vx_remap +

Scalars

Scalar : vx_scalar + Threshold : vx_threshold +

Object + Standard OpenVX with NVIDIA Extensions (ex: access from CUDA)

slide-31
SLIDE 31

31 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DATA OBJECT ACCESS

No permanent pointer to data content

Semi-opaque Objects

VisionWorks World (context) Application World

Pixels vx_image img

(reference)

vx_uint8 *ptr

VisionWorks optimizes the memory management Synchronize data with application only when needed Minimize data synchronization between CPU and GPU

vxAccessImagePatch(img, &rect, 0, &addr, &ptr, VX_READ_AND_WRITE); // Access data at address ‘ptr’ vxCommitImagePatch(img, &rect, 0, &addr, ptr); // ‘ptr’ is now invalid vxBox3x3(img, out_img);

slide-32
SLIDE 32

32 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DATA OBJECT ACCESS

MAP mode (direct access)

  • Host memory (CPU)
  • CUDA memory

Copy Mode

  • Host memory (CPU)
  • CUDA memory

Access Modes

VisionWorks World (context) Application World

vx_image img

(reference)

Pixels (Host mem) vx_uint8 *ptr Host Buffer Pixels (CUDA mem) vx_uint8 *dev_ptr CUDA Buffer

slide-33
SLIDE 33

33 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DATA OBJECT ACCESS

vxAccess<Object>(…) : access the content

HOST: VX_READ_ONLY, VX_WRITE_ONLY, VX_READ_AND_WRITE CUDA: NVX_READ_ONLY_CUDA, NVX_WRITE_ONLY_CUDA, NVX_READ_AND_WRITE_CUDA

vxCommit<Object>(…) : release the access and commit changes

API

vx_uint8 * pLut_cu = NULL; // NULL means ‘map’, non-NULL means ‘copy’ vxAccessLUT(lut_, (void **)&pLut_cu, VX_READ_ONLY_CUDA); // ‘pLut_cu’ is a CUDA dev pointer that can be used by CUDA kernels vxCommitLUT(lut, pLut_cu); // ‘pLut_cu’ is now invalid

slide-34
SLIDE 34

34 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

Mono or Multiplanar Color formats: RGB, RGBX, RGB16, NV12, NV21, UYVY, YUYV, IYUV, YUV4 ‘Gray’ scale: U8, U16, S16, 2S16, U32, S32, F32, 2F32

Formats

vx_image img = vxCreateImage(context, 640, 480, VX_DF_IMAGE_RGB);

slide-35
SLIDE 35

35 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

Constant image All pixels have the same value (no allocation needed) Enables performance optimizations without duplicating the primitive API

Uniform Image

vx_uint8 pix[3] = {0x0, 0x33, 0xCC}; vx_image img = vxCreateUniformImage(context, 640, 480, VX_DF_IMAGE_RGB, pix);

slide-36
SLIDE 36

36 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

Rectangular sub-image Same format as the parent image Share pixels with the parent image (same memory)

Region of Interest (ROI)

Parent Image

start : inside

ROI rectangle

end: outside struct vx_rectangle_t

vx_uint32 start_x The Start X coordinate. vx_uint32 start_y The Start Y coordinate. vx_uint32 end_x The End X coordinate. vx_uint32 end_y The End Y coordinate.

slide-37
SLIDE 37

37 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

ROI Example: Stereo Images

vx_rectangle_t left_rect = { 0, 0, width, height}; vx_image leftROI = vxCreateImageFromROI(inputRGB, &left_rect); vx_rectangle_t right_rect = {width, 0, 2*width, height}; vx_image rightROI = vxCreateImageFromROI(inputRGB, &right_rect);

Input images from the Middlebury stereo dataset (http://vision.middlebury.edu/stereo/data)

2*width height

slide-38
SLIDE 38

38 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

Map : VisionWorks returns address and memory layout Copy : The application provides address and memory layout

Access

void *ptr = NULL; // NULL means ‘map’ vx_imagepatch_addressing_t addr; // The memory layout will be returned here vx_rectangle_t rect = { 0u, 0u, width, height }; vxAccessImagePatch(img, &rect, 0, &addr, &ptr, VX_READ_AND_WRITE); // Access data at address ‘ptr’ with layout specified in ‘addr’ vxCommitImagePatch(img, &rect, 0, &addr, ptr); void *ptr = &my_image[0]; // non NULL means a ‘copy’ vx_imagepatch_addressing_t src_addr = { /* to fill */ }; vx_rectangle_t rect = { 0u, 0u, width, height }; vxAccessImagePatch(vx_src, &rect, 0, &addr, &ptr, VX_READ_AND_WRITE); // Access/modify data in my_buffer vxCommitImagePatch(vx_src, &rect, 0, &addr, ptr);

slide-39
SLIDE 39

39 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: IMAGES

Memory Layout

… … …

stride_y stride_x

typedef struct { vx_uint32 dim_x; vx_uint32 dim_y; vx_int32 stride_x; vx_int32 stride_y; vx_uint32 scale_x; vx_uint32 scale_y; vx_uint32 step_x; vx_uint32 step_y; } vx_imagepatch_addressing_t;

Row Major Ordering / Pitch Linear

Sub-sampled plans: 1 physical pixel every ‘step’ logical pixel Nb (logical) pixels Row 0 Row 1 Row H-1 Col 0 Col 1 Col W-1 Col 2

WxH Image

slide-40
SLIDE 40

40 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: PYRAMIDS

Formats : same as images Configurable number of levels Predefined scales : VX_SCALE_PYRAMID_HALF, VX_SCALE_PYRAMID_ORB

Multi-Resolution Image

vx_pyramid pyr = vxCreatePyramid(context, 5, 0.6f, 640, 480, VX_DF_IMAGE_RGB);

Level 0 Level 1 Level 2 Level 3 Level 4

scale scale scale scale

slide-41
SLIDE 41

41 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: PYRAMIDS

Pyramid ‘levels’ are image objects Pyramid used for tracking currently:

GaussianPyramid : generate a pyramid from an image OpticalFlowPyLK : Lucas Kanade tracking

More About Pyramids

vx_image level1 = vxGetPyramidLevel(pyr, 1); // Increment the ref count vxuBox3x3(context, level1, out_img); vxReleaseImage(&level1); // Decrement de ref count

slide-42
SLIDE 42

42 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: ARRAYS

Fix capacity (used by primitives to avoid overflow) Variable number of items Item types: rectangles, keypoints, coordinates 2D/3D

Array Creation

vx_array array = vxCreateArray(context, VX_TYPE_KEYPOINT, 1000);

!

A too large capacity can negatively impact the performance

slide-43
SLIDE 43

43 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: ARRAYS

Array of rectangles for dynamic ROIs (ex: object bounding box) Image created from ROI for static ROI (stereo image, ROI for static surveillance cameras)

Usage Example : Dynamic ROIs

vx_array array = vxCreateArray(context, VX_TYPE_RECTANGLE, 50);

slide-44
SLIDE 44

44 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: ARRAYS

Map : VisionWorks returns address and stride Copy : The application provides address and stride

Access Example

void *base = NULL; // NULL means ‘map’ vx_size stride; vxAccessArrayRange(array, 2, 10, &stride, &base, VX_READ_AND_WRITE); // Access data of range [2, 10[ at address base vxCommitArrayRange(array, 2, 10, base); void *base = &my_buffer[0]; // non NULL means a ‘copy’ void my_stride = sizeof(element_type); vxAccessArrayRange(array, 0, 10, &stride, &base, VX_READ_AND_WRITE); // Access data of range [2, 10[ in my_buffer vxCommitArrayRange(array, 2, 10, base);

slide-45
SLIDE 45

45 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FOCUS: ARRAYS

Memory Layout

array range

stride start index (2) : inside end index (10) :

  • utside

vxAccessArrayRange(array, 2, 10, &stride, &base, VX_READ_AND_WRITE);

slide-46
SLIDE 46

46

AGENDA

Programming Basics General Philosophy Primitives Data Objects Code Example

slide-47
SLIDE 47

47 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE DETECTION

slide-48
SLIDE 48

48 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE DETECTION

Feature detector primitives

HarrisCorner : strongest Harris points in the image FastCorner : strongest FAST points in the image HarrisTrack : balanced (per cell) Harris (re)detection FastTrack : balanced (per cell) FAST (re)detection

Keypoint structures

vx_keypoint_t : int coordinates, strength, tracking error & status, … nvx_keypointf_t : same as vx_keypoint_t except float coordinates nvx_point2f_t : lightweight structure, only float coordinates

What is in the Toolkit ?

slide-49
SLIDE 49

49 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE DETECTION

Processing

color convert Harris corner

points

(array of vx_keypoint_t)

frame_gray

(U8 image)

frame

(RGB image)

slide-50
SLIDE 50

50 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE DETECTION

Prepare Data

color convert Harris corner

// RGB image data at (compact layout) // address = pImage, width = W, height = H, compact memory layout // Create data objects vx_image frame = vxCreateImage(context, W, H, VX_DF_IMAGE_RGB); vx_image frame_gray = vxCreateImage(context, W, H, VX_DF_IMAGE_U8); vx_array points = vxCreateArray(context, VX_TYPE_KEYPOINT, 1000); // Copy the input data into the vx_image object vx_imagepatch_addressing_t addr; addr.stride_x = 3*sizeof(vx_uint8); // R + G + B addr.stride_y = addr.stride_x * W; void *p = pImage; // Non NULL pointer means a ‘copy’ vx_rectangle rect = {0, 0, W, H}; // Entire image vxAccessImagePatch(frame, &rect, 0, &addr, &p, VX_WRITE_ONLY); vxCommitImagePatch(frame, &rect, 0, &addr, p);

points

(array of vx_keypoint_t)

frame_gray

(U8 image)

frame

(RGB image)

frame

(RGB image)

slide-51
SLIDE 51

51 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE DETECTION

Computation and Get Outputs

// RGB to U8 conversion vxuColorConvert(context, frame, frame_gray); // Keypoint detection : Harris corner vxuHarrisCorners(context, frame_gray, s_strength_thresh, min_dist, k_sensivity, gradientSize, blockSize, points, 0); // Access keypoints vx_size nb_kp; vxQueryArray(points, VX_ARRAY_ATTRIBUTE_NUMITEMS, &nb_kp, sizeof(nb_kp)); vx_size stride; // Returned by the access function vx_keypoint_t *base = NULL; // NULL means ‘map’ (direct access) vxAccessArrayRange(points, 0, nb_kp, &stride, (void **)&base, VX_READ_ONLY); // Access keypoints starting from address ‘p’ with ‘stride’ vx_keypoint_t *p = base; for(vx_size i = 0; i < nb_kp; i++, p = (vx_keypoint_t *)((char*)p + stride) ) // ... vxCommitArrayRange(points, 0, nb_kp, p);

color convert Harris corner color convert Harris corner

points

(array of vx_keypoint_t)

frame_gray

(U8 image)

frame

(RGB image)

frame_gray

(U8 image)

points

(array of vx_keypoint_t)

slide-52
SLIDE 52

52 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

STANDARD KEYPOINT

vx_keypoint_t Structure

Data Fields Detector Tracker vx_int32 x The x coordinate. X X vx_int32 y The y coordinate. X X vx_float32 strength The strength of the keypoint. Its definition is specific to the corner detector. X vx_float32 scale Initialized to 0 by (current) corner detectors. (x) vx_float32 orientation Initialized to 0 by (current) corner detectors. (x) vx_int32 tracking_status A zero indicates a lost point. Initialized to 1 by corner detectors. X X vx_float32 error A tracking method specific error. Initialized to 0 by corner detectors. X

slide-53
SLIDE 53

53 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

MORE PRECISION NEEDED ?

nvx_keypointf_t : same as vx_keypoint_t, with float coordinates nvx_point2f_t : lightweight, no error computation The rest of the code is unchanged

Simply Change the Output Array

vx_array points = vxCreateArray(context, NVX_TYPE_KEYPOINTF, 1000); vx_array points = vxCreateArray(context, NVX_TYPE_POINT2F, 1000);

Data Fields vx_float32 x The X coordinate (-1.0f when tracking lost) vx_float32 y The Y coordinate (-1.0f when tracking lost)

slide-54
SLIDE 54

54

ANY QUESTION SO FAR ?

slide-55
SLIDE 55

55

AGENDA

Programming Basics Efficient IO Graph and Delay

slide-56
SLIDE 56

56 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

EFFICIENT IO BY AVOIDING COPIES

Processing GStreamer CSI / GMSL Camera ** Image in CUDA memory USB Camera Image in Host memory OpenCV Images created from handle NVMedia

** NVIDIA Embedded platforms support CSI Cameras; NVIDIA Automotive platforms support GMSL Cameras

slide-57
SLIDE 57

57 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

IMAGE CREATED FROM HANDLE

VisionWorks World Application World

Application Pixel Buffer vx_image img

(reference)

vx_uint8 *handle Moves from the Application to the VisionWorks World

The handle must NOT be used directly by the application after the image object creation

!

Principle

slide-58
SLIDE 58

58 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

IMAGE CREATED FROM HANDLE

CPU or CUDA memory : VX_IMPORT_TYPE_HOST, NVX_IMPORT_TYPE_CUDA Useful for both input and output images

Creation API

vx_image img = vxCreateImageFromHandle( context, VX_DF_IMAGE_RGB, &addr[0], // Plane layouts &ptrs[0], // Plane handles NVX_IMPORT_TYPE_HOST );

slide-59
SLIDE 59

59 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

IMAGE CREATED FROM HANDLE

Image access like other images : Map/Copy, Host/CUDA

Access with Standard Access/Commit

Mapped at its original address / memory layout Property of memory back to the application at image destruction

slide-60
SLIDE 60

60 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

IMAGE CREATED FROM HANDLE EXAMPLE

OpenCV Interop

OpenCV USB webcam Image In Host memory

Import a Webcam image into VisionWorks directly from the Host memory

slide-61
SLIDE 61

61 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OPENCV INTEROP EXAMPLE

Import a webcam image

// Create a Video Capture from OpenCV cv::VideoCapture inputVideo; inputVideo.open(0); // Grab data from the default webcam // VideoCapture always returns a BGR image, transform it into RGB cv::Mat cv_bgr, cv_rgb; inputVideo.read(cv_bgr); cv::cvtColor(cv_bgr, cv_rgb, cv::COLOR_BGR2RGB); // Import into VisionWorks vx_imagepatch_addressing_t addr; addr.dim_x = cv_rgb.cols; addr.dim_y = cv_rgb.rows; addr.stride_x = 3*sizeof(vx_uint8); addr.stride_y = cv_rgb.step; void *ptrs[] = { cv_rgb.data }; vx_image img = vxCreateImageFromHandle(context, VX_DF_IMAGE_RGB, &addr, ptrs, VX_IMPORT_TYPE_HOST);

slide-62
SLIDE 62

62 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

OPENCV INTEROP EXAMPLE

Refresh an Image

// Mapping an image created from handle will map at the // exact same address and with the same memory layout void *base = NULL; // NULL means ‘map’ vx_imagepatch_addressing_t addr; vx_rectangle_t rect = { 0u, 0u, cv_rgb.cols, cv_rgb.rows}; vxAccessImagePatch(img, &rect, 0, &addr, &base, VX_WRITE_ONLY); // Refresh the OpenCV image inputVideo.read(cv_src_bgr); cv::cvtColor(cv_src_bgr, cv_src_rgb, cv::COLOR_BGR2RGB); // Commit back changes vxCommitImagePatch(img, &rect, 0, &src_addr, base);

slide-63
SLIDE 63

63

ANY QUESTION SO FAR ?

slide-64
SLIDE 64

64

AGENDA

Programming Basics Efficient IO Graph and Delay User Kernel Low-Level CUDA API (alpha) Graphs Delay Object Code Example

slide-65
SLIDE 65

65

AGENDA

Programming Basics Efficient IO Graph and Delay User Kernel Low-Level CUDA API (alpha) Graphs Delay Object Code Example

slide-66
SLIDE 66

66 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

WHAT IS A GRAPH ?

Computer vision pipeline specified ahead-of-time Adapted to processing of video streams Best for performance : enables global optimizations

Node n1

data2

Node n2 Node n3

data3 data4 data5

Node n4

data1

slide-67
SLIDE 67

67 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

ENSURE BEST PERFORMANCE

Immediate Mode Graph Mode

slide-68
SLIDE 68

68 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

GRAPH LIFE CYCLE

Graph Creation Graph Verification Graph Execution Ahead of time / Boot time Runtime Graph Release Shutdown time

slide-69
SLIDE 69

69 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • A. GRAPH CREATION

Any number of graphs can be created Node = instance of vision primitive with well defined parameters

Graph and Nodes

vx_graph graph = vxCreateGraph(context); vx_image frameRGB, frameYUV; // Already created vx_node color = vxColorConvertNode(graph, frameRGB, frameYUV); Color Convert

frameRGB frameYUV

slide-70
SLIDE 70

70 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • A. GRAPH CREATION

Implicit (no edge object) Determined from nodes parameters (write → read relationships) Object hierarchy considered in the dependency analysis (pyramid levels, ROIs)

Edges

vxColorConvertNode(graph, frameRGB, frameGray); vxGaussianPyramidNode(graph, frameGray, pyramid); Color Convert

frameRGB frameGray

Gaussian Pyramid

pyramid

Box3x3

level1 box1

vx_image level1 = vxGetPyramidLevel(pyramid, 1); vxBox3x3Node(graph, level1, box1);

slide-71
SLIDE 71

71 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • B. GRAPH VERIFICATION

Error checking (parameter consistency, graph cycles) Memory allocation Node or device related initialization Optimizations

Better done at setup time What it does ?

slide-72
SLIDE 72

72 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • B. GRAPH VERIFICATION

Explicitly by the application (preferred) Implicitly by VisionWorks when needed

Must be done before execution

vx_status status = vxVerifyGraph(graph); vx_status status = vxProcessGraph(graph); // The graph will be automatically // verified if needed at that time

slide-73
SLIDE 73

73 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  • C. GRAPH EXECUTION

Synchronous Asynchronous Accessing objects used by the graph concurrently to the execution is forbidden

Two Modes

vx_status status = vxProcessGraph(graph); vx_status status = vxScheduleGraph(graph); // Do something on CPU status = vxWaitGraph(graph);

slide-74
SLIDE 74

74 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

VIRTUAL DATA OBJECTS

The application not allowed to access the object content Images, arrays and pyramids can be virtual Enables more optimizations (example : kernel fusion)

Intermediate Graph Data

vx_image in = vxCreateImage(context, 1920, 1080, VX_DF_IMAGE_NV12); vx_image out = vxCreateImage(context, 1920, 1080, VX_DF_IMAGE_U8); vx_image tmp = vxCreateVirtualImage(context, 1920, 1080, VX_DF_IMAGE_U8); // Create, verify the graph and execute the graph vx_node nExtract = vxChannelExtract(graph, in, VX_CHANNEL_Y, tmp); vx_node nBox = vxBox3x3Node(graph, tmp, out); Channel Extract

in tmp

Box3x3

  • ut
slide-75
SLIDE 75

75

AGENDA

Programming Basics Efficient IO Graph and Delay User Kernel Low-Level CUDA API (alpha) Graphs Delay Object Code Example

slide-76
SLIDE 76

76 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY

Collection of Data Objects

Slot –2 Slot –3 Slot -1 Slot 0

4 slots delay containing images

slide-77
SLIDE 77

77 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY

Rolling Buffer Delay Aging

Slot –2 Slot –3 Slot -1 Slot 0

slide-78
SLIDE 78

78 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY

Rolling Buffer Delay Aging

Slot –2 Slot –3 Slot -1 Slot 0

slide-79
SLIDE 79

79 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY API

Exemplar object replicated in each delay slot (meta-data) The exemplar object:

  • Can be any data object (image, array, pyramid, …)
  • Not affected by the delay creation, not link to the delay
  • No memory allocation if created and released just for the delay

Creation

vx_image exemplar = vxCreateImage(context, 640, 480, VX_DF_IMAGE_RGB); // Create a 3 slot delay containing 640x480 VX_DF_IMAGE_RGB images vx_delay delay = vxCreateDelay(context, (vx_reference)exemplar, 3); // The exemplar can now be released immediately vxReleaseImage(&exemplar);

slide-80
SLIDE 80

80 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY API

The delay slot can be used as any object Exception: the delay slot must not be released

Get a Reference of Objects in Slots

// Get references to delay slots vx_image prev_img = (vx_image)vxGetReferenceFromDelay(delay, -1); vx_image cur_img = (vx_image)vxGetReferenceFromDelay(delay, 0); // Add a node to the graph vxAbsDiffNode(context, prev_img, cur_img, out)

slide-81
SLIDE 81

81 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

DELAY API

Aging a delay moves data from object in slot n to object in slot n-1 Only data content shift, not objects Zero copy : internal handle switch

Aging

// Age the delay vxAgeDelay(delay);

Pixel Buffer Pixel Buffer Pixel Buffer

Image Object Image Object Image Object

After aging Before aging

Slot 0 Slot -1 Slot -2

slide-82
SLIDE 82

82

AGENDA

Programming Basics Efficient IO Graph and Delay User Kernel Low-Level CUDA API (alpha) Graphs Delay Object Code Example

slide-83
SLIDE 83

83 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE TRACKING

slide-84
SLIDE 84

84 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE TRACKING

Feature detector primitives

HarrisCorner : strongest Harris points in the image FastCorner : strongest FAST points in the image HarrisTrack : balanced (per cell) Harris (re)detection FastTrack : balanced (per cell) FAST (re)detection

OpticalFlow / tracking primitives

OpticalFlowPyrLK : sparse pyramidal Lucas-Kanade optical flow GaussianPyramid : generate a pyramid from an image

Keypoint structures

vx_keypoint_t : int coordinates, strength, tracking error & status, … nvx_keypoint_t : same as vx_keypoint_t except float coordinates nvx_point2f_t : lightweight structure, only float coordinates

What is in the Toolkit ?

slide-85
SLIDE 85

85 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

FEATURE TRACKING PROCESSING

color convert Harris corner array of vx_keypoint_t

Detection

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

color convert Gaussian pyramid

pyrLK optical flow

pyr 0 array of vx_keypoint_t pyramids pyr -1 pts 0 pts -1

Tracking

color convert Gaussian pyramid

pyrLK optical flow

pyr 0 array of vx_keypoint_t pyramids pyr -1 pts 0 pts -1

Tracking

slide-86
SLIDE 86

86 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TRACKING GRAPH

Data Objects

pyr 0

pyramid_delay

pyr -1 pts 0 pts -1

keypoint_delay // Import the input data into VisionWorks vx_image inputRGB = vxCreateImageFromHandle(context, VX_DF_IMAGE_RGB, &addr, ptrs, NVX_IMPORT_TYPE_CUDA); // Create the intermediate image vx_image inputGray = vxCreateImage(context, width, height, VX_DF_IMAGE_U8); // Image pyramids for two successive frames (2 slots delay object) vx_pyramid pyramid_exemplar = vxCreatePyramid(context, 4, VX_SCALE_PYRAMID_HALF, width, height, VX_DF_IMAGE_U8); vx_delay pyramid_delay = vxCreateDelay(context, (vx_reference)pyramid_exemplar, 2); vxReleasePyramid(&pyramid_exemplar); // Tracked points need to be stored for two successive frames (2 slot delay object) vx_array keypoint_exemplar = vxCreateArray(context, VX_TYPE_KEYPOINT, 2000); vx_delay keypoint_delay = vxCreateDelay(context,(vx_reference)keypoint_exemplar, 2); vxReleaseArray(&keypoint_exemplar); inputRGB array of vx_keypoint_t inputGray

color convert Gaussian pyramid pyrLK optical flow

slide-87
SLIDE 87

87 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TRACKING GRAPH

Nodes and Verification

vx_graph graph = vxCreateGraph (context); // RGB to Y conversion nodes vxColorConvertNode (graph, inputRGB, inputGray); // Pyramid node vx_node pyr_node = vxGaussianPyramidNode (graph, inputGray, (vx_pyramid) vxGetReferenceFromDelay(pyramid_delay, 0)); // Lucas-Kanade optical flow node, previous keypoints given as new estimates vxOpticalFlowPyrLKNode (graph, (vx_pyramid) vxGetReferenceFromDelay(pyramid_delay, -1), // previous pyramid (vx_pyramid) vxGetReferenceFromDelay(pyramid_delay, 0), // current pyramid (vx_array) vxGetReferenceFromDelay(keypoint_delay, -1), // previous keypoints (vx_array) vxGetReferenceFromDelay(keypoint_delay, -1), // new keypoints estimate (vx_array) vxGetReferenceFromDelay(keypoint_delay, 0), // new keypoints VX_TERM_CRITERIA_BOTH, s_lk_epsilon, s_lk_num_iters, s_lk_use_init_est, 10); // Graph verification status = vxVerifyGraph (graph);

pyr 0

pyramid_delay

pyr -1 pts 0 pts -1

keypoint_delay inputRGB array of vx_keypoint_t inputGray

color convert Gaussian pyramid pyrLK optical flow

color convert Gaussian pyramid pyrLK optical flow

slide-88
SLIDE 88

88 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

TRACKING EXECUTION

pyr 0

pyramid_delay

pyr -1 pts 0 pts -1

keypoint_delay inputRGB array of vx_keypoint_t inputGray

color convert Gaussian pyramid pyrLK optical flow

color convert Gaussian pyramid pyrLK optical flow // Data objects creation // <…> // Graph creation & verification // <…> // Process the first frame (keypoints detection) // <…> // Main processing loop for (;;) { void *devptr = 0; vx_rectangle_t rect = {0, width, 0, height}; vxAccessImagePatch (inputRGB, &rect, &addr, 0, &devptr, NVX_WRITE_ONLY_CUDA); // Get next frame data into ‘devptr’ here // <…> vxCommitImagePatch (inputRGB, &rect, 0, &addr, devptr); // Process graph vxProcessGraph (graph); // ‘Age’ delay objects for the next frame processing vxAgeDelay (pyramid_delay); vxAgeDelay (keypoint_delay); } new_frame

slide-89
SLIDE 89

89

QUESTIONS ?

slide-90
SLIDE 90

90 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

INTERESTED IN OTHER DEVELOPMENT ASPECTS ?

Samples Documentation Debug and profiling

VisionWorks Toolkit Hands-on lab (L6129)

slide-91
SLIDE 91

91 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

MORE QUESTIONS ?

VisionWorks Hangout (H6115)

slide-92
SLIDE 92

April 4-7, 2016 | Silicon Valley

THANK YOU

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join