BLINK: A GPU-Enabled Image Processing Framework Mark Davey Lead - - PowerPoint PPT Presentation

▶

Jan 17, 2024 342 likes •563 views

BLINK: A GPU-Enabled Image Processing Framework Mark Davey Lead HPC Engineer The Foundry The Foundry and HPC The Foundry Founded in 1996 We develop award-winning visual effects, computer graphics and design software used globally

SLIDE 1

BLINK: A GPU-Enabled Image Processing Framework

Mark Davey Lead HPC Engineer The Foundry

SLIDE 2

The Foundry and HPC

The Foundry

– Founded in 1996 – We develop award-winning visual effects, computer graphics and design software used globally by leading artists and designers

– We create frameworks to make best use of all available compute devices – “make things go faster” – Initial target: 2D Image Processing

SLIDE 3

2D Image Processing

A fundamental component in many Foundry products.

Used in such effects as:

Noise reduction
Keying
Motion and disparity estimation
Colour correction/grading
Panoramic stitching
3D texture creation

Need to make it as fast as possible!

SLIDE 4

Moving to GPUs

Traditionally used the CPU for image processing
Lots of legacy code
GPUs are great at image processing
Our customers often have powerful GPUs but not always (e.g.

render farms)

Need a fallback CPU path
Do not want to write same code multiple times

(debugging, maintenance, new hardware, etc.)

SLIDE 5

The Solution - BLINK

“Write once, deploy everywhere”
Image processing algorithms expressed as kernels
Kernels written in a C++ like, domain-specific language
Kernels run over an iteration space
Metadata expresses access patterns, image formats, boundary

conditions, etc.

Kernels are translated into different back-ends
JIT Compilation for many paths

SLIDE 6

BLINK - Features

Multiple back-ends supported
Consistent results across devices
Range of image formats and layouts available
Kernel execution strategy left to framework
Profiling (execute and transfer)

SLIDE 7

BLINK Back-ends

CUDA

(4.2, Compute Capability 2.0)

OpenCL

(1.1)

GLSL

(1.2)

(Scalar, SSE2, SSE4.1, AVX, AVX2)

SLIDE 8

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

SLIDE 9

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

SLIDE 10

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

SLIDE 11

BLINK - The Foundry

Nuke – Post Production Compositing Software

Many key plug-ins written using BLINK
BlinkScript

– Customers can create kernels within Nuke for GPU and CPU – Multi-GPU support on selected configurations

OCULA 4 – Stereoscopic Toolset

Projects

ASAP – A Scalable 2D/3D Architecture for Cross Media Virtual Production
Dreamspace – Advancements in Virtual Production Frameworks

SLIDE 12

OCULA

A collection of Nuke tools to handle stereoscopic imagery
Vector Disparity Generator at its heart

– Correct colour and focus, automatically correct alignment, retime

Latest version (4) written using BLINK
Over 12K kernel calls per frame!

SLIDE 13

OCULA 4 – Disparity Generation

SLIDE 14

OCULA 4 – Different Devices

SLIDE 15

Numerical Identity I

Our customers need visually identical results when

processing on different devices.

Some algorithms are extremely sensitive to small

differences in mathematical results (e.g. OCULA!)

Need to ensure numerical identity to guarantee visual

identity

SLIDE 16

Numerical Identity – General Overview

Disable fast math - to prevent compiler from reordering math operations.
Force floating point literals to single precision - different compilers treat double literals

differently giving inconsistent results.

Disable Fused-Multiply-Add (FMA)
Implement unified math library for all code paths

– Algebraic functions sqrt, hypot … – Transcendental functions sin, exp … – Integral rounding functions ceil, floor … – IEEE standard functions fmod, fabs … – Matrices and operators transpose, inverse … – Vectors and operators dot, cross … – Others min, max …

SLIDE 17

Numerical Identity – Platform Specifics

CUDA (nvcc flags)

Disable “Flush Denormals To Zero” (--ftz=false)
Disable “Fused Multiply Add” (--fmad=false)
Enable precise square root and divide (--prec-sqrt=true --prec-div=true)

CPU:

Precisely control FPU control register for rounding, denormal handing, etc

( using _mm_setcsr intrinsic )

Implement vector types (float1..float4, int1..int4,...)

Also supported for OpenCL (NVIDIA GPUs only)

SLIDE 18

OCULA 4 - Results

Disparity generation
3.3MPixel (2560x1350) frames
End-to-end processing cost
Only 5% overhead for Numerical Identity
Many kernels are memory bound

1 2 3 4 5 6 7 8 9 K5000 - Unified Math K5000 - Optimised Time (s)

OCULA 4 -Disparity - 3.3MPixel - Unified Math

SLIDE 19

OCULA 4 - Results

5 10 15 20 25 30 35 CPU - 2x 6-Core Xeon K5000 - Unified Math K5000 - Optimised Time (s)

Ocula 4 Disparity - 3.3MPixel Stereo

~ 5 times faster on the GPU … and more speed to come!

SLIDE 20

Under Development…Examples

Heterogeneous Compute

– Run graphs of kernels using scheduler – Target all available compute devices – Target data parallelism

BLINK for Real-time

– Export BLINK graphs from Nuke to run in BLINKPlayer – Kernels can be modified in BLINKPlayer – Parameters can be introspected from kernels and presented as GUI widgets – Composite live and rendered imagery

SLIDE 21

BLINK: A GPU-Enabled Image Processing Framework

Mark Davey Lead HPC Engineer The Foundry

The Foundry and HPC

– Founded in 1996 – We develop award-winning visual effects, computer graphics and design software used globally by leading artists and designers

– We create frameworks to make best use of all available compute devices – “make things go faster” – Initial target: 2D Image Processing

2D Image Processing

Used in such effects as:

Need to make it as fast as possible!

Moving to GPUs

render farms)

(debugging, maintenance, new hardware, etc.)

The Solution - BLINK

conditions, etc.

BLINK - Features

BLINK Back-ends

(4.2, Compute Capability 2.0)

(1.1)

(1.2)

(Scalar, SSE2, SSE4.1, AVX, AVX2)

BLINK Example

BLINK Example

BLINK Example

BLINK - The Foundry

Nuke – Post Production Compositing Software

– Customers can create kernels within Nuke for GPU and CPU – Multi-GPU support on selected configurations

Projects

OCULA

– Correct colour and focus, automatically correct alignment, retime

OCULA 4 – Disparity Generation

OCULA 4 – Different Devices

Numerical Identity I

processing on different devices.

differences in mathematical results (e.g. OCULA!)

identity

Numerical Identity – General Overview

differently giving inconsistent results.

Numerical Identity – Platform Specifics

CUDA (nvcc flags)

CPU:

( using _mm_setcsr intrinsic )

Also supported for OpenCL (NVIDIA GPUs only)

OCULA 4 - Results

OCULA 4 - Results

Under Development…Examples

– Run graphs of kernels using scheduler – Target all available compute devices – Target data parallelism

– Export BLINK graphs from Nuke to run in BLINKPlayer – Kernels can be modified in BLINKPlayer – Parameters can be introspected from kernels and presented as GUI widgets – Composite live and rendered imagery

Thank You

Questions?