BLINK: A GPU-Enabled Image Processing Framework Mark Davey Lead - - PowerPoint PPT Presentation

blink a gpu enabled image processing
SMART_READER_LITE
LIVE PREVIEW

BLINK: A GPU-Enabled Image Processing Framework Mark Davey Lead - - PowerPoint PPT Presentation

BLINK: A GPU-Enabled Image Processing Framework Mark Davey Lead HPC Engineer The Foundry The Foundry and HPC The Foundry Founded in 1996 We develop award-winning visual effects, computer graphics and design software used globally


slide-1
SLIDE 1

BLINK: A GPU-Enabled Image Processing Framework

Mark Davey Lead HPC Engineer The Foundry

slide-2
SLIDE 2

The Foundry and HPC

  • The Foundry

– Founded in 1996 – We develop award-winning visual effects, computer graphics and design software used globally by leading artists and designers

  • HPC

– We create frameworks to make best use of all available compute devices – “make things go faster” – Initial target: 2D Image Processing

slide-3
SLIDE 3

2D Image Processing

  • A fundamental component in many Foundry products.

Used in such effects as:

  • Noise reduction
  • Keying
  • Motion and disparity estimation
  • Colour correction/grading
  • Panoramic stitching
  • 3D texture creation

Need to make it as fast as possible!

slide-4
SLIDE 4

Moving to GPUs

  • Traditionally used the CPU for image processing
  • Lots of legacy code
  • GPUs are great at image processing
  • Our customers often have powerful GPUs but not always (e.g.

render farms)

  • Need a fallback CPU path
  • Do not want to write same code multiple times

(debugging, maintenance, new hardware, etc.)

slide-5
SLIDE 5

The Solution - BLINK

  • “Write once, deploy everywhere”
  • Image processing algorithms expressed as kernels
  • Kernels written in a C++ like, domain-specific language
  • Kernels run over an iteration space
  • Metadata expresses access patterns, image formats, boundary

conditions, etc.

  • Kernels are translated into different back-ends
  • JIT Compilation for many paths
slide-6
SLIDE 6

BLINK - Features

  • Multiple back-ends supported
  • Consistent results across devices
  • Range of image formats and layouts available
  • Kernel execution strategy left to framework
  • Profiling (execute and transfer)
slide-7
SLIDE 7

BLINK Back-ends

  • CUDA

(4.2, Compute Capability 2.0)

  • OpenCL

(1.1)

  • GLSL

(1.2)

  • x86

(Scalar, SSE2, SSE4.1, AVX, AVX2)

slide-8
SLIDE 8

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

slide-9
SLIDE 9

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

slide-10
SLIDE 10

BLINK Example

class GainImage: ImageComputationKernel<eComponentWise>

{ param: Image<eRead, ePoint> src; Image<eWrite, ePoint> dst; float gain; void define(){ defineParam(gain, “myGain” , 1.0f); } void process(){ dst() = src() * gain; } };

slide-11
SLIDE 11

BLINK - The Foundry

Nuke – Post Production Compositing Software

  • Many key plug-ins written using BLINK
  • BlinkScript

– Customers can create kernels within Nuke for GPU and CPU – Multi-GPU support on selected configurations

  • OCULA 4 – Stereoscopic Toolset

Projects

  • ASAP – A Scalable 2D/3D Architecture for Cross Media Virtual Production
  • Dreamspace – Advancements in Virtual Production Frameworks
slide-12
SLIDE 12

OCULA

  • A collection of Nuke tools to handle stereoscopic imagery
  • Vector Disparity Generator at its heart

– Correct colour and focus, automatically correct alignment, retime

  • Latest version (4) written using BLINK
  • Over 12K kernel calls per frame!
slide-13
SLIDE 13

OCULA 4 – Disparity Generation

slide-14
SLIDE 14

OCULA 4 – Different Devices

slide-15
SLIDE 15

Numerical Identity I

  • Our customers need visually identical results when

processing on different devices.

  • Some algorithms are extremely sensitive to small

differences in mathematical results (e.g. OCULA!)

  • Need to ensure numerical identity to guarantee visual

identity

slide-16
SLIDE 16

Numerical Identity – General Overview

  • Disable fast math - to prevent compiler from reordering math operations.
  • Force floating point literals to single precision - different compilers treat double literals

differently giving inconsistent results.

  • Disable Fused-Multiply-Add (FMA)
  • Implement unified math library for all code paths

– Algebraic functions sqrt, hypot … – Transcendental functions sin, exp … – Integral rounding functions ceil, floor … – IEEE standard functions fmod, fabs … – Matrices and operators transpose, inverse … – Vectors and operators dot, cross … – Others min, max …

slide-17
SLIDE 17

Numerical Identity – Platform Specifics

CUDA (nvcc flags)

  • Disable “Flush Denormals To Zero” (--ftz=false)
  • Disable “Fused Multiply Add” (--fmad=false)
  • Enable precise square root and divide (--prec-sqrt=true --prec-div=true)

CPU:

  • Precisely control FPU control register for rounding, denormal handing, etc

( using _mm_setcsr intrinsic )

  • Implement vector types (float1..float4, int1..int4,...)

Also supported for OpenCL (NVIDIA GPUs only)

slide-18
SLIDE 18

OCULA 4 - Results

  • Disparity generation
  • 3.3MPixel (2560x1350) frames
  • End-to-end processing cost
  • Only 5% overhead for Numerical Identity
  • Many kernels are memory bound

1 2 3 4 5 6 7 8 9 K5000 - Unified Math K5000 - Optimised Time (s)

OCULA 4 -Disparity - 3.3MPixel - Unified Math

slide-19
SLIDE 19

OCULA 4 - Results

5 10 15 20 25 30 35 CPU - 2x 6-Core Xeon K5000 - Unified Math K5000 - Optimised Time (s)

Ocula 4 Disparity - 3.3MPixel Stereo

~ 5 times faster on the GPU … and more speed to come!

slide-20
SLIDE 20

Under Development…Examples

  • Heterogeneous Compute

– Run graphs of kernels using scheduler – Target all available compute devices – Target data parallelism

  • BLINK for Real-time

– Export BLINK graphs from Nuke to run in BLINKPlayer – Kernels can be modified in BLINKPlayer – Parameters can be introspected from kernels and presented as GUI widgets – Composite live and rendered imagery

slide-21
SLIDE 21

Thank You

Questions?