High Quality Real Time Image Processing Framework on Mobile Platforms - PowerPoint PPT Presentation

High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1 Eyal Hirsch

SagivTech Snapshot • Established in 2009 and headquartered in Israel • Core domain expertise: GPU Computing and Computer Vision • What we do: - Technology - Solutions - Projects - EU Research - Training • GPU expertise: - Hard core optimizations - Efficient streaming for single or multiple GPU systems - Mobile GPUs SagivTech Ltd. proprietary information - for internal use only

Mobile is everywhere • The new era of mobile SagivTech Ltd. proprietary information - for internal use only

As mobile devices get smarter • In the beginning: I can talk from anywhere ! • A bit later: My phone can take pictures ! • Now: – Advanced camera – More compute power – Fast device – cloud communication • What can be done with those advancements? SagivTech Ltd. proprietary information - for internal use only

Project Tango • Mission: Running a depth sensing technology on a mobile platform • Challenge: First time on NVIDIA’s Tegra K1 • Extreme optimizations on a CPU-GPU platform to allow the device to handle other tasks in parallel • Expertise: • Mantis Vision – the algorithms • NVIDIA – the Tegra K1 platform • SagivTech – the GPU computing expertise • Bottom line: Depth sensing running in real time in parallel to other compute intensive applications ! SagivTech Ltd. proprietary information - for internal use only

Project Tango Credits: http://techaeris.com SagivTech Ltd. proprietary information - for internal use only

Mobile Crowdsourcing Video Scene Reconstruction • If you’ve been to a concert recently, you’ve probably seen how many people take videos of the event with mobile phone cameras • Each user has only one video – taken from one angle and location and of only moderate quality SagivTech Ltd. proprietary information - for internal use only

The Idea behind SceneNet • Leverage the power of multiple mobile phone cameras • Create a high-quality 3D video experience that is sharable via social networks SagivTech Ltd. proprietary information - for internal use only

Creation of the 3D Video Sequence TIME Following time The video data is The scene is photographed by synchronization, resolution transmitted via the several people using their cell normalization and spatial cellular network to a phone camera registration, the several videos High Performance are merged into a 3-D video Computing server. cube. SagivTech Ltd. proprietary information - for internal use only

Algorithms implemented on the TK1 • Enabling the 3D reconstruction for SceneNet required various algorithms to run on the TK1 GPU – FREAK: Fast Retina Key point – BRISK: Binary Robust Invariant Scalable Key points – DoG: Difference of Gaussians • Algorithms had to run in real-time • Algorithms are image processing building blocks for various image processing tasks SagivTech Ltd. proprietary information - for internal use only

Freak &DoG performance on the TK1 • DoG: – Input: 480 x 640 RGB Image – Output: ~32K key points • Freak: – Input: ~32K key points, Image – Output: Descriptor per key point • Majority of the code on the GPU • Off loading to the GPU allows for real time processing, not possible on the CPU SagivTech Ltd. proprietary information - for internal use only

DoG performance on the TK1 • DoG flow: Kernel Avg. time (ms) – Gaussian Misc 0.3 – DiffImage Gaussian: Conv2D 4.8 – Find Key points Gaussian: DownSampleBilinear 0.6 DiffImage 1.7 • Total: 10.83 ms FindKeyPoints 3.43 Total DoG 10.83 SagivTech Ltd. proprietary information - for internal use only

FREAK performance on the TK1 • FREAK flow: – IntegralImage Kernel Avg. time (ms) – Extract IntegralImage 1.5 descriptors GetDescriptors 0.9 • Total: 2.4 ms Total FREAK 2.4 • Total DoG + FREAK: 13.23 ms SagivTech Ltd. proprietary information - for internal use only

Freak &DoG performance on the TK1 • 13 ms means real time processing on Ardbeg development board !!! • Room for more tasks to run in the background • Opens up possibilities for many mobile applications • Having real time performance is not enough • Need to evaluate power consumption as well SagivTech Ltd. proprietary information - for internal use only

Performance is also GFlops/WATT SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 GPU • CUDA – NVIDIA • OpenCL – Khronos • RenderScript – Developed by Google SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 - CUDA • Most rules and methods that apply to discrete cards, apply to the TK1 GPU • Code and libraries (such as cuFFT, cuBLAS, cuSPARSE, CUB, Thrust, etc) should work out of the box for the TK1 • Develop on Windows/Linux with discrete card and then migrate to the TK1 • Use the profiler SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 - OpenCL • Most of the tips for CUDA applies to OpenCL • Runs nicely and shows nice performance • Migrated the in-house Bilateral filter from CUDA to OpenCL in less than a day • 2D separable convolution yield nice performance gains (compared to an optimized Neon implementation) SagivTech Ltd. proprietary information - for internal use only

2D separable convolution on the TK1 • Used 4 tests configuration to evaluate performance – Highly optimized reference library utilizing the NEON (CPU) – SagivTech’s in-house Neon implementation (CPU) – SagivTech’s in-house OpenCL implementation (GPU) T est configuration 1K x 1K 2K x 2K Reference library 22 97 ST single core NEON 23.5 99 ST 4 cores NEON 10.8 48 ST OpenCL 4 9 SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 – RenderScript - 1 • Google’s way of doing Compute on a mobile platform • Quick CUDA to RenderScript acronym translation: – User manages allocations (a.k.a buffers) – User manages data transfer/copies to/from allocations – User sets runtime parameters (a.k.a kernel params) – User launches kernels much like OpenCL/CUDA • Code ran on the GPU and yielded impressive performance boost (still lags behind CUDA) • CUDA to RS migration fairly easy SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 – RenderScript - 2 • Google does NOT mandate which SoC component will run the RS code • Developer has no control where RS code will run • Depends on specific hardware, vendors, code, etc • To test RS on TK1, locked GPU clocks in different configurations and run RS sparse matrix vector multiplication benchmark • Performance of the RS code under different clocks, would reveal which component ran RS code SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 – RenderScript - 3 • Sparse matrix vector multiplication using Render script • Used 3 test configurations Chart Title – Naive C++ CPU code 45 40 – SagivTech RS 35 30 – NVIDIA’s cuSparse 25 20 15 10 5 • RS running on GPU 0 GPU: Full GPU: Half GPU: • RS shows nice performance clocks clocks Quarter clocks Naive C++ SagivTech RS NVIDIA cuSparse SagivTech Ltd. proprietary information - for internal use only

Programming the TK1 – Optimization tips • Only one SMX • We’ve seen cases where different optimizations behave differently on the TK1 than on equivalent discrete card (such as __ldg etc) • Try various optimizations, in some cases we got better performance when using atomics rather than shared memory • Always optimize on the TK1 and not on discrete used for the development phase SagivTech Ltd. proprietary information - for internal use only

The future • Real time image processing of even complex algorithms is achievable on the TK1 • Easy migration from mature discrete GPU code to new and exiting field of mobile compute • Maxwell is already planed for next mobile generation, bringing more power efficiency and performance • It works!! SagivTech Ltd. proprietary information - for internal use only

Thank You F o r m o r e i n f o r m a t i o n p l e a s e c o n t a c t E y a l H i r s c h e y a l @ s a g i v t e c h . c o m

Programming the TK1 – General tips • TK1 hardware CC is 3.2 • Tools and compilation chain is quite different. Need some time to get started • Strive to do the CUDA and managing app/code in Windows/Linux using a discrete card and then migrate to Android • Always have a reference code in naive, single thread C++ to compare the results of the parallel algorithm SagivTech Ltd. proprietary information - for internal use only

Computational Photography: examples … • Background subtitution SagivTech Ltd. proprietary information - for internal use only

FREAK – Fast Retina Keypoint • Binary feature descriptor • Hamming distance matcher • Sampling pattern • Overlapping receptive fields • Exponential change in size • Rotation invariant SagivTech Ltd. proprietary information - for internal use only

BRISK – Binary Robust Invariant Scalable Key points • Binary feature descriptor • Hamming distance matcher • Sampling pattern • Equally spaced in circles • Gaussian kernel size relative to distance from feature SagivTech Ltd. proprietary information - for internal use only

High Quality Real Time Image Processing Framework on Mobile Platforms - PowerPoint PPT Presentation

High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1 Eyal Hirsch SagivTech Snapshot Established in 2009 and headquartered in Israel Core domain expertise: GPU Computing and Computer Vision What we

Introduction: What is Image Processing? CS 4640: Image Processing Basics January 10, 2012 What

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Real graduates, Real graduates, real transitions, real transitions, real stories: real

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

Image Processing CS 110 Why Image Processing? Medical Images

Real Real Real Time Real-Time Time Time Model Checking Model Model Checking Model

Color image processing The use of color in image processing is primarily motivated by two Image

Image restoration IMAGE P ROCES S IN G IN P YTH ON Rebeca Gonzalez Data Engineer Restore an

Image Transforma1ons image filtering : change range of image Image Processing : g(x) =

Q34.5 For the two examples is the image real or virtual? For the two examples, is the image real

David Tschumperl Image Team, GREYC / CNRS (UMR 6072) IPOL Workshop on Image Processing

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

Modeling and Correcting the Time- Dependent ACS PSF for Weak Lensing Jason Rhodes, JPL With:

Recovering and Reprocessing Resources from Waste Tabled on 6 June 2019 This presentation

Programmatic CDM Project Using Municipal Organic Waste of 64 Districts of Bangladesh Presented by:

RECYCLING PLAN Development 2013 2009 -2010 Residential City Council Stakeholder Revises

Overview SVM theoretical framework ORACLE data mining technology SVM parameter

Girosi, Jones, and Poggio Regularization theory and neural network architectures presented by

Particle filtering in geophysical systemes: Problems and potential solutions Peter Jan van

The Kernel Matrix Diffie-Hellman Assumption Carla Rfols 1 , Paz Morillo 2 and Jorge L. Villar 2 1