Accelerate Framework & the Armadillo Library Instructor - Simon - PowerPoint PPT Presentation

Accelerate Framework & the Armadillo Library Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

Today • Motivation • Accelerate Framework • BLAS & LAPACK • Armadillo Library

Algorithm Software Architecture SOC Hardware

Correlation Filters with Limited Boundaries Hamed Kiani Galoogahi Terence Sim Simon Lucey Algorithm Istituto Italiano di Tecnologia National University of Singapore Carnegie Mellon University Genova, Italy Singapore Pittsburgh, USA hamed.kiani@iit.it tsim@comp.nus.edu.sg slucey@cs.cmu.edu Abstract Correlation filters take advantage of specific properties in the Fourier domain allowing them to be estimated efficiently: O ( ND log D ) in the frequency domain, versus O ( D 3 + ND 2 ) spatially where D is signal length, and N is the number of signals. Recent extensions to cor- Software (a) (b) relation filters, such as MOSSE, have reignited interest of their use in the vision community due to their robustness and attractive computational properties. In this paper we demonstrate, however, that this computational efficiency Ax = b 1 comes at a cost. Specifically, we demonstrate that only D proportion of shifted examples are unaffected by boundary effects which has a dramatic effect on detection/tracking � � (c) (d) performance. In this paper, we propose a novel approach to correlation filter estimation that: (i) takes advantage of Figure 1. (a) Defines the example of fixed spatial support within inherent computational redundancies in the frequency do- the image from which the peak correlation output should occur. main, (ii) dramatically reduces boundary effects, and (iii) (b) The desired output response, based on (a), of the correlation is able to implicitly exploit all possible patches densely ex- filter when applied to the entire image. (c) A subset of patch ex- tracted from training examples during learning process. Im- amples used in a canonical correlation filter where green denotes pressive object tracking and detection results are presented a non-zero correlation output, and red denotes a zero correlation in terms of both accuracy and computational efficiency. output in direct accordance with (b). (d) A subset of patch ex- Architecture amples used in our proposed correlation filter. Note that our proposed approach uses all possible patches stemming from different 1. Introduction parts of the image, whereas the canonical correlation filter simply employs circular shifted versions of the same single patch. The Correlation between two signals is a standard approach central dilemma in this paper is how to perform (d) efficiently in to feature detection/matching. Correlation touches nearly the Fourier domain. The two last patches of (d) show that D − 1 T every facet of computer vision from pattern detection to ob- patches near the image border are affected by circular shift in our ject tracking. Correlation is rarely performed naively in the method which can be greatly diminished by choosing D << T , where D and T indicate the length of the vectorized face patch in spatial domain. Instead, the fast Fourier transform (FFT) (a) and the whole image in (a), respectively. affords the efficient application of correlating a desired template/filter with a signal. Correlation filters, developed initially in the seminal proach is that it attempts to learn the filter in the frequency work of Hester and Casasent [15], are a method for learning domain due to the efficiency of correlation in that domain. a template/filter in the frequency domain that rose to some prominence in the 80s and 90s. Although many variants Interest in correlation filters has been reignited in the vi- have been proposed [15, 18, 20, 19], the approach’s central sion world through the recent work of Bolme et al. [5] on SOC Hardware tenet is to learn a filter, that when correlated with a set of Minimum Output Sum of Squared Error (MOSSE) correla- training signals, gives a desired response, e.g. Figure 1 (b). tion filters for object detection and tracking. Bolme et al.’s Like correlation, one of the central advantages of the ap- work was able to circumvent some of the classical problems

Algorithm Software Architecture Hardware

Algorithm 4-way x + Software SIMD (Single Instruction, Multiple Data) � � Architecture � (length 2, 4, 8, …) vectors of integers or floats � Names: MMX, SSE, SSE2, … � SOC Hardware � �

Reminder: CPU clock is stuck!!!! • CPU clock stuck at about 3GHz since 2006 due to high power consumption (up to 130W per chip) • chip circuitry still doubling every 18-24 months • ⇒ more on-chip memory and MMU (memory management units) • ⇒ specialised hardware (e.g. multimedia, encryption) ⇒ multi-core (multiple CPU’s on one chip) • peak performance of chip still doubling every 18-24 months 7 Taken from http://people.maths.ox.ac.uk/gilesm/cuda/lecs/lec0.pdf

2015 2010

(Taken from http://bgr.com/2016/08/22/galaxy-note-7-vs-iphone-6-speed-test/)

Architecture Considerations • Memory hierarchy. • Vector instructions. 4-way x + SIMD (Single Instruction, Multiple Data) • Multiple threads. • Branch Prediction. � � � (length 2, 4, 8, …) vectors of integers or floats � Names: MMX, SSE, SSE2, … � � �

Writing fast vision code….. • In general you should NOT be trying to do these optimizations yourself. • BUT, you should be using tools to find where the biggest losses in performance are coming from. • Xcode comes with an excellent tool for doing this which is called “instruments”. • Ray Wenderlich has a useful tutorial (see link) on using instruments in Xcode. • More on this in later lectures.

Emerging Alternatives to OpenCV (https://developer.qualcomm.com/software/fastcv-sdk) (https://www.khronos.org/openvx/) (http://opencv.org/itseez-announces-release-of-accelerated-cv-library.html) GPUImage (https://github.com/BradLarson/GPUImage)

OpenVX versus OpenCV Open standard API designed to be Implementation Community driven open source library implemented by hardware vendors Extensive OpenCV Test Suite but Implementations must pass defined Conformance no formal Adopters program conformance test suite to use trademark Available functions can vary depending on All core functions must be available in all Consistency implementation / platform conformant implementations Very wide Tight focus on core hardware accelerated Scope 1000s of imaging and vision functions functions for mobile vision – but extensible Multiple camera APIs/interfaces Uses external/native camera API Memory-based architecture Graph-based execution Efficiency Each operation reads and writes to memory Optimizable computation and data transfer Rapid experimentation and Production development & deployment on Typical Use Case prototyping - especially on desktop mobile and embedded devices Embedded Re-usable code Callable library Deployment (Taken from https://www.khronos.org/openvx/)

Accelerate Framework

Accelerate Framework Jaguar iOS 4 Tiger iOS 5 (vForce) Taken from: http://www.mactech.com/sites/default/files/Biggus-Accelerate_IV.pdf

Accelerate Framework 1980 1990 2000 2010 LAPACK BLAS vForce vMathLib vDSP vBasicOps vBigNum vImage Taken from: http://www.mactech.com/sites/default/files/Biggus-Accelerate_IV.pdf

Accelerate Framework “image operations” “matrix operations” “signal processing” “misc math” BNNS (2016) “basic neural network subroutines” (Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )

Matrix-Matrix Multiplication (MMM) Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz Performance [Gflop/s] 50 MMM kernel function 45 40 35 30 Multiple threads: 4x 25 20 15 10 Vector instructions: 4x 5 Memory hierarchy: 20x 0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 matrix size Compiler doesn’t do the job >> A*B (in MATLAB) � � Taken from Markus Püschel - “How to Write Fast Numerical Code”.

BLAS • Basic Linear Algebra Subprograms • Level 1 (70s) • Level 2 (mid 80s) • Level 3 (late 80s) • BLAS was originally used to implement the linear algebra subroutine library (LINPACK).

The Path to LAPACK • EISPACK and LINPACK (early 70s) • Libraries for linear algebra algorithms • Jack Dongarra, Jim Bunch, Cleve Moler, Gilbert Stewart • LINPACK still the name of the benchmark for the TOP500 (Wiki) list of most powerful supercomputers • Problem • Implementation vector-based = low operational intensity   (e.g., MMM as double loop over scalar products of vectors) • Low performance on computers with deep memory hierarchy (in the 80s) • Solution: LAPACK • Reimplement the algorithms “block-based,” i.e., with locality • Developed late 1980s, early 1990s • Jim Demmel, Jack Dongarra et al. Taken from Markus Püschel - “How to Write Fast Numerical Code”.

Availability of LAPACK • LAPACK available on nearly all platforms. • Numerous implementations, • Intel MKL (Windows, Linux, OS X) • AMD ACML • OpenBLAS (Windows, Linux, Android, OS X) • Apple Accelerate (OS X, iOS)

Which is Easier to Follow?

Accelerate Framework & the Armadillo Library Instructor - Simon - PowerPoint PPT Presentation

Accelerate Framework & the Armadillo Library Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation Accelerate Framework BLAS & LAPACK Armadillo Library Algorithm Software Architecture

Armadillo By Jessie An armadillo is like 5.9-9. 8 in. long,5.9-9.0 tall. Brown, and black scales

Armadillo 9XR Start Page Performance Safety Comfort Tech. Data Economy Armadillo 9XR | Rider

Armadillo C++ linear algebra library Umair Muslim 341318 Overview Introduction

Practical Cryptanalysis of ARMADILLO-2 Mar a Naya-Plasencia and Thomas Peyrin University of

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Oracle Accelerate for Midsize Companies Ian Boyling, Director and Lead Consultant Prject (EU)

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

CSSE490 Web Services Development Introductions, REST, and Back-end tools Motivation for this

Conditional Probability We always use all available information when we assess the probability of

Comments on The Implications of Digital Currencies for Monetary Policy and the International

ME? VULNEX: www.vulnex.com Blog: www.simonroses.com Twitter: @simonroses TALK OBJECTIVES

How to attend a MSURA Virtual Meeting Presenter: Rick Vogt, Vice President, MSU Retirees

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

Metaphor Structure of reality built up through embodied interaction Categories created

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Accelerate Framework & the Armadillo Library Instructor - Simon - PowerPoint PPT Presentation

Accelerate Framework & the Armadillo Library Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation Accelerate Framework BLAS & LAPACK Armadillo Library Algorithm Software Architecture

Armadillo By Jessie An armadillo is like 5.9-9. 8 in. long,5.9-9.0 tall. Brown, and black scales

Armadillo 9XR Start Page Performance Safety Comfort Tech. Data Economy Armadillo 9XR | Rider

Armadillo C++ linear algebra library Umair Muslim 341318 Overview Introduction

Practical Cryptanalysis of ARMADILLO-2 Mar a Naya-Plasencia and Thomas Peyrin University of

ACCELERATE AUDIT ACCELERATE ATTAIN ALIGN ACCREDIT THE 4 STAGE PROCESS ACCELERATE ACCREDIT

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

The Homeschooling - Library Connection Diane Pamel- Library Director Southworth Library and

Eric Lashley Library Director, Georgetown Public Library (TX) Patrick Lloyd, LMSW Community

Oracle Accelerate for Midsize Companies Ian Boyling, Director and Lead Consultant Prject (EU)

Using GPU VSIPL &amp; CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

Library RMR Project Renovate, Modernize, Reorganize Library Serves Patrons of Every Age This

King Fahd University of Petroleum &amp; Minerals Deanship of Library Affairs KFUPM Library

PopUp Library @ Senior Center Whats a PopUp Library? Library services somewhere that is

What do you do with the temporarily placed programs? The problem is more widespread than just

CSSE490 Web Services Development Introductions, REST, and Back-end tools Motivation for this

Conditional Probability We always use all available information when we assess the probability of

Comments on The Implications of Digital Currencies for Monetary Policy and the International

ME? VULNEX: www.vulnex.com Blog: www.simonroses.com Twitter: @simonroses TALK OBJECTIVES

How to attend a MSURA Virtual Meeting Presenter: Rick Vogt, Vice President, MSU Retirees

Donor Recognition For financial contributions received July 1, 2016 - February 28, 2017 2016 -

Metaphor Structure of reality built up through embodied interaction Categories created

Financial State of the Club Used the CPI to eliminate inflation (all in 2019 $) Going back

Using GPU VSIPL & CUDA to Accelerate RF Clutter Simulation Accelerate RF Clutter Simulation

King Fahd University of Petroleum & Minerals Deanship of Library Affairs KFUPM Library