Accelerate Framework & the Armadillo Library
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
Accelerate Framework & the Armadillo Library Instructor - Simon - - PowerPoint PPT Presentation
Accelerate Framework & the Armadillo Library Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Today Motivation Accelerate Framework BLAS & LAPACK Armadillo Library Algorithm Software Architecture
Instructor - Simon Lucey
16-623 - Designing Computer Vision Apps
Correlation Filters with Limited Boundaries
Hamed Kiani Galoogahi Istituto Italiano di Tecnologia Genova, Italy
hamed.kiani@iit.itTerence Sim National University of Singapore Singapore
tsim@comp.nus.edu.sgSimon Lucey Carnegie Mellon University Pittsburgh, USA
slucey@cs.cmu.eduAbstract
Correlation filters take advantage of specific proper- ties in the Fourier domain allowing them to be estimated efficiently: O(ND log D) in the frequency domain, ver- sus O(D3 + ND2) spatially where D is signal length, and N is the number of signals. Recent extensions to cor- relation filters, such as MOSSE, have reignited interest of their use in the vision community due to their robustness and attractive computational properties. In this paper we demonstrate, however, that this computational efficiency comes at a cost. Specifically, we demonstrate that only 1 D proportion of shifted examples are unaffected by boundary effects which has a dramatic effect on detection/trackingNames: MMX, SSE, SSE2, …
x
4-way
SIMD (Single Instruction, Multiple Data)
power consumption (up to 130W per chip)
units)
multi-core (multiple CPU’s on one chip)
7
Taken from http://people.maths.ox.ac.uk/gilesm/cuda/lecs/lec0.pdf
2010 2015
(Taken from http://bgr.com/2016/08/22/galaxy-note-7-vs-iphone-6-speed-test/)
Names: MMX, SSE, SSE2, …
x
4-way
SIMD (Single Instruction, Multiple Data)
losses in performance are coming from.
called “instruments”.
instruments in Xcode.
(https://developer.qualcomm.com/software/fastcv-sdk)
(https://www.khronos.org/openvx/) (http://opencv.org/itseez-announces-release-of-accelerated-cv-library.html)
(https://github.com/BradLarson/GPUImage)
Implementation
Community driven open source library Open standard API designed to be implemented by hardware vendors
Conformance
Extensive OpenCV Test Suite but no formal Adopters program Implementations must pass defined conformance test suite to use trademark
Consistency
Available functions can vary depending on implementation / platform All core functions must be available in all conformant implementations
Scope
Very wide 1000s of imaging and vision functions Multiple camera APIs/interfaces Tight focus on core hardware accelerated functions for mobile vision – but extensible Uses external/native camera API
Efficiency
Memory-based architecture Each operation reads and writes to memory Graph-based execution Optimizable computation and data transfer
Typical Use Case
Rapid experimentation and prototyping - especially on desktop Production development & deployment on mobile and embedded devices
Embedded Deployment
Re-usable code Callable library
(Taken from https://www.khronos.org/openvx/)
Taken from: http://www.mactech.com/sites/default/files/Biggus-Accelerate_IV.pdf
1980 1990 2000 2010
LAPACK vDSP vImage vForce vMathLib vBasicOps vBigNum BLAS
Taken from: http://www.mactech.com/sites/default/files/Biggus-Accelerate_IV.pdf
“image operations” “matrix operations” “signal processing” “misc math” “basic neural network subroutines”
(Taken from https://www.bignerdranch.com/blog/neural-networks-in-ios-10-and-macos/ )
5 10 15 20 25 30 35 40 45 50 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
matrix size
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz
Performance [Gflop/s] Memory hierarchy: 20x Vector instructions: 4x Multiple threads: 4x
>> A*B (in MATLAB)
Taken from Markus Püschel - “How to Write Fast Numerical Code”.
subroutine library (LINPACK).
most powerful supercomputers
(e.g., MMM as double loop over scalar products of vectors)
80s)
Taken from Markus Püschel - “How to Write Fast Numerical Code”.
you operate with matrices and vectors and do not write your
BLAS LAPACK
“Computer Vision Algorithms”
compressed!!!).
BLAS LAPACK
“Computer Vision Algorithms”
60Mb (see link).
link).
vision app is its size.
33
Accelerate Framework comes “built in” to all iOS devices. NOTHING TO DOWNLOAD!!
BLAS LAPACK
“Computer Vision Algorithms”
MAC OS X).
Objective C in iOS and other mobile platforms.
comparison to OpenCV.
https://github.com/slucey-cs-cmu-edu/OpenCV_vs_Armadillo
command line. $ git clone https://github.com/slucey-cs-cmu-edu/OpenCV_vs_Armadillo.git
multiplication, SVD, Backslash, and FFT.
https://github.com/slucey-cs-cmu-edu/Intro_iOS_Armadillo
command line. $ git clone https://github.com/slucey-cs-cmu-edu/Intro_iOS_Armadillo.git