learning and computer vision Sample slides only Presenter: Prof. - PowerPoint PPT Presentation

Fast Convolution Algorithms for deep learning and computer vision Sample slides only Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr

Outline • 1D convolutions Linear & Cyclic 1D convolutions Discrete Fourier Transform, Fast Fourier Transform Winograd algorithm • Linear & Cyclic 2D convolutions • Applications in deep learning Convolutional neural networks

Motivation • Fast implementation of 1D and 2D digital filters Image filtering Image feature calculation • Gabor filters • Fast implementation of 1D and 2D correlation Template matching Correlation tracking • Machine learning Convolutional Neural Networks

Linear 1D convolution • The one-dimensional (linear) convolution of: • an input signal 𝑦 and • a convolution kernel ℎ (filter finite impulse response) of length 𝑂 : 𝑂−1 𝑧 𝑙 = ℎ 𝑙 ∗ 𝑦 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 − 𝑗 𝑗=0 • For a convolution kernel centered around 0 and 𝑂 = 2𝑤 + 1 , it takes the form: 𝑤 𝑧 𝑙 = ℎ 𝑙 ∗ 𝑦 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 − 𝑗 𝑗=−𝑤

Linear 1D convolution - Example Image source: http://electricalacademia.com/signals-and-systems/example-of-discrete-time-graphical-convolution/

Linear 1D correlation • Correlation of template ℎ and input signal 𝑦 𝑙 : 𝑂−1 𝑠 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 + 𝑗 𝑗=0 • Input signal is not flipped. • It is used for template matching and for object tracking in video. • It is often confused with convolution: they are identical only if h is centered at and is symmetric about i=0 .

Cyclic 1D convolution • One-dimensional cyclic convolution of length N , (𝑙) 𝑂 = 𝑙 𝑛𝑝𝑒 𝑂 : 𝑂−1 𝑧 𝑙 = 𝑦 𝑙 ⊛ ℎ 𝑙 = ෍ ℎ 𝑗 𝑦(( (𝑙 − 𝑗) 𝑂 )) 𝑗=0 • Embedding linear convolution in a cyclic convolution 𝑧 𝑜 = 𝑦 𝑦 ⊗ ℎ 𝑜 of length 𝑂 ≥ 𝑀 + 𝑁 − 1 and then performing a cyclic convolution of length N : 𝑂−1 𝑦 𝑂 𝑗 ℎ 𝑜 (( (𝑙 − 𝑗) 𝑂 )) 𝑧 𝑙 = 𝑦 𝑙 ⊛ ℎ 𝑙 = σ 𝑗=0

Cyclic Convolution via DFT Cyclic convolution can also be calculated using 1D DFT: 𝒛 = 𝐽𝐸𝐺𝑈(𝐸𝐺𝑈 𝒚 𝐸𝐺𝑈 𝒊 )

1D FFT • There are a few algorithms to speed up the calculation of DFT. • The most well known is the radix-2 decimation-in-time ( DIT ) Fast Fourier Transform ( FFT ) (Cooley-Tuckey). 1. The DFT of a sequence 𝑦(𝑜) of length 𝑂 is: 𝑂−1 𝑦(𝑜) 𝑓 −2𝜌𝑗 𝑂 𝑜𝑙 𝑌(𝑙) = ෍ 𝑜=0 where 𝑙 is an integer ranging from 0 to 𝑂 − 1 .

1D FFT • radix-2 FFT breaks a length- N DFT into many size-2 DFTs called "butterfly" operations. • There are log 2 N stages.

Z-transform The Z-transform of a signal (function) x(n) having domain [ 0,…,N ] is given by: 𝑂−1 𝑦(𝑜)𝑨 −𝑜 𝑌(𝑨) = ෍ 𝑜=0 The domain of Z-transform is the complex plane, since z is a complex number. The following relation holds for the Z-transform: 𝑧(𝑜) = 𝑦(𝑜) ∗ ℎ(𝑜) ⇔ 𝑍(𝑨) = 𝑌(𝑨)𝐼(𝑨)

Cyclic convolution and Z-transform Where : (𝑙) 𝑂 = 𝑙 mod 𝑂 − N mod( z 1 )

Winograd algorithm Fast 1D cyclic convolution with minimal complexity • The Winograd algorithm works on small tiles of the input image. • The input tile and filter are transformed • The outputs of the transform are multiplied together in an element-wise fashion • The result is transformed back to obtain the outputs of the convolution.

Winograd algorithm Fast 1D cyclic convolution with minimal complexity • Winograd convolution algorithms or fast filtering algorithms: 𝑍 = 𝐃 𝐁𝐲⨂𝐂𝐢 • They require only 2𝑂 − 𝑤 multiplications in their middle vector product, thus having minimal complexity. • 𝜉 : number of cyclotomic polynomial factors of polynomial 𝑨 𝑂 − 1 over the rational numbers 𝑅 . • GEneral Matrix Multiplication (GEMM) BLAS or CUBLAS routines can be used.

Linear and cyclic 2D convolutions • Two-dimensional linear convolution with convolutional kernel ℎ of size 𝑂 1 × 𝑂 2 is given by: 𝑂 1 𝑂 2 𝑧 𝑙 1 , 𝑙 2 = ℎ 𝑙 1 , 𝑙 2 ∗∗ 𝑦 𝑙 1 , 𝑙 2 = ෍ ෍ ℎ 𝑗 1 , 𝑗 2 𝑦(𝑙 1 − 𝑗 1 , 𝑙 2 − 𝑗 2 ) 𝑗 1 𝑗 2 • Its two-dimensional cyclic convolution counterpart of support 𝑂 1 × 𝑂 2 is defined as: 𝑂 1 𝑂 2 𝑧 𝑙 1 , 𝑙 2 = ℎ 𝑙 1 , 𝑙 2 ⊛⊛ 𝑦 𝑙 1 , 𝑙 2 = ෍ ෍ ℎ 𝑗 1 , 𝑗 2 𝑦( 𝑙 1 − 𝑗 1 𝑂 1 , 𝑙 2 − 𝑗 2 𝑂 2 ) 𝑗 1 𝑗 2

2D Convolution - Example • With Padding

Applications • Convolutional neural networks • Signal processing Signal filtering Signal restoration Signal deconvolution • Signal analysis Time delay estimation Distance calculation (e.g., sonar) 1D template matching

Convolutional Neural Networks Convergence of machine learning and signal processing processing • Two step architecture: • First layers with sparse NN connections: convolutions. • Fully connected final layers. • Need for fast convolution calculations.

Convolutional Layer For RGB images • For a convolutional layer 𝑚 with an activation function 𝑔 𝑚 (∙) , multiple incoming features 𝑒 𝑗𝑜 and one single output feature 𝑝. Multiple input features to single feature 𝒑 transformation (𝑚) (𝑚) 𝑟 1 𝑟 2 𝑒 𝑗𝑜 𝑧 𝑚 (𝑗, 𝑘, 𝑝) = 𝑔 𝑐 (𝑚) + ෍ 𝑥 (𝑚) 𝑙 1 , 𝑙 2 , 𝑠, 𝑝 𝑦 (𝑚) 𝑗 − 𝑙 1 , 𝑘 − 𝑙 2 , 𝑠 ෍ ෍ 𝑚 𝑠=1 𝑙 1 =−𝑟 1 𝑙 2 =−𝑟 2 Convolutional Layer Activation Volume (3D tensor) 𝑒 𝑗𝑜 𝑚 (𝑝) = 𝑔 𝑚 (𝑠) 𝑐 𝑚 (𝑝) + ෍ 𝑿 𝑚 (𝑠, 𝑝) ∗ 𝒀 𝑗𝑘 𝑚 𝑝 : 𝑗 = 1, . . , 𝑜 𝑚 , 𝑘 = 1, . . , 𝑛 𝑚 , 𝑝 = 1, … , 𝑒 𝑝𝑣𝑢 𝑩 𝑚 = 𝑏 𝑗𝑘 𝑏 𝑗𝑘 𝑚 𝑠=1 where 𝑩 𝑚 is the activation volume for the convolutional layer 𝑚 , 𝑿 𝑚 (𝑠, 𝑝) is a 2D slice of the convolutional kernel 𝑿 (𝑚) ∈ ℝ ℎ 1 ×ℎ 2 ×𝑒 𝑗𝑜 ×𝑒 𝑝𝑣𝑢 𝑝 , 𝑐 𝑚 (𝑝) 𝑠 and for input feature output feature a scalar bias and 𝑚 (𝑠) a region of input feature 𝑠 centered at 𝑗, 𝑘 𝑈 , e.g. 𝒀 1 (1) the R channel of an image 𝑒 𝑗𝑜 = 𝐷 = 3 . 𝒀 𝑗𝑘

Deep Learning Frameworks Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

Deep Learning Frameworks • All 5 frameworks work with cuDNN as backend. • cuDNN unfortunately not open source • cuDNN supports FFT and Winograd Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

The Neon story • Developed by Nervana in 2015 • Written in Python and C • Doesn’t support Windows • Uses MKL for CPU (highly optimized by Intel) • Supports CUDA for GPU • Known mostly to be the first to implement Winograd faster than others.

Q & A Thank you very much for your attention! Contact: Prof. I. Pitas pitas@csd.auth.gr www.multidrone.eu

learning and computer vision Sample slides only Presenter: Prof. - PowerPoint PPT Presentation

Fast Convolution Algorithms for deep learning and computer vision Sample slides only Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr Outline 1D convolutions Linear & Cyclic 1D convolutions

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

Computer Aided Many sorts of computer assisted/aided Learning learning From PowerPoint

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Impasse, Conflict Impasse, Conflict and Learning of CS Notions and Learning of CS Notions David

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Year 7 Learning Evening 2017 W elcome! Year 7 Learning Evening 2017 Year 7 Learning Evening

Learning is a never-ending process Tasks come and go, but learning is forever Learn more e ff

Welcome to Welcome to The Learning Tree Workshop Series on Learning Differences, Learning

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

A Gentle Introduction to Machine Learning Supervised learning, unsupervised learning (very

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

MATRICES AND LINEAR ALGEBRA Linear Algebra Matrix manipulation is the original essence of

Fast Fourier transform (FFT) MATLAB tutorial series (Part 2.1) Pouyan Ebrahimbabaie Laboratory

Sparse Fourier Transforms Eric Price UT Austin Eric Price Sparse Fourier Transforms 1 / 36

Fast Fourier Transform Dima Kochkov 1 1 Department of Physics University of Illinois at

Shors Algorithm for Factoring: Background Quantum Algorithms In 1994, Peter Shor came up with O

, f m . . F , f = .

Fourier Series and Transform Overview Why Fourier transform? Trigonometric functions Who is

Asian SWIFT method Efficient wavelet-based valuation of arithmetic Asian options Alvaro