learning and computer vision Sample slides only Presenter: Prof. - - PowerPoint PPT Presentation

β–Ά
learning and computer
SMART_READER_LITE
LIVE PREVIEW

learning and computer vision Sample slides only Presenter: Prof. - - PowerPoint PPT Presentation

Fast Convolution Algorithms for deep learning and computer vision Sample slides only Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr Outline 1D convolutions Linear & Cyclic 1D convolutions


slide-1
SLIDE 1

Fast Convolution Algorithms for deep learning and computer vision

Sample slides only

Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr

slide-2
SLIDE 2

Outline

  • 1D convolutions

Linear & Cyclic 1D convolutions Discrete Fourier Transform, Fast Fourier Transform Winograd algorithm

  • Linear & Cyclic 2D convolutions
  • Applications in deep learning

Convolutional neural networks

slide-3
SLIDE 3

Motivation

  • Fast implementation of 1D and 2D digital filters

Image filtering Image feature calculation

  • Gabor filters
  • Fast implementation of 1D and 2D correlation

Template matching Correlation tracking

  • Machine learning

Convolutional Neural Networks

slide-4
SLIDE 4

Linear 1D convolution

  • The one-dimensional (linear) convolution of:
  • an input signal 𝑦 and
  • a convolution kernel β„Ž (filter finite impulse response) of length 𝑂:

𝑧 𝑙 = β„Ž 𝑙 βˆ— 𝑦 𝑙 = ෍

𝑗=0 π‘‚βˆ’1

β„Ž 𝑗 𝑦 𝑙 βˆ’ 𝑗

  • For a convolution kernel centered around 0 and 𝑂 = 2𝑀 +

1, it takes the form:

𝑧 𝑙 = β„Ž 𝑙 βˆ— 𝑦 𝑙 = ෍

𝑗=βˆ’π‘€ 𝑀

β„Ž 𝑗 𝑦 𝑙 βˆ’ 𝑗

slide-5
SLIDE 5

Linear 1D convolution - Example

Image source: http://electricalacademia.com/signals-and-systems/example-of-discrete-time-graphical-convolution/

slide-6
SLIDE 6

Linear 1D convolution - Example

Image source: http://electricalacademia.com/signals-and-systems/example-of-discrete-time-graphical-convolution/

slide-7
SLIDE 7

Linear 1D correlation

  • Correlation of template β„Ž and input signal 𝑦 𝑙 :

𝑠 𝑙 = ෍

𝑗=0 π‘‚βˆ’1

β„Ž 𝑗 𝑦 𝑙 + 𝑗

  • Input signal is not flipped.
  • It is used for template matching and for object tracking in

video.

  • It is often confused with convolution: they are identical
  • nly if h is centered at and is symmetric about i=0.
slide-8
SLIDE 8

Cyclic 1D convolution

  • One-dimensional cyclic convolution of length N , (𝑙)𝑂= 𝑙 𝑛𝑝𝑒 𝑂 :

𝑧 𝑙 = 𝑦 𝑙 βŠ› β„Ž 𝑙 = ෍

𝑗=0 π‘‚βˆ’1

β„Ž 𝑗 𝑦(( (𝑙 βˆ’ 𝑗)𝑂))

  • Embedding linear convolution in a cyclic convolution 𝑧 π‘œ = 𝑦 𝑦 βŠ— β„Ž π‘œ of

length 𝑂 β‰₯ 𝑀 + 𝑁 βˆ’ 1 and then performing a cyclic convolution of length N: 𝑧 𝑙 = 𝑦 𝑙 βŠ› β„Ž 𝑙 = σ𝑗=0

π‘‚βˆ’1 𝑦𝑂 𝑗 β„Žπ‘œ(( (𝑙 βˆ’ 𝑗)𝑂))

slide-9
SLIDE 9

Cyclic Convolution via DFT

Cyclic convolution can also be calculated using 1D DFT:

𝒛 = π½πΈπΊπ‘ˆ(πΈπΊπ‘ˆ π’š πΈπΊπ‘ˆ π’Š )

slide-10
SLIDE 10

1D FFT

  • There are a few algorithms to speed up the calculation of

DFT.

  • The most well known is the radix-2 decimation-in-time

(DIT) Fast Fourier Transform (FFT) (Cooley-Tuckey).

  • 1. The DFT of a sequence 𝑦(π‘œ) of length 𝑂 is:

π‘Œ(𝑙) = ෍

π‘œ=0 π‘‚βˆ’1

𝑦(π‘œ) π‘“βˆ’2πœŒπ‘—

𝑂 π‘œπ‘™

where 𝑙 is an integer ranging from 0 to 𝑂 βˆ’ 1.

slide-11
SLIDE 11

1D FFT

  • radix-2 FFT breaks a length-N DFT into many size-2 DFTs called

"butterfly" operations.

  • There are log2N stages.
slide-12
SLIDE 12

Z-transform

π‘Œ(𝑨) = ෍

π‘œ=0 π‘‚βˆ’1

𝑦(π‘œ)π‘¨βˆ’π‘œ

The Z-transform of a signal (function) x(n) having domain [0,…,N] is given by: The domain of Z-transform is the complex plane, since z is a complex number. The following relation holds for the Z-transform:

𝑧(π‘œ) = 𝑦(π‘œ) βˆ— β„Ž(π‘œ) ⇔ 𝑍(𝑨) = π‘Œ(𝑨)𝐼(𝑨)

slide-13
SLIDE 13

Cyclic convolution and Z-transform

Where : (𝑙)𝑂 = 𝑙 mod 𝑂

) 1 mod( βˆ’

N

z

slide-14
SLIDE 14

Winograd algorithm Fast 1D cyclic convolution with minimal complexity

  • The Winograd algorithm works on small tiles of the input image.
  • The input tile and filter are transformed
  • The outputs of the transform are multiplied together in an element-wise

fashion

  • The result is transformed back to obtain the outputs of the convolution.
slide-15
SLIDE 15

Winograd algorithm Fast 1D cyclic convolution with minimal complexity

  • Winograd convolution algorithms or fast

filtering algorithms: 𝑍 = 𝐃 𝐁𝐲⨂𝐂𝐒

  • They require only 2𝑂 βˆ’ 𝑀 multiplications in

their middle vector product, thus having minimal complexity.

  • πœ‰: number of cyclotomic polynomial

factors of polynomial 𝑨𝑂 βˆ’ 1 over the rational numbers 𝑅.

  • GEneral Matrix Multiplication (GEMM) BLAS
  • r CUBLAS routines can be used.
slide-16
SLIDE 16

Linear and cyclic 2D convolutions

  • Two-dimensional linear convolution with convolutional kernel β„Ž of size

𝑂1 Γ— 𝑂2 is given by:

𝑧 𝑙1, 𝑙2 = β„Ž 𝑙1, 𝑙2 βˆ—βˆ— 𝑦 𝑙1, 𝑙2 = ෍

𝑗1 𝑂1

෍

𝑗2 𝑂2

β„Ž 𝑗1, 𝑗2 𝑦(𝑙1 βˆ’ 𝑗1, 𝑙2 βˆ’ 𝑗2)

  • Its two-dimensional cyclic convolution counterpart of support 𝑂1 Γ— 𝑂2

is defined as:

𝑧 𝑙1, 𝑙2 = β„Ž 𝑙1, 𝑙2 βŠ›βŠ› 𝑦 𝑙1, 𝑙2 = ෍

𝑗1 𝑂1

෍

𝑗2 𝑂2

β„Ž 𝑗1, 𝑗2 𝑦( 𝑙1 βˆ’ 𝑗1 𝑂1, 𝑙2 βˆ’ 𝑗2 𝑂2)

slide-17
SLIDE 17

2D Convolution - Example

  • With Padding
slide-18
SLIDE 18

Applications

  • Convolutional neural networks
  • Signal processing

Signal filtering Signal restoration Signal deconvolution

  • Signal analysis

Time delay estimation Distance calculation (e.g., sonar) 1D template matching

slide-19
SLIDE 19

Convolutional Neural Networks

  • Two step architecture:
  • First layers with sparse NN connections: convolutions.
  • Fully connected final layers.
  • Need for fast convolution calculations.

Convergence of machine learning and signal processing processing

slide-20
SLIDE 20

Convolutional Layer

  • For a convolutional layer π‘š with an activation function 𝑔

π‘š(βˆ™) ,

multiple incoming features π‘’π‘—π‘œ and one single output feature 𝑝. For RGB images

Multiple input features to single feature 𝒑 transformation 𝑧 π‘š (𝑗, π‘˜, 𝑝) = 𝑔

π‘š

𝑐(π‘š) + ෍

𝑠=1 π‘’π‘—π‘œ

෍

𝑙1=βˆ’π‘Ÿ1 π‘Ÿ1

(π‘š)

෍

𝑙2=βˆ’π‘Ÿ2 π‘Ÿ2

(π‘š)

π‘₯(π‘š) 𝑙1, 𝑙2, 𝑠, 𝑝 𝑦(π‘š) 𝑗 βˆ’ 𝑙1, π‘˜ βˆ’ 𝑙2, 𝑠 Convolutional Layer Activation Volume (3D tensor) π‘π‘—π‘˜

π‘š (𝑝) = 𝑔 π‘š

𝑐 π‘š (𝑝) + ෍

𝑠=1 π‘’π‘—π‘œ

𝑿 π‘š (𝑠, 𝑝) βˆ— π’€π‘—π‘˜

π‘š (𝑠)

𝑩 π‘š = π‘π‘—π‘˜

π‘š 𝑝 : 𝑗 = 1, . . , π‘œ π‘š , π‘˜ = 1, . . , 𝑛 π‘š , 𝑝 = 1, … , 𝑒𝑝𝑣𝑒

where𝑩 π‘š is the activation volume for the convolutional layer π‘š, 𝑿 π‘š (𝑠, 𝑝) is a 2D slice of the convolutional kernel 𝑿(π‘š) ∈ β„β„Ž1Γ—β„Ž2Γ—π‘’π‘—π‘œΓ—π‘’π‘π‘£π‘’ for input feature 𝑠 and

  • utput

feature 𝑝 , 𝑐 π‘š (𝑝) a scalar bias and π’€π‘—π‘˜

π‘š (𝑠)a region of input feature 𝑠 centered at 𝑗, π‘˜ π‘ˆ, e.g. 𝒀 1 (1) the R channel of an image π‘’π‘—π‘œ = 𝐷 = 3.

slide-21
SLIDE 21

Deep Learning Frameworks

Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

slide-22
SLIDE 22

Deep Learning Frameworks

  • All 5 frameworks work with cuDNN as backend.
  • cuDNN unfortunately not open source
  • cuDNN supports FFT and Winograd

Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

slide-23
SLIDE 23

The Neon story

  • Developed by Nervana in 2015
  • Written in Python and C
  • Doesn’t support Windows
  • Uses MKL for CPU (highly optimized by Intel)
  • Supports CUDA for GPU
  • Known mostly to be the first to implement Winograd faster than
  • thers.
slide-24
SLIDE 24

Q & A

Thank you very much for your attention!

Contact: Prof. I. Pitas pitas@csd.auth.gr www.multidrone.eu