learning and computer
play

learning and computer vision Sample slides only Presenter: Prof. - PowerPoint PPT Presentation

Fast Convolution Algorithms for deep learning and computer vision Sample slides only Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr Outline 1D convolutions Linear & Cyclic 1D convolutions


  1. Fast Convolution Algorithms for deep learning and computer vision Sample slides only Presenter: Prof. Ioannis Pitas Aristotle University of Thessaloniki pitas@csd.auth.gr

  2. Outline • 1D convolutions Linear & Cyclic 1D convolutions Discrete Fourier Transform, Fast Fourier Transform Winograd algorithm • Linear & Cyclic 2D convolutions • Applications in deep learning Convolutional neural networks

  3. Motivation • Fast implementation of 1D and 2D digital filters Image filtering Image feature calculation • Gabor filters • Fast implementation of 1D and 2D correlation Template matching Correlation tracking • Machine learning Convolutional Neural Networks

  4. Linear 1D convolution • The one-dimensional (linear) convolution of: • an input signal 𝑦 and • a convolution kernel ℎ (filter finite impulse response) of length 𝑂 : 𝑂−1 𝑧 𝑙 = ℎ 𝑙 ∗ 𝑦 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 − 𝑗 𝑗=0 • For a convolution kernel centered around 0 and 𝑂 = 2𝑤 + 1 , it takes the form: 𝑤 𝑧 𝑙 = ℎ 𝑙 ∗ 𝑦 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 − 𝑗 𝑗=−𝑤

  5. Linear 1D convolution - Example Image source: http://electricalacademia.com/signals-and-systems/example-of-discrete-time-graphical-convolution/

  6. Linear 1D convolution - Example Image source: http://electricalacademia.com/signals-and-systems/example-of-discrete-time-graphical-convolution/

  7. Linear 1D correlation • Correlation of template ℎ and input signal 𝑦 𝑙 : 𝑂−1 𝑠 𝑙 = ෍ ℎ 𝑗 𝑦 𝑙 + 𝑗 𝑗=0 • Input signal is not flipped. • It is used for template matching and for object tracking in video. • It is often confused with convolution: they are identical only if h is centered at and is symmetric about i=0 .

  8. Cyclic 1D convolution • One-dimensional cyclic convolution of length N , (𝑙) 𝑂 = 𝑙 𝑛𝑝𝑒 𝑂 : 𝑂−1 𝑧 𝑙 = 𝑦 𝑙 ⊛ ℎ 𝑙 = ෍ ℎ 𝑗 𝑦(( (𝑙 − 𝑗) 𝑂 )) 𝑗=0 • Embedding linear convolution in a cyclic convolution 𝑧 𝑜 = 𝑦 𝑦 ⊗ ℎ 𝑜 of length 𝑂 ≥ 𝑀 + 𝑁 − 1 and then performing a cyclic convolution of length N : 𝑂−1 𝑦 𝑂 𝑗 ℎ 𝑜 (( (𝑙 − 𝑗) 𝑂 )) 𝑧 𝑙 = 𝑦 𝑙 ⊛ ℎ 𝑙 = σ 𝑗=0

  9. Cyclic Convolution via DFT Cyclic convolution can also be calculated using 1D DFT: 𝒛 = 𝐽𝐸𝐺𝑈(𝐸𝐺𝑈 𝒚 𝐸𝐺𝑈 𝒊 )

  10. 1D FFT • There are a few algorithms to speed up the calculation of DFT. • The most well known is the radix-2 decimation-in-time ( DIT ) Fast Fourier Transform ( FFT ) (Cooley-Tuckey). 1. The DFT of a sequence 𝑦(𝑜) of length 𝑂 is: 𝑂−1 𝑦(𝑜) 𝑓 −2𝜌𝑗 𝑂 𝑜𝑙 𝑌(𝑙) = ෍ 𝑜=0 where 𝑙 is an integer ranging from 0 to 𝑂 − 1 .

  11. 1D FFT • radix-2 FFT breaks a length- N DFT into many size-2 DFTs called "butterfly" operations. • There are log 2 N stages.

  12. Z-transform The Z-transform of a signal (function) x(n) having domain [ 0,…,N ] is given by: 𝑂−1 𝑦(𝑜)𝑨 −𝑜 𝑌(𝑨) = ෍ 𝑜=0 The domain of Z-transform is the complex plane, since z is a complex number. The following relation holds for the Z-transform: 𝑧(𝑜) = 𝑦(𝑜) ∗ ℎ(𝑜) ⇔ 𝑍(𝑨) = 𝑌(𝑨)𝐼(𝑨)

  13. Cyclic convolution and Z-transform Where : (𝑙) 𝑂 = 𝑙 mod 𝑂 − N mod( z 1 )

  14. Winograd algorithm Fast 1D cyclic convolution with minimal complexity • The Winograd algorithm works on small tiles of the input image. • The input tile and filter are transformed • The outputs of the transform are multiplied together in an element-wise fashion • The result is transformed back to obtain the outputs of the convolution.

  15. Winograd algorithm Fast 1D cyclic convolution with minimal complexity • Winograd convolution algorithms or fast filtering algorithms: 𝑍 = 𝐃 𝐁𝐲⨂𝐂𝐢 • They require only 2𝑂 − 𝑤 multiplications in their middle vector product, thus having minimal complexity. • 𝜉 : number of cyclotomic polynomial factors of polynomial 𝑨 𝑂 − 1 over the rational numbers 𝑅 . • GEneral Matrix Multiplication (GEMM) BLAS or CUBLAS routines can be used.

  16. Linear and cyclic 2D convolutions • Two-dimensional linear convolution with convolutional kernel ℎ of size 𝑂 1 × 𝑂 2 is given by: 𝑂 1 𝑂 2 𝑧 𝑙 1 , 𝑙 2 = ℎ 𝑙 1 , 𝑙 2 ∗∗ 𝑦 𝑙 1 , 𝑙 2 = ෍ ෍ ℎ 𝑗 1 , 𝑗 2 𝑦(𝑙 1 − 𝑗 1 , 𝑙 2 − 𝑗 2 ) 𝑗 1 𝑗 2 • Its two-dimensional cyclic convolution counterpart of support 𝑂 1 × 𝑂 2 is defined as: 𝑂 1 𝑂 2 𝑧 𝑙 1 , 𝑙 2 = ℎ 𝑙 1 , 𝑙 2 ⊛⊛ 𝑦 𝑙 1 , 𝑙 2 = ෍ ෍ ℎ 𝑗 1 , 𝑗 2 𝑦( 𝑙 1 − 𝑗 1 𝑂 1 , 𝑙 2 − 𝑗 2 𝑂 2 ) 𝑗 1 𝑗 2

  17. 2D Convolution - Example • With Padding

  18. Applications • Convolutional neural networks • Signal processing Signal filtering Signal restoration Signal deconvolution • Signal analysis Time delay estimation Distance calculation (e.g., sonar) 1D template matching

  19. Convolutional Neural Networks Convergence of machine learning and signal processing processing • Two step architecture: • First layers with sparse NN connections: convolutions. • Fully connected final layers. • Need for fast convolution calculations.

  20. Convolutional Layer For RGB images • For a convolutional layer 𝑚 with an activation function 𝑔 𝑚 (∙) , multiple incoming features 𝑒 𝑗𝑜 and one single output feature 𝑝. Multiple input features to single feature 𝒑 transformation (𝑚) (𝑚) 𝑟 1 𝑟 2 𝑒 𝑗𝑜 𝑧 𝑚 (𝑗, 𝑘, 𝑝) = 𝑔 𝑐 (𝑚) + ෍ 𝑥 (𝑚) 𝑙 1 , 𝑙 2 , 𝑠, 𝑝 𝑦 (𝑚) 𝑗 − 𝑙 1 , 𝑘 − 𝑙 2 , 𝑠 ෍ ෍ 𝑚 𝑠=1 𝑙 1 =−𝑟 1 𝑙 2 =−𝑟 2 Convolutional Layer Activation Volume (3D tensor) 𝑒 𝑗𝑜 𝑚 (𝑝) = 𝑔 𝑚 (𝑠) 𝑐 𝑚 (𝑝) + ෍ 𝑿 𝑚 (𝑠, 𝑝) ∗ 𝒀 𝑗𝑘 𝑚 𝑝 : 𝑗 = 1, . . , 𝑜 𝑚 , 𝑘 = 1, . . , 𝑛 𝑚 , 𝑝 = 1, … , 𝑒 𝑝𝑣𝑢 𝑩 𝑚 = 𝑏 𝑗𝑘 𝑏 𝑗𝑘 𝑚 𝑠=1 where 𝑩 𝑚 is the activation volume for the convolutional layer 𝑚 , 𝑿 𝑚 (𝑠, 𝑝) is a 2D slice of the convolutional kernel 𝑿 (𝑚) ∈ ℝ ℎ 1 ×ℎ 2 ×𝑒 𝑗𝑜 ×𝑒 𝑝𝑣𝑢 𝑝 , 𝑐 𝑚 (𝑝) 𝑠 and for input feature output feature a scalar bias and 𝑚 (𝑠) a region of input feature 𝑠 centered at 𝑗, 𝑘 𝑈 , e.g. 𝒀 1 (1) the R channel of an image 𝑒 𝑗𝑜 = 𝐷 = 3 . 𝒀 𝑗𝑘

  21. Deep Learning Frameworks Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

  22. Deep Learning Frameworks • All 5 frameworks work with cuDNN as backend. • cuDNN unfortunately not open source • cuDNN supports FFT and Winograd Image Source: Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Le - Performance Analysis of CNN Frameworks for GPUs

  23. The Neon story • Developed by Nervana in 2015 • Written in Python and C • Doesn’t support Windows • Uses MKL for CPU (highly optimized by Intel) • Supports CUDA for GPU • Known mostly to be the first to implement Winograd faster than others.

  24. Q & A Thank you very much for your attention! Contact: Prof. I. Pitas pitas@csd.auth.gr www.multidrone.eu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend