Con Convol oluti tion onal Neural Netw twork orks Presented by - PowerPoint PPT Presentation

Con Convol oluti tion onal Neural Netw twork orks Presented by Tristan Maidment Adapted from Ke Yu’s Slides

Ou Outlin line • Neural Network recap • Building blocks of CNNs • Architecture of CNNs • Visualizing and understanding CNNs • More applications

Ne Neur ural Ne Network rk Recap

Multilayer Perceptron (MLP) P) Fully-connected (FC) layer • A layer has full connections to all activations in the previous layer 𝑏 ["] 𝑏 [!] 𝑏 [!] = 𝜏 (𝑋 [!] 𝑌 + 𝑐 [!] ) 𝑋 ["] 𝑋 [!] 𝑌 𝑐 ["] 𝑋 [#] 𝑋 [!] ~ 4,3 , 𝑌~ 3, 𝑛 , 𝑏 [!] ~(4, 𝑛) 𝑐 [!] [!] ["] 𝑏 ! 𝑏 ! 𝑐 [#] 𝑧 $ 𝑦 ! 𝑏 ["] = 𝜏 (𝑋 ["] 𝑏 [!] + 𝑐 ["] ) [!] ["] 𝑏 " 𝑏 " 𝑔 𝑦 " 𝑋 ["] ~ 4,4 , 𝑏 [!] ~ 4, 𝑛 , 𝑏 ["] ~(4, 𝑛) [!] ["] 𝑏 # 𝑏 # 𝑦 # 𝑧 = 𝑔 (𝑋 [#] 𝑏 ["] + 𝑐 [#] ) ["] $ [!] 𝑏 & 𝑏 & 𝑋 [#] ~ 1,4 , 𝑏 ["] ~ 4, 𝑛 , $ 𝑧~(1, 𝑛)

Ac Acti tivati tion Fu Functi tions a a x x 1 𝑢𝑏𝑜ℎ 𝑦 = 𝑓 ( − 𝑓 '( 𝜏 𝑦 = 1 + 𝑓 '( 𝑓 ( + 𝑓 '( a a x x 𝑆𝑓𝑀𝑉: max 0, 𝑦 𝑀𝑓𝑏𝑙𝑧 𝑆𝑓𝑀𝑉: max 0.1𝑦, 𝑦

Ba Backpropagati tion Al Algori rith thm 1.The network is initialized with randomly chosen weights 2.Implement forward propagation to get all intermediates 𝑨 ["] , 𝑏 ["] 3.Compute cost function 𝐾 𝑋, 𝑐 4.Network back propagates the error and calculates the gradients 5.Adjust the weights of the network 𝑋 ["] : = 𝑋 ["] − 𝛽 & 𝑒[𝑋 ["] ] 𝑐 ["] : = 𝑐 ["] − 𝛽 & 𝑒[𝑐 ["] ] 6.Repeat the above steps until the error is acceptable

Co Comp mpute Gradients ts 𝑋 [$] 𝑐 [$] 𝑦 𝑨 ["] = 𝑋 ["] 𝑦 + 𝑐 ["] 𝑏 ["] = 𝜏(𝑨 ["] ) 𝑨 [$] = 𝑋 [$] 𝑏 ["] + 𝑐 [$] 𝑧 = 𝜏(𝑨 [$] ) 𝑋 ["] / ℒ(/ 𝑧, y) 𝑐 ["] ℒ $ 𝑧, y = −(𝑧 log $ 𝑧 + 1 − 𝑧 log(1 − $ 𝑧)) 𝑒[𝑏 [!] ] = 𝑒[𝑨 " ] 𝑒𝑨 " 𝑧] = 𝑒ℒ 𝑧 = − 𝑧 𝑧 + 1 − 𝑧 𝑒𝑏 ! = 𝑋 " ) 𝑒[𝑨 " ] 𝑒[$ 𝑒$ $ 1 − $ 𝑧 = 𝑒[𝑏 [!] ] 𝑒𝑏 [!] 𝑒[𝑨 " ] = 𝑒ℒ 𝑒$ 𝑧 𝑒𝑨 ! = 𝑋 " ) 𝑒[𝑨 " ] ∗ 𝜏′(z ! ) 𝑒 𝑨 ! 𝑒𝑨 ["] = $ 𝑧 − 𝑧 𝑒$ 𝑧 𝑒[𝑋 " ] = 𝑒ℒ 𝑒$ 𝑧 𝑒𝑨 " 𝑒𝑋 [!] = 𝑒[𝑨 [!] ] 𝑒𝑨 [!] 𝑒𝑋 " = 𝑒[𝑨 " ]𝑏 ! ! 𝑒𝑋 ! = 𝑒[𝑨 [!] ]𝑦 ) 𝑒$ 𝑧 𝑒𝑨 " 𝑒[𝑐 ["] ] = 𝑒ℒ 𝑒$ 𝑧 𝑒𝑨 ["] 𝑒𝑐 [!] = 𝑒[𝑨 [!] ] 𝑒𝑨 [!] 𝑒𝑐 " = 𝑒[𝑨 " ] 𝑒𝑋 ! = 𝑒[𝑨 [!] ] 𝑒$ 𝑧 𝑒𝑨 ["]

Op Optim imiz izatio ion – Lea Learning g Rate e and Momen entu tum • Stochastic gradient descent (mini-batch gradient descent) • SGD with momentum prevents oscillations 𝑤 !" = 𝛾𝑤 !" + 1 − 𝛾 𝑒𝑋 𝑋 = 𝑋 − 𝛽𝑤 !" , 𝑤 !# = 𝛾𝑤 !# + 1 − 𝛾 𝑒𝑐 𝑐 = 𝑐 − 𝛽𝑤 !# • Adaptive Learning Rate − RMSProp 𝛽 𝑇 !" = 𝛾𝑇 !" + 1 − 𝛾 𝑒𝑋 $ 𝑋 = 𝑋 − 𝑒𝑋 𝑇 !" − Adam 𝑇 !" = 𝛾 $ 𝑇 !" + 1 − 𝛾 $ 𝑒𝑋 $ 𝑤 !" = 𝛾 % 𝑤 !" + 1 − 𝛾 % 𝑒𝑋 𝛽 '()) 𝑋 = 𝑋 − 𝑤 !& '()) = 𝑤 !& '()) = 𝑇 !& '()) + 𝜁 𝑤 !& 𝑇 !& 𝑇 !& * * 𝛾 % 𝛾 $

Re Regularization • Parameter Regularization: − Adding L 1 (Lasso) , L 2 (Ridge) or sometimes combined (Elastic) to cost function − Other norms are computationally ineffective • Dropout − Forward: multiply the output of hidden layer with mask of 0s and 1s randomly drawn from a Bernoulli distribution and remove all the links to the dropout nodes − Backward: do gradient descent through diminished network

Co Convoluti tional Ne Neural Ne Netw twork rk Bu Building Bl Blocks

Why not just use an MLP P for images? • MLP connects each pixel in an image to each neuron and suffers from the curse of dimensionality, so it does not scale well to higher resolution images. • For example: a small 200×200 pixel RGB image the first weight matrix of FC would have 200×200×3×#𝑜𝑓𝑣𝑠𝑝𝑜 = 12,000× #𝑜𝑓𝑣𝑠𝑝𝑜 parameters for the first layer alone

Co Convoluti tion Operati tion General form: 𝑇 𝑢 = 5 𝑔 𝑏 𝑕 𝑢 − 𝑏 𝑒𝑏 Denoted by: 𝑡 𝑢 = (𝑔 ∗ 𝑕)(𝑢) Network terminology: 𝑔 : input, usually a multidimensional arrays 𝑕 : kernel or filter 𝑡 : output is referred to as the feature map • In practice, CNNs generally use kernels without flipping (i.e. cross-correlation)

Fast Fourier Transforms on GPU PUs • Convolution theorem: Fourier transfer of a convolution of two signals is the pointwise product of their Fourier transforms. ℱ 𝑦 ∗ 𝑥 = ℱ 𝑦 & ℱ 𝑥 𝑦 ∗ 𝑥 = ℱ 23 {ℱ 𝑦 & ℱ 𝑥 } • Fast Fourier transfer (FFT) reduces the complexity of convolution from 𝑃(𝑜 1 ) to 𝑃(𝑜log 𝑜 ) • GPU-accelerated FFT implementations that perform up to 10 times faster than CPU only alternatives. (via NVIDIA CUDA)

2D 2D Co Convoluti tion Operati tion An example of 2D Convolution without kernel flipping. Boxes connected by arrows indicating how the upper-left element of the output is formed by applying the kernel to the corresponding upper-left region of the input. This process is called as template matching. The inner product between a kernel and a piece of image is maximized exactly when those two vectors match up.

Ex Example les of f kernel l effects Identity Edge detection 1 Edge detection 2 Box blur 0 0 0 −1 −1 −1 0 1 0 1 1 1 1 0 1 0 −1 8 −1 1 −4 1 1 1 1 9 0 0 0 −1 −1 −1 0 1 0 1 1 1

Mo Motivation 1: Local Connectivity • In FC layers, every output unit interacts with every input unit. • Because kernel is usually smaller than the input, CNN typically have sparse interactions. • Store fewer parameters which both reduces the memory requirements and improves statistical efficiency. • Compute the output requires fewer operations.

Mo Motivation 1: Local Connectivity Growing Receptive Fields • In a deep convolutional network, units in the deeper layers may indirectly interact with a larger portion of the input. • This allows the network to efficiently describe complicated interactions from constructing simple building blocks that each describe only sparse interactions. • For example, h 3 is connected to 3 input variables, while g 3 is connected to all 5 input variables through indirect connections

Mo Motivation 2: Parameter Sharing • In a traditional neural network, each element of the weight matrix is used exactly once when computing the output of a layer. • In a convolutional neural network, each member of the kernel is used at every position of the input (except some of the boundary pixels). • Parameter sharing means that rather than learning a separate set of parameters for every location, we learn only one set. • It does further reduce the storage requirement of model parameters. Thus convolution is dramatically more efficient than dense matrix multiplication in terms of memory requirements and statistical efficiency

Mo Motivation 2: Parameter Sharing Input size: 320 by 280 Kernel size: 2 by 1 Output size: 319 by 280 • Image on right is formed by taking each pixel and subtracting the value of its neighboring pixel. Output image shows the vertically oriented edges. • The input image is 280 pixels tall and 320 pixels wide. The output image is 319 pixels wide. • CNN stores 2 parameters, while to describe the same transformation with a matrix multiplication would need 320×280×319× 280 > 8e9 weights

Mo Motivation 3: Equivariance to Translation • Parameter sharing causes the layer to have a property known as equivariance to translation. • With images, convolution creates a 2D feature maps. If we move the object in the input, it’s representation will move the same amount in the output. • Experiments have show that many CNNs detect simple edges in the first layer. The same edges appear everywhere in the image, so the same kernel can be used to extract features throughout.

Pa Padding 6 by 6 A 4 by 4 3 by 3 A 1 1 1 B B B ∗ 1 1 1 = B B B B 1 1 1 B B B Downsides of convolution • Image shrinks after applying convolutional operation. In a very deep neural network, after many steps, we end up with a very small output. • Pixels on the corners or edges are used much less than pixels in the middle. Lots of information from the edges of the image are throwed away.

Ze Zero Padding 8 by 8 6 by 6 0 0 0 0 0 0 0 0 A A 0 0 A 3 by 3 A A 0 0 0 0 ∗ = 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • Padding the image with additional border(s) • Set pixel values to 0 on the border

Ze Zero Padding Graph • Consider a filter of width six at every layer • Starting from an input of sixteen pixels, without zero padding, we are only able to have three convolutional layers • Adding five zeros to each layer prevents the representation from shrinking with depth

Con Convol oluti tion onal Neural Netw twork orks Presented by - PowerPoint PPT Presentation

Con Convol oluti tion onal Neural Netw twork orks Presented by Tristan Maidment Adapted from Ke Yus Slides Ou Outlin line Neural Network recap Building blocks of CNNs Architecture of CNNs Visualizing and understanding

The new EU sport policy: The new EU sport policy: The new EU sport policy: The new EU sport

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork

Secure and Self-Stabilizing Clock Synchronization in Clock Synchronization in Sensor Netw orks

A know ledge netw orking lens: Making sense of intra-organizational netw orks of practice

On Evolution of On Evolution of C2 Netw ork Topology C2 Netw ork Topology IC ICCR CRTS 2010

Artificial Artificial neur neural al ne netw twork ork alg algorithm rithms fo for Heal

Academic Netw ork Construction In China Academic Netw ork Construction In China A Brief

Approach to Coding Reform A Smart Gr A Smart Growth th Ne Netw twork W ork Web ebin inar

Compan pany y Pr Profi file INNOVATIVE TIVE SOL OLUTI TION ONS for the e developme

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON

Automa utomation tion of of Mit MitM M Attac Attack k on on WiFi iFi Netw Networ orks

Building Ext Building Extensible Ne Building Ext Building Extensible Ne nsible Netw nsible

Autom ated Prediction of Solar Flares Using Neural Netw orks and Sunspots Associations Tufan

The he Capac apacity ity of of th the Ag e Aging ng Service vices s Netw twork: ork:

Fi Financing nancing the gas gas tr trans ansmis mission sion netw twork ork Analyzing

How to Displa lay Multiple Ad Netwo works? What is an Ad Netw twork? Popular Ad network

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

Software Architecture Software architecture The design process for identifying the sub-

Programming in Oz Wacek Ku snierczyk December 10., 2010 1 Lecture Outline Introduction to

Reading Assignment Chapter 4 of PR The Kalman Filter Focus on histogram and particle

Recommender Systems Francesco Ricci Database and Information Systems Free University of Bozen,

Filtering, Decomposition and Search Space Reduction for Optimal Sequential Planning (FDP System)

Searches Through Encrypted Data presenter: Reza Curtmola Advanced Topics in Network Security

Today 1. Add top-level function defines to the Book language not in the book Before we implement

Sambuz

Useful Links

Newsletter

Mail Us

Con Convol oluti tion onal Neural Netw twork orks Presented by - PowerPoint PPT Presentation

Con Convol oluti tion onal Neural Netw twork orks Presented by Tristan Maidment Adapted from Ke Yus Slides Ou Outlin line Neural Network recap Building blocks of CNNs Architecture of CNNs Visualizing and understanding

The new EU sport policy: The new EU sport policy: The new EU sport policy: The new EU sport

Lectu ture 6 6 reca recap Prof. Leal-Taix and Prof. Niessner 1 Ne Neural Ne Netw twork

Secure and Self-Stabilizing Clock Synchronization in Clock Synchronization in Sensor Netw orks

A know ledge netw orking lens: Making sense of intra-organizational netw orks of practice

On Evolution of On Evolution of C2 Netw ork Topology C2 Netw ork Topology IC ICCR CRTS 2010

Artificial Artificial neur neural al ne netw twork ork alg algorithm rithms fo for Heal

Academic Netw ork Construction In China Academic Netw ork Construction In China A Brief

Approach to Coding Reform A Smart Gr A Smart Growth th Ne Netw twork W ork Web ebin inar

Compan pany y Pr Profi file INNOVATIVE TIVE SOL OLUTI TION ONS for the e developme

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE &amp; RECLAMATI ON CLOSURE &amp; RECLAMATI ON

Automa utomation tion of of Mit MitM M Attac Attack k on on WiFi iFi Netw Networ orks

Building Ext Building Extensible Ne Building Ext Building Extensible Ne nsible Netw nsible

Autom ated Prediction of Solar Flares Using Neural Netw orks and Sunspots Associations Tufan

The he Capac apacity ity of of th the Ag e Aging ng Service vices s Netw twork: ork:

Fi Financing nancing the gas gas tr trans ansmis mission sion netw twork ork Analyzing

How to Displa lay Multiple Ad Netwo works? What is an Ad Netw twork? Popular Ad network

Indexing CS6320 1/29/2018 Shachi Deshpande, Yunhe Liu Content Motivation for Indexing

Software Architecture Software architecture The design process for identifying the sub-

Programming in Oz Wacek Ku snierczyk December 10., 2010 1 Lecture Outline Introduction to

Reading Assignment Chapter 4 of PR The Kalman Filter Focus on histogram and particle

Recommender Systems Francesco Ricci Database and Information Systems Free University of Bozen,

Filtering, Decomposition and Search Space Reduction for Optimal Sequential Planning (FDP System)

Searches Through Encrypted Data presenter: Reza Curtmola Advanced Topics in Network Security

Today 1. Add top-level function defines to the Book language not in the book Before we implement

Sambuz

Useful Links

Newsletter

Mail Us

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON