Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison

Convolutional neural networks • Strong empirical application performance • Convolutional networks: neural networks that use convolution in place of general matrix multiplication in at least one of their layers ℎ = 𝜏(𝑋 𝑈 𝑦 + 𝑐) for a specific kind of weight matrix 𝑋

Convolution

Convolution: discrete version • Given array 𝑣 𝑢 and 𝑥 𝑢 , their convolution is a function 𝑡 𝑢 +∞ 𝑡 𝑢 = ෍ 𝑣 𝑏 𝑥 𝑢−𝑏 𝑏=−∞ • Written as 𝑡 = 𝑣 ∗ 𝑥 or 𝑡 𝑢 = 𝑣 ∗ 𝑥 𝑢 • When 𝑣 𝑢 or 𝑥 𝑢 is not defined, assumed to be 0

Illustration 1 𝑥 = [z, y, x] 𝑡 3 𝑣 = [a, b, c, d, e, f] xb+yc+zd 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟐 𝒗 𝟑 𝐯 𝟒 a b c d e f

Illustration 1 𝑡 4 xc+yd+ze 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟑 𝒗 𝟒 𝐯 𝟓 a b c d e f

Illustration 1 𝑡 5 xd+ye+zf 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟒 𝒗 𝟓 𝐯 𝟔 a b c d e f

Illustration 1: boundary case 𝑡 6 xe+yf 𝐱 𝟑 𝐱 𝟐 x y 𝒗 𝟓 𝐯 𝟔 a b c d e f

Illustration 1 as matrix multiplication y z a x y z b x y z c x y z d x y z e x y f

Illustration 2: two dimensional case a b c d w x e f g h y z i j k l wa + bx + ey + fz

Illustration 2 a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz

Illustration 2 Input Kernel (or filter) a b c d w x e f g h y z i j k l wa + bx + bw + cx + ey + fz fy + gz Feature map

Advantage: sparse interaction Fully connected layer, 𝑛 × 𝑜 edges 𝑛 output nodes 𝑜 input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Advantage: sparse interaction Convolutional layer, ≤ 𝑛 × 𝑙 edges 𝑛 output nodes 𝑙 kernel size 𝑜 input nodes Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Advantage: sparse interaction Multiple convolutional layers: larger receptive field Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Advantage: parameter sharing The same kernel are used repeatedly. E.g., the black edge is the same weight in the kernel. Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Advantage: equivariant representations • Equivariant: transforming the input = transforming the output • Example: input is an image, transformation is shifting • Convolution(shift(input)) = shift(Convolution(input)) • Useful when care only about the existence of a pattern, rather than the location

Pooling • Summarizing the input (i.e., output the max of the input) Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Advantage Induce invariance Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Motivation from neuroscience • David Hubel and Torsten Wiesel studied early visual system in human brain (V1 or primary visual cortex), and won Nobel prize for this • V1 properties • 2D spatial arrangement • Simple cells: inspire convolution layers • Complex cells: inspire pooling layers

Variants of convolution and pooling

Variants of convolutional layers • Multiple dimensional convolution • Input and kernel can be 3D • E.g., images have (width, height, RBG channels) • Multiple kernels lead to multiple feature maps (also called channels) • Mini-batch of images have 4D: (image_id, width, height, RBG channels)

Variants of convolutional layers • Padding: valid xd+ye+zf x y z a b c d e f

Variants of convolutional layers • Padding: same xe+yf x y a b c d e f

Variants of convolutional layers • Stride Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Variants of pooling • Stride and padding Figure from Deep Learning, by Goodfellow, Bengio, and Courville

Variants of pooling • Max pooling 𝑧 = max{𝑦 1 , 𝑦 2 , … , 𝑦 𝑙 } • Average pooling 𝑧 = mean{𝑦 1 , 𝑦 2 , … , 𝑦 𝑙 } • Others like max-out

Case study: LeNet-5

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation

LeNet-5 • Proposed in “ Gradient-based learning applied to document recognition ” , by Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner, in Proceedings of the IEEE, 1998 • Apply convolution on 2D images (MNIST) and use backpropagation • Structure: 2 convolutional layers (with pooling) + 3 fully connected layers • Input size: 32x32x1 • Convolution kernel size: 5x5 • Pooling: 2x2

LeNet-5 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Filter: 5x5, stride: 1x1, #filters: 6 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Filter: 5x5x6, stride: 1x1, #filters: 16 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Pooling: 2x2, stride: 2 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

LeNet-5 Weight matrix: 400x120 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Weight matrix: 84x10 LeNet-5 Weight matrix: 120x84 Figure from Gradient-based learning applied to document recognition, by Y. LeCun, L. Bottou, Y. Bengio and P. Haffner

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer - PowerPoint PPT Presentation

Neural Networks Part 3 Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Convolutional neural networks Strong empirical application performance Convolutional networks: neural networks that

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Gradient Gibbs measures with disorder Codina Cotar University College London June 25, 2015, GGI

Selected Topics in Optimization Some slides borrowed from

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

Binary Activated Neural Networks via Continuous Binarization Charbel Sakr *# , Jungwook Choi + ,

Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe)

Gradient Descent for L2 Penalized Logistic Regr. N 1 X log BernPMF( t n | ( w T ( x n )))

Dioptics: a common generalization of Learners Motivation Simple Essence gradient-based

Exploring the phases of Yang-Mills theory with adjoint matter through the gradient flow Camilo