Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1

Outline CS109B, P ROTOPAPAS , G LICKMAN 2

Main drawbacks of MLPs • MLPs use one perceptron for each input (e.g. pixel in an image, multiplied by 3 in RGB case). The amount of weights rapidly becomes unmanageable for large images. • Training difficulties arise, overfitting can appear. • MLPs react differently to an input (images) and its shifted version – they are not translation invariant. CS109B, P ROTOPAPAS , G LICKMAN 3

Latest events on Image Recognition You Only Look Once (YOLO) - 2016 CS109B, P ROTOPAPAS , G LICKMAN 4

Latest events on Image Recognition Mask- RCNN - 2017 CS109B, P ROTOPAPAS , G LICKMAN 5

Latest events on Image Recognition NVIDIA Video to Video Synthesis - 2018 CS109B, P ROTOPAPAS , G LICKMAN 6

Image analysis Imagine that we want to recognize swans in an image: Round, elongated oval with orange protuberance Oval-shaped white blob (body) Long white rectangular shape (neck) CS109B, P ROTOPAPAS , G LICKMAN 7

Cases can be a bit more complex … Round, elongated head with orange or black beak Oval-shaped white body with or without large white symmetric blobs (wings) Long white neck, square shape CS109B, P ROTOPAPAS , G LICKMAN 8

Now what? Small black circles, Round, elongated head with Long white neck, can bend Black triangular can be facing the orange or black beak, can around, not necessarily shaped form, on the camera, sometimes be turned backwards straight head, can have can see both different sizes White tail, generally far Luckily, the from the head, looks White elongated piece, can Black feet, under feathery color is be squared or more White, oval shaped body, can have triangular, can be obstructed body, with or without different shapes consistent… CS109B, P ROTOPAPAS , G LICKMAN sometimes wings visible 9

CS109B, P ROTOPAPAS , G LICKMAN 10

We need to be able to deal with these cases. CS109B, P ROTOPAPAS , G LICKMAN 11

Image features We’ve been basically talking about detecting features in images, in a very • naïve way. • Researchers built multiple computer vision techniques to deal with these issues: SIFT, FAST, SURF, BRIEF, etc. • However, similar problems arose: the detectors where either too general or too over-engineered. Humans were designing these feature detectors, and that made them either too simple or hard to generalize. FAST corner SIFT feature detection descriptor algorithm CS109B, P ROTOPAPAS , G LICKMAN 12

Image features (cont) What if we learned the features to detect? • • We need a system that can do Representation Learning (or Feature Learning). Representation Learning: technique that allows a system to automatically find relevant features for a given task. Replaces manual feature engineering. Multiple techniques for this: Unsupervised (K-means, PCA, … ). • Supervised (Sup. Dictionary learning, Neural Networks!) • CS109B, P ROTOPAPAS , G LICKMAN 13

Drawbacks Imagine we want to build a cat detector with an MLP. In this case, the red weights will be modified to better recognize cats In this case, the green weights will be modified. We are learning redundant features. Approach is not robust, as cats could appear in yet another position. CS109B, P ROTOPAPAS , G LICKMAN 14

Drawbacks Example: CIFAR10 Simple 32x32 color images (3 channels) Each pixel is a feature: an MLP would have 32x32x3+1 = 3073 weights per neuron! CS109B, P ROTOPAPAS , G LICKMAN 15

Drawbacks Example: ImageNet Images are usually 224x224x3: an MLP would have 150129 weights per neuron. If the first layer of the MLP is around 128 nodes, which is small, this already becomes very heavy to calculate. Model complexity is extremely high: overfitting. CS109B, P ROTOPAPAS , G LICKMAN 16

Images are Local and Hierarchical CS109B, P ROTOPAPAS , G LICKMAN

Images are Invariant CS109B, P ROTOPAPAS , G LICKMAN

“ Convolution ” Operation CS109B, P ROTOPAPAS , G LICKMAN

“ Convolution ” Operation Kernel Edge detection " % − 1 − 1 − 1 $ ' * = − 1 8 − 1 $ ' $ − 1 − 1 − 1 ' # & Sharpen " % 0 − 1 0 $ ' * = − 1 5 − 1 $ ' $ 0 − 1 0 ' # & wikipedia.org CS109B, P ROTOPAPAS , G LICKMAN

A Convolutional Network + ReLU + ReLU CS109B, P ROTOPAPAS , G LICKMAN

Basics of CNNs We know that MLPs: • Do not scale well for images Ignore the information brought by pixel position and correlation with • neighbors • Cannot handle translations The general idea of CNNs is to intelligently adapt to properties of images: • Pixel position and neighborhood have semantic meanings. • Elements of interest can appear anywhere in the image. CS109B, P ROTOPAPAS , G LICKMAN 22

Basics of CNNs MLP CNN CNNs are also composed of layers, but those layers are not fully connected: they have filters, sets of cube-shaped weights that are applied throughout the image. Each 2D slice of the filters are called kernels. These filters introduce translation invariance and parameter sharing. How are they applied? Convolutions! CS109B, P ROTOPAPAS , G LICKMAN 23

� Convolution and cross-correlation A convolution of f and g (𝑔 ∗ 𝑕) is defined as the integral of the • product, having one of the functions inverted and shifted: 𝑔 ∗ 𝑕 𝑢 = (𝑔 𝑏 𝑕 𝑢 − 𝑏 𝑒𝑏 - Function is • Discrete convolution: inverted and / shifted left by t 𝑔 ∗ 𝑕 𝑢 = . 𝑔 𝑏 𝑕(𝑢 − 𝑏) -01/ • Discrete cross-correlation: / 𝑔 ⋆ 𝑕 𝑢 = . 𝑔 𝑏 𝑕(𝑢 + 𝑏) -01/ CS109B, P ROTOPAPAS , G LICKMAN 24

Convolutions – step by step CS109B, P ROTOPAPAS , G LICKMAN 25

Convolutions – another example CS109B, P ROTOPAPAS , G LICKMAN 26

Convolutions – 3D input CS109B, P ROTOPAPAS , G LICKMAN 27

Convolutions – what happens at the edges? If we apply convolutions on a normal image, the result will be down-sampled by an amount depending on the size of the filter. We can avoid this by padding the edges in different ways. CS109B, P ROTOPAPAS , G LICKMAN 28

Padding Full padding. Introduces zeros such that all Same padding. Ensures that the pixels are visited the same amount of times by output has the same size as the the filter. Increases size of output. input. CS109B, P ROTOPAPAS , G LICKMAN 29

Convolutional layers Convolutional layer with four 3x3 filters Convolutional layer with four 3x3 filters on a on an RGB image. As you can see, the black and white image (just one channel) filters are now cubes, and they are applied on the full depth of the image.. CS109B, P ROTOPAPAS , G LICKMAN 30

Convolutional layers (cont) • To be clear: each filter is convolved with the entirety of the 3D input cube, but generates a 2D feature map. • Because we have multiple filters, we end up with a 3D output: one 2D feature map per filter. • The feature map dimension can change drastically from one conv layer to the next: we can enter a layer with a 32x32x16 input and exit with a 32x32x128 output if that layer has 128 filters. CS109B, P ROTOPAPAS , G LICKMAN 31

Why does this make sense? In image is just a matrix of pixels. Convolving the image with a filter produces a feature map that highlights the presence of a given feature in the image. CS109B, P ROTOPAPAS , G LICKMAN 32

CS109B, P ROTOPAPAS , G LICKMAN 33

Learning CNN In a convolutional layer, we are basically applying multiple filters at over the image to extract different features. But most importantly, we are learning those filters! One thing we’re missing: non-linearity. CS109B, P ROTOPAPAS , G LICKMAN 34

Introducing ReLU The most successful non-linearity for CNNs is the Rectified Non-Linear unit (ReLU): Combats the vanishing gradient problem occurring in sigmoids, is easier to compute, generates sparsity (not always beneficial) CS109B, P ROTOPAPAS , G LICKMAN 35

Convolutional layers so far A convolutional layer convolves each of its filters with the • input. Input: a 3D tensor, where the dimensions are Width, Height • and Channels (or Feature Maps) Output: a 3D tensor, with dimensions Width, Height and • Feature Maps (one for each filter) • Applies non-linear activation function (usually ReLU) over each value of the output. • Multiple parameters to define: number of filters, size of filters, stride, padding, activation function to use, regularization. CS109B, P ROTOPAPAS , G LICKMAN 36

Building a CNN A convolutional neural network is built by stacking layers, typically of 3 types: Convolutional Fully connected Pooling Layers Layers Layers CS109B, P ROTOPAPAS , G LICKMAN 37

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1 Outline CS109B, P ROTOPAPAS , G LICKMAN 2 Main drawbacks of MLPs MLPs use one perceptron for each input (e.g. pixel in an image,

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Lecture 11: Neural Networks (Part 3) March 2nd, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Veganism as a protected belief in UK human rights and equality law: some thoughts on the proposal

2017 CIVIL WAR RESEARCH PAPER AND PRESENTATION The U.S.

The Investment Court System in CETA: Constitutionality dissected Presentation by: Laurens

Lets Talk Crosswalk KIM LOUTTIT & TRICIA HUSUL SCDN MATH TEAM Quiet Signal When we

Radio Frequency Identification (RFID) Implementation Efforts at Four Firms: Integrating Lessons

Gate Access Control System Chief Tim Melanson Bear Valley Police Department Current System

WIRELESS STRAIN GAGE FOR TESTING AND HEALTH MONITORING OF CARBON FIBER COMPOSITES F. Gasco 1 , P.

Applying Real-Time Location Systems to Improve Personnel Safety in Dredging Construction

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1 Outline CS109B, P ROTOPAPAS , G LICKMAN 2 Main drawbacks of MLPs MLPs use one perceptron for each input (e.g. pixel in an image,

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

Lecture 11: Neural Networks (Part 3) March 2nd, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Veganism as a protected belief in UK human rights and equality law: some thoughts on the proposal

2017 CIVIL WAR RESEARCH PAPER AND PRESENTATION The U.S.

The Investment Court System in CETA: Constitutionality dissected Presentation by: Laurens

Lets Talk Crosswalk KIM LOUTTIT &amp; TRICIA HUSUL SCDN MATH TEAM Quiet Signal When we

Radio Frequency Identification (RFID) Implementation Efforts at Four Firms: Integrating Lessons

Gate Access Control System Chief Tim Melanson Bear Valley Police Department Current System

WIRELESS STRAIN GAGE FOR TESTING AND HEALTH MONITORING OF CARBON FIBER COMPOSITES F. Gasco 1 , P.

Applying Real-Time Location Systems to Improve Personnel Safety in Dredging Construction

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Lets Talk Crosswalk KIM LOUTTIT & TRICIA HUSUL SCDN MATH TEAM Quiet Signal When we