lecture 8 convolutional neural networks 1
play

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1 Outline CS109B, P ROTOPAPAS , G LICKMAN 2 Main drawbacks of MLPs MLPs use one perceptron for each input (e.g. pixel in an image,


  1. Lecture 8: Convolutional Neural Networks 1 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman 1

  2. Outline CS109B, P ROTOPAPAS , G LICKMAN 2

  3. Main drawbacks of MLPs • MLPs use one perceptron for each input (e.g. pixel in an image, multiplied by 3 in RGB case). The amount of weights rapidly becomes unmanageable for large images. • Training difficulties arise, overfitting can appear. • MLPs react differently to an input (images) and its shifted version – they are not translation invariant. CS109B, P ROTOPAPAS , G LICKMAN 3

  4. Latest events on Image Recognition You Only Look Once (YOLO) - 2016 CS109B, P ROTOPAPAS , G LICKMAN 4

  5. Latest events on Image Recognition Mask- RCNN - 2017 CS109B, P ROTOPAPAS , G LICKMAN 5

  6. Latest events on Image Recognition NVIDIA Video to Video Synthesis - 2018 CS109B, P ROTOPAPAS , G LICKMAN 6

  7. Image analysis Imagine that we want to recognize swans in an image: Round, elongated oval with orange protuberance Oval-shaped white blob (body) Long white rectangular shape (neck) CS109B, P ROTOPAPAS , G LICKMAN 7

  8. Cases can be a bit more complex … Round, elongated head with orange or black beak Oval-shaped white body with or without large white symmetric blobs (wings) Long white neck, square shape CS109B, P ROTOPAPAS , G LICKMAN 8

  9. Now what? Small black circles, Round, elongated head with Long white neck, can bend Black triangular can be facing the orange or black beak, can around, not necessarily shaped form, on the camera, sometimes be turned backwards straight head, can have can see both different sizes White tail, generally far Luckily, the from the head, looks White elongated piece, can Black feet, under feathery color is be squared or more White, oval shaped body, can have triangular, can be obstructed body, with or without different shapes consistent… CS109B, P ROTOPAPAS , G LICKMAN sometimes wings visible 9

  10. CS109B, P ROTOPAPAS , G LICKMAN 10

  11. We need to be able to deal with these cases. CS109B, P ROTOPAPAS , G LICKMAN 11

  12. Image features We’ve been basically talking about detecting features in images, in a very • naïve way. • Researchers built multiple computer vision techniques to deal with these issues: SIFT, FAST, SURF, BRIEF, etc. • However, similar problems arose: the detectors where either too general or too over-engineered. Humans were designing these feature detectors, and that made them either too simple or hard to generalize. FAST corner SIFT feature detection descriptor algorithm CS109B, P ROTOPAPAS , G LICKMAN 12

  13. Image features (cont) What if we learned the features to detect? • • We need a system that can do Representation Learning (or Feature Learning). Representation Learning: technique that allows a system to automatically find relevant features for a given task. Replaces manual feature engineering. Multiple techniques for this: Unsupervised (K-means, PCA, … ). • Supervised (Sup. Dictionary learning, Neural Networks!) • CS109B, P ROTOPAPAS , G LICKMAN 13

  14. Drawbacks Imagine we want to build a cat detector with an MLP. In this case, the red weights will be modified to better recognize cats In this case, the green weights will be modified. We are learning redundant features. Approach is not robust, as cats could appear in yet another position. CS109B, P ROTOPAPAS , G LICKMAN 14

  15. Drawbacks Example: CIFAR10 Simple 32x32 color images (3 channels) Each pixel is a feature: an MLP would have 32x32x3+1 = 3073 weights per neuron! CS109B, P ROTOPAPAS , G LICKMAN 15

  16. Drawbacks Example: ImageNet Images are usually 224x224x3: an MLP would have 150129 weights per neuron. If the first layer of the MLP is around 128 nodes, which is small, this already becomes very heavy to calculate. Model complexity is extremely high: overfitting. CS109B, P ROTOPAPAS , G LICKMAN 16

  17. Images are Local and Hierarchical CS109B, P ROTOPAPAS , G LICKMAN

  18. Images are Invariant CS109B, P ROTOPAPAS , G LICKMAN

  19. “ Convolution ” Operation CS109B, P ROTOPAPAS , G LICKMAN

  20. “ Convolution ” Operation Kernel Edge detection " % − 1 − 1 − 1 $ ' * = − 1 8 − 1 $ ' $ − 1 − 1 − 1 ' # & Sharpen " % 0 − 1 0 $ ' * = − 1 5 − 1 $ ' $ 0 − 1 0 ' # & wikipedia.org CS109B, P ROTOPAPAS , G LICKMAN

  21. A Convolutional Network + ReLU + ReLU CS109B, P ROTOPAPAS , G LICKMAN

  22. Basics of CNNs We know that MLPs: • Do not scale well for images Ignore the information brought by pixel position and correlation with • neighbors • Cannot handle translations The general idea of CNNs is to intelligently adapt to properties of images: • Pixel position and neighborhood have semantic meanings. • Elements of interest can appear anywhere in the image. CS109B, P ROTOPAPAS , G LICKMAN 22

  23. Basics of CNNs MLP CNN CNNs are also composed of layers, but those layers are not fully connected: they have filters, sets of cube-shaped weights that are applied throughout the image. Each 2D slice of the filters are called kernels. These filters introduce translation invariance and parameter sharing. How are they applied? Convolutions! CS109B, P ROTOPAPAS , G LICKMAN 23

  24. � Convolution and cross-correlation A convolution of f and g (𝑔 ∗ 𝑕) is defined as the integral of the • product, having one of the functions inverted and shifted: 𝑔 ∗ 𝑕 𝑢 = (𝑔 𝑏 𝑕 𝑢 − 𝑏 𝑒𝑏 - Function is • Discrete convolution: inverted and / shifted left by t 𝑔 ∗ 𝑕 𝑢 = . 𝑔 𝑏 𝑕(𝑢 − 𝑏) -01/ • Discrete cross-correlation: / 𝑔 ⋆ 𝑕 𝑢 = . 𝑔 𝑏 𝑕(𝑢 + 𝑏) -01/ CS109B, P ROTOPAPAS , G LICKMAN 24

  25. Convolutions – step by step CS109B, P ROTOPAPAS , G LICKMAN 25

  26. Convolutions – another example CS109B, P ROTOPAPAS , G LICKMAN 26

  27. Convolutions – 3D input CS109B, P ROTOPAPAS , G LICKMAN 27

  28. Convolutions – what happens at the edges? If we apply convolutions on a normal image, the result will be down-sampled by an amount depending on the size of the filter. We can avoid this by padding the edges in different ways. CS109B, P ROTOPAPAS , G LICKMAN 28

  29. Padding Full padding. Introduces zeros such that all Same padding. Ensures that the pixels are visited the same amount of times by output has the same size as the the filter. Increases size of output. input. CS109B, P ROTOPAPAS , G LICKMAN 29

  30. Convolutional layers Convolutional layer with four 3x3 filters Convolutional layer with four 3x3 filters on a on an RGB image. As you can see, the black and white image (just one channel) filters are now cubes, and they are applied on the full depth of the image.. CS109B, P ROTOPAPAS , G LICKMAN 30

  31. Convolutional layers (cont) • To be clear: each filter is convolved with the entirety of the 3D input cube, but generates a 2D feature map. • Because we have multiple filters, we end up with a 3D output: one 2D feature map per filter. • The feature map dimension can change drastically from one conv layer to the next: we can enter a layer with a 32x32x16 input and exit with a 32x32x128 output if that layer has 128 filters. CS109B, P ROTOPAPAS , G LICKMAN 31

  32. Why does this make sense? In image is just a matrix of pixels. Convolving the image with a filter produces a feature map that highlights the presence of a given feature in the image. CS109B, P ROTOPAPAS , G LICKMAN 32

  33. CS109B, P ROTOPAPAS , G LICKMAN 33

  34. Learning CNN In a convolutional layer, we are basically applying multiple filters at over the image to extract different features. But most importantly, we are learning those filters! One thing we’re missing: non-linearity. CS109B, P ROTOPAPAS , G LICKMAN 34

  35. Introducing ReLU The most successful non-linearity for CNNs is the Rectified Non-Linear unit (ReLU): Combats the vanishing gradient problem occurring in sigmoids, is easier to compute, generates sparsity (not always beneficial) CS109B, P ROTOPAPAS , G LICKMAN 35

  36. Convolutional layers so far A convolutional layer convolves each of its filters with the • input. Input: a 3D tensor, where the dimensions are Width, Height • and Channels (or Feature Maps) Output: a 3D tensor, with dimensions Width, Height and • Feature Maps (one for each filter) • Applies non-linear activation function (usually ReLU) over each value of the output. • Multiple parameters to define: number of filters, size of filters, stride, padding, activation function to use, regularization. CS109B, P ROTOPAPAS , G LICKMAN 36

  37. Building a CNN A convolutional neural network is built by stacking layers, typically of 3 types: Convolutional Fully connected Pooling Layers Layers Layers CS109B, P ROTOPAPAS , G LICKMAN 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend