CNNs Applications M. Soleymani Sharif University of Technology - - PowerPoint PPT Presentation

cnns applications
SMART_READER_LITE
LIVE PREVIEW

CNNs Applications M. Soleymani Sharif University of Technology - - PowerPoint PPT Presentation

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] ImageNet Classification


slide-1
SLIDE 1

CNNs Applications

  • M. Soleymani

Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017.

slide-2
SLIDE 2

AlexNet

  • ImageNet Classification with Deep Convolutional Neural Networks

[Krizhevsky, Sutskever, Hinton, 2012]

slide-3
SLIDE 3

Image classification

slide-4
SLIDE 4

Other Computer Vision Tasks

slide-5
SLIDE 5

Semantic Segmentation

slide-6
SLIDE 6

Semantic Segmentation Idea: Sliding Window

Problem: Very inefficient! Not reusing shared features between

  • verlapping patches
slide-7
SLIDE 7

Semantic Segmentation Idea: Fully Convolutional

slide-8
SLIDE 8

Semantic Segmentation Idea: Fully Convolutional

slide-9
SLIDE 9

Semantic Segmentation Idea: Fully Convolutional

slide-10
SLIDE 10

In-Network upsampling: “Unpooling”

slide-11
SLIDE 11

In-Network upsampling: “Max Unpooling”

slide-12
SLIDE 12

Learnable Upsampling: Transpose Convolution

slide-13
SLIDE 13

Learnable Upsampling: Transpose Convolution

slide-14
SLIDE 14

Learnable Upsampling: Transpose Convolution

Other names: Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution

slide-15
SLIDE 15

Transpose Convolution: 1D Example

slide-16
SLIDE 16

Semantic Segmentation Idea: Fully Convolutional

slide-17
SLIDE 17

Computer Vision Tasks

slide-18
SLIDE 18

Classification + Localization

Classification: C classes Input: Image Output: Class label Evaluation metric: Accuracy Localization: Input: Image Output: Box in the image (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization: Do both CAT (x, y, w, h)

slide-19
SLIDE 19

Classification + Localization

Often pretrained on ImageNet (Transfer learning)

slide-20
SLIDE 20

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model (e.g., VGG)

Image Convolution and Pooling Final conv feature map Fully-connected layers Class scores Softmax loss

slide-21
SLIDE 21

Localization as Regression

Input: image Output: Box coordinates (4 numbers) Neural Net Correct output: box coordinates (4 numbers) Loss: L2 distance Only one object, simpler than detection

slide-22
SLIDE 22

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model (e.g., VGG)
  • Step 2: Attach new fully-connected “regression head” to the network

Image Convolution and Pooling Final conv feature map

Fully-connected layers Class scores Fully-connected layers Box coordinate s

“Classification head” “Regression head”

slide-23
SLIDE 23

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model (e.g., VGG)
  • Step 2: Attach new fully-connected “regression head” to the network
  • Step 3: Train the regression head only with SGD and L2 loss

Image Convolution and Pooling Final conv feature map

Fully-connected layers Class scores

“Classification head”

Fully-connected layers Box coordinates

L2 loss

slide-24
SLIDE 24

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model (e.g., VGG)
  • Step 2: Attach new fully-connected “regression head” to the network
  • Step 3: Train the regression head only with SGD and L2 loss

Image Convolution and Pooling Final conv feature map

Fully-connected layers Class scores

“Classification head”

Fully-connected layers Box coordinates

L2 loss

slide-25
SLIDE 25

Simple Recipe for Classification + Localization

  • Step 1: Train (or download) a classification model (e.g., VGG)
  • Step 2: Attach new fully-connected “regression head” to the network
  • Step 3: Train the regression head only with SGD and L2 loss

Image Convolution and Pooling Final conv feature map

Fully-connected layers Class scores Fully-connected layers Box coordinates

L2 loss

Softmax loss

slide-26
SLIDE 26

Classification + Localization

Often pretrained on ImageNet (Transfer learning)

slide-27
SLIDE 27

Classification + Localization

Image Convolution and Pooling Final conv feature map

Fully-connected layers Class scores Fully-connected layers Box coordinates

L2 loss

Softmax loss

Classification head: C numbers (one per class) Class agnostic: 4 numbers (one box) Class specific: C x 4 numbers (one box per class)

slide-28
SLIDE 28

Aside: Human Pose Estimation

slide-29
SLIDE 29

Aside: Human Pose Estimation

slide-30
SLIDE 30

Where to attach the regression head?

Image Convolution and Pooling Final conv feature map Fully- connected layers Class scores Softmax loss After conv layers: Overfeat, VGG After last FC layer: DeepPose, R-CNN

slide-31
SLIDE 31

Object detection

slide-32
SLIDE 32

Object Detection: Impact of Deep Learning

slide-33
SLIDE 33

Object Detection as Regression?

Each image needs a different number of outputs!

slide-34
SLIDE 34

Object Detection as Classification: Sliding Window

slide-35
SLIDE 35

Object Detection as Classification: Sliding Window

slide-36
SLIDE 36

Object Detection as Classification: Sliding Window

Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

slide-37
SLIDE 37

Region Proposals

slide-38
SLIDE 38

R-CNN

slide-39
SLIDE 39

R-CNN

slide-40
SLIDE 40

R-CNN

slide-41
SLIDE 41

R-CNN

slide-42
SLIDE 42

Image Convolution and Pooling Final conv feature map Fully-connected layers Class scores 1000 classes Softmax loss

R-CNN Training

  • Step 1: Train (or download) a classification model for ImageNet (e.g.

VGG)

slide-43
SLIDE 43

Image Convolution and Pooling Final conv feature map Fully-connected layers Class scores: 21 classes Softmax loss Re-initialize this layer: was 4096 x 1000, now will be 4096 x 21

R-CNN Training

  • Step 2: Fine-tune model for detection
  • Instead of 1000 ImageNet classes, want 20 object classes + background
  • Throw away final fully-connected layer, reinitialize from scratch
  • Keep training model using positive / negative regions from detection images
slide-44
SLIDE 44

Image Convolution and Pooling pool5 features Region Proposals Crop + Warp Forward pass Save to disk

R-CNN Training

Step 3: Extract features

  • Extract region proposals for all images
  • For each region: warp to CNN input size, run forward through CNN, save pool5 features to disk
  • Have a big hard drive: features are ~200GB for PASCAL dataset!
slide-45
SLIDE 45

Positive samples for cat SVM Negative samples for cat SVM Training image regions Cached region features

R-CNN Training

  • Step 4: Train one binary SVM per class to classify region features
slide-46
SLIDE 46

Training image regions Cached region features Regression targets (dx, dy, dw, dh) Normalized coordinates (0, 0, 0, 0) Proposal is good (.25, 0, 0, 0) Proposal too far to left (0, 0, -0.125, 0) Proposal too wide

R-CNN Training

  • Step 5 (bbox regression): For each class, train a linear regression model to map

from cached features to offsets to GT boxes to make up for “slightly wrong” proposals

slide-47
SLIDE 47

R-CNN: Problems

  • Ad hoc training objectives

– Fine-tune network with softmax classifier (log loss) – Train post-hoc linear SVMs (hinge loss) – Train post-hoc bounding-box regressions (least squares)

  • Training is slow (84h), takes a lot of disk space
  • Inference (detection) is slow

– 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] – Fixed by SPP-net [He et al. ECCV14]

slide-48
SLIDE 48

Fast R-CNN

slide-49
SLIDE 49

Fast R-CNN

Share computation of convolutional layers between proposals for an image

slide-50
SLIDE 50

Fast R-CNN: Region of Interest Pooling

Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Problem: Fully-connected layers expect low-res conv features: C x h x w

slide-51
SLIDE 51

Fast R-CNN: Region of Interest Pooling

Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Project region proposal onto conv feature map Problem: Fully-connected layers expect low-res conv features: C x h x w

slide-52
SLIDE 52

Fast R-CNN: Region of Interest Pooling

Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Problem: Fully-connected layers expect low-res conv features: C x h x w Divide projected region into h x w grid

slide-53
SLIDE 53

Fast R-CNN: Region of Interest Pooling

Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w

slide-54
SLIDE 54

Fast R-CNN: Region of Interest Pooling

Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Can back propagate similar to max pooling RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w

slide-55
SLIDE 55

Fast R-CNN: RoI Pooling

slide-56
SLIDE 56

Fast R-CNN

Share computation of convolutional layers between proposals for an image

slide-57
SLIDE 57

R-CNN vs SPP vs Fast R-CNN

Problem: Runtime dominated by region proposals!

slide-58
SLIDE 58

Faster R-CNN

  • Make CNN do proposals!

– Solely based on CNN – No external modules

  • Each step is end-to-end

Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015

slide-59
SLIDE 59

Faster R-CNN

  • Insert

Region Proposal Network (RPN) to predict proposals from features Jointly

  • Train with 4 losses:

– RPN classify object / not object – RPN regress box coordinates – Final classification score (object classes) – Final box coordinates

slide-60
SLIDE 60

Faster R-CNN: Make CNN do proposals!

slide-61
SLIDE 61

Object Detection: Lots of Variables …

slide-62
SLIDE 62

Object Detection

Source: http://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf

slide-63
SLIDE 63
slide-64
SLIDE 64

Instance Segmentation

slide-65
SLIDE 65

Mask R-CNN

slide-66
SLIDE 66

Mask R-CNN: Very Good Results!

slide-67
SLIDE 67

Mask R-CNN: Also does pose

slide-68
SLIDE 68

Open Source Frameworks

  • Lots of good implementations on GitHub!
  • TensorFlow Detection API:

– https://github.com/tensorflow/models/tree/master/research/object_detection – Faster RCNN, SSD, RFCN, Mask R-CNN

  • Caffe2 Detectron:

– https://github.com/facebookresearch/Detectron – Mask R-CNN, RetinaNet, Faster R-CNN, RPN, Fast R-CNN, R-FCN

  • Finetune on your own dataset with pre-trained models