CNNs for Segmentation, Localization, and detection M. Soleymani - PowerPoint PPT Presentation

CNNs for Segmentation, Localization, and detection M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017 and some from John Canny lectures, cs294-129, Berkeley, 2016. .

AlexNet [Krizhevsky, Sutskever, Hinton, 2012] • ImageNet Classification with Deep Convolutional Neural Networks

Image classification

Other Computer Vision Tasks

Classification and localization What & where

Classification + Localization Classification : C classes Input: Image CAT Output: Class label Evaluation metric: Accuracy Localization : Input: Image Output : Box in the image (x, y, w, h) (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization : Do both

Idea #1: Localization as Regression Input : image Neural Output : Net Box coordinates (4 numbers) Loss : L2 distance Correct output : box coordinates (4 numbers) Only one object, simpler than detection

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class feature map Image scores

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network Fully-connected layers “ Classification head ” Convolution Class scores and Pooling Fully-connected layers “ Regression head ” Final conv Box feature map Image coordinates

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “ Classification head ” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “ regression head ” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers Softmax loss Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

Classification + Localization Often pretrained on ImageNet (Transfer learning)

Aside: Human Pose Estimation

Where to attach the regression head? After last FC layer : After conv layers : Overfeat, VGG DeepPose, R-CNN Fully- Convolution connected and Pooling layers Softmax loss Final conv Class feature map Image scores

Object detection

Object Detection: Impact of Deep Learning

Object Detection as Regression? Each image needs a different number of outputs!

Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

Region Proposals

R-CNN Training • Step 1 : Train (or download) a classification model for ImageNet (AlexNet) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class scores feature map Image 1000 classes

R-CNN Training • Step 2 : Fine-tune model for detection Instead of 1000 ImageNet classes, want 20 object classes + background - Throw away final fully-connected layer, reinitialize from scratch - Keep training model using positive / negative regions from detection images - Re-initialize this layer: Convolution Fully-connected was 4096 x 1000, and Pooling layers now will be 4096 x 21 Softmax loss Final conv Class scores: feature map Image 21 classes

R-CNN Training Step 3 : Extract features - Extract region proposals for all images - For each region: warp to CNN input size, run forward through CNN, save pool5 features to disk - Have a big hard drive: features are ~200GB for PASCAL dataset! Convolution and Pooling pool5 features Image Region Crop + Warp Forward pass Save to disk Proposals

R-CNN Training • Step 4 : Train one binary SVM per class to classify region features Training image regions Cached region features Positive samples for cat Negative samples for cat SVM SVM

R-CNN Training • Step 5 (bbox regression): For each class, train a linear regression model to map from cached features to offsets to GT boxes to make up for “ slightly wrong ” proposals Training image regions Cached region features (.25, 0, 0, 0) (0, 0, -0.125, 0) Regression targets (0, 0, 0, 0) Proposal too Proposal too (dx, dy, dw, dh) Proposal is far to left wide Normalized good coordinates

R-CNN: Problems • Ad hoc training objectives – Fine-tune network with softmax classifier (log loss) – Train post-hoc linear SVMs (hinge loss) – Train post-hoc bounding-box regressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow – 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] – Fixed by SPP-net [He et al. ECCV14]

Fast R-CNN

Fast R-CNN Share computation of convolutional layers between proposals for an image

Fast R-CNN: Region of Interest Pooling Convolution Fully-connected and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Project region proposal onto conv Convolution Fully-connected feature map and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Divide projected Convolution Fully-connected region into h x w and Pooling layers grid Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Max-pool within each Convolution Fully-connected grid cell and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

Fast R-CNN: Region of Interest Pooling Can back propagate similar Convolution Fully-connected to max pooling and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

Fast R-CNN: RoI Pooling

Fast R-CNN Share computation of convolutional layers between proposals for an image

R-CNN vs SPP vs Fast R-CNN Problem: Runtime dominated by region proposals!

Faster R-CNN • Make CNN do proposals! – Solely based on CNN – No external modules • Each step is end-to-end Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “ Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks ” . NIPS 2015

Faster R-CNN • Insert Region Proposal Network (RPN) to predict proposals from features Jointly • Train with 4 losses: – RPN classify object / not object – RPN regress box coordinates – Final classification score (object classes) – Final box coordinates

Faster R-CNN: Make CNN do proposals!

Object Detection Source: http://icml.cc/2016/tutorials/icml2016_tutorial_deep_residual_networks_kaiminghe.pdf

Semantic Segmentation

Semantic Segmentation Idea: Sliding Window Problem: Very inefficient! Not reusing shared features between overlapping patches

Semantic Segmentation Idea: Fully Convolutional

In-Network upsampling: “ Unpooling ”

In-Network upsampling: “ Max Unpooling ”

Learnable Upsampling: Transpose Convolution

Learnable Upsampling: Transpose Convolution Other names: Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution

CNNs for Segmentation, Localization, and detection M. Soleymani - PowerPoint PPT Presentation

CNNs for Segmentation, Localization, and detection M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017 and some from John Canny lectures,

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 Bypass Connection x `

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev

Food Recognition using Fusion of Classifiers based on CNNs Eduardo Aguilar, Marc Bolaos, and

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

OPENING 15 Sept 2018 Bloombergs Where to go in 2018 CNNs Metropolises on

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan

Object Detection EECS 442 Prof. David Fouhey Winter 2019, University of Michigan

Poster: Securing IoT through coverage-bounding wireless communication with visible light Qing

AIHce EXP Virtual Advancing Worker Health and Safety 1 Poster Specifications Your poster should

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

CNN Applications in Computer Vision ELEG 5491 Tutorial Xihui Liu Table of Contents Image

dinam: A Wireless Sensor Network Concept and Platform for Rapid Development June 16 th , 2010 7 th

Learning-based Contour Detection & Contour-based Object Detection Iasonas Kokkinos Department

Context-sensitive Analysis Attribute Grammar And Type Checking cs5363 1 Context-Sensitive

CNNs for Segmentation, Localization, and detection M. Soleymani - PowerPoint PPT Presentation

CNNs for Segmentation, Localization, and detection M. Soleymani Sharif University of Technology Fall 2017 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017 and some from John Canny lectures,

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 Bypass Connection x `

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev

Food Recognition using Fusion of Classifiers based on CNNs Eduardo Aguilar, Marc Bolaos, and

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

OPENING 15 Sept 2018 Bloombergs Where to go in 2018 CNNs Metropolises on

C-Brain: A Deep Learning Accelerator that Tames the Diversity of CNNs through Adaptive Data-level

GESTURE RECOGNITION WITH 3D CNNS Pavlo Molchanov 4/6/2016 Xiaodong Yang Shalini Gupta Kihwan

Object Detection EECS 442 Prof. David Fouhey Winter 2019, University of Michigan

Poster: Securing IoT through coverage-bounding wireless communication with visible light Qing

AIHce EXP Virtual Advancing Worker Health and Safety 1 Poster Specifications Your poster should

Post-Mortem of the NERSC Franklin XT Upgrade to CLE 2.1 James M. Craw, Nicholas P. Cardo, Yun

CNN Applications in Computer Vision ELEG 5491 Tutorial Xihui Liu Table of Contents Image

dinam: A Wireless Sensor Network Concept and Platform for Rapid Development June 16 th , 2010 7 th

Learning-based Contour Detection &amp; Contour-based Object Detection Iasonas Kokkinos Department

Context-sensitive Analysis Attribute Grammar And Type Checking cs5363 1 Context-Sensitive

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Learning-based Contour Detection & Contour-based Object Detection Iasonas Kokkinos Department