CNNs Applications M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017.

AlexNet [Krizhevsky, Sutskever, Hinton, 2012] • ImageNet Classification with Deep Convolutional Neural Networks

Image classification

Other Computer Vision Tasks

Semantic Segmentation

Semantic Segmentation Idea: Sliding Window Problem: Very inefficient! Not reusing shared features between overlapping patches

Semantic Segmentation Idea: Fully Convolutional

In-Network upsampling: “Unpooling”

In-Network upsampling: “Max Unpooling”

Learnable Upsampling: Transpose Convolution

Learnable Upsampling: Transpose Convolution Other names: Deconvolution (bad) Upconvolution Fractionally strided convolution Backward strided convolution

Transpose Convolution: 1D Example

Semantic Segmentation Idea: Fully Convolutional

Computer Vision Tasks

Classification + Localization Classification : C classes Input: Image CAT Output: Class label Evaluation metric: Accuracy Localization : Input: Image Output : Box in the image (x, y, w, h) (x, y, w, h) Evaluation metric: Intersection over Union Classification + Localization : Do both

Classification + Localization Often pretrained on ImageNet (Transfer learning)

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class feature map Image scores

Localization as Regression Input : image Neural Output : Net Box coordinates (4 numbers) Loss : L2 distance Correct output : box coordinates (4 numbers) Only one object, simpler than detection

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network Fully-connected layers “Classification head” Convolution Class scores and Pooling Fully-connected layers “Regression head” Final conv Box feature map Image coordinate s

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers “Classification head” Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

Simple Recipe for Classification + Localization • Step 1 : Train (or download) a classification model (e.g., VGG) • Step 2 : Attach new fully-connected “regression head” to the network • Step 3 : Train the regression head only with SGD and L2 loss Fully-connected layers Softmax loss Convolution and Pooling Class scores Fully-connected layers L2 loss Final conv Box feature map Image coordinates

Classification + Localization Often pretrained on ImageNet (Transfer learning)

Classification + Localization Fully-connected Classification head : layers C numbers (one per class) Softmax loss Convolution and Pooling Class scores Class agnostic: Fully-connected 4 numbers layers (one box) L2 loss Class specific: Final conv C x 4 numbers Box feature map Image (one box per class) coordinates

Aside: Human Pose Estimation

Where to attach the regression head? After last FC layer : After conv layers : DeepPose, R-CNN Overfeat, VGG Fully- Convolution connected and Pooling layers Softmax loss Final conv Class feature map Image scores

Object detection

Object Detection: Impact of Deep Learning

Object Detection as Regression? Each image needs a different number of outputs!

Object Detection as Classification: Sliding Window

Object Detection as Classification: Sliding Window Problem: Need to apply CNN to huge number of locations and scales, very computationally expensive!

Region Proposals

R-CNN Training • Step 1 : Train (or download) a classification model for ImageNet (e.g. VGG) Convolution Fully-connected and Pooling layers Softmax loss Final conv Class scores feature map Image 1000 classes

R-CNN Training • Step 2 : Fine-tune model for detection Instead of 1000 ImageNet classes, want 20 object classes + background - Throw away final fully-connected layer, reinitialize from scratch - Keep training model using positive / negative regions from detection images - Re-initialize this layer: Convolution Fully-connected was 4096 x 1000, and Pooling layers now will be 4096 x 21 Softmax loss Final conv Class scores: feature map Image 21 classes

R-CNN Training Step 3 : Extract features - Extract region proposals for all images - For each region: warp to CNN input size, run forward through CNN, save pool5 features to disk - Have a big hard drive: features are ~200GB for PASCAL dataset! Convolution and Pooling pool5 features Image Region Crop + Warp Forward pass Save to disk Proposals

R-CNN Training • Step 4 : Train one binary SVM per class to classify region features Training image regions Cached region features Positive samples for cat Negative samples for cat SVM SVM

R-CNN Training • Step 5 (bbox regression): For each class, train a linear regression model to map from cached features to offsets to GT boxes to make up for “slightly wrong” proposals Training image regions Cached region features (.25, 0, 0, 0) (0, 0, -0.125, 0) Regression targets (0, 0, 0, 0) Proposal too Proposal too (dx, dy, dw, dh) Proposal is far to left wide Normalized good coordinates

R-CNN: Problems • Ad hoc training objectives – Fine-tune network with softmax classifier (log loss) – Train post-hoc linear SVMs (hinge loss) – Train post-hoc bounding-box regressions (least squares) • Training is slow (84h), takes a lot of disk space • Inference (detection) is slow – 47s / image with VGG16 [Simonyan & Zisserman. ICLR15] – Fixed by SPP-net [He et al. ECCV14]

Fast R-CNN

Fast R-CNN Share computation of convolutional layers between proposals for an image

Fast R-CNN: Region of Interest Pooling Convolution Fully-connected and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Project region proposal onto conv Convolution Fully-connected feature map and Pooling layers Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Divide projected Convolution Fully-connected region into h x w and Pooling layers grid Hi-res input Hi-res conv Problem : Fully-connected image: features: layers expect low-res 3 x 800 x 600 C x H x W conv features: C x h x w with region with region proposal proposal

Fast R-CNN: Region of Interest Pooling Max-pool within each Convolution Fully-connected grid cell and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

Fast R-CNN: Region of Interest Pooling Can back propagate similar Convolution Fully-connected to max pooling and Pooling layers Hi-res input RoI conv features: Hi-res conv Fully-connected layers image: C x h x w features: expect low-res conv 3 x 800 x 600 for region C x H x W features: with region proposal with region C x h x w proposal proposal

Fast R-CNN: RoI Pooling

Fast R-CNN Share computation of convolutional layers between proposals for an image

R-CNN vs SPP vs Fast R-CNN Problem: Runtime dominated by region proposals!

Faster R-CNN • Make CNN do proposals! – Solely based on CNN – No external modules • Each step is end-to-end Shaoqing Ren, Kaiming He, Ross Girshick, & Jian Sun. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. NIPS 2015

CNNs Applications M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] ImageNet Classification

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

Lecture 2 Applications of CNNs Lin ZHANG, PhD School of Software Engineering Tongji University

Volumetric and Multi-View CNNs for Object Classification on 3D Data Charles R. Qi, Hao Su,

NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 Bypass Connection x `

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev

Food Recognition using Fusion of Classifiers based on CNNs Eduardo Aguilar, Marc Bolaos, and

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

OPENING 15 Sept 2018 Bloombergs Where to go in 2018 CNNs Metropolises on

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects -

Selected results on heavy flavour physics at LHCb Matthew CHARLES (UPMC/LPNHE) 1 Plan

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, SURFsara TF-NOC Meeting Cambridge

November 16, 2017 Gildas Avoine Loc Ferreira Rescuing LoRaWAN 1.0 Workshop CRYPTACUS 1

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

National School Lunch Program (NSLP) Equipment Assistance Grant Information Webinar Nov. 5,

Budget Recalibration P2I Meeting September 28, 2017 Agenda > 5 minutes: Welcome and Goals

Budget Recalibration: P2I October 10, 2017 Agenda 5 minutes: Welcome and Goals 60 60

CNNs Applications M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

CNNs Applications M. Soleymani Sharif University of Technology Spring 2019 Most slides have been adopted from Fei Fei Li and colleagues lectures, cs231n, Stanford 2017. AlexNet [Krizhevsky, Sutskever, Hinton, 2012] ImageNet Classification

Deep Learning for Geometry Processing 3D Representations View-Based and Volumetric CNNs 3D

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye &amp; Woon Kyoung Sung

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Texture attribute synthesis and transfer using feed-forward CNNs Thomas Irmer, Tobias Glasmachers,

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Table of Contents Convolutional Neural Nets (CNNs) 1 Deep Q Learning 2 Lecture 6: CNNs and Deep

Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are

Lecture 2 Applications of CNNs Lin ZHANG, PhD School of Software Engineering Tongji University

Volumetric and Multi-View CNNs for Object Classification on 3D Data Charles R. Qi*, Hao Su*,

NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE

Neural Architecture Ligeng Zhu May 4th 1 The Blooming of CNNs 2 Bypass Connection x `

A Case for Dynamic Activation Quantization in CNNs Karl Taht, Surya Narayanan, Rajeev

Food Recognition using Fusion of Classifiers based on CNNs Eduardo Aguilar, Marc Bolaos, and

Advanced Section #3: CNNs and Object Detection AC 209B: Data Science Javier Zazo Pavlos

OPENING 15 Sept 2018 Bloombergs Where to go in 2018 CNNs Metropolises on

CSE 152: Computer Vision Hao Su Lecture 10: Object Recognition How do we represent objects -

Selected results on heavy flavour physics at LHCb Matthew CHARLES (UPMC/LPNHE) 1 Plan

SURFsara NOC Flash talk Erik Ruiter, Sr. Network Specialist, SURFsara TF-NOC Meeting Cambridge

November 16, 2017 Gildas Avoine Loc Ferreira Rescuing LoRaWAN 1.0 Workshop CRYPTACUS 1

Texture Synthesis Given a texture, create more CS176: Texture Synthesis All examples from Wei

National School Lunch Program (NSLP) Equipment Assistance Grant Information Webinar Nov. 5,

Budget Recalibration P2I Meeting September 28, 2017 Agenda &gt; 5 minutes: Welcome and Goals

Budget Recalibration: P2I October 10, 2017 Agenda 5 minutes: Welcome and Goals 60 60

Understanding Geometry of Encoder-Decoder CNNs (E-D CNNs) Jong Chul Ye & Woon Kyoung Sung

Volumetric and Multi-View CNNs for Object Classification on 3D Data Charles R. Qi, Hao Su,

Budget Recalibration P2I Meeting September 28, 2017 Agenda > 5 minutes: Welcome and Goals