CS6501: Deep Learning for Visual Recognition Object Detection: - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

Today’s Class • Object Detection • The RCNN Object Detector (2014) • The Fast RCNN Object Detector (2015) • The Faster RCNN Object Detector (2016) • YOLO (CVPR 2016) • SSD (ECCV 2016)

Object Detection deer cat

Object Detection Class Scores Deer: 0.9 Cat: 0.05 Fully Connected : Umbrella: 0.01 4096 to k … Fully Connected: 4096 to 4 Box Coordinates (x, y, w, h)

Object Detection Deer: (x, y, w, h) 4096 Cat: (x, y, w, h)

Object Detection Penguin: (x, y, w, h) Penguin: (x, y, w, h) 4096 Penguin: (x, y, w, h) Penguin: (x, y, w, h) …

Object Detection as Classification deer? CNN cat? background?

Object Detection as Classification with Sliding Window deer? CNN cat? background?

Object Detection as Classification with Box Proposals

RCNN https://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr.pdf Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick et al. CVPR 2014.

RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Selective Search: combine the strength of both an exhaustive search and segmentation. Uijlings et al. IJCV 2013. ref

RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector • from each region proposal feature vector warp CNN Arbitrary rectangles? 5 conv layers + 2 fully A fixed size input? 227 x 227 connected layers

RCNN First stage: generate category- independent region proposals. • 2000 Region proposals for every image Second stage: extracts a fixed-length feature vector from each region. a 4096-dimensional feature vector • people? from each region proposal feature vector linear horse? svm Third stage: a set of class- specific background? linear SVMs. x object category and location • Bounding box y regression w h proposal location

Fast-RCNN RCNN Simple and scalable. • improves mAP. • • A multistage pipeline. Training is expensive in • ? space and time (features are extracted from each region proposal in each image and written into disk). Object detection is slow. •

Fast-RCNN Idea: No need to recompute features for every box independently https://arxiv.org/abs/1504.08083 Fast R-CNN. Girshick. ICCV 2015.

Fast-RCNN Process the whole image with several convolutional ( conv ) and max pooling a region of interest ( RoI ) pooling layers to produce a conv feature map. layer extracts a fixed-length feature vector from the region feature map. FC+ K + 1 categories softmax feature vector + four real-valued FC+ numbers for each of regressor the K object classes. …

RCNN vs Fast-RCNN Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Fast-RCNN RCNN Faster-RCNN Simple and scalable. Higher mAP. • • improves mAP. Single stage, end-to-end • • training. No disk storage is required • • A multistage pipeline. for feature caching. Training is expensive in • ? space and time (features proposals are the • are extracted from each computational bottleneck region proposal in each in detection systems. image and written into disk). Object detection is slow. •

Faster-RCNN Idea: Integrate the Bounding Box Pro posals as part of the CNN predictions https://arxiv.org/abs/1506.01497 Ren et al. NIPS 2015.

Faster-RCNN Region Proposal Networks: k anchors boxes 2 k scores 4 k coordinates object or not object bounding box proposal RPN 1x1 conv layer 1x1 conv layer cls layer reg layer Shared conv layers nxn conv layer Fast-RCNN feature map … sliding window, nxn

RCNN vs Fast-RCNN Figure adapted from: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf

Fast-RCNN RCNN Faster-RCNN Simple and scalable. Higher mAP. compute proposals with a • • • improves mAP. Single stage, end-to-end deep convolutional neural • • training. network -- Region Proposal No disk storage is required Network (RPN) • • A multistage pipeline. for feature caching. merge RPN and Fast R-CNN • Training is expensive in • into a single network, space and time (features enabling nearly cost-free proposals are the • are extracted from each region proposals. computational bottleneck region proposal in each in detection systems. image and written into ? disk). Object detection is slow. •

YOLO- You Only Look Once Idea: No bounding box proposal. A single regression problem, stra ight from image pixels to boundi ng box coordinates and class pro babilities. extremely fast • reason globally • learn generalizable represent • ations https://arxiv.org/abs/1506.02640 Redmon et al. CVPR 2016.

YOLO- You Only Look Once Divide the image into 7x7 cells. Each cell trains a detector. The detector needs to predict the object’s class distributions. The detector has 2 bounding-box predictors to predict bounding-boxes and confidence scores.

SSD: Single Shot Detector Idea: Similar to YOLO, but denser grid map, multiscale grid maps. + Data augme ntation + Hard negative mining + Other design choices in the network. Liu et al. ECCV 2016.

Questions?

CS6501: Deep Learning for Visual Recognition Object Detection: - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN Todays Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster RCNN Object Detector

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Introduction to Visual Recognition General visual recognition importance for intelligence?

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Rich representations for Rich representations for learning visual recognition learning visual

Softmax Classifier + SGD Todays Class Intro to Machine Learning What is Machine Learning?

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

De Deer P Pop opula lation ion on on K Kaib aibab ab Pla Plateau G Game P Preserve

Incorporating Stakeholders Values into Ohio Deer Management: Workshop #2 Ohio Division of

Incorporating Stakeholders Values into Ohio Deer Management: Workshop #2 Ohio Division of

Interactive language learning from two extremes Sida I. Wang, Percy Liang, Christopher D. Manning

Classification from Positive, Unlabeled and Biased Negative Data Poster #180 Yu-Guan Hsieh 1 ,

Massively Parallel Computation Philip Bille Sequential Computation Computation. Read and

Correlation Autoencoder Hashing for Supervised Cross-Modal Search . . . Yue Cao, Mingsheng

JUST THE MATHS SLIDES NUMBER 15.1 ORDINARY DIFFERENTIAL EQUATIONS 1 (First order

CS6501: Deep Learning for Visual Recognition Object Detection: - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN Todays Class Object Detection The RCNN Object Detector (2014) The Fast RCNN Object Detector (2015) The Faster RCNN Object Detector

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

CS6501: Deep Learning for Visual Recognition Seq2Seq Model &amp; Text-to-Image Synthesis

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Introduction to Visual Recognition General visual recognition importance for intelligence?

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Rich representations for Rich representations for learning visual recognition learning visual

Softmax Classifier + SGD Todays Class Intro to Machine Learning What is Machine Learning?

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

De Deer P Pop opula lation ion on on K Kaib aibab ab Pla Plateau G Game P Preserve

Incorporating Stakeholders Values into Ohio Deer Management: Workshop #2 Ohio Division of

Incorporating Stakeholders Values into Ohio Deer Management: Workshop #2 Ohio Division of

Interactive language learning from two extremes Sida I. Wang, Percy Liang, Christopher D. Manning

Classification from Positive, Unlabeled and Biased Negative Data Poster #180 Yu-Guan Hsieh 1 ,

Massively Parallel Computation Philip Bille Sequential Computation Computation. Read and

Correlation Autoencoder Hashing for Supervised Cross-Modal Search . . . Yue Cao, Mingsheng

JUST THE MATHS SLIDES NUMBER 15.1 ORDINARY DIFFERENTIAL EQUATIONS 1 (First order

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis