Tw Two-sta stage ge object object detec detectors tors CV3DST - PowerPoint PPT Presentation

Tw Two-sta stage ge object object detec detectors tors CV3DST | Prof. Leal-Taixé 1

Ty Types of object ct dete tecto ctors • One-stage detectors Class score (cat, Classification dog, person) Feature Image extraction Bounding box Localization (x,y,w,h) • Two-stage detectors Class score (cat, Classification Extraction of dog, person) Feature Image object extraction Refine bounding box proposals Localization ( Δ x, Δ y, Δ w, Δ h) CV3DST | Prof. Leal-Taixé 2

Ty Types of object ct dete tecto ctors • One-stage detectors Class score (cat, Classification dog, person) Feature Image extraction Bounding box Localization (x,y,w,h) • Two-stage detectors Class score (cat, Classification Extraction of dog, person) Feature Image object extraction Refine bounding box proposals Localization ( Δ x, Δ y, Δ w, Δ h) CV3DST | Prof. Leal-Taixé 3

Lo Locali lizati tion • Bounding box regression Output: Box coordinates (x,y,w,h) Feature extraction (this time with a L2 loss function Neural Network) Image Ground truth: Box coordinates Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 4

Lo Locali lizati tion • Bounding box regression Output: Box coordinates (x,y,w,h) L2 loss function Convolutional Image Neural Network Ground truth: Box coordinates Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 5

Lo Locali lizati tion n and nd cla lassificati tion • Bounding box regression Fully connected Output: Box coordinates (x,y,w,h) Convolutional Image Neural Network Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 6

Lo Locali lizati tion n and nd cla lassificati tion • Bounding box regression Fully connected L2 loss Output: Box coordinates (x,y,w,h) Convolutional Image Neural Network Softmax loss Output: Class scores Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 7

Lo Locali lizati tion n and nd cla lassificati tion • Bounding box regression Regression head Output: Box coordinates (x,y,w,h) Convolutional Image Neural Network Classification Output: head Class scores Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 8

Lo Locali lizati tion n and nd cla lassificati tion • It was typical to train the classification head first, freeze the layers • Then train the regression head • At test time, we use both! Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 CV3DST | Prof. Leal-Taixé 10

Ov Overfe rfeat • Sliding window + box regression + classification Feature map Boxes (5 x 5 x 1024) (1000 x 4) Convolutional Class scores Image Neural Network 1000 (221 x 221 x 3) Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 11

Ov Overfe rfeat • Sliding window + box regression + classification Image (468 x 356 x 3) Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 12

Ov Overfe rfeat • Sliding window + box regression + classification We end up with many predictions and we have to combine them for a final detection (in Overfeat they have a greedy method) Image (468 x 356 x 3) Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 16

Ov Overfe rfeat • Sliding window + box regression + classification We end up with many predictions and we have to combine them for a final detection (in Overfeat they have a greedy method) Image (468 x 356 x 3) Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 17

Ov Overfe rfeat • In practice: use many sliding window locations and multiple scales Window positions + score maps Box regression outputs Final Predictions Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 31 CV3DST | Prof. Leal-Taixé 18

Ov Overfe rfeat • Sliding window + box regression + classification Feature map Boxes (5 x 5 x 1024) (1000 x 4) Convolutional Class scores Image Neural Network 1000 (221 x 221 x 3) What prevents us from dealing with any image size? Sermanet et al, “Integrated Recognition, Localization and Detection using Convolutional Networks”, ICLR 2014 Lecture 8 - 12 CV3DST | Prof. Leal-Taixé 19

Wh What at ab abou out multiple e ob objec jects? • Localization: Regression • How about detection? CV3DST | Prof. Leal-Taixé 20

Wh What at ab abou out multiple e ob objec jects? • Localization: Regression • How about detection? 3 objects means having an output of 12 numbers (3 x 4) CV3DST | Prof. Leal-Taixé 21

Wh What at ab abou out multiple e ob objec jects? • Localization: Regression • How about detection? 14 objects means having an output of 56 numbers (14 x 4) CV3DST | Prof. Leal-Taixé 22

What Wh at ab abou out multiple e ob objec jects? • Localization: Regression • How about detection? • Having a variable sized output is not optimal for Neural Networks • There are a couple of workarounds: – RNN: Romera-Paredes and Torr. Recurrent Instance Segmentation. ECCV 2016. – Set prediction: Rezatofighi, Kaskman, Motlagh, Shi, Cremers, Leal-Taixé, Reid. Deep Perm-Set Net: Learn to predict sets with unknown permutation and cardinality using deep neural networks. Arxiv: 1805.00613 CV3DST | Prof. Leal-Taixé 23

De Dete tecti ction as cla classifica cati tion? • Localization: Regression • How about detection? Regression Is this a Flamingo? NO CV3DST | Prof. Leal-Taixé 24

De Dete tecti ction as cla classifica cati tion? • Localization: Regression • How about detection? Regression Is this a Flamingo? NO CV3DST | Prof. Leal-Taixé 25

De Dete tecti ction as cla classifica cati tion? • Localization: Regression • How about detection? Regression Is this a Flamingo? YES! CV3DST | Prof. Leal-Taixé 26

De Dete tecti ction as cla classifica cati tion? • Localization: Regression • How about detection? Classification • Problem: – Expensive to try all possible positions, scales and aspect ratios – How about trying only on a subset of boxes with most potential? CV3DST | Prof. Leal-Taixé 27

Reg Region on Pr Propo posals ls • We have already seen a method that gives us “interesting” regions in an image that potentially contain an object • Step 1: Obtain region proposals • Step 2: Classify them. Lecture 8 - 49 CV3DST | Prof. Leal-Taixé 28

Th The e R-CNN family ly CV3DST | Prof. Leal-Taixé 29

R-CN CNN Girschick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014 CV3DST | Prof. Leal-Taixé 30

R-CN CNN Classification head Regression head to refine the bounding box Extract features location Warping to a fix size 227 x 227 Girschick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014 CV3DST | Prof. Leal-Taixé 31

R-CN CNN • Training scheme: – 1. Pre-train the CNN on ImageNet – 2. Finetune the CNN on the number of classes the detector is aiming to classify (softmax loss) – 3. Train a linear Support Vector Machine classifier to classify image regions. One SVM per class! (hinge loss) – 4. Train the bounding box regressor (L2 loss) CV3DST | Prof. Leal-Taixé 32

R-CN CNN • PROS: – The pipeline of proposals, feature extraction and SVM classification is well-known and tested. Only features are changed (CNN instead of HOG). – CNN summarizes each proposal into a 4096 vector (much more compact representation compared to HOG) – Leverage transfer learning: the CNN can be pre-trained for image classification with C classes. One needs only to change the FC layers to deal with Z classes. CV3DST | Prof. Leal-Taixé 33

R-CN CNN • CONS: Let us try to solve this first – Slow! 47s/image with VGG16 backbone. One considers around 2000 proposals per image, they need to be warped and forwarded through the CNN. – Training is also slow and complex – The object proposal algorithm is fixed. Feature extraction and SVM classifier are trained separately à not exploiting learning to its full potential. CV3DST | Prof. Leal-Taixé 34

Tw Two-sta stage ge object object detec detectors tors CV3DST - PowerPoint PPT Presentation

Tw Two-sta stage ge object object detec detectors tors CV3DST | Prof. Leal-Taix 1 Ty Types of object ct dete tecto ctors One-stage detectors Class score (cat, Classification dog, person) Feature Image extraction Bounding

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Ten Tors / D of E Ten Tors / D of E Equipment q pm Individual Equipment Needed Individual

TORS & Supraglottic Laryngectomy TORS & Supraglottic Laryngectomy Dr. Walvekar, I have

Review o view on n LAr LAr Detec Detector tors Kohei Yorita (Waseda Univ.) 8 th March, 2019

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Detectors installation in the TAN at IR1 and IR5: Detectors installation in the TAN at IR1 and

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Summary : SC/Quantum detectors 2 hr 45m 3 hr 30m two long days with detectors for

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

SSWG Stage Two: Information Gathering Todays Plan Review feedback Review Stage Two related to

Learning in the Foundation Stage The Foundation Stage is the stage of education for children

ERSKINE PARK HIGH SCHOOL Putting Plans into Action STAGE 3 STAGE 2 STAGE 1 Since last

Need to define two things: The destination Something to click on to get there Tag

EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2003 Prof. Shih-Fu Chang

Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon

UC SF The Short Neck: What is the Role of Anchors, Chimneys, Z-Fen? Jade S. Hiramoto, MD, MAS

Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Tw Two-sta stage ge object object detec detectors tors CV3DST - PowerPoint PPT Presentation

Tw Two-sta stage ge object object detec detectors tors CV3DST | Prof. Leal-Taix 1 Ty Types of object ct dete tecto ctors One-stage detectors Class score (cat, Classification dog, person) Feature Image extraction Bounding

in Big-Data Analytic Systems Rui Li , Peizhen Guo, Bo Hu, Wenjun Hu Yale University Background

Ten Tors / D of E Ten Tors / D of E Equipment q pm Individual Equipment Needed Individual

TORS &amp; Supraglottic Laryngectomy TORS &amp; Supraglottic Laryngectomy Dr. Walvekar, I have

Review o view on n LAr LAr Detec Detector tors Kohei Yorita (Waseda Univ.) 8 th March, 2019

Sta$s$cs Sta$s$cs Fourth Dimension of a Sta$s$cal Programmer

VOLVO PENTA STAGE V SOLUTION Engine concept and range presentation April 2019 ADDITIONAL

Detectors installation in the TAN at IR1 and IR5: Detectors installation in the TAN at IR1 and

RICH DETECTORS Giulia Meo University of Heidelberg 27 January 2017 1/30 Cherenkov Radiation

IGCSE MISY Mandalay 2020-2022 MISY Mandalay Key Stage 4 MISY Key Stages EYFS KS4 KS5 KS1

24/10/2018 01/12/2018 01/07/2019 01/07/2020 01/07/2021 01/07/2022 Stage 2 Stage 3 Royal

F orwa rd L ooking Sta te me nt Ce rta in o f the sta te me nts ma de in this Pre se nta tio

Summary : SC/Quantum detectors 2 hr 45m 3 hr 30m two long days with detectors for

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

SSWG Stage Two: Information Gathering Todays Plan Review feedback Review Stage Two related to

Learning in the Foundation Stage The Foundation Stage is the stage of education for children

ERSKINE PARK HIGH SCHOOL Putting Plans into Action STAGE 3 STAGE 2 STAGE 1 Since last

Need to define two things: The destination Something to click on to get there Tag

EE 6882 Statistical Methods for Video Indexing and Analysis Fall 2003 Prof. Shih-Fu Chang

Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon

UC SF The Short Neck: What is the Role of Anchors, Chimneys, Z-Fen? Jade S. Hiramoto, MD, MAS

Convolutional Feature Maps Elements of efficient (and accurate) CNN-based object detection

Object Detection Ujjwal Post-Doc, STARS Team INRIA Sophia Antipolis Outline What is Object

Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

TORS & Supraglottic Laryngectomy TORS & Supraglottic Laryngectomy Dr. Walvekar, I have