Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) - PowerPoint PPT Presentation

Reproducible$research$– get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research

Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person : 0.992 horse : 0.993 Recognition car : 1.000 Wh What? person : 0.979 dog : 0.997 Figure%adapted%from%Kaiming He

Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets RHCNNv1 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Object%detection%renaissance% (2013'present) 80% PASCAL$VOC Fast$RHCNN 70% mean0Average0Precision0(mAP) +$Accurate 60% RHCNNv1 +$Fast 50% +$Streamlined +$Accurate H Slow 40% H Inelegant 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

Region'based%convnets (R'CNNs) • RHCNN$(aka$“slow$RHCNN”)$ [Girshick et$al.$CVPR14] • SPPHnet$ [He$et$al.$ECCV14]

Slow%R'CNN Input$image Girshick et$al.$CVPR14.

Slow%R'CNN Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

Slow%R'CNN Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

Slow%R'CNN Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

Slow%R'CNN Classify$regions$with$SVMs SVMs SVMs SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

Slow%R'CNN Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs Bbox reg SVMs Bbox reg SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

What’s%wrong%with%slow%R'CNN?

What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss)

What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss) • Training$is$slow$(84h),$takes$a$lot$of$disk$space

What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressions$(least$squares) • Training$is$slow$(84h),$takes$a$lot$of$disk$space • Inference$(detection)$is$slow • 47s$/$image$with$VGG16$[Simonyan &$Zisserman.$ICLR15] • Fixed$by$SPPHnet$[He$et$al.$ECCV14] ~2000$ConvNet forward$passes$per$image

SPP'net Input$image He$et$al.$ECCV14.

SPP'net “conv5”$feature$map$of$image Forward$ whole& image$through$ConvNet ConvNet Input$image He$et$al.$ECCV14.

SPP'net Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

SPP'net Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

SPP'net Classify$regions$with$SVMs SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

SPP'net Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

What’s%good%about%SPP'net? • Fixes$one$issue$with$RHCNN:$makes$testing$fast Bbox reg SVMs RegionHwise FCs computation ImageHwise computation (shared) ConvNet Post$hoc$component

What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(25h),$takes$a$lot$of$disk$space

What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(though$faster),$takes$a$lot$of$disk$space • Introduces$a$new$problem:$cannot$update$ parameters$below$SPP$layer$during$training

SPP'net:%the%main%limitation Bbox reg SVMs Trainable (3$layers) FCs Frozen ConvNet (13$layers) Post$hoc$component He$et$al.$ECCV14.

Fast%R'CNN • Fast$testHtime,$like$SPPHnet

Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage

Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage • Higher$mean$average$precision$than$slow$RHCNN$ and$SPPHnet

Fast%R'CNN%(test%time) Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

Fast%R'CNN%(test%time) “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

Fast%R'CNN%(test%time) Linear$+ Softmax classifier softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

Fast%R'CNN%(test%time) Linear$+ Softmax classifier BoundingHbox$ regressors Linear softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

Fast%R'CNN (training) Linear$+ Linear softmax FCs ConvNet

Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs ConvNet

Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs Trainable ConvNet

Obstacle%#1:%Differentiable%RoI pooling Region%of%Interest%(RoI)%pooling%must%be%(sub')% differentiable%to%train%conv layers

Obstacle%#1:%Differentiable%RoI pooling ) 8 RoI pooling ! ∗ 0,2 = 23 5 8,: ! ∗ 1,0 = 23 5 9,8 ) 0 :; 9 ) 8 RoI pooling ) 9 max%pooling%“switch”% 1$if$ ), * “pooled” ( i.e. argmax back'pointer) input$ ! ;$0$o/w 23 23 = 4 4 ! = ! ∗ ), * 20 1 25 67 6 7 P artial Over$regions$ ) , Partial$from for$ 0 1 locations$ * next$layer

Obstacle%#2:%efficient%SGD%steps Slow%R'CNN%and%SPP'net%use%region'wise%sampling%to% make%mini'batches • Sample%128%example%RoIs uniformly%at%random • Examples%will%come%from%different%images%with%high% probability ...$ ...$ ...$ ...$ SGD$miniHbatch

Obstacle%#2:%efficient%SGD%steps Note%the%receptive%field%for%one%example%RoI is%often% very%large • Worst%case:%the%receptive%field%is%the%entire%image Example$RoI Example$RoI RoI’sreceptive$field

Obstacle%#2:%efficient%SGD%steps Worst%case%cost%per%mini'batch%(crude%model%of% computational%complexity) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN 128*600*1000%/%(128*224%*224)%=%12x%more% computation%than%slow%R'CNN Example$RoI Example$RoI RoI’sreceptive$field

Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches ...$ ...$ ...$ ...$

Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images

Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images • Sample%many% examples%from% each%image%(64)% SGD$miniHbatch

Obstacle%#2:%efficient%SGD%steps Use%the%test'time%trick%from%SPP'net%during%training • Share%computation%between%overlapping%examples% from%the%same%image Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

Obstacle%#2:%efficient%SGD%steps Cost%per%mini'batch%compared%to%slow%R'CNN%(same% crude%cost%model) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN • 2*600*1000%/%(128*224*224)%=%0.19x%less% computation%than%slow%R'CNN Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) - PowerPoint PPT Presentation

Reproducible$research$ get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person :

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fast Food and Your Health www.ddssafety.net Last updated October 2009 What is fast food?

Lurssen 32,9 A classic fast Lurssen 32,9 A classic fast A F T D E C K Lurssen 32,9 A

Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev

Object Detection in Recent 3 Years Beyond RetinaNet and Mask R-CNN Gang Yu

Dynamic Graph CNN for learning on point clouds Wang Yue, et al. Otakar Jaek March 25, 2019

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task Danqi Chen, Jason Bolton

Intro to Feature Representation in Virtual Screening Shengchao Liu, Gitter Group Feature

Lecture 19: Generative Models, Part 1 Justin Johnson November 20, 2019 Lecture 19 - 1 Last

SCE Map Update: Performance Studies Michael Mooney, Hannah Rogers Colorado State University

Skylight to enhance outdoor scenes 02564 Real-Time Graphics Skylight and irradiance environment

Counting colored maps: algebraicity results ArXiv: 0909.1695 Olivier Bernardi, MIT Joint work

Today Translating our parsing rules to Haskell Lambda Calculus Higher-Order abstract

Deriving Generic Functions by Example Neil Mitchell www.cs.york.ac.uk/~ndm/derive Generic

Karnaugh maps Last week we saw applications of Boolean logic to circuit design. The basic

Mapping the Tcl world: using Tcl to curate OpenStreetMap Kevin B. Kenny 5 November 2019 Howd

Around canonical heights in arithmetic dynamics Shu Kawaguchi Arithmetic 2015 - Silvermania