fast r cnn
play

Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) - PowerPoint PPT Presentation

Reproducible$research$ get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person :


  1. Reproducible$research$– get$the$code! http://git.io/vBqm5 Fast%R'CNN Ross$Girshick Facebook$AI$Research$(FAIR) Work$done$at$Microsoft$Research

  2. Fast%Region'based%ConvNets (R'CNNs)% for%Object%Detection Localization Wh Where? person : 0.992 horse : 0.993 Recognition car : 1.000 Wh What? person : 0.979 dog : 0.997 Figure%adapted%from%Kaiming He

  3. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  4. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC 70% mean0Average0Precision0(mAP) 60% Before$deep$convnets RHCNNv1 50% 40% Using$deep$convnets 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  5. Object%detection%renaissance% (2013'present) 80% PASCAL$VOC Fast$RHCNN 70% mean0Average0Precision0(mAP) +$Accurate 60% RHCNNv1 +$Fast 50% +$Streamlined +$Accurate H Slow 40% H Inelegant 30% 20% 10% 0% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 year

  6. Region'based%convnets (R'CNNs) • RHCNN$(aka$“slow$RHCNN”)$ [Girshick et$al.$CVPR14] • SPPHnet$ [He$et$al.$ECCV14]

  7. Slow%R'CNN Input$image Girshick et$al.$CVPR14.

  8. Slow%R'CNN Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  9. Slow%R'CNN Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  10. Slow%R'CNN Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Girshick et$al.$CVPR14.

  11. Slow%R'CNN Classify$regions$with$SVMs SVMs SVMs SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

  12. Slow%R'CNN Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs Bbox reg SVMs Bbox reg SVMs Forward$each$region$ through$ ConvNet ConvNet ConvNet ConvNet Warped$image$regions Regions$of$Interest$(RoI)$ from$a$proposal$method (~2k) Input$image Post$hoc$component Girshick et$al.$CVPR14.

  13. What’s%wrong%with%slow%R'CNN?

  14. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss)

  15. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressors (squared$loss) • Training$is$slow$(84h),$takes$a$lot$of$disk$space

  16. What’s%wrong%with%slow%R'CNN? • Ad$hoc$training$objectives • FineHtune$network$with$softmax classifier$(log$loss) • Train$postHhoc$linear$SVMs$(hinge$loss) • Train$postHhoc$boundingHbox$regressions$(least$squares) • Training$is$slow$(84h),$takes$a$lot$of$disk$space • Inference$(detection)$is$slow • 47s$/$image$with$VGG16$[Simonyan &$Zisserman.$ICLR15] • Fixed$by$SPPHnet$[He$et$al.$ECCV14] ~2000$ConvNet forward$passes$per$image

  17. SPP'net Input$image He$et$al.$ECCV14.

  18. SPP'net “conv5”$feature$map$of$image Forward$ whole& image$through$ConvNet ConvNet Input$image He$et$al.$ECCV14.

  19. SPP'net Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

  20. SPP'net Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image He$et$al.$ECCV14.

  21. SPP'net Classify$regions$with$SVMs SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

  22. SPP'net Apply$boundingHbox$ regressors Classify$regions$with$SVMs Bbox reg SVMs FullyHconnected$layers FCs Spatial$Pyramid$Pooling$ (SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$ whole image$through$ConvNet method ConvNet Input$image Post$hoc$component He$et$al.$ECCV14.

  23. What’s%good%about%SPP'net? • Fixes$one$issue$with$RHCNN:$makes$testing$fast Bbox reg SVMs RegionHwise FCs computation ImageHwise computation (shared) ConvNet Post$hoc$component

  24. What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(25h),$takes$a$lot$of$disk$space

  25. What’s%wrong%with%SPP'net? • Inherits$the$rest$of$RHCNN’s$problems • Ad$hoc$training$objectives • Training$is$slow$(though$faster),$takes$a$lot$of$disk$space • Introduces$a$new$problem:$cannot$update$ parameters$below$SPP$layer$during$training

  26. SPP'net:%the%main%limitation Bbox reg SVMs Trainable (3$layers) FCs Frozen ConvNet (13$layers) Post$hoc$component He$et$al.$ECCV14.

  27. Fast%R'CNN • Fast$testHtime,$like$SPPHnet

  28. Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage

  29. Fast%R'CNN • Fast$testHtime,$like$SPPHnet • One$network,$trained$in$one$stage • Higher$mean$average$precision$than$slow$RHCNN$ and$SPPHnet

  30. Fast%R'CNN%(test%time) Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  31. Fast%R'CNN%(test%time) “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  32. Fast%R'CNN%(test%time) Linear$+ Softmax classifier softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  33. Fast%R'CNN%(test%time) Linear$+ Softmax classifier BoundingHbox$ regressors Linear softmax FullyHconnected$layers FCs “RoI Pooling”$(singleHlevel$SPP)$layer Regions$of “conv5”$feature$map$of$image Interest$(RoIs) from$a$proposal Forward$whole$image$through$ConvNet method ConvNet Input$image

  34. Fast%R'CNN (training) Linear$+ Linear softmax FCs ConvNet

  35. Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs ConvNet

  36. Fast%R'CNN (training) Log$loss$+$smooth$L1$loss MultiHtask$loss Linear$+ Linear softmax FCs Trainable ConvNet

  37. Obstacle%#1:%Differentiable%RoI pooling Region%of%Interest%(RoI)%pooling%must%be%(sub')% differentiable%to%train%conv layers

  38. Obstacle%#1:%Differentiable%RoI pooling ) 8 RoI pooling ! ∗ 0,2 = 23 5 8,: ! ∗ 1,0 = 23 5 9,8 ) 0 :; 9 ) 8 RoI pooling ) 9 max%pooling%“switch”% 1$if$ ), * “pooled” ( i.e. argmax back'pointer) input$ ! ;$0$o/w 23 23 = 4 4 ! = ! ∗ ), * 20 1 25 67 6 7 P artial Over$regions$ ) , Partial$from for$ 0 1 locations$ * next$layer

  39. Obstacle%#2:%efficient%SGD%steps Slow%R'CNN%and%SPP'net%use%region'wise%sampling%to% make%mini'batches • Sample%128%example%RoIs uniformly%at%random • Examples%will%come%from%different%images%with%high% probability ...$ ...$ ...$ ...$ SGD$miniHbatch

  40. Obstacle%#2:%efficient%SGD%steps Note%the%receptive%field%for%one%example%RoI is%often% very%large • Worst%case:%the%receptive%field%is%the%entire%image Example$RoI Example$RoI RoI’sreceptive$field

  41. Obstacle%#2:%efficient%SGD%steps Worst%case%cost%per%mini'batch%(crude%model%of% computational%complexity) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN 128*600*1000%/%(128*224%*224)%=%12x%more% computation%than%slow%R'CNN Example$RoI Example$RoI RoI’sreceptive$field

  42. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches ...$ ...$ ...$ ...$

  43. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images

  44. Obstacle%#2:%efficient%SGD%steps Solution:%use%hierarchical%sampling%to%build%mini' batches • Sample%a%small% number%of%images% ...$ ...$ ...$ ...$ (2) Sample$images • Sample%many% examples%from% each%image%(64)% SGD$miniHbatch

  45. Obstacle%#2:%efficient%SGD%steps Use%the%test'time%trick%from%SPP'net%during%training • Share%computation%between%overlapping%examples% from%the%same%image Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

  46. Obstacle%#2:%efficient%SGD%steps Cost%per%mini'batch%compared%to%slow%R'CNN%(same% crude%cost%model) input%size%for%Fast%R'CNN input%size%for%slow%R'CNN • 2*600*1000%/%(128*224*224)%=%0.19x%less% computation%than%slow%R'CNN Example$RoI Example$RoI 1 1 Example$RoI Example$RoI 2 2 Example$RoI 3 Example$RoI 3 Union$of$RoIs’ receptive$fields (shared$computation)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend