Deep Object Detec*on Ali Farhadi Mohammad Rastegari CSE - - PowerPoint PPT Presentation

▶

Apr 01, 2024 1.01k likes •1.34k views

Deep Object Detec*on Ali Farhadi Mohammad Rastegari CSE 576 So Far Backpropaga*on A 1 A 2 A 3 A 4 A 5 Convolu*onal Neural Networks(CNN) 3 L 5x5 conv, 256, pool/2

SLIDE 1

Deep ¡Object ¡Detec*on ¡

Ali ¡Farhadi ¡ Mohammad ¡Rastegari ¡ CSE ¡576 ¡ ¡

SLIDE 2

So ¡Far ¡

Backpropaga*on ¡
Convolu*onal ¡Neural ¡Networks(CNN) ¡
AlexNet ¡
Deeper ¡Architectures ¡

11x11 conv, 96, /4, pool/2

5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000 s

A2 A3 A4 A5

3 L Number of Classes

1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 64 3x3 conv, 64 1x1 conv, 256 1x1 conv, 128, /2 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1 conv, 512 1x1 conv, 256, /2 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 256 3x3 conv, 256 1x1 conv, 1024 1x1 conv, 512, /2 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 1x1 conv, 512 3x3 conv, 512 1x1 conv, 2048 ave pool, fc 1000 7x7 conv, 64, /2, pool/2

AlexNet, 8 layers (ILSVRC 2012)

Revolution of Depth

ResNet, 152 layers (ILSVRC 2015)

3x3 conv, 64 3x3 conv, 64, pool/2 3x3 conv, 128 3x3 conv, 128, pool/2 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512, pool/2 fc, 4096 fc, 4096 fc, 1000 11x11 conv, 96, /4, pool/2 5x5 conv, 256, pool/2 3x3 conv, 384 3x3 conv, 384 3x3 conv, 256, pool/2 fc, 4096 fc, 4096 fc, 1000

VGG, 19 layers (ILSVRC 2014)

Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Re

Revolu'on)of)Depth)

11.7 16.4 25.8 28.2 ILSVRC'13 ILSVRC'12 AlexNet ILSVRC'11 ILSVRC'10 shallow 8 layers 8 layers 6.7 7.3 ILSVRC'14 GoogleNet ILSVRC'14 VGG 19 layers 22 layers 3.57 ILSVRC'15 ResNet ILS Go 22

152 layers

SLIDE 3

Deep ¡Leaning ¡Prac*cal ¡Tips ¡

Use ¡off-‑the-‑shelf ¡architectures ¡ ¡
Verify ¡the ¡correctness ¡of ¡your ¡network ¡by ¡

training ¡over ¡a ¡single ¡batch. ¡ ¡

– Overfit ¡: ¡Good ¡to ¡go! ¡ ¡ – Did ¡not ¡converge ¡: ¡Something ¡is ¡wrong ¡with ¡ forward/backward ¡func*ons ¡or ¡data! ¡ ¡

Use ¡a ¡proper ¡learning ¡rate ¡regime. ¡ ¡ ¡ ¡

SLIDE 4

Lr=0.1 ¡ Lr=0.01 ¡ Lr=0.001 ¡

SLIDE 5

Object ¡Detec*on ¡

SLIDE 6

Sliding ¡Window ¡

SLIDE 7

Sliding ¡Window ¡

SLIDE 8

Sliding ¡Window ¡

3 L Number of Classes

SLIDE 9

Object ¡Proposal ¡

SLIDE 10

Selec*ve ¡Search ¡

Uijlings, ¡Jasper ¡RR, ¡et ¡al. ¡"Selec*ve ¡search ¡for ¡object ¡recogni*on." ¡ Interna*onal ¡journal ¡of ¡computer ¡vision ¡ ¡(2013). ¡

SLIDE 11

11 ¡

Grishick ¡et ¡al ¡[CVPR’14] ¡

Region-‑Based ¡CNN ¡ ¡(R-‑CNN) ¡

SLIDE 12

12 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

SLIDE 13

13 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

SLIDE 14

14 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

Depending ¡on ¡region ¡proposal ¡ ¡
Need ¡to ¡apply ¡CNN ¡~2K ¡*mes ¡per ¡image ¡ ¡

SLIDE 15

15 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡ ROI ¡Info ¡ ROI ¡Pooling ¡

Fast ¡R-‑CNN ¡

SLIDE 16

16 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡ ROI ¡Info ¡

Fast ¡R-‑CNN ¡

SLIDE 17

Bounding ¡Box ¡Regression ¡

SLIDE 18

Bounding ¡Box ¡Regression ¡

SLIDE 19

19 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡ ROI ¡Info ¡

Bbox ¡Regression ¡

Classifica*on ¡ Regression ¡

SLIDE 20

20 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

Faster ¡R-‑CNN ¡

Classifica*on ¡ Regression ¡

x1 ¡ y1 ¡ w1 ¡ h1 ¡ x2 ¡ y2 ¡ w2 ¡ h2 ¡ xk ¡ yk ¡ wk ¡ hk ¡ …. ¡

Less ¡number ¡of ¡proposals ¡ ¡ compared ¡to ¡Selec*ve ¡Search ¡ 300 ¡vs. ¡2000 ¡

SLIDE 21

Method mAP Sec/im R-CNN 59.2 ¡ 20 ¡ Fast R-CNN 68.4 2 ¡ Faster R-CNN 72.1 ¡ 0.5

Pascal 2012

SLIDE 22

22 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

Direct ¡Regression ¡ No ¡Proposal ¡

We ¡do ¡not ¡know ¡the ¡number ¡of ¡objects ¡in ¡an ¡image ¡

SLIDE 23

SLIDE 24

SLIDE 25

SLIDE 26

26 ¡

. ¡ . ¡ . ¡ . ¡ . ¡ . ¡

Reshape ¡

YOLO ¡

x,y,w,h,c ¡ c1,c2,…,cN ¡

SLIDE 27

Method mAP Sec/im R-CNN 59.2 ¡ 20 ¡ Fast R-CNN 68.4 2 ¡ Faster R-CNN 72.1 ¡ 0.5 YOLO 57.9 ¡ 0.02

Pascal 2012

SLIDE 28

Source ¡Code ¡

Fast ¡R-‑CNN ¡

– hkps://github.com/mahyarnajibi/fast-‑rcnn-‑torch ¡ – hkps://github.com/rbgirshick/fast-‑rcnn ¡

YOLO ¡

Deep ¡Object ¡Detec*on ¡

Ali ¡Farhadi ¡ Mohammad ¡Rastegari ¡ CSE ¡576 ¡ ¡

So ¡Far ¡

Deep ¡Leaning ¡Prac*cal ¡Tips ¡

training ¡over ¡a ¡single ¡batch. ¡ ¡

– Overfit ¡: ¡Good ¡to ¡go! ¡ ¡ – Did ¡not ¡converge ¡: ¡Something ¡is ¡wrong ¡with ¡ forward/backward ¡func*ons ¡or ¡data! ¡ ¡

Object ¡Detec*on ¡

Sliding ¡Window ¡

Sliding ¡Window ¡

Sliding ¡Window ¡

Object ¡Proposal ¡

Selec*ve ¡Search ¡

Region-­‑Based ¡CNN ¡ ¡(R-­‑CNN) ¡

Object ¡Detec*on ¡by ¡R-­‑CNN ¡

Object ¡Detec*on ¡by ¡R-­‑CNN ¡

Object ¡Detec*on ¡by ¡R-­‑CNN ¡

Fast ¡R-­‑CNN ¡

Fast ¡R-­‑CNN ¡

Bounding ¡Box ¡Regression ¡

Bounding ¡Box ¡Regression ¡

Bbox ¡Regression ¡

Faster ¡R-­‑CNN ¡

Pascal 2012

Direct ¡Regression ¡ No ¡Proposal ¡

YOLO ¡

Pascal 2012

Source ¡Code ¡

– hkps://github.com/mahyarnajibi/fast-­‑rcnn-­‑torch ¡ – hkps://github.com/rbgirshick/fast-­‑rcnn ¡

– hkps://github.com/pjreddie/darknet/blob/ master/src/yolo.c ¡

¡

Region-‑Based ¡CNN ¡ ¡(R-‑CNN) ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

Object ¡Detec*on ¡by ¡R-‑CNN ¡

Fast ¡R-‑CNN ¡

Fast ¡R-‑CNN ¡

Faster ¡R-‑CNN ¡

– hkps://github.com/mahyarnajibi/fast-‑rcnn-‑torch ¡ – hkps://github.com/rbgirshick/fast-‑rcnn ¡