From Rigid Templates to Grammars: Object Detection with Structured Models
Ross B. Girshick
Dissertation defense April 20, 2012
From Rigid Templates to Grammars: Object Detection with Structured - - PowerPoint PPT Presentation
From Rigid Templates to Grammars: Object Detection with Structured Models Ross B. Girshick Dissertation defense April 20, 2012 The question What objects are where? 2 Why it matters Intellectual curiosity - How do we extract this
Dissertation defense April 20, 2012
2
3
4
person motorbike
Image credits: PASCAL VOC
(, ) = | ∩ | | ∪ |
5
6
Image credit: http://i173.photobucket.com/albums/w78/yahoozy/MultipleExposures2.jpg
7
Image credits: PASCAL VOC
8
Image credits: PASCAL VOC
9
Image credits: PASCAL VOC
10
11
13
(f)
14
∈Z() w · ψ(, (, ))
15
16
17
(“pushmi-pullyu” instead of horse) (right-facing horse)
18
19
20
22
23
24
(618ms per/image)
(454ms per/image)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall precision PASCAL 2007 comp3 class: motorbike
baseline (AP 48.7) cascade (AP 48.9) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall precision PASCAL 2007 comp3 class: motorbike
baseline (AP 48.7) cascade (AP 41.8)
26
Helmet,
Ski cap, no face, truncated Pirate hat, dresses, long hair Truncation, holding glass, heavy occlusion
27
(f)
[DT’05] AP 0.12 [FMR’08] AP 0.27 [FGMR’10] AP 0.36 [GFM voc-release4] AP 0.42
28
29
∈T
p = (x,y,l)
30
31
Parts 1-6 (no occlusion) Parts 1-4 & occluder Parts 1-2 & occluder Example detections and derived filters Subtype 1 Subtype 2 Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Occluder
32
training
Subtype 1 Subtype 2 Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Occluder
33
∈S()
34
35
∈S() w · ψ(, )
36
37
(ˆ ,ˆ )∈Y×Z [w · ψ(,ˆ
ˆ ∈Z w · ψ(, ,ˆ
[Yu and Joachims]
38
Image pyramid HOG feature pyramid
39
40
shoe lower-part person eyes face legs shoe nose mouth pants trunk arms
person
41
∈S() [w · ψ(, ) + margin(, )]
∈S() [w · ψ(, ) output(, )]
42
43
44
45
47
Image credits: PASCAL VOC
48
49
50
0.45617 0.04390 0.02462 0.01339 0.00629 0.00556 0.00456 0.00391 0.00367 0.00353 0.00310 0.00063 0.00030 0.00020 0.00018 0.00018 0.00017 0.00014 0.00013 0.00011 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00007 0.00006 0.00005 0.00004 0.00004 0.00003 0.00003 0.00003 0.00002 0.0000251
52
22.3 26.0 29.7 32.3 34.1
0" 10" 20" 30" 40" 50" 60" 70"
aero bike bird boat bottle bus car cat chair cow table dog horse motorbike person plant sheep sofa train tvmonitor mAP
Average Precision
LSVM CVPR star model MIX …+ORIENT …+CONTEXT
53
54
(f) Parts 1-6 (no occlusion) Parts 1-4 & occluder Parts 1-2 & occluder Example detections and derived filters Subtype 1 Subtype 2 Part 1 Part 2 Part 3 Part 4 Part 5 Part 6 Occluder
0.12 0.27 0.36 0.43 [DT’05] [FMR’08] [FGMR’10] [GFM voc-release4] [GFM’11] 0.47 Prior work
55
ˆ ∈Y [w · ψ(,ˆ
[Tsochantaridis et al., Taskar et al.]
56
∈S() [w · ψ(, ) + margin(, )]
∈S() [w · ψ(, ) output(, )]
w
∈S() [w · ψ(, ) + margin(, )] w · ψ(, (w))
57
58
Image pyramid HOG feature pyramid
[FMR’08]
59
60
HOG feature pyramid