Human Detection Greg Mori CMPT888
Outline • Human detection in images – Histograms of Oriented Gradients (HOG) • Dalal and Triggs CVPR 2005 – Latent SVM (L‐SVM) • Part‐based model • Felzenszwalb et al. CVPR 2008 • Human detection in videos – Cascade of boosted classifiers • Viola et al. ICCV 2003 – Motion HOG • Dalal et al. ECCV 2006
HISTOGRAMS OF ORIENTED GRADIENTS FOR HUMAN DETECTION Slides from Navneet Dalal
"#$%&'(')**%+,$-+#.& "#$%/'01-1,-'$.2'%#,$%+&1'*1#*%1'+.'+3$41&'$.2'5+21#& )**%+,$-+#.&/ 63$41&7'8+%3&'('39%-+:312+$'$.$%;&+& <121&-=+$.'21-1,-+#.'8#='&3$=-',$=& >+&9$%'&9=51+%%$.,17'?1@$5+#='$.$%;&+& !
"#$$#%&'(#)* +#,)-./0#)(1-2$-/0(#%&'/(),-32*)* 4/0#/5')-/33)/0/6%)-/6,-%'2(7#68 92:3');-5/%<802&6,* =6%26*(0/#6),-#''&:#6/(#26 >%%'&*#26*?-,#$$)0)6(-*%/')* 4#,)2*-*)@&)6%)*-#6.2'.)*-:2(#26-2$- (7)-*&5A)%(?-(7)-%/:)0/-/6,-(7)- 25A)%(*-#6-(7)-5/%<802&6, B/#6-/**&:3(#26C-&30#87(-$&''1-.#*#5')- 3)23') !
"#$#%&'()$#*+)',-#+$&#%./ 9/:*#'%;$<) =)#)&#%./'>%/?.> (0"*#+1-/'()+##+ !"#$%&'()*+,-'.&/ C)77 2'-)3&',(4"&'(-.(/$+&-+1(5( "*-'.&+&-".(6'11/ 67.&8 !".&*+/&(."*#+1-/'("4'*( @0)+7$:' "4'*1+$$-.)(/$+&-+1(6'11/ .A'67.&8B !"11'6&(789/("4'*( ,'&'6&-".(:-.,": ()$#*+)'0)&#.+ ' ! '1'2'3334'3334''''3335 ;-.'+*(<=> ! !"#$%&%&#%'(#)"#*+,--."# !"#$%&'()#*%+*,'"-.$-/*0'(/"-.$#*+%'*!1)(.*2-$-3$"%.4 #/0123#4556
"#$%#&$'()*(+$,%-&-.(/0,1$ +$,%-&-.(20,1$ 9",#15+"&%"2#3(:"&3$#(%(%2& ;-2<6=(>--)6,6&)-1()-(6%,&-&-.( (1#2",&3/&*$"#3"&;#$0& &?,.$1 "7#15+", 6$"#3"&-(7"08$",/+43(/%& .%*/0"&(1#2",&(%3/&-"#34$"& %/$1#+(,"0&3$#(%(%2&(1#2"& ,5#*", 0#3#&,"3 .%*/0"&(1#2",&(%3/&-"#34$"& !"#$%&'(%#$)&*+#,,(-("$ ,5#*", "34$5678)-9)34$56(:$5&1&)- !"#$%&'(%#$)&*+#,,(-("$ @$6%,&-&-.(%$:<5$1(*,A1$( 2)1&6&#$1(3B(,-()%:$%()*( ?,.-&6<:$C !
#$%&'()*+,-./+) 01+12(.(+) ;*<(2() %+13,(4.&)*15( =%>&/+&?16@&*/5/A+B7+1CD)-1*( $+,(4.1.,/4&6,4) >5/*9&4/+215,)1.,/4 !" D4/+2@ 0(+*(4.17(&/8&65/*9& ! ! ! " ! /:(+51- /+ � + � ! !# D4/+2@ ! ! "$ ! # � + � % >5/*9 =D#$%B;FGH ED#$% E(4.(+&6,4 E(55 !"
"#$%&$'()*+,$'$+-.'/ 809+4.6./'5($*+6$'$7$/. 01203+4.5/)*+6$'$7$/. 95$(* D<;+4)/('(#.+B(*6)B/ 95$(* !A<I+4)/('(#.+B(*6)B/ 1.C$'(#.+6$'$+&*$#$(%$7%. !A!I+*.C$'(#.+(H$C./ A<<+4)/('(#.+B(*6)B/ DEE+4)/('(#.+B(*6)B/ 9./' 9./' 1.C$'(#.+6$'$+&*$#$(%$7%. FDG+*.C$'(#.+(H$C./ :#.5$%%+;<=+$**)'$'()*/>+ :#.5$%%+!;;F+$**)'$'()*/>+ 5.?%.@'()*/ 5.?%.@'()*/ !!
#$%&'(()*%&+,&-'./% 012)3%4%56&7'.)4'6'8'5% 19:1;)3%&5,.)4'6'8'5% :<=>?#@)A7$%).%'&)3%&+%/6)5%3'&'67,.),.)012)4'6'8'5% ?'$%)!>"),&4%&)(,B%&)+'(5%)3,5767$%5)6C'.),6C%&)4%5/&736,&5 !"
#$%&'%()*+$,'*,-./-0,1)2)3)4$ !"
#$$%&'()$(*+,+-%'%,. /,+01%2'(.-))'31245( ! 6,1%2'+'1)2(712.5( " 8%09&124(4,+01%2'(.&+:%( @2&,%+.124(),1%2'+'1)2(712.( $,)-(;(')(<(0%&,%+.%.($+:.%( $,)-("(')(A(0%&,%+.%.($+:.%( =).1'1>%.(7?(!<('1-%. =).1'1>%.(7?(!<('1-%. !"
#$%&'()*'+)$,-./+0$1-2-3($45-67/%('8 #$%&'()*'+)$,-&/+0$1 3($45-$7/%('8 9+%$,:-($4'(-,$%&'()*'+)$,- 67/%('88),:-;($45*-)&8%$7/- )*-/**/,+)'( 8/%<$%&',4/=-;>+-1/*4%)8+$%- *)?/-),4%/'*/* !"
#$$%&'()$(*+)&,(-./(0%++(123% "> !<= 45-/%()$$(6%'7%%.(.%%/($)5(+)&-+(89-'2-+(2.:-52-.&%(-./( .%%/($)5($2.%5(89-'2-+(5%8)+;'2). !"
#$%&'()*+',-.$% /0).*, C?$'26$, 5$(67*$8, 5$(67*$8, :.*%(8$;(0, $123)4$ 6'28($0*% )+%,9*% 0$6,9*% 9$(67*% <+%*,(3)+'*20*,&.$%,2'$,7$28=,%7+.48$'=,4$6,%(47+.$**$% >$'*(&24,6'28($0*%,(0%(8$,2,)$'%+0,2'$,&+.0*$8,2%,0$62*(?$ :?$'42))(06,@4+&A%,B.%*,+.*%(8$,*7$,&+0*+.',2'$,3+%*, (3)+'*20* !"
"#$%#&$'()*(+$,-).)/)01 ?$,$3,&)6(@-85$ B38/$C5D83$(D1%8;&. :,3.%)&3;$<#=%3(%3''% A #,3'$#%3.+%'-,3()-.# 45(63,(%7$3("6$#%-8$6% 9).+-9# >".%').$36%:?@% ?$,$3,&)6('&6.)' ,'3##)7)$6%-.%3''% '-,3()-.# !"#$%&"'()*'$% 2)345()6(74&/.&60(%)745,( +$($,()-.#%).%/01% *-#)()-.%2%#,3'$%#*3,$ *$8,4%$(5$,5(95,8,&3(:(;),&)6< "7=$3,(.$,$3,&)65('&,-( 7)46.&60(7)>$5( !
#$%&'()*+%,-./0,*&-12*+%'3+&'24 D'+3 ?%'6-@,&,*&'24-)*28, #$%&'(3*+%,-:,43,-3*+4-2>- :,&,*&'24-A'4:2A #( !"#$%&'( $ % C=8,3=2%: *%&'$ # # ) %&'$ # # ) ( � = � � � " " % " $ # ! ! � � + ' $ ! # & %&' $ ! ! # " � " ! � = � � � � � " " " " � � 566%7-82/$3&-92:,-:,&,*&'24;- B'4+%-:,&,*&'243 %'<,-9,+4-3='>& !"
#$$%&'()$(*+,'-,.(*/))'0-12 *+,'-,.(3/))'0-12(,3+%&'(4,'-)(,3( +%4(5-16)5(30,+%7(3/,..%3'(3-2/,( ,++4)89(%:;,.(')(3'4-6%<&%..(3-=% >%.,'-?%.@(-16%+%16%1'()$(3&,.%( 3/))'0-127(3-2/,(%:;,.(')(A9B(')(A9C( )&',?%3(2-?%3(2))6(4%3;.'3 !"
#$$%&'()$(*'+%,(-.,./%'%,0 12$$%,%3'(/.442350 #$$%&'()$(0&.6%7,.'2) 8.,9(&6244235()$(:;<(0&),%0( @23%(0&.6%(0./46235(+%640(2/4,)=%( 52=%0('+%(>%0'(,%0?6'0('+.3(02/46%( ,%&.66 4,)>.>2620'2&(/.44235()$('+%0%( 0&),%0 !"
DETECTING HUMANS USING A PART‐BASED MODEL Felzenszwalb et al., A Discriminatively Trained, Multiscale, Deformable Part Model, CVPR 2008 Slides from Pedro Felzenszwalb
PASCAL Challenge • ~10,000 images, with ~25,000 target objects - Objects from 20 categories (person, car, bicycle, cow, table...) - Objects are annotated with labeled bounding boxes
Why is it hard? • Objects in rich categories exhibit significant variability - Photometric variation - Viewpoint variation - Intra-class variability - Cars come in a variety of shapes (sedan, minivan, etc) - People wear different clothes and take different poses We need rich object models But this leads to difficult matching and training problems
Starting point: sliding window classifiers Feature vector x = [ ... , ... , ... , ... ] • Detect objects by testing each subwindow - Reduces object detection to binary classification - Dalal & Triggs: HOG features + linear SVM classifier - Previous state of the art for detecting people
Histogram of Gradient (HOG) features • Image is partitioned into 8x8 pixel blocks • In each block we compute a histogram of gradient orientations - Invariant to changes in lighting, small deformations, etc. • Compute features at different resolutions (pyramid)
HOG Filters • Array of weights for features in subwindow of HOG pyramid • Score is dot product of filter and feature vector p Filter F Score of F at position p is F � � ( p, H ) � ( p, H ) = concatenation of HOG features from HOG pyramid H subwindow specified by p
Dalal & Triggs: HOG + linear SVMs � (p, H) � (q, H) There is much more background than objects Start with random negatives and repeat: 1) Train a model 2) Harvest false positives to define “hard negatives” Typical form of a model
Overview of our models • Mixture of deformable part models • Each component has global template + deformable parts • Fully trained from bounding boxes alone
2 component bicycle model root filters part filters deformation coarse resolution finer resolution models Each component has a root filter F 0 and n part models ( F i , v i , d i )
Object hypothesis z = ( p 0 ,..., p n ) p 0 : location of root p 1 ,..., p n : location of parts Score is sum of filter scores minus deformation costs Image pyramid HOG feature pyramid Multiscale model captures features at two-resolutions
Score of a hypothesis “data term” “spatial prior” n n � � d i · ( dx 2 i , dy 2 score( p 0 , . . . , p n ) = F i · φ ( H, p i ) − i ) i =0 i =1 displacements filters deformation parameters score( z ) = β · Ψ ( H, z ) concatenation of HOG concatenation filters and features and part deformation parameters displacement features
Matching • Define an overall score for each root location - Based on best placement of parts score( p 0 ) = max p 1 ,...,p n score( p 0 , . . . , p n ) . • High scoring root locations define detections - “sliding window approach” • Efficient computation: dynamic programming + generalized distance transforms (max-convolution)
Recommend
More recommend