HumanDetection GregMori CMPT888 Outline Humandetectioninimages - - PowerPoint PPT Presentation

human detection
SMART_READER_LITE
LIVE PREVIEW

HumanDetection GregMori CMPT888 Outline Humandetectioninimages - - PowerPoint PPT Presentation

HumanDetection GregMori CMPT888 Outline Humandetectioninimages HistogramsofOrientedGradients(HOG) DalalandTriggsCVPR2005 LatentSVM(LSVM)


slide-1
SLIDE 1

Human
Detection


Greg
Mori
 CMPT888


slide-2
SLIDE 2

Outline


  • Human
detection
in
images


– Histograms
of
Oriented
Gradients
(HOG)


  • Dalal
and
Triggs
CVPR
2005


– Latent
SVM
(L‐SVM)


  • Part‐based
model

  • Felzenszwalb
et
al.
CVPR
2008

  • Human
detection
in
videos


– Cascade
of
boosted
classifiers


  • Viola
et
al.
ICCV
2003


– Motion
HOG


  • Dalal
et
al.
ECCV
2006

slide-3
SLIDE 3

HISTOGRAMS
OF
ORIENTED
GRADIENTS
 FOR
HUMAN
DETECTION


Slides
from
Navneet
Dalal


slide-4
SLIDE 4

!

"#$%&'(')**%+,$-+#.&

"#$%/'01-1,-'$.2'%#,$%+&1'*1#*%1'+.'+3$41&'$.2'5+21#& )**%+,$-+#.&/

63$41&7'8+%3&'('39%-+:312+$'$.$%;&+& <121&-=+$.'21-1,-+#.'8#='&3$=-',$=& >+&9$%'&9=51+%%$.,17'?1@$5+#='$.$%;&+&

slide-5
SLIDE 5

!

"#$$#%&'(#)*

+#,)-./0#)(1-2$-/0(#%&'/(),-32*)* 4/0#/5')-/33)/0/6%)-/6,-%'2(7#68 92:3');-5/%<802&6,* =6%26*(0/#6),-#''&:#6/(#26 >%%'&*#26*?-,#$$)0)6(-*%/')* 4#,)2*-*)@&)6%)*-#6.2'.)*-:2(#26-2$- (7)-*&5A)%(?-(7)-%/:)0/-/6,-(7)- 25A)%(*-#6-(7)-5/%<802&6, B/#6-/**&:3(#26C-&30#87(-$&''1-.#*#5')- 3)23')

slide-6
SLIDE 6

!

"#$#%&'()$#*+)',-#+$&#%./

!"#$%&'()*+,-'.&/

()$#*+)'0)&#.+'!'1'2'3334'3334''''3335

67.&8 (0"*#+1-/'()+##+ 2'-)3&',(4"&'(-.(/$+&-+1(5( "*-'.&+&-".(6'11/ !".&*+/&(."*#+1-/'("4'*( "4'*1+$$-.)(/$+&-+1(6'11/ !"11'6&(789/("4'*( ,'&'6&-".(:-.,": 9/:*#'%;$<)

=)#)&#%./'>%/?.>

;-.'+*(<=> @0)+7$:' .A'67.&8B C)77

!"#$%&%&#%'(#)"#*+,--."#!"#$%&'()#*%+*,'"-.$-/*0'(/"-.$#*+%'*!1)(.*2-$-3$"%.4#/0123#4556

slide-7
SLIDE 7

!

"#$%#&$'()*(+$,%-&-.(/0,1$

!"#$%&'(%#$)&*+#,,(-("$ .%*/0"&(1#2",&(%3/&-"#34$"& ,5#*", 6$"#3"&-(7"08$",/+43(/%& %/$1#+(,"0&3$#(%(%2&(1#2"& 0#3#&,"3

+$,%-&-.(20,1$

"34$5678)-9)34$56(:$5&1&)- !"#$%&'(%#$)&*+#,,(-("$ .%*/0"&(1#2",&(%3/&-"#34$"& ,5#*", 9",#15+"&%"2#3(:"&3$#(%(%2& (1#2",&3/&*$"#3"&;#$0& "7#15+", ;-2<6=(>--)6,6&)-1()-(6%,&-&-.( &?,.$1

@$6%,&-&-.(%$:<5$1(*,A1$( 2)1&6&#$1(3B(,-()%:$%()*( ?,.-&6<:$C

slide-8
SLIDE 8

!"

#$%&'()*+,-./+)

01+12(.(+) %+13,(4.&)*15( $+,(4.1.,/4&6,4) 0(+*(4.17(&/8&65/*9& /:(+51-

  • +
  • !

!

" ! ! !

;*<(2() =%>&/+&?16@&*/5/A+B7+1CD)-1*( >5/*9&4/+215,)1.,/4

!"D4/+2@

/+

!#D4/+2@

E(55 >5/*9

=D#$%B;FGH

E(4.(+&6,4

ED#$%

# "$

%

  • +
  • !

! !

slide-9
SLIDE 9

!!

"#$%&$'()*+,$'$+-.'/

01203+4.5/)*+6$'$7$/. 809+4.6./'5($*+6$'$7$/.

:#.5$%%+;<=+$**)'$'()*/>+ 5.?%.@'()*/

A<<+4)/('(#.+B(*6)B/ 1.C$'(#.+6$'$+&*$#$(%$7%. D<;+4)/('(#.+B(*6)B/ 1.C$'(#.+6$'$+&*$#$(%$7%. DEE+4)/('(#.+B(*6)B/ FDG+*.C$'(#.+(H$C./ !A<I+4)/('(#.+B(*6)B/ !A!I+*.C$'(#.+(H$C./

:#.5$%%+!;;F+$**)'$'()*/>+ 5.?%.@'()*/ 95$(* 9./' 95$(* 9./'

slide-10
SLIDE 10

!"

#$%&'(()*%&+,&-'./%

012)3%4%56&7'.)4'6'8'5% 19:1;)3%&5,.)4'6'8'5% :<=>?#@)A7$%).%'&)3%&+%/6)5%3'&'67,.),.)012)4'6'8'5% ?'$%)!>"),&4%&)(,B%&)+'(5%)3,5767$%5)6C'.),6C%&)4%5/&736,&5

slide-11
SLIDE 11

!"

#$%&'%()*+$,'*,-./-0,1)2)3)4$

slide-12
SLIDE 12

!"

#$$%&'()$(*+,+-%'%,.

/,+01%2'(.-))'31245(! 6,1%2'+'1)2(712.5(" 8%09&124(4,+01%2'(.&+:%( $,)-(;(')(<(0%&,%+.%.($+:.%( =).1'1>%.(7?(!<('1-%. @2&,%+.124(),1%2'+'1)2(712.( $,)-("(')(A(0%&,%+.%.($+:.%( =).1'1>%.(7?(!<('1-%.

slide-13
SLIDE 13

!"

#$%&'()*'+)$,-./+0$1-2-3($45-67/%('8

#$%&'()*'+)$,-&/+0$1 3($45-$7/%('8 9+%$,:-($4'(-,$%&'()*'+)$,- )*-/**/,+)'( 67/%('88),:-;($45*-)&8%$7/- 8/%<$%&',4/=-;>+-1/*4%)8+$%- *)?/-),4%/'*/*

slide-14
SLIDE 14

!"

#$$%&'()$(*+)&,(-./(0%++(123%

45-/%()$$(6%'7%%.(.%%/($)5(+)&-+(89-'2-+(2.:-52-.&%(-./( .%%/($)5($2.%5(89-'2-+(5%8)+;'2).

!<= ">

slide-15
SLIDE 15

!"

#$%&'()*+',-.$%

/0).*, $123)4$ 5$(67*$8, )+%,9*% 5$(67*$8, 0$6,9*% :.*%(8$;(0, 9$(67*%

<+%*,(3)+'*20*,&.$%,2'$,7$28=,%7+.48$'=,4$6,%(47+.$**$% >$'*(&24,6'28($0*%,(0%(8$,2,)$'%+0,2'$,&+.0*$8,2%,0$62*(?$ :?$'42))(06,@4+&A%,B.%*,+.*%(8$,*7$,&+0*+.',2'$,3+%*, (3)+'*20*

C?$'26$, 6'28($0*%

slide-16
SLIDE 16

!

"#$%#&$'()*(+$,-).)/)01

2)345()6(74&/.&60(%)745,( *$8,4%$(5$,5(95,8,&3(:(;),&)6<

!"#$%&"'()*'$% +$($,()-.#%).%/01% *-#)()-.%2%#,3'$%#*3,$ 45(63,(%7$3("6$#%-8$6% 9).+-9# :,3.%)&3;$<#=%3(%3''% #,3'$#%3.+%'-,3()-.# "7=$3,(.$,$3,&)65('&,-( 7)46.&60(7)>$5(

?$,$3,&)6(@-85$

A

B38/$C5D83$(D1%8;&. ?$,$3,&)6('&6.)' >".%').$36%:?@% ,'3##)7)$6%-.%3''% '-,3()-.#

slide-17
SLIDE 17

!"

#$%&'()*+%,-./0,*&-12*+%'3+&'24

566%7-82/$3&-92:,-:,&,*&'24;- %'<,-9,+4-3='>&

  • =

=

  • !

" " " " # $ " % " "

& ' # # ! " " # $ %&' # $ ( ) # %&'$ ) # *%&'$

! +

! ! !

  • %

$ #(!"#$%&'( ?%'6-@,&,*&'24-)*28, #$%&'(3*+%,-:,43,-3*+4-2>- :,&,*&'24-A'4:2A B'4+%-:,&,*&'243 C=8,3=2%: D'+3

slide-18
SLIDE 18

!"

#$$%&'()$(*+,'-,.(*/))'0-12

*+,'-,.(3/))'0-12(,3+%&'(4,'-)(,3( +%4(5-16)5(30,+%7(3/,..%3'(3-2/,( ,++4)89(%:;,.(')(3'4-6%<&%..(3-=% >%.,'-?%.@(-16%+%16%1'()$(3&,.%( 3/))'0-127(3-2/,(%:;,.(')(A9B(')(A9C( )&',?%3(2-?%3(2))6(4%3;.'3

slide-19
SLIDE 19

!"

#$$%&'()$(*'+%,(-.,./%'%,0

12$$%,%3'(/.442350 #$$%&'()$(0&.6%7,.'2)

8.,9(&6244235()$(:;<(0&),%0( 52=%0('+%(>%0'(,%0?6'0('+.3(02/46%( 4,)>.>2620'2&(/.44235()$('+%0%( 0&),%0 @23%(0&.6%(0./46235(+%640(2/4,)=%( ,%&.66

slide-20
SLIDE 20

DETECTING
HUMANS
USING
A
PART‐BASED
 MODEL


Felzenszwalb
et
al.,
A
Discriminatively
Trained,
Multiscale,
Deformable
 Part
Model,
CVPR
2008
 Slides
from
Pedro
Felzenszwalb


slide-21
SLIDE 21

PASCAL Challenge

  • ~10,000 images, with ~25,000 target objects
  • Objects from 20 categories (person, car, bicycle, cow, table...)
  • Objects are annotated with labeled bounding boxes
slide-22
SLIDE 22
slide-23
SLIDE 23

Why is it hard?

  • Objects in rich categories exhibit significant variability
  • Photometric variation
  • Viewpoint variation
  • Intra-class variability
  • Cars come in a variety of shapes (sedan, minivan, etc)
  • People wear different clothes and take different poses

We need rich object models But this leads to difficult matching and training problems

slide-24
SLIDE 24

Starting point: sliding window classifiers

Feature vector x = [... , ... , ... , ... ]

  • Detect objects by testing each subwindow
  • Reduces object detection to binary classification
  • Dalal & Triggs: HOG features + linear SVM classifier
  • Previous state of the art for detecting people
slide-25
SLIDE 25

Histogram of Gradient (HOG) features

  • Image is partitioned into 8x8 pixel blocks
  • In each block we compute a histogram of gradient orientations
  • Invariant to changes in lighting, small deformations, etc.
  • Compute features at different resolutions (pyramid)
slide-26
SLIDE 26

HOG Filters

  • Array of weights for features in subwindow of HOG pyramid
  • Score is dot product of filter and feature vector

HOG pyramid H

Score of F at position p is F (p, H)

Filter F

(p, H) = concatenation of HOG features from subwindow specified by p p

slide-27
SLIDE 27

Dalal & Triggs: HOG + linear SVMs

Typical form of a model (p, H) (q, H) There is much more background than objects Start with random negatives and repeat: 1) Train a model 2) Harvest false positives to define “hard negatives”

slide-28
SLIDE 28

Overview of our models

  • Mixture of deformable part models
  • Each component has global template + deformable parts
  • Fully trained from bounding boxes alone
slide-29
SLIDE 29

2 component bicycle model

root filters coarse resolution part filters finer resolution deformation models

Each component has a root filter F0 and n part models (Fi, vi, di)

slide-30
SLIDE 30

Object hypothesis

Image pyramid HOG feature pyramid

Multiscale model captures features at two-resolutions

Score is sum of filter scores minus deformation costs

p0 : location of root p1,..., pn : location of parts z = (p0,..., pn)

slide-31
SLIDE 31

filters deformation parameters displacements

score(p0, . . . , pn) =

n

  • i=0

Fi · φ(H, pi) −

n

  • i=1

di · (dx2

i , dy2 i )

concatenation of HOG features and part displacement features concatenation filters and deformation parameters

score(z) = β · Ψ(H, z)

Score of a hypothesis

“data term” “spatial prior”

slide-32
SLIDE 32

Matching

  • Define an overall score for each root location
  • Based on best placement of parts
  • High scoring root locations define detections
  • “sliding window approach”
  • Efficient computation: dynamic programming +

generalized distance transforms (max-convolution)

score(p0) = max

p1,...,pn score(p0, . . . , pn).

slide-33
SLIDE 33

head filter

Dl(x, y) = max

dx,dy

  • Rl(x + dx, y + dy) − di · (dx2, dy2)
  • Transformed response

max-convolution, computed in linear time (spreading, local max, etc) input image Response of filter in l-th pyramid level

Rl(x, y) = F · φ(H, (x, y, l))

cross-correlation

slide-34
SLIDE 34

+ x x x

... ... ...

model response of root filter transformed responses response of part filters feature map feature map at twice the resolution combined score of root locations color encoding of filter response values

slide-35
SLIDE 35

Matching results

(after non-maximum suppression) ~1 second to search all scales

slide-36
SLIDE 36

Training

  • Training data consists of images with labeled bounding boxes.
  • Need to learn the model structure, filters and deformation costs.

Training

slide-37
SLIDE 37

Latent SVM (MI-SVM)

LD(β) = 1 2||β||2 + C

n

  • i=1

max(0, 1 − yifβ(xi))

Minimize

D = (x1, y1, . . . , xn, yn)

Training data

yi ∈ {−1, 1}

We would like to find such that: yifβ(xi) > 0 Classifiers that score an example x using

are model parameters

z are latent values

fβ(x) = max

z∈Z(x) β · Φ(x, z)

slide-38
SLIDE 38

Semi-convexity

  • Maximum of convex functions is convex
  • is convex in
  • is convex for negative examples

max(0, 1 − yifβ(xi)) fβ(x) = max

z∈Z(x) β · Φ(x, z)

LD(β) = 1 2||β||2 + C

n

  • i=1

max(0, 1 − yifβ(xi))

Convex if latent values for positive examples are fixed

slide-39
SLIDE 39

Latent SVM training

  • Convex if we fix z for positive examples
  • Optimization:
  • Initialize and iterate:
  • Pick best z for each positive example
  • Optimize via gradient descent with data-mining

LD(β) = 1 2||β||2 + C

n

  • i=1

max(0, 1 − yifβ(xi))

slide-40
SLIDE 40

Training Models

  • Reduce to Latent SVM training problem
  • Positive example specifies some z should have high score
  • Bounding box defines range of root locations
  • Parts can be anywhere
  • This defines Z(x)
slide-41
SLIDE 41

Background

  • Negative example specifies no z should have high score
  • One negative example per root location in a background image
  • Huge number of negative examples
  • Consistent with requiring low false-positive rate
slide-42
SLIDE 42

Training algorithm, nested iterations

Fix “best” positive latent values for positives Harvest high scoring (x,z) pairs from background images Update model using gradient descent Trow away (x,z) pairs with low score

  • Sequence of training rounds
  • Train root filters
  • Initialize parts from root
  • Train final model
slide-43
SLIDE 43

Person model

root filters coarse resolution part filters finer resolution deformation models

slide-44
SLIDE 44

Person detections

high scoring true positives high scoring false positives (not enough overlap)

slide-45
SLIDE 45

Quantitative results

  • 7 systems competed in the 2008 challenge
  • Out of 20 classes we got:
  • First place in 7 classes
  • Second place in 8 classes
  • Some statistics:
  • It takes ~2 seconds to evaluate a model in one image
  • It takes ~4 hours to train a model
  • MUCH faster than most systems.
slide-46
SLIDE 46

HUMAN
DETECTION
IN
VIDEO


slide-47
SLIDE 47

Motion
is
Helpful!


  • Humans
can
perceive
human
figure
presence


and
action
in
videos


– Even
from
solely
from
body
joint
positions
 – Even
in
clutter


  • Moving
light
displays


– Johansson,
Perception
and
Psychophysics
1973
 – Ideas
used
by
Song
et
al.
CVIU
2000


slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50

CASCADE
OF
BOOSTED
FEATURES
FOR
 DETECTING
PEDESTRIANS


Viola,
Jones,
and
Snow,
Detecting
pedestrians
using
patterns
of
motion
 and
appearance,
ICCV
2003


slide-51
SLIDE 51

Viola‐Jones


  • Viola‐Jones
face
detector


– Viola
and
Jones
CVPR
2001
 – Window‐scanning
approach


  • Two
nice
ideas


– Define
many,
efficient‐to‐compute
features


  • AdaBoost
to
select
good
ones
from
them


– Cascade
architecture
to
quickly
eliminate
non‐face
 sub‐windows


slide-52
SLIDE 52

Adaboost
Algorithm


  • Given
a
set
of
“weak
learners”

  • Build
“strong
learner”


– Greedy
selection
of
weak
learners
 – Each
iteration,
choose
best
weak
learner


hi(x) ∈ {+1,−1}

h(x) =

T

t=1

αtht(x)

slide-53
SLIDE 53

53


AdaBoost
Algorithm


W
 w


slide-54
SLIDE 54

Face
Features


  • Features
–
Haar‐like


rectangle
features


  • Each
weak
learner


examines
a
single
feature


slide-55
SLIDE 55

Integral
Images


  • Fast
computation
of
features
possible
using


Integral
Images


slide-56
SLIDE 56

Cascade
of
Classifiers


  • Most
image
sub‐windows
don’t
contain
a
face

slide-57
SLIDE 57

Learned
Classifier


  • First
two
weak
learners
chosen:

slide-58
SLIDE 58

And
People?


  • Same
algorithm,
slightly
different
features

  • Diagonal
to
capture
legs

  • Frame
differencing
for






motion


slide-59
SLIDE 59

MOTION
HOG


Dalal,
Triggs,
and
Schmid,
Human
Detection
Using
Oriented
Histograms
of
 Flow
and
Appearance,
ECCV
2006

 Slides
from
Navneet
Dalal


slide-60
SLIDE 60

!"

#$%&$'()*+(,-$./00&'1(234&'

!"##$%&'()*+',"-'.##'/#"%0+' "1$-'2$&$%&3"4'5342"5 6"-7.#3+$'%"4&-.+&'53&834' "1$-#.9934:'/#"%0+'",'%$##+ ;%%<7<#.&$'1"&$+',"-' 23,,$-$4&3.#',#"5'"-3$4&.&3"4' "1$-'+9.&3.#'%$##+ !"79<&$'"9&3%.#',#"5 '6"-7.#3+$':.77.'='%"#"<- !"79<&$'23,,$-$4&3.#',#"5 5'67%(&841/ 2$'0/.7%&9/(&841/ :;$<(=&/;> #41'&%7>/($=(=;$< ?&==/-/'%&4;(=;$<(@ ?&==/-/'%&4;(=;$<(A B;$.C *9/-;46( $=(B;$.C0 2/;; ?/%/.%&$'(<&'>$<0

slide-61
SLIDE 61

!"

#$%&$'%()*+),%-./&%)01.&-2.'*3

!"##$%&'()*+'",$-' .$&$%&/"0'1/0."1 #45%2.67*38*45%2.)9%2':'*3 2/0$3-'456 4&3&/%'()*' 70%"./08 6"&/"0'()*' 70%"./08 );3</.)'=->% ?*3:%2/.'$%)'=->%@:A B<<%-&-32%) ?C-33%D E*.'*3) ?C-33%D

F%:.)G F%:.)H F&-'3

I-=%)J)KLK:M)J"):C*.: HN"O)<*:'.'$%)('39*(: J)KLK:M)HPG):C*.: JJQG)<*:'.'$%)('39*(: Q)3%()KLK:M)HGP):C*.: GN"")<*:'.'$%)('39*(:

K-.-)I%.

slide-62
SLIDE 62

!"

#$%&'()*$+&$'),$-'%./&01

2&/1+) 3/.40 506$'%) 3/.40 71+%8) 39$: 29$:) 4.(8 !;39$:)) %&33 ";39$:) %&33 <=(8))) ";39$:) %&33 <=(8))) !;39$:) %&33

>/0.+)"?)!;39$:)6$4@$'0'+1) .1)&'%0@0'%0'+)&4.(01 >.A0)+B0&/)9$6.9)(/.%&0'+1) 10@./.+09C?).'%)6$4@-+0) DEF1).1)&')1+.+&6)&4.(01 *$+&$'),$-'%./C)D&1+$(/.41) G*,DH)0'6$%0)%0@+B).'%)4$+&$') I$-'%./&01

slide-63
SLIDE 63

!"

#$%&'()*'+,-'./)01'.2&34

*%,.//1)3$256+,)-,/.+&7,)%&45/.3,2,'+4) $8)%&88,-,'+)/&294

:,;6&-,4)-,/&.9/,)5.-+)%,+,3+$-4

<.-+4).-,)-,/.+&7,/1)/$3./&4,%)&')$6-) %,+,3+&$')=&'%$=4 >//$=4)%&88,-,'+)3$%&'()43?,2,4)9.4,%) $')8&@,%)45.+&./)%&88,-,'3,4 *'+,-'./)A$+&$')B&4+$(-.24)C*ABD),'3$%,)

  • ,/.+&7,)%1'.2&34)$8)%&88,-,'+)-,(&$'4
slide-64
SLIDE 64

!!

"#$%&'()*+),-.

/+012-&.+33-4-)5-

678-&!9"#".+33-4-)*+72:&(3&32(;& <-5*(4&+07=-:&>$!9&$#"? @74+7)*:&07A&,:-&274=-4& :17*+72&.+:1275-0-)*:&;B+2-& .+33-4-)5+)=9&-C=C&>D&E&E&E&FD?

'-)*-4&5-22&.+33-4-)5- !" !" !" !" !" !" !" #" !" G7<-2-*F:*A2-&5-22& .+33-4-)5-:

HD FD HD FD HD FD HD FD HD FI HD FD HD FD HD HD FD HD FD HD FD FD HD HD FI HD

slide-65
SLIDE 65

SUMMARY


slide-66
SLIDE 66

Summary


  • Large
literature
on
human
detection


– These
are
a
few,
widely
used,
examples


  • Code
is
available


– Ask
me
for
reading
list
of
others


  • Encode
shape
and
motion


– Gradient
filters
 – Motion
histograms


  • Encode
spatial
variability


– Part‐based
models