SLIDE 1 !""#$%&'()*%+$),'
34'5$%/)/26$2)#'7.&%#+' '
!"#$%&'()'#)*+,-)./0.) Erik Sudderth
Brown University
Work by E. Sudderth, A. Torralba, W. Freeman, & A. Willsky IJCV 2008: Describing Visual Scenes using Transformed Objects & Parts CVPR 2006: Depth from Familiar Objects: A Hierarchical Model for 3D Scenes NIPS 2005: Describing Visual Scenes using Transformed Dirichlet Processes Building on work by Y. W. Teh, M. Jordan, M. Beal, & D. Blei JASA 2006: Hierarchical Dirichlet Processes
SLIDE 2 8%)/,$,9':$16';."$2'7.&%#+ '
model neural stochastic recognition nonparametric gradient dynamical Bayesian !
Framework for unsupervised discovery of low-dimensional latent structure from bag of word representations
Algorithms Neuroscience Statistics Vision ! !! pLSA: Probabilistic Latent Semantic Analysis (Hofmann 2001) !! LDA: Latent Dirichlet Allocation (Blei, Ng, & Jordan 2003) !! HDP: Hierarchical Dirichlet Processes (Teh, Jordan, Beal, & Blei 2006)
SLIDE 3 5$%/)/26$2)#'<$/$26#%1'=/.2%++'
E[πj] = β
G0 ∼ DP(γ, H)
Gj ∼ DP(α, G0)
J groups of data: documents, images, !
SLIDE 4
>6$,%+%'?%+1)@/),1'A/),26$+%'
SLIDE 5 8.2)#'B$+@)#'A%)1@/%+C'D@"%/"$E%#+ '
- ! Partition image into ~1,000 superpixels
- ! Goal: Reduce dimensionality, aggregate
information spatially – hopefully not across object boundaries!
Inspired by the successes of topic models for text data, some have proposed learning from local image features
SLIDE 6 8.2)#'B$+@)#'A%)1@/%+C'F,1%/%+1'?%9$.,+'
Affinely Adapted Harris Corners Maximally Stable Extremal Regions Linked Sequences
- f Canny Edges
- ! Some invariance to lighting & pose variations
- ! Dense, multiscale over-segmentation of image
SLIDE 7 !'<$+2/%1%'A%)1@/%'B.2)G@#)/*'
- ! !"#$%&'()*+,'-."/#%$-+"0+
"#')1.%2"1+)1)#/3+
*'<2"1%#3+='%+>?$)%1-+
1)%#)-.+!"#$%&'()*+'
appearance of feature i in image j 2D position of feature i in image j
SIFT Descriptors
Lowe, IJCV 2004
SLIDE 8 ;6%'H./#&')+')'()9'.I'B$+@)#'H./&+'
Fei-Fei & Perona, CVPR 2005 Sivic, Russell, Efros, Zisserman, & Freeman, ICCV 2005
Topics as visual themes composing a known set of scene categories Topics as visual object classes within a (carefully chosen) image collection
SLIDE 9 F0)9%+')+'0./%'16),'()9+'.I'A%)1@/%+'
- ! A";+*"+B+C1";+.,'-+'-+"<)%1+D)1)%.,+%+<&)%#+-C3E+
- ! A";+$%13+D'<3<&)-+%1*+.#'<3<&)-+%$+B+&""C'1/+%.E+
,-.'%*/'(/'0*."12'0)'#3$//4/'"5%2/#'"10)'0)6"7'5)+/' 9-/*/'%*/'5%1.'5)*/'0))&#'%!%"&%:&/':.'%+%6;12' 1)16%*%5/0*"7'%1+'-"/*%*7-"7%&'<%./#"%1'5)+/&#='
SLIDE 10 B$+@)#'JGK%21'>)1%9./$L)M.,'
- ! GOAL: Visually recognize and localize object categories
- ! Robustly learn appearance models from few examples
SLIDE 11 Part-Based Models for Objects
Pictorial Structures
Fischler & Elschlager, 1973
Generalized Cylinders
Marr & Nishihara, 1978
Recognition by Components
Biederman, 1987
R it Constellation Model
Perona, Weber, Welling, Fergus, Fei-Fei, 2000 to !
Efficient Matching
Felzenszwalb & Huttenlocher, 2005
Discriminative Parts
Felzenszwalb, McAllester, Ramanan, 2008 to !
SLIDE 12 >.@,M,9'JGK%21+'N'=)/1+'
How many parts? How many objects?
SLIDE 13
O%,%/)MP%'7.&%#'I./'JGK%21+'
For each image: Sample a reference position For each feature: !! Randomly choose one part !! Sample from that part’s feature distribution
SLIDE 14 JGK%21+')+'<$+1/$G@M.,+'
- ! Parts are defined by parameters, which
encode distributions on visual features:
Feature appearance Feature position Pr(appearance | part) Pr(position | part)
- ! Objects are defined by distributions on the
infinitely many potential part parameters:
Pr(part)
SLIDE 15 !'-.,")/)0%1/$2'=)/1Q()+%&'7.&%#'
4 Images 16 Images 64 Images # images # parts
SLIDE 16 O%,%/)#$L$,9'!2/.++'>)1%9./$%+'
Can we transfer knowledge from one object category to another?
SLIDE 17 8%)/,$,9'D6)/%&'=)/1+'
- ! FDG)<.-+%#)+"H)1+&"<%&&3+-'$'&%#+'1+%55)%#%1<)+
- ! I'-<"=)#+6%*0#+-,%#)*+%<#"--+<%.)/"#')-+
–!A";+$%13+.".%&+5%#.-+-,"6&*+;)+-,%#)E+ –!A";+$%13+5%#.-+-,"6&*+)%<,+<%.)/"#3+6-)E+
SLIDE 18 5$%/)/26$2)#'<='JGK%21'7.&%#'
G
J
G !" #" H R
N
w $" %" &" v
L
G
1
G G
2
SLIDE 19 5$%/)/26$2)#'<='JGK%21'7.&%#'
G
J
G !" #" H R
N
w $" %" &" v
L
G
1
G G
2
Discrete Data: Teh et. al., 2004
SLIDE 20
>6$,%+%'?%+1)@/),1'A/),26$+%'
SLIDE 21 D6)/$,9'=)/1+C'RS'>)1%9./$%+'
- ! Caltech 101 Dataset (Li & Perona)
- ! Horses (Borenstein & Ullman)
- ! Cat & dog faces (Vidal-Naquet & Ullman)
- ! Bikes from Graz-02 (Opelt & Pinz)
- ! Google!
SLIDE 22 B$+@)#$L)M.,'.I'D6)/%&'=)/1+'
Pr(appearance | part) Pr(position | part)
SLIDE 23 B$+@)#$L)M.,'.I'D6)/%&'=)/1+'
Pr(appearance | part) Pr(position | part)
SLIDE 24 B$+@)#$L)M.,'.I'D6)/%&'=)/1+'
Pr(appearance | part) Pr(position | part)
SLIDE 25 B$+@)#$L)M.,'.I'=)/1'<%,+$M%+'
Hierarchical Clustering of Pr(part | object)
SLIDE 26
<%1%2M.,';)+T'
SLIDE 27 <%1%2M.,'?%+@#1+'
6 Training Images per Category
(ROC Curves)
Shared Parts
more accurate than
Unshared Parts
Modeling feature positions improves shared detection, but hurts unshared detection
SLIDE 28 <%1%2M.,'?%+@#1+'
6 Training Images per Category
(ROC Curves)
Detection vs. Training Set Size
(Area Under ROC)
SLIDE 29
D6)/$,9'D$0"#$U%+'7.&%#+'
SLIDE 30
D2%,%+V'JGK%21+V'),&'=)/1+'
Features Parts Objects Scene
SLIDE 31
>.,1%E1@)#';/),+I%/'8%)/,$,9'
SLIDE 32 JGK%21'P+4'B$+@)#'>)1%9./$%+'
- ! Assume training data contains object category labels
- ! Discover underlying visual categories automatically
SLIDE 33 7@#M"#%'JGK%21'D2%,%+'
- ! How many cars are there?
- ! Where are those cars in the scene?
Standard dependent Dirichlet process models (Gelfand et. al., 2005) inappropriate
SLIDE 34 D")M)#';/),+I./0)M.,+'
- ! Let global DP clusters model objects
in a canonical coordinate frame
- ! Generate images via a random
set of transformations:
Parameterized family
Shift cluster from canonical coordinate frame to object location in a given image
Layered Motion Models (Darrell & Pentland 1991, Wang & Adelson 1994, Jojic & Frey 2001) Nonparametric Transformation Densities (Learned-Miller & Viola 2000)
SLIDE 35
!';.*'H./#&C''()/+'N'(#.G+'
SLIDE 36 ;/),+I./0%&'<$/$26#%1'=/.2%++'
G G
1 2 3
G G G
J
G
j
!" #" H R
Mixture Parameters Transformations
N
v $" %"
SLIDE 37
F0"./1),2%'.I';/),+I./0)M.,+'
HDP TDP
SLIDE 38 >.@,M,9'N'8.2)M,9'JGK%21+''
- ! How many cars are there?
- ! Where are those cars in the scene?
Dirichlet Processes Transformations
SLIDE 39 B$+@)#'D2%,%';<=''
G #" H R
Global Density
Object category Part size & shape Transformation prior
'" F F
( J
G
j
!"
Transformed Densities
Object category Part size & shape Instance locations
w v
2D Image Features
Appearance Location
N
$ $" %"
SLIDE 40
D1/%%1'D2%,%'B$+@)#'>)1%9./$%+'
SLIDE 41
D1/%%1'D2%,%'D%90%,1)M.,+'
SLIDE 42
D%90%,1)M.,'=%/I./0),2%'
SLIDE 43 WE1%,+$.,C'X<'D2%,%+ '
- ! Segmentation easier in 3D
- ! Identifying known objects
regularizes depth estimation
Red Green Office Scene Far Near
SLIDE 44 X<'D1/@21@/%'I/.0'D1%/%. '
Reference (left) Image Potential Matches Depth Densities
Overhead View Depth = Disparity
)"
SLIDE 45 O/%%&*'<%"16'W+M0)1%+ '
Reference (left) Image Potential Matches Depth Densities Red Far Green Near
SLIDE 46 X<';/),+I./0%&'<=C'JY2%'D2%,%+ '
Computer Screen Desk Bookshelves Background
SLIDE 47
D1%/%.';%+1'F0)9% '
Simultaneous object recognition & coarse 3D reconstruction