!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5$%/)/26$2)#'7.&%#+' ' !"#$%&'()'#)*+,-)./0. ) Erik Sudderth Brown University Work by E. Sudderth, A. Torralba, W. Freeman, & A. Willsky IJCV 2008: Describing Visual Scenes using Transformed Objects & Parts CVPR 2006: Depth from Familiar Objects: A Hierarchical Model for 3D Scenes NIPS 2005: Describing Visual Scenes using Transformed Dirichlet Processes Building on work by Y. W. Teh, M. Jordan, M. Beal, & D. Blei JASA 2006: Hierarchical Dirichlet Processes
8%)/,$,9':$16';."$2'7.&%#+ ' Framework for unsupervised discovery of low-dimensional latent structure from bag of word representations model neural stochastic recognition Algorithms nonparametric Neuroscience gradient Statistics dynamical Vision Bayesian ! ! ! ! pLSA : Probabilistic Latent Semantic Analysis (Hofmann 2001) ! ! LDA : Latent Dirichlet Allocation (Blei, Ng, & Jordan 2003) ! ! HDP : Hierarchical Dirichlet Processes (Teh, Jordan, Beal, & Blei 2006)
5$%/)/26$2)#'<$/$26#%1'=/.2%++' G 0 ∼ DP( γ , H ) G j ∼ DP( α , G 0 ) J groups of data: documents, images, ! E [ π j ] = β
>6$,%+%'?%+1)@/),1'A/),26$+%'
8.2)#'B$+@)#'A%)1@/%+C'D@"%/"$E%#+ ' Inspired by the successes of topic models for text data, some have proposed learning from local image features • ! Partition image into ~1,000 superpixels • ! Goal: Reduce dimensionality, aggregate information spatially – hopefully not across object boundaries!
8.2)#'B$+@)#'A%)1@/%+C'F,1%/%+1'?%9$.,+' Maximally Stable Linked Sequences Affinely Adapted Extremal Regions of Canny Edges Harris Corners • ! Some invariance to lighting & pose variations • ! Dense, multiscale over-segmentation of image
!'<$+2/%1%'A%)1@/%'B.2)G@#)/*' SIFT Descriptors • ! !"#$%&'()*+,'-."/#%$-+"0+ "#')1.%2"1+)1)#/3+ • ! 4"$56.)+789:::+;"#*+ Lowe, IJCV 2004 *'<2"1%#3+='%+>?$)%1-+ appearance of • ! @%5+)%<,+0)%.6#)+."+ feature i in image j 1)%#)-.+ !"#$%&'()*+' 2D position of feature i in image j
;6%'H./#&')+')'()9'.I'B$+@)#'H./&+' Fei-Fei & Perona, CVPR 2005 Sivic, Russell, Efros, Zisserman, & Freeman, ICCV 2005 Topics as visual themes composing a Topics as visual object classes within a known set of scene categories (carefully chosen) image collection
F0)9%+')+'0./%'16),'()9+'.I'A%)1@/%+' • ! A";+*"+B+C1";+.,'-+'-+"<)%1+D)1)%.,+%+<&)%#+-C3E+ • ! A";+$%13+D'<3<&)-+%1*+.#'<3<&)-+%$+B+&""C'1/+%.E+ ,-.'%*/'(/'0*."12'0)'#3$//4/'"5%2/#'"10)'0)6"7'5)+/' 9-/*/'%*/'5%1.'5)*/'0))&#'%!%"&%:&/':.'%+%6;12' 1)16%*%5/0*"7'%1+'-"/*%*7-"7%&'<%./#"%1'5)+/&#='
B$+@)#'JGK%21'>)1%9./$L)M.,' • ! GOAL: Visually recognize and localize object categories • ! Robustly learn appearance models from few examples
Part-Based Models for Objects Pictorial Structures Generalized Cylinders R Recognition by Components it Fischler & Elschlager, 1973 Marr & Nishihara, 1978 Biederman, 1987 Discriminative Parts Constellation Model Efficient Matching Felzenszwalb, McAllester, Perona, Weber, Welling, Felzenszwalb & Huttenlocher, 2005 Ramanan, 2008 to ! Fergus, Fei-Fei, 2000 to !
>.@,M,9'JGK%21+'N'=)/1+' How many parts? How many objects?
O%,%/)MP%'7.&%#'I./'JGK%21+' For each image: Sample a reference position For each feature: ! ! Randomly choose one part ! ! Sample from that part’s feature distribution
JGK%21+')+'<$+1/$G@M.,+' Pr(part) Feature Feature Pr(appearance | part) Pr(position | part) appearance position • ! Parts are defined by parameters , which encode distributions on visual features: • ! Objects are defined by distributions on the infinitely many potential part parameters:
!'-.,")/)0%1/$2'=)/1Q()+%&'7.&%#' # parts # images 4 Images 16 Images 64 Images
O%,%/)#$L$,9'!2/.++'>)1%9./$%+' Can we transfer knowledge from one object category to another?
8%)/,$,9'D6)/%&'=)/1+' • ! FDG)<.-+%#)+"H)1+&"<%&&3+-'$'&%#+'1+%55)%#%1<)+ • ! I'-<"=)#+ 6%*0# +-,%#)*+%<#"--+<%.)/"#')-+ – ! A";+$%13+.".%&+5%#.-+-,"6&*+;)+-,%#)E+ – ! A";+$%13+5%#.-+-,"6&*+)%<,+<%.)/"#3+6-)E+
5$%/)/26$2)#'<='JGK%21'7.&%#' H R #" G 0 G 0 G &" G G !" 1 2 %" $" v w N J L
5$%/)/26$2)#'<='JGK%21'7.&%#' H Discrete Data: Teh et. al., 2004 R #" G 0 G 0 G &" G G !" 1 2 %" $" v w N J L
>6$,%+%'?%+1)@/),1'A/),26$+%'
D6)/$,9'=)/1+C'RS'>)1%9./$%+' • ! Caltech 101 Dataset (Li & Perona) • ! Bikes from Graz-02 (Opelt & Pinz) • ! Horses (Borenstein & Ullman) • ! Google ! • ! Cat & dog faces (Vidal-Naquet & Ullman)
B$+@)#$L)M.,'.I'D6)/%&'=)/1+' Pr(position | part) Pr(appearance | part)
B$+@)#$L)M.,'.I'D6)/%&'=)/1+' Pr(position | part) Pr(appearance | part)
B$+@)#$L)M.,'.I'D6)/%&'=)/1+' Pr(position | part) Pr(appearance | part)
B$+@)#$L)M.,'.I'=)/1'<%,+$M%+' Hierarchical Clustering of Pr(part | object)
<%1%2M.,';)+T'
<%1%2M.,'?%+@#1+' Shared Parts more accurate than Unshared Parts Modeling feature positions improves shared detection, but hurts unshared detection 6 Training Images per Category (ROC Curves)
<%1%2M.,'?%+@#1+' 6 Training Images per Category Detection vs. Training Set Size (ROC Curves) (Area Under ROC)
D6)/$,9'D$0"#$U%+'7.&%#+'
D2%,%+V'JGK%21+V'),&'=)/1+' Scene Objects Parts Features
>.,1%E1@)#';/),+I%/'8%)/,$,9'
JGK%21'P+4'B$+@)#'>)1%9./$%+' • ! Assume training data contains object category labels • ! Discover underlying visual categories automatically
7@#M"#%'JGK%21'D2%,%+' • ! How many cars are there? • ! Where are those cars in the scene? Standard dependent Dirichlet process models (Gelfand et. al., 2005) inappropriate
D")M)#';/),+I./0)M.,+' • ! Let global DP clusters model objects in a canonical coordinate frame • ! Generate images via a random set of transformations: Parameterized family Shift cluster from canonical of transformations coordinate frame to object location in a given image Layered Motion Models (Darrell & Pentland 1991, Wang & Adelson 1994, Jojic & Frey 2001) Nonparametric Transformation Densities (Learned-Miller & Viola 2000)
!';.*'H./#&C''()/+'N'(#.G+'
;/),+I./0%&'<$/$26#%1'=/.2%++' H Mixture Transformations Parameters G #" G R 0 0 G !" j G G G 1 2 3 %" $" v N J
F0"./1),2%'.I';/),+I./0)M.,+' TDP HDP
>.@,M,9'N'8.2)M,9'JGK%21+'' • ! How many cars are there? Dirichlet Processes • ! Where are those cars in the scene? Transformations
B$+@)#'D2%,%';<='' R Global Density G #" Object category 0 Part size & shape Transformation prior G !" j Transformed Densities Object category o Part size & shape %" '" Instance locations F F $ $" 2D Image Features ( Appearance Location w v H N J
D1/%%1'D2%,%'B$+@)#'>)1%9./$%+'
D1/%%1'D2%,%'D%90%,1)M.,+'
D%90%,1)M.,'=%/I./0),2%'
WE1%,+$.,C'X<'D2%,%+ ' Office Scene Red Far Green Near • ! Segmentation easier in 3D • ! Identifying known objects regularizes depth estimation
X<'D1/@21@/%'I/.0'D1%/%. ' Reference (left) Image Potential Matches Depth Densities )" Depth = Disparity Overhead View
O/%%&*'<%"16'W+M0)1%+ ' Reference (left) Image Potential Matches Depth Densities Green Near Red Far
X<';/),+I./0%&'<=C'JY2%'D2%,%+ ' Computer Screen Background Bookshelves Desk
D1%/%.';%+1'F0)9% ' Simultaneous object recognition & coarse 3D reconstruction
Recommend
More recommend