!""#$%&'()*%+$),' -.,")/)0%1/$2+' - - PowerPoint PPT Presentation

0 1 2 34 5 26 2 7 0
SMART_READER_LITE
LIVE PREVIEW

!""#$%&'()*%+$),' -.,")/)0%1/$2+' - - PowerPoint PPT Presentation

!""#$%&'()*%+$),' -.,")/)0%1/$2+' 34'5$%/)/26$2)#'7.&%#+' ' !"#$%&'()'#)*+,-)./0. ) Erik Sudderth Brown University Work by E. Sudderth, A. Torralba, W. Freeman, & A. Willsky IJCV 2008: Describing Visual


slide-1
SLIDE 1

!""#$%&'()*%+$),'

  • .,")/)0%1/$2+'

34'5$%/)/26$2)#'7.&%#+' '

!"#$%&'()'#)*+,-)./0.) Erik Sudderth

Brown University

Work by E. Sudderth, A. Torralba, W. Freeman, & A. Willsky IJCV 2008: Describing Visual Scenes using Transformed Objects & Parts CVPR 2006: Depth from Familiar Objects: A Hierarchical Model for 3D Scenes NIPS 2005: Describing Visual Scenes using Transformed Dirichlet Processes Building on work by Y. W. Teh, M. Jordan, M. Beal, & D. Blei JASA 2006: Hierarchical Dirichlet Processes

slide-2
SLIDE 2

8%)/,$,9':$16';."$2'7.&%#+ '

model neural stochastic recognition nonparametric gradient dynamical Bayesian !

Framework for unsupervised discovery of low-dimensional latent structure from bag of word representations

Algorithms Neuroscience Statistics Vision ! !! pLSA: Probabilistic Latent Semantic Analysis (Hofmann 2001) !! LDA: Latent Dirichlet Allocation (Blei, Ng, & Jordan 2003) !! HDP: Hierarchical Dirichlet Processes (Teh, Jordan, Beal, & Blei 2006)

slide-3
SLIDE 3

5$%/)/26$2)#'<$/$26#%1'=/.2%++'

E[πj] = β

G0 ∼ DP(γ, H)

Gj ∼ DP(α, G0)

J groups of data: documents, images, !

slide-4
SLIDE 4

>6$,%+%'?%+1)@/),1'A/),26$+%'

slide-5
SLIDE 5

8.2)#'B$+@)#'A%)1@/%+C'D@"%/"$E%#+ '

  • ! Partition image into ~1,000 superpixels
  • ! Goal: Reduce dimensionality, aggregate

information spatially – hopefully not across object boundaries!

Inspired by the successes of topic models for text data, some have proposed learning from local image features

slide-6
SLIDE 6

8.2)#'B$+@)#'A%)1@/%+C'F,1%/%+1'?%9$.,+'

Affinely Adapted Harris Corners Maximally Stable Extremal Regions Linked Sequences

  • f Canny Edges
  • ! Some invariance to lighting & pose variations
  • ! Dense, multiscale over-segmentation of image
slide-7
SLIDE 7

!'<$+2/%1%'A%)1@/%'B.2)G@#)/*'

  • ! !"#$%&'()*+,'-."/#%$-+"0+

"#')1.%2"1+)1)#/3+

  • ! 4"$56.)+789:::+;"#*+

*'<2"1%#3+='%+>?$)%1-+

  • ! @%5+)%<,+0)%.6#)+."+

1)%#)-.+!"#$%&'()*+'

appearance of feature i in image j 2D position of feature i in image j

SIFT Descriptors

Lowe, IJCV 2004

slide-8
SLIDE 8

;6%'H./#&')+')'()9'.I'B$+@)#'H./&+'

Fei-Fei & Perona, CVPR 2005 Sivic, Russell, Efros, Zisserman, & Freeman, ICCV 2005

Topics as visual themes composing a known set of scene categories Topics as visual object classes within a (carefully chosen) image collection

slide-9
SLIDE 9

F0)9%+')+'0./%'16),'()9+'.I'A%)1@/%+'

  • ! A";+*"+B+C1";+.,'-+'-+"<)%1+D)1)%.,+%+<&)%#+-C3E+
  • ! A";+$%13+D'<3<&)-+%1*+.#'<3<&)-+%$+B+&""C'1/+%.E+

,-.'%*/'(/'0*."12'0)'#3$//4/'"5%2/#'"10)'0)6"7'5)+/&#8' 9-/*/'%*/'5%1.'5)*/'0))&#'%!%"&%:&/':.'%+%6;12' 1)16%*%5/0*"7'%1+'-"/*%*7-"7%&'<%./#"%1'5)+/&#='

slide-10
SLIDE 10

B$+@)#'JGK%21'>)1%9./$L)M.,'

  • ! GOAL: Visually recognize and localize object categories
  • ! Robustly learn appearance models from few examples
slide-11
SLIDE 11

Part-Based Models for Objects

Pictorial Structures

Fischler & Elschlager, 1973

Generalized Cylinders

Marr & Nishihara, 1978

Recognition by Components

Biederman, 1987

R it Constellation Model

Perona, Weber, Welling, Fergus, Fei-Fei, 2000 to !

Efficient Matching

Felzenszwalb & Huttenlocher, 2005

Discriminative Parts

Felzenszwalb, McAllester, Ramanan, 2008 to !

slide-12
SLIDE 12

>.@,M,9'JGK%21+'N'=)/1+'

How many parts? How many objects?

slide-13
SLIDE 13

O%,%/)MP%'7.&%#'I./'JGK%21+'

For each image: Sample a reference position For each feature: !! Randomly choose one part !! Sample from that part’s feature distribution

slide-14
SLIDE 14

JGK%21+')+'<$+1/$G@M.,+'

  • ! Parts are defined by parameters, which

encode distributions on visual features:

Feature appearance Feature position Pr(appearance | part) Pr(position | part)

  • ! Objects are defined by distributions on the

infinitely many potential part parameters:

Pr(part)

slide-15
SLIDE 15

!'-.,")/)0%1/$2'=)/1Q()+%&'7.&%#'

4 Images 16 Images 64 Images # images # parts

slide-16
SLIDE 16

O%,%/)#$L$,9'!2/.++'>)1%9./$%+'

Can we transfer knowledge from one object category to another?

slide-17
SLIDE 17

8%)/,$,9'D6)/%&'=)/1+'

  • ! FDG)<.-+%#)+"H)1+&"<%&&3+-'$'&%#+'1+%55)%#%1<)+
  • ! I'-<"=)#+6%*0#+-,%#)*+%<#"--+<%.)/"#')-+

–!A";+$%13+.".%&+5%#.-+-,"6&*+;)+-,%#)E+ –!A";+$%13+5%#.-+-,"6&*+)%<,+<%.)/"#3+6-)E+

slide-18
SLIDE 18

5$%/)/26$2)#'<='JGK%21'7.&%#'

G

J

G !" #" H R

N

w $" %" &" v

L

G

1

G G

2

slide-19
SLIDE 19

5$%/)/26$2)#'<='JGK%21'7.&%#'

G

J

G !" #" H R

N

w $" %" &" v

L

G

1

G G

2

Discrete Data: Teh et. al., 2004

slide-20
SLIDE 20

>6$,%+%'?%+1)@/),1'A/),26$+%'

slide-21
SLIDE 21

D6)/$,9'=)/1+C'RS'>)1%9./$%+'

  • ! Caltech 101 Dataset (Li & Perona)
  • ! Horses (Borenstein & Ullman)
  • ! Cat & dog faces (Vidal-Naquet & Ullman)
  • ! Bikes from Graz-02 (Opelt & Pinz)
  • ! Google!
slide-22
SLIDE 22

B$+@)#$L)M.,'.I'D6)/%&'=)/1+'

Pr(appearance | part) Pr(position | part)

slide-23
SLIDE 23

B$+@)#$L)M.,'.I'D6)/%&'=)/1+'

Pr(appearance | part) Pr(position | part)

slide-24
SLIDE 24

B$+@)#$L)M.,'.I'D6)/%&'=)/1+'

Pr(appearance | part) Pr(position | part)

slide-25
SLIDE 25

B$+@)#$L)M.,'.I'=)/1'<%,+$M%+'

Hierarchical Clustering of Pr(part | object)

slide-26
SLIDE 26

<%1%2M.,';)+T'

slide-27
SLIDE 27

<%1%2M.,'?%+@#1+'

6 Training Images per Category

(ROC Curves)

Shared Parts

more accurate than

Unshared Parts

Modeling feature positions improves shared detection, but hurts unshared detection

slide-28
SLIDE 28

<%1%2M.,'?%+@#1+'

6 Training Images per Category

(ROC Curves)

Detection vs. Training Set Size

(Area Under ROC)

slide-29
SLIDE 29

D6)/$,9'D$0"#$U%+'7.&%#+'

slide-30
SLIDE 30

D2%,%+V'JGK%21+V'),&'=)/1+'

Features Parts Objects Scene

slide-31
SLIDE 31

>.,1%E1@)#';/),+I%/'8%)/,$,9'

slide-32
SLIDE 32

JGK%21'P+4'B$+@)#'>)1%9./$%+'

  • ! Assume training data contains object category labels
  • ! Discover underlying visual categories automatically
slide-33
SLIDE 33

7@#M"#%'JGK%21'D2%,%+'

  • ! How many cars are there?
  • ! Where are those cars in the scene?

Standard dependent Dirichlet process models (Gelfand et. al., 2005) inappropriate

slide-34
SLIDE 34

D")M)#';/),+I./0)M.,+'

  • ! Let global DP clusters model objects

in a canonical coordinate frame

  • ! Generate images via a random

set of transformations:

Parameterized family

  • f transformations

Shift cluster from canonical coordinate frame to object location in a given image

Layered Motion Models (Darrell & Pentland 1991, Wang & Adelson 1994, Jojic & Frey 2001) Nonparametric Transformation Densities (Learned-Miller & Viola 2000)

slide-35
SLIDE 35

!';.*'H./#&C''()/+'N'(#.G+'

slide-36
SLIDE 36

;/),+I./0%&'<$/$26#%1'=/.2%++'

G G

1 2 3

G G G

J

G

j

!" #" H R

Mixture Parameters Transformations

N

v $" %"

slide-37
SLIDE 37

F0"./1),2%'.I';/),+I./0)M.,+'

HDP TDP

slide-38
SLIDE 38

>.@,M,9'N'8.2)M,9'JGK%21+''

  • ! How many cars are there?
  • ! Where are those cars in the scene?

Dirichlet Processes Transformations

slide-39
SLIDE 39

B$+@)#'D2%,%';<=''

G #" H R

Global Density

Object category Part size & shape Transformation prior

'" F F

( J

G

j

!"

Transformed Densities

Object category Part size & shape Instance locations

w v

2D Image Features

Appearance Location

N

$ $" %"

slide-40
SLIDE 40

D1/%%1'D2%,%'B$+@)#'>)1%9./$%+'

slide-41
SLIDE 41

D1/%%1'D2%,%'D%90%,1)M.,+'

slide-42
SLIDE 42

D%90%,1)M.,'=%/I./0),2%'

slide-43
SLIDE 43

WE1%,+$.,C'X<'D2%,%+ '

  • ! Segmentation easier in 3D
  • ! Identifying known objects

regularizes depth estimation

Red Green Office Scene Far Near

slide-44
SLIDE 44

X<'D1/@21@/%'I/.0'D1%/%. '

Reference (left) Image Potential Matches Depth Densities

Overhead View Depth = Disparity

)"

slide-45
SLIDE 45

O/%%&*'<%"16'W+M0)1%+ '

Reference (left) Image Potential Matches Depth Densities Red Far Green Near

slide-46
SLIDE 46

X<';/),+I./0%&'<=C'JY2%'D2%,%+ '

Computer Screen Desk Bookshelves Background

slide-47
SLIDE 47

D1%/%.';%+1'F0)9% '

Simultaneous object recognition & coarse 3D reconstruction