SLIDE 1 Instance-level Recognition
Pingmei Xu
SLIDE 2 Object Recognition
Friends SE01EP02
SLIDE 3 Recognition: Find the Ring!
Friends SE01EP02
SLIDE 4 Recognition: Find the Ring!
Friends SE01EP02
Instance Recognition
SLIDE 5 Recognition: Find the Ring!
Friends SE01EP02
Category Recognition
SLIDE 6 Recognition: Find the Ring!
Friends SE01EP02
Recognition Algorithm
SLIDE 7 Recognition: Find the Ring!
Friends SE01EP02
Scene Understanding
SLIDE 8 History of Ideas in Recognition
Some Slides are borrowed from Svetlana Lazebnik
SLIDE 9 The Religious Wars
- …and the standard answer: probably both or neither
Alexei Efros
Geometry Appearance Parts The Whole vs. vs.
SLIDE 10
1960s ~ late 1990s
the Geometric Era
SLIDE 11 Blocks World (1960s)
- Constrained 3D scene models to allow object recognition
from very simple image features.
L.G. Roberts
SLIDE 12 Generalized Cylinders (1970s)
- Representing 3D shapes and parts in terms of
“Generalized Cylinders”.
SLIDE 13 Recognition by Parts
- Geons: shape primitives + deformations, with predictable
edge properties under perspective.
Biederman (1987) Pentland (1986)
SLIDE 14 Recognition by Parts
- Hypothesis: there is a small number of geometric components
that constitute the primitive elements of the object recognition system.
Biederman (1987)
Primitives (geons) Objects
SLIDE 15 Parts + Spatial Configurations
- There is more to shape than just the right part primitives,
i.e., their spatial relationships.
Fischler & Elschlager (1973)
SLIDE 16 Alignment
Huttenlocher & Ullman (1987)
SLIDE 17
1990s
Appearance-Based
SLIDE 18 Eigenfaces
Turk & Pentland (1991)
SLIDE 19 Color Histogram
Swain & Ballard (1991)
SLIDE 20
1990s ~ present
Sliding Window
SLIDE 21 Sliding Window Approaches
Viola & Jones (2000) Dalal & Triggs (2005)
HOG feature map Template Response map
SLIDE 22
late 1990s ~ present
Local Features
SLIDE 24
early 2000s ~ present
Parts and Shape
SLIDE 25 Constellation Models
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
SLIDE 26 Deformable Part Model
Felzenszwalb, Girshick, McAllester & Ramanan (2009)
SLIDE 27
mid 2000s ~ present
Bag of Features
SLIDE 28 Bag-of-features Models
Objects Bag of “words”
SLIDE 29
Present Trends
SLIDE 30 Data-driven Method
Malisiewicz et al. (2011)
SLIDE 31 Recognition from RGBD Images
Shotton et al. (2011)
SLIDE 32 Deep Learning
http://deeplearning.net/
SLIDE 33 History of Ideas in Recognition
- 1960s – late 1990s: the geometric era
- 1990s: appearance-based models
- 1990s – present: sliding window approaches
- late 1990s – present: local features
- early 2000s – present: parts-and-shape models
- mid 2000s – present: bags of features
- present trends: “big data”, context, attributes, combining
geometry and recognition, advanced scene understanding tasks, deep learning
SLIDE 34 History of Ideas in Recognition
?
THE ¡COMPUTER ¡VISION ¡EVOLUTION
SLIDE 35 3D Object Modeling and Recognition
A test image Instances of 5 models
Rothganger, Lazebnik, Schmid, & Ponce (2006)
SLIDE 36
3D Modeling: Pairwise Matching
SLIDE 37 Affine Patches
- Idea: although smooth surfaces are almost never planar
in the large, they are always planar in the small.
STEP 1: Detection Detect salient image regions STEP 2: Description Extract a descriptor
SLIDE 38 Affine Patches
Harris-Laplacian DoG
SLIDE 39
Affine Patches
SLIDE 40
Affine Patches
SLIDE 41
Affine Patches
SLIDE 42
Affine Patches
SLIDE 44
Geometric Constraints
SLIDE 45 3D Object Modeling: Matching Procedure
RANSAC: 1) sampling stage 2) consensus stage
SLIDE 46
Application: 3D Object Modeling
SLIDE 47 3D Modeling: Input Images
- The 20 images used to construct the teddy bear model.
SLIDE 48 3D Modeling: Partial Model from Image Pairs
- Matches between two images.
SLIDE 49 3D Modeling: Partial Model→Composite Ones
Covert a collection of matches to a 3D model
- 1. Chaining
- 2. Stitching
- 3. Bundle adjustment
- 4. Euclidean upgrade
SLIDE 50 3D Modeling: Partial→Composite: Chaining
Chaining: link matches across multiple images.
- Construction of the patch-view matrix.
A (subsampled) patch-view matrix for the teddy bear. Each black square indicates the presence of a given patch in a given image.
SLIDE 51 3D Modeling: Partial→Composite: Stitching
Stitching: solve for the affine structure and motion while coping with missing data.
Common patches of adjacent modeling views presented in a common coordinate frame.
SLIDE 52 3D Modeling: Partial→Composite: Bundle Adjustment
Bundle adjustment: refine the model using non-linear least squares.
The bear model along with the recovered affine camera configurations.
SLIDE 53
3D Modeling: Object Gallery
SLIDE 54
Application: 3D Object Recognition
SLIDE 55 3D Object Recognition: Select Potential Matches
Features: 1) a measure of the contrast (average squared gradient norm) in the patch 2) a 10 × 10 color histogram drawn from the UV portion of YUV space, and 3) SIFT
SLIDE 56 3D Object Recognition: Robust Estimation
Here |P| denotes the size of the set P of match hypotheses, K is the number of best matches kept per model patch, M is the number of samples drawn, and N is the size of one seed.
SLIDE 57 3D Object Recognition: Object Detection
- Criteria used to decide whether it is present or not:
- Measure of distortion: reflects how close to the top part of
a scaled rotation this matrix is.
SLIDE 58 3D Object Recognition: Successful Examples
SLIDE 59 3D Object Recognition: Failed Examples
SLIDE 60 3D Object Modeling and Recognition
- Paper: 3D Object Modeling and Recognition Using Local Affine-Invariant
Image Descriptors and Multi-View Spatial Constraints. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, IJCV 2006
Input Pairwise Matching 3D Modeling Test Image 3D Recognition
SLIDE 61
Large Scale Retrieval
SLIDE 62 Large Scale Image Retrieval
- Combining local features, indexing, and spatial
constraints
SLIDE 63 Video Google (VG)
- Whether text retrieval approach is applicable to object
recognition?
Query Search Results
J Sivic et al. (2003), Philbin et al. (2007, 2008), Chum et al. (2007)
SLIDE 64 Text Retrieval
Word Stem Document Corpus
SLIDE 65 The Visual Analogy
? ? Frame/Image Film/Image Set
SLIDE 66 VG:Local Descriptor
- Viewpoint covariant regions:
1) ’Maximally Stable’ (yellow) 2) ’Shape Adapted’ (cyan)
SLIDE 67 The Visual Analogy
Descriptor
? Frame/Image Film/Image Set
SLIDE 68 VG: Visual Vocabulary
- Vector quantize the descriptors into clusters by k-means.
Affine covariant regions Clusters
SLIDE 69
- Each group of patches belongs to the same visual word.
VG: Visual Words
SLIDE 70 The Visual Analogy
Descriptor
Centroid Frame/Image Film/Image Set
SLIDE 71 Image Retrieval Using Visual Words
- Vocabulary construction (offline)
- Database construction (offline)
- Image retrieval (online)
SLIDE 72 VG: Stop List
- The most frequent visual words that occur in almost all
images are suppressed.
Before stop list→ After stop list→
SLIDE 73 VG: Soft Assignment
- Count in one bin is spread to neighbouring bins.
SLIDE 74 Vocabulary Construction Summary
Subset of 48 shots is selected Select regions (SA+MS) Frame tracking Reject unstable regions SIFT descriptors Cluster descriptors using K-means Parameters tuning
10k frames= 10% of movie 10k frames*1600 =1,600,000 regions ~200k regions
SLIDE 75 Image Retrieval Using Visual Words
- Vocabulary construction (offline)
- Database construction (offline)
- Image retrieval (online)
SLIDE 76 tf-idf Vector
- Documents -> vectors of word frequencies
- Term frequency – inverse document frequency
- Downweight words that appear often in the database
Number of
in document d Number of words in document d Total number of documents in database Total number of word i in database
SLIDE 77 VG: Inverted File Index
- Word -> a list of all documents (with frequencies)
Word ID 1 2 3 4 … N Document ID 1 2 3 4 … K
SLIDE 78 Crawling Movies Summary
Select key frames Select regions (SA+MS) Frame tracking Reject unstable regions SIFT descriptors Vector quantization Stop list Tf-idf weighting Inverted index
Vocabulary Construction Database Construction
SLIDE 79 Image Retrieval Using Visual Words
- Vocabulary construction (offline)
- Database construction (offline)
- Image retrieval (online)
- Building query vector
- Ranking results
SLIDE 80 Querying
word id image id 1 8 2 1,17 3 12 4 3,10 5 6,9 6 1,4,5,7 7 9,16 … N
Query Image Database Invert File
w2 w7
w2 w2 w9
w20
w4 w6 w5
w22
1 2 3 17 …
SLIDE 81 VG: Rerank by Spatial Consistency
- Neighbouring matches in the query region lie in a
surrounding area in the retrieved frame.
Before → After →
SLIDE 82 VG: Query Expansion
Query image New query Results Spacial verification New results
Ondrej Chum
SLIDE 83 “Google Like” Object Query
Compute descriptors Build query vector Use inverse index to find relevant frames Calculate distance to relevant frames Rank results Re-rank by spacial consistency Query expansion (optional)
SLIDE 84 Large Scale Image Retrieval
- Vocabulary construction (offline)
- extract affine covariant regions
- compute descriptor
- cluster the descriptors into visual words (k-means)
- Database construction (offline)
- tf-idf vectors for each document
- inverted indices from visual words to images
- Image retrieval (online)
- extract visual words and compute a tf-idf vector for the query image
- retrieve the top image candidates
- (optional) re-rank matches using spatial consistency
- (optional) expand the answer set by re-submitting highly ranked matches
Richard Szeliski, 2010
SLIDE 85
Instance Recognition Application
SLIDE 86 Planar Object Recognition
- Recognition of flat textured objects (CD covers, book
covers, etc.)
SLIDE 87
Fingerprint Recognition
SLIDE 88 Automatic Discovery of Landmarks
Quack, Leibe & VanGool (2008)
SLIDE 89 Cross-domain Image Matching
Shrivastava et al. (2011)
SLIDE 90 Summary
- History of ideas in Recognition
- 3D object modeling and recognition
- Large scale image retrieval
- Instance recognition application
SLIDE 91
Thank You