3D Object Representations for Fine-Grained Categorization Jonathan - - PowerPoint PPT Presentation
3D Object Representations for Fine-Grained Categorization Jonathan - - PowerPoint PPT Presentation
3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei What is this? What is this? Car What is this? Sedan What is this? BMW Sedan What is this? BMW 3-Series Sedan What is this?
What is this?
What is this?
Car
What is this?
Sedan
What is this?
BMW Sedan
What is this?
BMW 3-Series Sedan
What is this?
2013 BMW 3-Series Sedan
What is this?
2013 BMW 3-Series Sedan 328i
Difficulty
How many classes are there?
Difficulty
How many classes are there?
Why 3D?
Why 3D?
Related Work
- Many works on fine-grained recognition and
3D recognition Birdlets
- Birdlets
– 3D volumetric bird model – Pose normalization – Extensive training annotations
Birdlets: Subordinate Categorization Using Volumetric Primitives and Pose-Normalized Appearance.
- R. Farrell, O. Oza, N. Zhang, V. I. Morariu, T. Darrell, L. S. Davis. ICCV 2011
Method Overview
- 1. Estimate 3D geometry
- 2. Calculate appearance w.r.t. geometry
- 3. Use appearance in 3D representation
Getting 3D Geometry
- Train geometry classifier from synthetic data
– Generate synthetic data from CAD models – Group synthetic data by azimuth, elevation, and coarse type
- sedan, coupe, convertible, SUV, pickup, hatchback,
station wagon station wagon
– SVM
- At test time use multiple hypotheses
Base HOG features Learned classifier
Synthetic Data
- 41 CAD models
- 36 azimuths
- 4 elevations
10 backgrounds
- 10 backgrounds
- 59,040 synthetic images w/full 3D annotations
Appearance
- Sample patches directly from 3D surface
- Rectify patches for viewpoint invariance
3D Representation 1: SPM-3D
- Extension of Spatial Pyramid Matching to 3D
- 1. Compute features for each patch
- 2. Pool over regions on object surface
We use 1x1,2x2,4x4 pooling regions We use 1x1,2x2,4x4 pooling regions
Beyond Bags of Features: Spatial Pyramid Matching for recognizing natural scene categories.
- S. Lazebnik, C. Schmid, J. Ponce. CVPR 2006
3D Representation 2: BB-3D
- 3D version of randomized BubbleBank [Deng et al. CVPR 2013]
- BB-2D: random templates + local pooling regions
Fine-Grained Crowdsourcing for Fine-Grained Recognition. J. Deng, J. Krause, L. Fei-Fei. CVPR 2013
BubbleBank-3D
- 1. Randomly sample templates
- 2. Pool over local 3D region
Fine-Grained Car Datasets
- Existing datasets are small and not very fine-grained
– car-types: 14 classes, variety of coarse categories
- Two new datasets:
– BMW-10: Ten classes, ultra-fine-grained BMW-10: Ten classes, ultra-fine-grained – car-197: 197 classes, much bigger
- In terms of # images:
Fine-Grained Categorization for Scene Understanding. M Stark, J. Krause, B. Pepik, D. Meger, J.J. Little, B. Schiele, D. Koller. BMVC 2012
car-types
car-197
BMW-10
- 10 types of BMWs, 512 images, many
viewpoints, bounding boxes, hand-curated
Car-197
- 197 car models, 16,185 images
- Collected very carefully on AMT
- Slightly modified version in FGComp
- Standalone dataset out soon
Fine-Grained Challenge 2013. http://sites.google.com/site/fgcomp2013
Experiments: BMW-10
30 40 50 60 70
ccuracy
10 20 30
Accur
3D works!
BB-3D: Local vs. Global
- BB-3D-L: 64.7%, BB-3D-G: 66.1%
- Why global pooling can work:
– More robust w.r.t. difficult viewpoints Left-right symmetry – Left-right symmetry
Experiments: car-types
85 90 95 100
ccuracy
70 75 80
Accu
Still works!
Experiments: car-197
62 64 66 68 70 72 74 76 78
Accuracy
- The problems:
– Underrepresentation of some types of CAD models – Template vs. codebook approaches with many classes
- The silver lining: Stacking helps a lot :)
56 58 60 62
LLC+SPM SPM-3D BB BB-3D-G Stacked
Discriminative Bubbles
Discriminative power of templates in BB-3D (BMW-10): Discriminative features at front/back!
Size/color proportional to
Bonus: Ultra-Wide Baseline Matching
- Measures ability to localize 3D points across viewpoints
- Use BB-3D-L + RANSAC for correspondences
Experiments: Ultra-Wide Baseline Matching
- On 3D Object Classes
- Works well, state of the art for some baselines
3D Generic Object Categorization, Localization, and Pose Estimation. S. Savarese, L. Fei-Fei. ICCV 2007 [24] 3D2PM – 3D Deformable Part Models. B. Pepik, P. Gehler, M. Stark, B. Schiele. ECCV 2012 [37] Revisiting 3D Geometric Models for Accurate Object Shape and Pose. M. Z. Zia, M. Stark, B. Schiele, M.
- Schindler. 3DRR 2011
BB-3D-S: Single geometry hypothesis BB-3D-M: Multiple geometry hypotheses
But Wait, There’s More: Reconstruction of Category
- Same fine-grained category, different instances,
backgrounds, lighting, etc.
- Pipeline: BB-3D-L for point correspondences→
VisualSFM for bundle adjustment VisualSFM for bundle adjustment
Conclusion
- Lifted two representations to 3D (SPM-3D, BB-
3D) which are state of the art on two fine- grained datasets
- Two new fine-grained datasets of cars
- Two new fine-grained datasets of cars
- Promising initial results on ultra-wide baseline