Render for CNN:
Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views
Hao Su* Charles R. Qi* Yangyan Li Leonidas J. Guibas
Render for CNN: Viewpoint Estimation in Images Using CNNsTrained - - PowerPoint PPT Presentation
Render for CNN: Viewpoint Estimation in Images Using CNNsTrained with Rendered 3D Model Views Hao Su * Charles R. Qi * Yangyan Li Leonidas J. Guibas ILSVRC Image ClassificationTop-5 Error (%) 28.2 30 25.8 25 Top-5 Error (%) 20 16.4 15
Hao Su* Charles R. Qi* Yangyan Li Leonidas J. Guibas
5 10 15 20 25 30 2010 2011 2012 2013 2014 2015 28.2 25.8 16.4 11.7 6.7 3.6
Top-5 Error (%)
azimuth elevation in-plane rotation
Images in the Wild Models unknown
Images in the Wild Models unknown
AlexNet [Krizhevsky et al.]
AlexNet [Krizhevsky et al.]
PASCAL3D+ dataset [Xiang et al.]
PASCAL3D+ dataset [Xiang et al.]
Step1: Choose similar model
PASCAL3D+ dataset [Xiang et al.]
Step2: Coarse Viewpoint Labeling
Step1: Choose similar model
PASCAL3D+ dataset [Xiang et al.]
Step2: Coarse Viewpoint Labeling Step3: Label keypoints For alignment
Step1: Choose similar model
Annotation takes ~1 min per object
30K images with viewpoint labels in PASCAL3D+ dataset [Xiang et al.]
60M parameters. AlexNet [Krizhevsky et al.]
Manual alignment by annotators Auto alignment through rendering
1,000 1,000,000 PSB 05’ SHREC 14’ 10 100 1,000 ModelNet 15’
ShapeNet (going on)
Rendering Viewpoint
Synthetic Images Training
ShapeNet
Viewpoint
Real Images Testing
Rendering
I want data! How to render data with both quantity and quality?
Ideal
Ideal
Ideal
Previous works
Ideal Sweet spot
Previous works
Ideal Sweet spot
Previous works
Story Time!
47% -> 74%
Randomize lighting
47% -> 74%
Randomize lighting
74% -> 86% Add backgrounds
bbox crop texture 86% -> 93%
bbox crop texture
86% -> 93%
3D model Rendering Add bkg Crop
Sample lighting and camera params Sample bkg. Image Alpha-blending Sample cropping params Hyper-parameters estimation from real images
3D model Rendering
Sample lighting and camera params
Camera params KDE from PASCAL3D+ train set Lighting params Randomly sampled
3D model Rendering Add bkg
Sample lighting and camera params Sample bkg. Image Alpha-blending
3D model Rendering Add bkg Crop
Sample lighting and camera params Sample bkg. Image Alpha-blending Sample cropping params
Cropping patterns KDE from PASCAL3D+ train set
Cropping patterns KDE from PASCAL3D+ train set
Metric: median angle error (lower the better) Real test images from PASCAL3D+ dataset
Metric: viewpoint accuracy and median angle error (lower the better) Real test images from PASCAL3D+ dataset Our model trained on rendered images outperforms state-of-the-art model trained on real images in PASCAL3D+.
8 9 10 11 12 13 14 15 16 Vps&Kps (CVPR15) RenderForCNN (Ours) Viewpoint Median Error
55 60 65 70 75 80 85 90 10 91 1000 6928
#models (for one category) Accuracy
10 vs 1000 models 20% + difference
90 180 270 360 90 180 270 360 90 180 270 360 90 180 270 360
90 180 270 Ground truth view Estimated view confidence airplane bicycle bicycle boat motorbike car
table chair monitor 90 180 270 Ground truth view Estimated view confidence chair sofa sofa
sofa occluded by people car occluded by motorbike ambiguous car viewpoint ambiguous chair viewpoint multiple cars multiple chairs
Images rendered from 3D models can be effectively used to train CNNs, especially for 3D tasks. State-of-the-art result has been achieved. Keys to success
http://shapenet.cs.stanford.edu