synthesizing 3d shapes via modeling multi view depth maps
play

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and - PowerPoint PPT Presentation

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Amir A. Soltani Haibin Huang Jiajun Wu Tejas Kulkarni Josh Tenenbaum Samples Out-of-Sample Generalization 07/21/2017 Motivation -


  1. Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Amir A. Soltani Haibin Huang Jiajun Wu Tejas Kulkarni Josh Tenenbaum Samples Out-of-Sample Generalization 07/21/2017

  2. Motivation - Autonomous Vehicles

  3. Motivation - Robotics

  4. Motivation ● Computer Vision cannot simply rely on 2D data to solve 3D problems ● We need to have good 3D representations to solve inverse problems ● A generative model for 3D is a good starting point (A lot more needed though) ● Good progress has been made in the past 2 or 3 years ● Still, the choice of 3D representation is being debated ● Each representation has advantages and disadvantages ● So far there is not a good agreement on which representation to use

  5. Choice of Representation Voxels Multi-view Meshes Point clouds Template-based

  6. 3D Representation - Voxels Computational complexity is very high (O 3 ) if used naively ● ● Cannot Model High-Res Shapes ● Details can easily get lost ● Highly sparse at higher resolutions ● Cannot model regular structures easily

  7. 3D Representation - Voxels ● Directly predicting high-res voxel-based outputs is very hard ● Highest so far is 64 x 64 x 64 ● One model per object category Wu et al, NIPS 2016

  8. 3D Representation - Point Clouds ● Things start to get mathematically-involved from here ● The choice of loss function, non-differentiability issues etc ● Not obvious how many points to have ● Details Will Be Missing ● Not a lot of work done using point clouds so far Image courtesy: Hao Su

  9. 3D Representation - Point Clouds Su et al, CVPR 2017

  10. 3D Representation - Meshes ● Cannot directly apply out-of-the-box models on ● Need to Construct Special Kind of Kernels for CNNs ● Mathematically Involved ● Can be seen as a graph as well Image courtesy: Hao Su

  11. 3D Representation - Template-Based (CAD) ● Again, Not Able to Easily Apply Out-of-Box Models on ● Data Is Very Hard to Obtain ● Hard to Model Shapes Never Seen Before ● Offers Compositionality Intrinsically and Explicitly ● Might be a Good Option for Learning Functionalities Image courtesy: Haibin Huang

  12. 3D Representation - Multi-View ● Multi-view representation is very lightweight ● Offers Flexibility (Depth Maps) and Eases the Computation Significantly ● Although 2D, Still Explicitly Models 3D Shapes ● Allows Generating Hi-Res, Detailed, Novel Objects ● Without the machinery required for new voxel-based models ● Can easily apply out-of-the-box CNN models on ● Not Mathematically Involved ● More Intuitive

  13. Motivations ● Synthesize/Generate Hi-Res, Detailed and Novel Shapes ● Use Some Sort of a Representation Whose Data is Easily Obtainable ● No Doubt that it is Very Easy to Obtain 2D images or RGBD or just D ● Have Out-of-Sample Generalizability ● A Step Forward Towards Obtaining 3D Concepts Efficiently to Solve Inverse Vision Problems ● Model 3D via 2D (inspired by biological vision) ● Share the Same Representations For All Categories

  14. Pipeline - Data Set ● Used ShapeNet Core ● Contains Aligned, Normalized Shapes ● ~37k for train, ~3k for test ● Normalized and Aligned ● Render 20 views of depth maps ● Camera Positions Fixed

  15. Pipeline - Architectures ● Train 3 Different VAE Models ● AllVPNet: Train with All 20 Views ● DropoutNet: Train with 2-5 Randomly Chosen Views ● SingleVPNet: Train with 1 Randomly Chosen View ● Z Layer Has 100 Nodes for Unconditional and 40 for Conditional ● L1 Loss Function is Used During Training

  16. Pipeline - Architectures L1 L1

  17. Pipeline - 3D Reconstruction ● Deterministic Function is Used to Generate the Final 3D Point Cloud ● Number of Points is Between ~30k to ~400k depending on Shape Complexity ● Not fixed!

  18. Results - Sampling Random Sampling

  19. Results - Sampling Random Samples’ Nearest Neighbors Training set Reconstruction Random Sample

  20. Results - Sampling Random Samples

  21. Results - Sampling More Random Samples

  22. Results - Sampling More Random Samples

  23. Results Samples Conditional Sampling

  24. Results - Sampling Conditional Sampling

  25. Results - Sampling Conditional Samples

  26. Results - Conditional Sampling Nearest Neighbors Training set Reconstruction Cond. Sample Training set Reconstruction Cond. Sample

  27. Results - Conditional Sampling Nearest Neighbors Training set Reconstruction Cond. Sample

  28. Results - Reconstruction

  29. Results - Classification Classification, Reconstruction Error

  30. Results - Reconstruction Out-of-Sample Generalization ● Put Silhouettes/Depth Maps into 224 x 224 canvases ● Images Scaled to Fit ● Camera Pose Not Fixed ● Different Size and Orientation ● NYUD and Silhouettes from the Internet ● The Rest of The Results Are All Obtained Through SingleVPNet Model

  31. Results - Reconstruction Out-of-Sample Generalization (NYUD)

  32. Results - Reconstruction Out-of-Sample Generalization (Uncond. SinlgeVPNet - NYUD Silhouettes)

  33. Results - Reconstruction Out-of-Sample Generalization (Uncond. SinlgeVPNet - NYUD Silhouettes)

  34. Results - Reconstruction Out-of-Sample Generalization (Silhouettes From Web)

  35. Results - Representation Analysis Consistent Representation ● Naturally Would Like to Get The Same Shape Across All Views ● Intuitively-Thinking, Uncertainty is Actually Part of Consistency ● Obtaining Good Priors Is Important!

  36. Results - Analysis Consistent Representation

  37. Results - Analysis Consistent Representation

  38. Results - Analysis Priors Matter!

  39. Results - Analysis What 3D shape is this?

  40. Results - Analysis

  41. Results - Analysis ● Model’s Prediction: “airplane” ● Quite meaningful and intuitive ● Obtaining good, inductive biases is hard but helps a lot! ● Behaves like a hierarchical prior

  42. Results - Analysis Implicitly Learning About Parts

  43. Results - Analysis Implicitly Learning About Parts

  44. Concolusion ● We showed an effective paradigm for learning 3D shapes using multiview representation ● Samples obtained look realistic, novel and detailed ● Out-of-sample generalization is attainable via good generative models + meaningful priors ● Hierarchical priors can effectively induce enough bias to generate meaningful results ● Strong inductive biases helps get meaningful 3D shapes on highly occluded inputs ● Parts can be learned implicitly. Hard to explicitly learn parts for real-word tasks

  45. Future Directions and Challenges ● Current data sets are not sufficient to learn about 3D vision ● 3D shapes are the end product of an underlying process: physics ● Current data-driven approaches do get us to where we want to be ● 3D shapes are composed of things like material, mass, etc ● To meaningfully interact with 3D shapes we need to do more! ● Learning fast, and accurate physics simulators might be a good starting point

  46. Future Directions and Challenges Thank you!

  47. Results - Conditional Sampling Conditional Samples

  48. Results - Classification, Recon. Err. ● The Goal Is Not to do Classification or Recon. But to Have Hierarchical Priors ● Strong Regularization

  49. Results - IoU IoU numbers for ShapeNet Core

  50. Results - Conditional Sampling More Conditional Samples

  51. Results Conditional Sampling More Conditional Samples

  52. Results - Analysis What about this?

  53. Results - Analysis

  54. Results - Analysis

  55. Results - Analysis

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend