 
              Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data Evgeny Burnaev Skoltech, ADASE group Joint with Alexander Notchenko
Supervised Deep Learning data Type Supervision 2D Image classification, Class label, object detection box, detection segmentation segmentation contours Pose Estimation Structure of “skeleton” on image - Deep learning based methods - Feature based methods 2 - Human performance
But world is in 3D 3
Autonomous Vehicles 4
Autonomous Vehicles 5
Augmented (Mixed) Reality 6
Augmented (Mixed) Reality 7
Robotics in Human Environments 8
Commodity sensors to create 2.5D images In Intel el R Real ealSen ense S e Ser eries es Asus Xt As Xtion Pr Pro Mi Micr crosoft K t Kinect v ect v2 St Structure Se Sensor 9
3D Deep Learning is gaining popularity Go Google Scholar when searched for " 3D 3D " " De Deep Le Learni ning ng " returns: Workshops: year # articles ● Deep Learning for Robotic Vision Workshop 2012 410 CVPR 2017 ● Geometry Meets Deep Learning ECCV 2016 2013 627 ● 3D Deep Learning Workshop @ NIPS 2016 2014 1210 ● Large Scale 3D Data: Acquisition, Modelling 2015 2570 and Analysis CVPR 2016 ● 3D from a Single Image CVPR 2015 2016 5440 10
Representation of 3D data for Deep Learning Me Meth thod Pros (+) +) Co Cons (-) Ma Many 2D su sust stain su surface texture, Redundant representation, Re pr projections The There is a lot ot of of 2D DL L method hods vu vulnerable to optic illusi sions Vo Voxels si simple, can be sp sparse se, has s losin lo ing surface propertie ies volumetric properties vo Point Po Ca Can be sparse lo losin ing surface propertie ies Clo Cloud an and volumet etric c proper erties es 2.5D 2. 5D Ch Cheap measurement devic ices, self occlusi se sion of bodies s in a im images sense se ses s depth sc scene, a lot of Noise se in measureme me ments 1 1
Multi-view CNN for 3D Shape Recognition 12
3D ShapeNets: a Deep Representation for Volumetric Shapes 1 3
Previous work “Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval” Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev 14
3D Design Phase Designers spend about 60% % of their time • searching for the right information Massive and complex CAD models are usually Ma • derly archived in enterprises, which makes disorde di design reuse a difficult task al can significantly shorten the product lifecycles 3D 3D Mo Model el ret etriev eval 15
Sparsity of voxel representation 30 3 voxels is already enough to understand simple shape But with texture information it would be even easier Sparsity for all classes of ModelNet40 train dataset at voxel resolution 40 is only 5.5% 16
Shape Retrieval Sp Sparse 3D 3D CNN CNN Precomputed V plane - feature feature vector vector of plane of dataset. (V car , V person ,...) Query Cosine distance 17 Retrieved Items
Triplet loss Learning representation by minimizing triplet loss Triplet is a set ( a , p , n ), where a - anchor object p - positive object that is similar to anchor object n - negative object that is not similar to anchor object 𝜇 𝜀 # , 𝜀 % = max 0, 𝜈 + 𝜀 # − 𝜀 % , where is a margin parameter, and are distances between p 18 and a , n and a
Conclusions Amount of layers needed for shape recognition changes with • different resolutions (hierarchical nature of complex shapes) You don’t need a lot of details to discriminate classes in small • dataset (40 classes in ModelNet40) No point in 3D if you only want the surfaces of shapes, might • be beneficial for volumetric images like (MRI, CT, density maps) Getting accuracy over 91% is unreasonable because of label • noise 19
Typical representation: 3D shape as a dense Point Cloud 20 Robotics AR Autonomous Vehicles
What do they have in common? They require understanding the whole scene 21
Probabilistic Inverse Graphics Long term view 22
Conceptual / Physical Heterogeneous Data Physical Representation: “Semantic Conditional Domain projection Shape, Albedo, Location Structure with generators of parameters” objects approximate DR 3D to 2D Object and relationship Furniture params generator approximate DR Physical 3D to 2.5D dynamics Humans Noise generator Learnable Projection (3D Scanner output) ... ...
Probabilistic Inverse Graphics problem ü Given many heterogeneous images of scenes - I k , D k , G k ü Find distribution over disentangled features - P ( x 1 , x 2 , …, x n | I k , D k , G k ) ü Learning generative models along the way - P ( s i , a i , l i | x i ) Main problems: Ø Physical state of scenes is a union of unknown number of objects Ø Some objects have hierarchical structure 24
What can we do in short term: Reconstruct 3D scenes from Point Cloud data while learning generative models for shapes of objects, features, and priors on scenes ü Segmentation of separate objects (Applying DNNs to 3D shapes defined by point clouds) ü Be able to retrieve objects from some database by their features ü Use context as a prior distribution for scenes with multiple objects ü Give meaningful latent representations for objects without need for “human labels”, based only on shapes 25
What approaches researchers are using? PointNet Multi-view Stereo Machine Efficient Point Cloud Generation PointNet++ 26
PointNet module A way to learn features for a set Column-wise Maximum operation of points of variable size Feature representation of Point set Fully-Connected Deep Neural Networks Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." 2 arXiv preprint arXiv:1612.00593 (2016). 7
Point Cloud 3D reconstruction Architecture Prior on objects and Classification loss their relative positions (CrossEntropy) reconstruction Point Cloud Parser (3D CNNs, Graph NN - (RNN + PointNet) approximate renderer) Feature representation 28 3D Reconstruction loss
Point Cloud 3D reconstruction Architecture Sub-task #3 Sub-task #1 Prior on objects and Classification loss their relative positions (CrossEntropy) reconstruction Point Cloud Parser (3D CNNs, Graph NN - (RNN + PointNet) approximate renderer) Feature representation 29 Sub-task #2 3D Reconstruction loss
Sub-task #1: Point cloud segmentation and feature extraction Ø Point Clouds by their nature are variable size data - reduces problems with representation Ø PointNet modules - used to capture advantages of variable amount of data for different objects Ø Several ways to measure reconstruction quality from Point Cloud: Earth Mover's Distance Chamfer distance 30 Φ - is some bijection from one point cloud to another
Sub-task #2: Mesh reconstruction There exist several ways to do that: Ø Gr Graph ne neur ural ne networ orks - GNNs perform convolutions over nodes and edges ions - combining basis meshes with Ø De Deep Spectral l De Decomposit itio some weights and nonlinearities Volumetric Networks - is a kind of Spatial Transformer Ø Sp Spatial Vo Networks that iteratively populates an output shape Ø Fr Free-For Form De Deformatio ion Ne Network - additive deformation of a template mesh until the loss is converges 31 Requirement - generator have to generate a closed mesh, e.g. (|V| + |F| - |E| = 2)
Sub-task #3: Impose prior on feature representation Prior knowledge - helps solve ill-posed problem, makes learning and inference better Joint probability for sets of objects: - shape and Location of i-th object - с an be modeled by CRFs or RNNs 3 2
New available datasets of aligned 2D-to-3D data (mostly point clouds) 3 3
2D-3D-Semantics dataset Size: 766 Gigabyte! 34 http://buildingparser.stanford.edu/dataset.html
ScanNet http://www.scan-net.org/ Size: 1.3 Terabyte! 3 5
Summary Ø Reconstruction of meshes for wide variety of objects found indoors (2 Tb of data in total) Ø Objects have explicit hierarchical structure (parts comprise the objects) Ø Feature representation captures variable amount of information, can be applied for compression Ø All of the above trained in one architecture! 3 6
Conclusions Ø 3D/4D multimodal data processing Ø Probabilistic Inverse Graphics Ø Applications 3 7
Recommend
More recommend