Advanced Approaches to Object Recognition and 3D Model Construction - PowerPoint PPT Presentation

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data Evgeny Burnaev Skoltech, ADASE group Joint with Alexander Notchenko

Supervised Deep Learning data Type Supervision 2D Image classification, Class label, object detection box, detection segmentation segmentation contours Pose Estimation Structure of “skeleton” on image - Deep learning based methods - Feature based methods 2 - Human performance

But world is in 3D 3

Autonomous Vehicles 4

Autonomous Vehicles 5

Augmented (Mixed) Reality 6

Augmented (Mixed) Reality 7

Robotics in Human Environments 8

Commodity sensors to create 2.5D images In Intel el R Real ealSen ense S e Ser eries es Asus Xt As Xtion Pr Pro Mi Micr crosoft K t Kinect v ect v2 St Structure Se Sensor 9

3D Deep Learning is gaining popularity Go Google Scholar when searched for " 3D 3D " " De Deep Le Learni ning ng " returns: Workshops: year # articles ● Deep Learning for Robotic Vision Workshop 2012 410 CVPR 2017 ● Geometry Meets Deep Learning ECCV 2016 2013 627 ● 3D Deep Learning Workshop @ NIPS 2016 2014 1210 ● Large Scale 3D Data: Acquisition, Modelling 2015 2570 and Analysis CVPR 2016 ● 3D from a Single Image CVPR 2015 2016 5440 10

Representation of 3D data for Deep Learning Me Meth thod Pros (+) +) Co Cons (-) Ma Many 2D su sust stain su surface texture, Redundant representation, Re pr projections The There is a lot ot of of 2D DL L method hods vu vulnerable to optic illusi sions Vo Voxels si simple, can be sp sparse se, has s losin lo ing surface propertie ies volumetric properties vo Point Po Ca Can be sparse lo losin ing surface propertie ies Clo Cloud an and volumet etric c proper erties es 2.5D 2. 5D Ch Cheap measurement devic ices, self occlusi se sion of bodies s in a im images sense se ses s depth sc scene, a lot of Noise se in measureme me ments 1 1

Multi-view CNN for 3D Shape Recognition 12

3D ShapeNets: a Deep Representation for Volumetric Shapes 1 3

Previous work “Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval” Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev 14

3D Design Phase Designers spend about 60% % of their time • searching for the right information Massive and complex CAD models are usually Ma • derly archived in enterprises, which makes disorde di design reuse a difficult task al can significantly shorten the product lifecycles 3D 3D Mo Model el ret etriev eval 15

Sparsity of voxel representation 30 3 voxels is already enough to understand simple shape But with texture information it would be even easier Sparsity for all classes of ModelNet40 train dataset at voxel resolution 40 is only 5.5% 16

Shape Retrieval Sp Sparse 3D 3D CNN CNN Precomputed V plane - feature feature vector vector of plane of dataset. (V car , V person ,...) Query Cosine distance 17 Retrieved Items

Triplet loss Learning representation by minimizing triplet loss Triplet is a set ( a , p , n ), where a - anchor object p - positive object that is similar to anchor object n - negative object that is not similar to anchor object 𝜇 𝜀 # , 𝜀 % = max 0, 𝜈 + 𝜀 # − 𝜀 % , where is a margin parameter, and are distances between p 18 and a , n and a

Conclusions Amount of layers needed for shape recognition changes with • different resolutions (hierarchical nature of complex shapes) You don’t need a lot of details to discriminate classes in small • dataset (40 classes in ModelNet40) No point in 3D if you only want the surfaces of shapes, might • be beneficial for volumetric images like (MRI, CT, density maps) Getting accuracy over 91% is unreasonable because of label • noise 19

Typical representation: 3D shape as a dense Point Cloud 20 Robotics AR Autonomous Vehicles

What do they have in common? They require understanding the whole scene 21

Probabilistic Inverse Graphics Long term view 22

Conceptual / Physical Heterogeneous Data Physical Representation: “Semantic Conditional Domain projection Shape, Albedo, Location Structure with generators of parameters” objects approximate DR 3D to 2D Object and relationship Furniture params generator approximate DR Physical 3D to 2.5D dynamics Humans Noise generator Learnable Projection (3D Scanner output) ... ...

Probabilistic Inverse Graphics problem ü Given many heterogeneous images of scenes - I k , D k , G k ü Find distribution over disentangled features - P ( x 1 , x 2 , …, x n | I k , D k , G k ) ü Learning generative models along the way - P ( s i , a i , l i | x i ) Main problems: Ø Physical state of scenes is a union of unknown number of objects Ø Some objects have hierarchical structure 24

What can we do in short term: Reconstruct 3D scenes from Point Cloud data while learning generative models for shapes of objects, features, and priors on scenes ü Segmentation of separate objects (Applying DNNs to 3D shapes defined by point clouds) ü Be able to retrieve objects from some database by their features ü Use context as a prior distribution for scenes with multiple objects ü Give meaningful latent representations for objects without need for “human labels”, based only on shapes 25

What approaches researchers are using? PointNet Multi-view Stereo Machine Efficient Point Cloud Generation PointNet++ 26

PointNet module A way to learn features for a set Column-wise Maximum operation of points of variable size Feature representation of Point set Fully-Connected Deep Neural Networks Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." 2 arXiv preprint arXiv:1612.00593 (2016). 7

Point Cloud 3D reconstruction Architecture Prior on objects and Classification loss their relative positions (CrossEntropy) reconstruction Point Cloud Parser (3D CNNs, Graph NN - (RNN + PointNet) approximate renderer) Feature representation 28 3D Reconstruction loss

Point Cloud 3D reconstruction Architecture Sub-task #3 Sub-task #1 Prior on objects and Classification loss their relative positions (CrossEntropy) reconstruction Point Cloud Parser (3D CNNs, Graph NN - (RNN + PointNet) approximate renderer) Feature representation 29 Sub-task #2 3D Reconstruction loss

Sub-task #1: Point cloud segmentation and feature extraction Ø Point Clouds by their nature are variable size data - reduces problems with representation Ø PointNet modules - used to capture advantages of variable amount of data for different objects Ø Several ways to measure reconstruction quality from Point Cloud: Earth Mover's Distance Chamfer distance 30 Φ - is some bijection from one point cloud to another

Sub-task #2: Mesh reconstruction There exist several ways to do that: Ø Gr Graph ne neur ural ne networ orks - GNNs perform convolutions over nodes and edges ions - combining basis meshes with Ø De Deep Spectral l De Decomposit itio some weights and nonlinearities Volumetric Networks - is a kind of Spatial Transformer Ø Sp Spatial Vo Networks that iteratively populates an output shape Ø Fr Free-For Form De Deformatio ion Ne Network - additive deformation of a template mesh until the loss is converges 31 Requirement - generator have to generate a closed mesh, e.g. (|V| + |F| - |E| = 2)

Sub-task #3: Impose prior on feature representation Prior knowledge - helps solve ill-posed problem, makes learning and inference better Joint probability for sets of objects: - shape and Location of i-th object - с an be modeled by CRFs or RNNs 3 2

New available datasets of aligned 2D-to-3D data (mostly point clouds) 3 3

2D-3D-Semantics dataset Size: 766 Gigabyte! 34 http://buildingparser.stanford.edu/dataset.html

ScanNet http://www.scan-net.org/ Size: 1.3 Terabyte! 3 5

Summary Ø Reconstruction of meshes for wide variety of objects found indoors (2 Tb of data in total) Ø Objects have explicit hierarchical structure (parts comprise the objects) Ø Feature representation captures variable amount of information, can be applied for compression Ø All of the above trained in one architecture! 3 6

Conclusions Ø 3D/4D multimodal data processing Ø Probabilistic Inverse Graphics Ø Applications 3 7

Advanced Approaches to Object Recognition and 3D Model Construction - PowerPoint PPT Presentation

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data Evgeny Burnaev Skoltech, ADASE group Joint with Alexander Notchenko Supervised Deep Learning data Type Supervision 2D Image classification, Class

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D

Supervised object recognition, unsupervised object recognition then Perceptual organization Bill

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Overview Object Recognition Neurobiology of Vision Computational Object Recognition: Whats

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Object recognition and hierarchical computation Challenges in object recognition.

Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Outline Object-orientation and databases CS 235: Object-oriented model: ODL Object

2018 Departm ents St a n d a r d - Ba se d September I n st r u ct ion St u d e n t

Understanding the Implications of Recommender Systems on Our Views and Behaviors Gedas Adomavicius

PCP Client: Implementation Experiences Bruno Faria Felipe Miranda Costa Flvio Montenegro IETF

Evaluating and Improving the Usability of Mechanical Turk for Low-Income Workers in India

> in Selected Grid Infrastructures (2010) David Groep, Nikhef with graphics by many others

COMP30019 Graphics and Interaction Particle Systems Adrian Pearce Department of Computer Science

Detection and Segmentation CS60010: Deep Learning Abir Das IIT Kharagpur March 04 and 05, 2020

Object Detection Prof. Kuan-Ting Lai 2020/5/5 2 YOLO v2

Advanced Approaches to Object Recognition and 3D Model Construction - PowerPoint PPT Presentation

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data Evgeny Burnaev Skoltech, ADASE group Joint with Alexander Notchenko Supervised Deep Learning data Type Supervision 2D Image classification, Class

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Beyond Object Recognition in 2D Georgia Gkioxari Object Recognition in 2D The World is 3D

Supervised object recognition, unsupervised object recognition then Perceptual organization Bill

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

Overview Object Recognition Neurobiology of Vision Computational Object Recognition: Whats

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Object recognition and hierarchical computation Challenges in object recognition.

Selective Search for Object Recognition Uijlings et al. Schuyler Smith Overview

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

PLS Advanced Diffusion Model New Advanced Diffusion Model for Dopants in Silicon Advanced Dopant

Object Space Volume Rendering Object Space Volume Rendering Ronald Peikert SciVis 2010 - Object

Outline Object-orientation and databases CS 235: Object-oriented model: ODL Object

2018 Departm ents St a n d a r d - Ba se d September I n st r u ct ion St u d e n t

Understanding the Implications of Recommender Systems on Our Views and Behaviors Gedas Adomavicius

PCP Client: Implementation Experiences Bruno Faria Felipe Miranda Costa Flvio Montenegro IETF

Evaluating and Improving the Usability of Mechanical Turk for Low-Income Workers in India

&gt; in Selected Grid Infrastructures (2010) David Groep, Nikhef with graphics by many others

COMP30019 Graphics and Interaction Particle Systems Adrian Pearce Department of Computer Science

Detection and Segmentation CS60010: Deep Learning Abir Das IIT Kharagpur March 04 and 05, 2020

Object Detection Prof. Kuan-Ting Lai 2020/5/5 2 YOLO v2

> in Selected Grid Infrastructures (2010) David Groep, Nikhef with graphics by many others