Advanced Approaches to Object Recognition and 3D Model Construction - - PowerPoint PPT Presentation

advanced approaches to object recognition and 3d model
SMART_READER_LITE
LIVE PREVIEW

Advanced Approaches to Object Recognition and 3D Model Construction - - PowerPoint PPT Presentation

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data Evgeny Burnaev Skoltech, ADASE group Joint with Alexander Notchenko Supervised Deep Learning data Type Supervision 2D Image classification, Class


slide-1
SLIDE 1

Evgeny Burnaev

Skoltech, ADASE group

Advanced Approaches to Object Recognition and 3D Model Construction from Heterogeneous Data

Joint with Alexander Notchenko

slide-2
SLIDE 2

Supervised Deep Learning data

2

Type

Supervision

2D Image classification, detection segmentation Pose Estimation Structure of “skeleton” on image

Class label, object detection box, segmentation contours

  • Deep learning based methods
  • Feature based methods
  • Human performance
slide-3
SLIDE 3

But world is in 3D

3

slide-4
SLIDE 4

Autonomous Vehicles

4

slide-5
SLIDE 5

5

Autonomous Vehicles

slide-6
SLIDE 6

Augmented (Mixed) Reality

6

slide-7
SLIDE 7

Augmented (Mixed) Reality

7

slide-8
SLIDE 8

Robotics in Human Environments

8

slide-9
SLIDE 9

Commodity sensors to create 2.5D images

9

In Intel el R Real ealSen ense S e Ser eries es As Asus Xt Xtion Pr Pro Mi Micr crosoft K t Kinect v ect v2 St Structure Se Sensor

slide-10
SLIDE 10

3D Deep Learning is gaining popularity

10

Go Google Scholar when searched for "3D 3D" "De Deep Le Learni ning ng" returns:

year # articles 2012 410 2013 627 2014 1210 2015 2570 2016 5440

Workshops:

  • Deep Learning for Robotic Vision Workshop

CVPR 2017

  • Geometry Meets Deep Learning ECCV 2016
  • 3D Deep Learning Workshop @ NIPS 2016
  • Large Scale 3D Data: Acquisition, Modelling

and Analysis CVPR 2016

  • 3D from a Single Image CVPR 2015
slide-11
SLIDE 11

Representation of 3D data for Deep Learning

1 1

Me Meth thod Pros (+) +) Co Cons (-) Ma Many 2D pr projections su sust stain su surface texture, The There is a lot

  • t of
  • f 2D DL

L method hods Re Redundant representation, vu vulnerable to optic illusi sions Vo Voxels si simple, can be sp sparse se, has s vo volumetric properties lo losin ing surface propertie ies Po Point Clo Cloud Ca Can be sparse lo losin ing surface propertie ies an and volumet etric c proper erties es 2. 2.5D 5D im images Ch Cheap measurement devic ices, se sense ses s depth se self occlusi sion of bodies s in a sc scene, a lot of Noise se in me measureme ments

slide-12
SLIDE 12

Multi-view CNN for 3D Shape Recognition

12

slide-13
SLIDE 13

3D ShapeNets: a Deep Representation for Volumetric Shapes

1 3

slide-14
SLIDE 14

Previous work

14

“Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval”

Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev

slide-15
SLIDE 15

3D Design Phase

15

  • Designers spend about 60%

% of their time searching for the right information

  • Ma

Massive and complex CAD models are usually di disorde derly archived in enterprises, which makes design reuse a difficult task

3D 3D Mo Model el ret etriev eval al can significantly shorten the product lifecycles

slide-16
SLIDE 16

Sparsity of voxel representation

16

303 voxels is already enough to understand simple shape Sparsity for all classes of ModelNet40 train dataset at voxel resolution 40 is

  • nly 5.5%

But with texture information it would be even easier

slide-17
SLIDE 17

Shape Retrieval

17

Precomputed feature vector

  • f dataset.

(Vcar , Vperson ,...) Vplane - feature vector of plane Query Retrieved Items Cosine distance

Sp Sparse 3D 3D CNN CNN

slide-18
SLIDE 18

Triplet loss

18

Triplet is a set (a, p, n), where a - anchor object p - positive object that is similar to anchor object n - negative object that is not similar to anchor object where is a margin parameter, and are distances between p and a, n and a 𝜇 𝜀#, 𝜀% = max 0, 𝜈 + 𝜀# − 𝜀% , Learning representation by minimizing triplet loss

slide-19
SLIDE 19

Conclusions

19

  • Amount of layers needed for shape recognition changes with

different resolutions (hierarchical nature of complex shapes)

  • You don’t need a lot of details to discriminate classes in small

dataset (40 classes in ModelNet40)

  • No point in 3D if you only want the surfaces of shapes, might

be beneficial for volumetric images like (MRI, CT, density maps)

  • Getting accuracy over 91% is unreasonable because of label

noise

slide-20
SLIDE 20

Typical representation: 3D shape as a dense Point Cloud

20

Autonomous Vehicles AR Robotics

slide-21
SLIDE 21

What do they have in common?

21

They require understanding the whole scene

slide-22
SLIDE 22

22

Probabilistic Inverse Graphics

Long term view

slide-23
SLIDE 23

approximate DR 3D to 2D approximate DR 3D to 2.5D

Domain projection

Learnable Projection (3D Scanner output)

Conceptual / Physical

Heterogeneous Data

Object and relationship params

Physical dynamics Noise

Physical Representation: Shape, Albedo, Location Conditional generators of

  • bjects

Furniture generator Humans generator

...

“Semantic Structure with parameters”

...

slide-24
SLIDE 24

Probabilistic Inverse Graphics problem

24

ü Given many heterogeneous images of scenes ü Find distribution over disentangled features ü Learning generative models along the way

  • Ik, Dk, Gk
  • P(x1, x2, …, xn | Ik, Dk, Gk)
  • P(si, ai, li | xi)

Main problems:

Ø Physical state of scenes is a union of unknown number of objects Ø Some objects have hierarchical structure

slide-25
SLIDE 25

What can we do in short term:

25

Reconstruct 3D scenes from Point Cloud data while learning generative models for shapes of objects, features, and priors on scenes ü Segmentation of separate objects (Applying DNNs to 3D shapes defined by point clouds) ü Be able to retrieve objects from some database by their features ü Use context as a prior distribution for scenes with multiple objects ü Give meaningful latent representations for objects without need for “human labels”, based only on shapes

slide-26
SLIDE 26

What approaches researchers are using?

26

PointNet PointNet++ Multi-view Stereo Machine Efficient Point Cloud Generation

slide-27
SLIDE 27

PointNet module

2 7

A way to learn features for a set

  • f points of variable size

Qi, Charles R., et al. "Pointnet: Deep learning on point sets for 3d classification and segmentation." arXiv preprint arXiv:1612.00593 (2016).

Feature representation

  • f Point set

Fully-Connected Deep Neural Networks Column-wise Maximum operation

slide-28
SLIDE 28

Point Cloud 3D reconstruction Architecture

28

3D Reconstruction loss Point Cloud Parser (RNN + PointNet) reconstruction (3D CNNs, Graph NN - approximate renderer) Classification loss (CrossEntropy) Feature representation Prior on objects and their relative positions

slide-29
SLIDE 29

Point Cloud 3D reconstruction Architecture

29

3D Reconstruction loss Point Cloud Parser (RNN + PointNet) reconstruction (3D CNNs, Graph NN - approximate renderer) Classification loss (CrossEntropy) Feature representation Prior on objects and their relative positions

Sub-task #2 Sub-task #3 Sub-task #1

slide-30
SLIDE 30

Sub-task #1: Point cloud segmentation and feature extraction

30

Ø Point Clouds by their nature are variable size data - reduces problems with representation Ø PointNet modules - used to capture advantages of variable amount of data for different objects Ø Several ways to measure reconstruction quality from Point Cloud:

Φ - is some bijection from one point cloud to another

Earth Mover's Distance Chamfer distance

slide-31
SLIDE 31

Sub-task #2: Mesh reconstruction

31

Ø Gr Graph ne neur ural ne networ

  • rks - GNNs perform convolutions over nodes

and edges Ø De Deep Spectral l De Decomposit itio ions - combining basis meshes with some weights and nonlinearities Ø Sp Spatial Vo Volumetric Networks - is a kind of Spatial Transformer Networks that iteratively populates an output shape Ø Fr Free-For Form De Deformatio ion Ne Network - additive deformation of a template mesh until the loss is converges There exist several ways to do that:

Requirement - generator have to generate a closed mesh, e.g. (|V| + |F| - |E| = 2)

slide-32
SLIDE 32

Sub-task #3: Impose prior on feature representation

3 2

Prior knowledge - helps solve ill-posed problem, makes learning and inference better

  • shape and Location of i-th
  • bject

Joint probability for sets of objects:

  • сan be modeled by CRFs
  • r RNNs
slide-33
SLIDE 33

3 3

New available datasets of aligned 2D-to-3D data (mostly point clouds)

slide-34
SLIDE 34

2D-3D-Semantics dataset

34

Size: 766 Gigabyte! http://buildingparser.stanford.edu/dataset.html

slide-35
SLIDE 35

ScanNet

3 5

Size: 1.3 Terabyte!

http://www.scan-net.org/

slide-36
SLIDE 36

Summary

3 6

Ø Reconstruction of meshes for wide variety of

  • bjects found indoors (2 Tb of data in total)

Ø Objects have explicit hierarchical structure (parts comprise the objects) Ø Feature representation captures variable amount

  • f information, can be applied for compression

Ø All of the above trained in one architecture!

slide-37
SLIDE 37

Conclusions

3 7

Ø 3D/4D multimodal data processing Ø Probabilistic Inverse Graphics Ø Applications

slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45