for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - PowerPoint PPT Presentation

High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1

The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5 GAN [Goodfellow et al.]

The Success of Convolutional Networks Versatility Speech Recognition, Abdel-Hamid et al. Object Detection Semantic Segmentation Experience Machine Translation Efficiency 6

Examples of 3D Vision Tasks 3D Registration 3D Reconstruction 7 3D Object Pose Estimation 3D Object Tracking

3D Vision in Action Microsoft HoloLens Amazon AR View Nvidia Research, 2019 8

3D Reconstruction Supervised Reconstruction 3D Perception 3D Semantic Segmentation 3D Feature Learning Perception on a Set of 3D Data 4D Spatio-Temporal Perception 4D and 6D for Registration 15

● 3D-Recurrent Reconstruction Neural Networks, Chris , Danfei, JunYoung , Kevin, Silvio, ECCV’16 ● Universal Correspondence Networks, Chris , JunYoung , Silvio, Manmohan, NIPS’16 ● Weakly supervised 3D Reconstruction with Adversarial Constraint, JunYoung, Chris , Manmohan, Animehs , Silvio, 3DV’17 3D Reconstruction ● DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image, Andrey, Jingwei, Animesh, Viraj, JunYoung, Chris , Silvio, WACV’18 ● Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings, Kevin, Chris , Manolis , Angel, Thomas, Silvio, ACCV’18 ● 4D-Spatio Temporal ConvNets: Minkowski Convolutional Neural Networks, Chris , JunYoung , Silvio, CVPR’19 17

3D Reconstruction from Few Images ● Single or Multi-view images of an object ● Online retail stores Input Images 3D Reconstruction TODO 18

3D Reconstruction from Few Images ● Wide baseline ● Specular / texture-less region ● Single view 19

3D Reconstruction Observations (Images) Algorithms Depth Estimation Structure from Motion MVS Tomography Object-centric Reconstruction … [Eigen et al., Saxena et al., …] [Longuet-Higgins, Haming et al., Snavely et al., …] 3D Representation 20

3D Recurrent Reconstruction Neural Networks ● End-to-end 3D reconstruction ● Unified framework ● Single-view & Multi-view reconst. ● 3D-Convolutional LSTM ● Update hidden states Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 22

Update / maintain prediction Increasing confidence on armrests Number of images Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 30

Robustness to texture and # views Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 33

● SegCloud: Semantic Segmentation of 3D Point Clouds, Lyne, Chris , Iro, JunYoung Silvio, 3DV’17 ● 4D-Spatio Temporal ConvNets: Minkowski Convolutional Neural 3D Perception Networks, Chris , JunYoung, Silvio, CVPR’19 ● Fully Convolutional Geometric Features, Chris , Jaesik, Vladlen , ICCV’19 36

Sparsity of 3D data vs. O(N 3 ) volume O(N 2 ) surface 37

20cm voxel : 18% 38

10cm voxel : 9% 39

5cm voxel : 4.5% 40

2.5cm voxel : 1.8% 41

Sparse Representations and Convolution Conti tinuo nuous us Discre screte te Graph ph Repr presentat ntation on Representatio tation Repr presentat ntation on Graph Net Points and PointNet Occupancy Net OctNet and Octree [Kipf & Wellings] [Qi et al.] [Mescheder et al.] [Riegler et al.] Deep SDF Conv on Graph [Park et al.] [Defferrard et al.] …. Deep Level Sets [Michalkiewicz et al.] Hybrid rid Continuous Convolution Repr presentat ntation on Sparse Tensor • PointCNN …. [Graham et al., Choy et al.] • Monte Carlo Conv Contiuous + Graph • Surface / Tangent Conv …. 43

Sparse Matrix (0, 0) ● Majority of elements are 0 ● Efficient representation ● Non-zero elements only ● Compressed sparse row (CSR) ● List of lists ● COOrdinate list ● Etc. ● Example: 2x2 matrix COOrdinate (COO) representation ○ 4 at (0, 0) ○ 1 at (1, 1) ○ 45

Sparse Tensor (0, 0, 0) ● High-dimensional extension ● COOrdinate representation 4 at (0, 0, 0) ○ 1 at (1, 1, 0) ○ 9 at (1, 1, 1) ○ 46

Convolution on a Sparse Tensor Sparse Convolution Convolution Cannot support arbitrary sparsity Dense Tensor Kernel Static Sparsity Pattern [Graham et al., Submanifold Sparse ConvNet, 2017] 47 [Graham and Maaten, 3D Sparse ConvNet, 2018]

Generalized Convolution Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern [Graham et al.] [Choy et al.] 50 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Generalized Convolution Sparsity pattern manipulation Can support arbitrary sparsity Ex) C = A + B Ex) Pruning High-dimensional ConvNet Sparse Tensor Kernel Volume of dense convolution kernel: O(N D ) Sparse convolution kernel: O(D) Dynamic Sparsity Pattern Generative Tasks 51

Generalized Convolution: Special Cases Sparse Tensor Kernel Dynamic Sparsity Pattern • Octree Generative Networks • Dilated Convolution • Separable Convolution Arbitrary sparsity • Sparse Convolution • Dense Convolution 52

Minkowski Engine A convolutional neural network library for sparse tensors ● Convolution ● [Max/Avg/Global] Pool ● Broadcast ● [Batch/Instance] Normalization ● Tensor arithmetic ● Pruning ● … 60 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Minkowski Network ● Very deep convolutional neural networks possible in 3D 42-layer deep neural networks for semantic segmentation ○ 101 layers for classification ○ ● Reuse network architectures from years of research in 2D 4D MinkNet18 ResNet18 61 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Minkowski Engine for other applications 62 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Sparsity Pattern Reconstruction 65 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

3D Perception: Semantic Segmentation ● Partition 3D scans or data into semantic parts ● Label each voxel or 3D point as one of semantic labels 66

3D Semantic Segmentation on Sparse Tensors ● Sparse tensors for all input/output feature maps ● U-shaped network Hierarchical map ○ Increases receptive field size exponentially ○ 67 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Results: ScanNet 70 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

Results: Stanford 3D 72 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

● Universal Correspondence Network, Chris , JunYoung, Silvio, Manmohan, NIPS’16 ● Fully Convolutional Geometric Features, Chris , Jaesik, Vladlen , ICCV’19 3D Feature Learning 75

3D Geometric Feature ● A vector representation of the local / global 3D geometry Correspondence, registration, tracking, scene flow, ... ○ 76

Prior works in 3D Geometric Features Learn arned d Feature tures Hand-de designe gned d Feature ures 3DMatch, CGF, PointNet, PPF, FoldNet, Spin Image, USC, SHOT, PFH, FPFH PPFFold, CapsuleNet, DirectReg, SmoothNet ● Extract a small 3D patch Limits context, receptive field ○ Features extracted separately ○ ● Preprocessing Normal, Signed Distance Function, curvatures ○ 77 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res , ICCV’19

Fully Convolutional Metric Learning ● No preprocessing, no patch extraction no receptive field limit by crop size ○ Efficient reuse of shared computation ○ ● Hardest Negative Mining Choy et al., Univers ersal l Corres responde dence e Network rk , NIPS’16 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res , ICCV’19 80

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - PowerPoint PPT Presentation

High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Overview n Perception for robotics Page 1 Overview n Perception for robotics Overview

Intro to Perception Dr. Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Spring

Perception of Affordances Perception of Affordances Final Status of Work Final Status of Work

An Estimating System For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION

Doing Business in Brazil Perception vs Reality Perception x Reality March, 2019 CONTENT

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits Nearey & Smits: Perception of

Human Perception and Memory Semester 2, 2009 1 Vision Human Visual Perception Humans are

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

Psychology 101 Coupling between action and perception Action for perception Action

Chapter 6: Space & Depth Perception Lec 12 Jonathan Pillow, Sensation & Perception (PSY

Intro to Perception Instructor: Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325)

Perception, Planning and Control F1/10 th Racing Perception, planning and control Localization

Learning may work Matthieu R. Bloch 1. A dataset D { ( x 1 , y 1 ) , , ( x N , y N )

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Scientific Visualization Dr. Ronald Peikert SciVis 2008 - Introduction Spring 2008 Ronald

Method Combinators Conclusion Perfs Alt. MCs CGFs Combinators SBCL e ELS 2018 E Introduction

Symmetry in Shapes Theory and Practice Niloy Mitra Maksim Ovsjanikov Mark Pauly

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: Rountev, Sharp, Xu, 2008

Complexity and Character of Human Languages Chomsky Hierarchy Informatics 2A: Lecture 21 The

Uniform Interpolation Part II: An Algebraic Framework George Metcalfe Mathematical Institute

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - PowerPoint PPT Presentation

High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Overview n Perception for robotics Page 1 Overview n Perception for robotics Overview

Intro to Perception Dr. Jonathan Pillow Sensation &amp; Perception (PSY 345 / NEU 325) Spring

Perception of Affordances Perception of Affordances Final Status of Work Final Status of Work

An Estimating System For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION

Doing Business in Brazil Perception vs Reality Perception x Reality March, 2019 CONTENT

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits Nearey &amp; Smits: Perception of

Human Perception and Memory Semester 2, 2009 1 Vision Human Visual Perception Humans are

Speech Generation and Perception 1 Speech Generation and Perception : The study of the

Psychology 101 Coupling between action and perception Action for perception Action

Chapter 6: Space &amp; Depth Perception Lec 12 Jonathan Pillow, Sensation &amp; Perception (PSY

Intro to Perception Instructor: Jonathan Pillow Sensation &amp; Perception (PSY 345 / NEU 325)

Perception, Planning and Control F1/10 th Racing Perception, planning and control Localization

Learning may work Matthieu R. Bloch 1. A dataset D { ( x 1 , y 1 ) , , ( x N , y N )

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Scientific Visualization Dr. Ronald Peikert SciVis 2008 - Introduction Spring 2008 Ronald

Method Combinators Conclusion Perfs Alt. MCs CGFs Combinators SBCL e ELS 2018 E Introduction

Symmetry in Shapes Theory and Practice Niloy Mitra Maksim Ovsjanikov Mark Pauly

Modular Dataflow Analysis Aivar Annamaa Feb. 23 rd , 2010 Based on: Rountev, Sharp, Xu, 2008

Complexity and Character of Human Languages Chomsky Hierarchy Informatics 2A: Lecture 21 The

Uniform Interpolation Part II: An Algebraic Framework George Metcalfe Mathematical Institute

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Intro to Perception Dr. Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325) Spring

An Estimating System For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION

4aSC43 Patterns in the perception of VC(C)V Nearey and Smits Nearey & Smits: Perception of

Chapter 6: Space & Depth Perception Lec 12 Jonathan Pillow, Sensation & Perception (PSY

Intro to Perception Instructor: Jonathan Pillow Sensation & Perception (PSY 345 / NEU 325)