PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES Ben Eckart, NVIDIA - - PowerPoint PPT Presentation

processing with
SMART_READER_LITE
LIVE PREVIEW

PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES Ben Eckart, NVIDIA - - PowerPoint PPT Presentation

GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019 2 3D POINT CLOUD DATA Basic data type for unstructured 3D data Emergence of commercial depth


slide-1
SLIDE 1

Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019

GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES

slide-2
SLIDE 2

2

3D POINT CLOUD DATA

Basic data type for unstructured 3D data Emergence of commercial depth sensors has made it ubiquitous

2

slide-3
SLIDE 3

3

POINT CLOUD PROCESSING CHALLENGES

Points are non-differentiable, non-probabilistic Large amounts of often noisy data Often spatially redundant, wide ranging density variance

slide-4
SLIDE 4

4

PREVIOUS APPROACHES

What have people done before?

Discrete Approaches Voxel Grids/Lists, Octrees, TSDFs Though efficient, they inherit the same non- differentiable, non-probabilistic problems as point clouds

OctoMap

slide-5
SLIDE 5

5

PREVIOUS APPROACHES

What have people done before?

Continuous Approaches Gaussian Mixture Models, Gaussian Processes Though theoretically attractive, in practice tend to be too slow for many applications GMM Gaussian Process

slide-6
SLIDE 6

Proposal: Hierarchical Gaussian Mixture

GMM J=8 GMM J=8 GMM J=8 GMM J=8 GMM J=8 GMM J=8

GMM J=8

GMM J=8 GMM J=8

GMM J=8

GMM J=8

GMM J=8 GMM J=8 GMM J=8 GMM J=8 “Level 2” GMM “Level 3” “Level 4”

Goals: Efficiency benefits of hierarchical structures like Octree Theoretical benefits of a probabilistic generative model

slide-7
SLIDE 7

Talk Overview

  • Background

– Theory of generative modeling for point clouds

  • Single-Layer Model (GMMs)

– GPU-Accelerated Construction Algorithm – Benefits: Compact and Data-Parallel – Limitations: Scaling with model size, lack of memory coherence

  • Hierarchical Models (HGMMs)

– GPU-Accelerated Construction Algorithm – Benefits: Fast and Parallelizable on GPU – Application: Registration

slide-8
SLIDE 8

8

STATISTICAL / GENERATIVE MODELS

Interpret point cloud data (PCD) as an iid sampling of some unknown latent spatial probabilistic function Generative property: Full joint probability space is represented

Model

slide-9
SLIDE 9
  • Given a set of parameters describing the model, find the

parameters that best “explain” the data (Maximum Data Likelihood)

Modeling as an MLE Optimization

Model Data

slide-10
SLIDE 10

Parametric Model as a Modified GMM

Interpret point cloud data as an iid sampling from a small number (J << N)

  • f Gaussian and Uniform Distributions:
slide-11
SLIDE 11

GMM for Point Clouds: Intuition

Point samples representing pieces of the same local geometry could be aggregated into clusters with the local geometry encoded inside the covariance of that cluster.

slide-12
SLIDE 12

12

SOLVING FOR THE MLE GMM PARAMETERS

Typically done via the Expectation Maximization (EM) Algorithm

E Step M𝚰 Step

Update 𝚰

𝚰𝒋𝒐𝒋𝒖

𝚰𝒈𝒋𝒐𝒃𝒎

Point Cloud

EM Algorithm

Update point-cluster associations

slide-13
SLIDE 13

E Step: A Single Point

Z

zi

For each point z, we want to find the relative likelihood (expectation) of it having been generated by each cluster

𝑃(𝑂)

slide-14
SLIDE 14

Z

We calculate the probability of each point with respect to each J Gaussian cluster. The expected associations are denoted by the NxJ matrix γ

zi 𝑃(𝐾)

E Step: Expectation Vector

slide-15
SLIDE 15

15

M STEP: CLOSED FORM WEIGHTED SUMS

For the GMM case, the M Step has closed form solutions given the NxJ matrix γ:

“Probabilistic generalization

  • f K-Means Clustering”
slide-16
SLIDE 16

GPU Data Parallelism

slide-17
SLIDE 17

GMM Model Limitations

  • Each point needs to access all J

cluster parameters in CUDA (poor memory locality and linear scaling with J)

  • NxJ expectation matrix mostly sparse

(thus wasted computation)

  • Static number of Gaussians that

must be set a priori

zi 𝑃(𝐾)

slide-18
SLIDE 18

18

HIERARCHICAL GAUSSIAN MIXTURE

Suppose we restrict J to be only 8 Gaussians The model would fit entirely in shared memory for each CUDA threadblock, removing need for global memory accesses The expectation matrix will be dense (Nx8)

18

slide-19
SLIDE 19

19

HIERARCHICAL GAUSSIAN MIXTURE

After convergence of the J=8 GMM, we can use the Nx8 expectation matrix as a partition function Each point is partitioned via its maximum expectation Now we have 8 partitions of roughly size N/8

19

slide-20
SLIDE 20

20

HIERARCHICAL GAUSSIAN MIXTURE

We can now run the algorithm recursively on each partition Each partition contains ~N/8 points that will be modeled as another J=8 GMM Note that this will produce 64 clusters in total

20

slide-21
SLIDE 21

21

PARALLEL PARTITIONING USING CUDA

Given each point's max expectation and associated cluster index, we can "invert" this index using parallel scans to group together point ID's having same partition #: [0 0 1 0 1 1 1 2 0 2 2 2] ➔ [[0 1 3 8] [2 4 5 6] [7 9 10 11]] Now we can run a 2D cuda kernel where Dimension 1: index into original point cloud Dimension 2: cluster of the parent e.g. 3 clusters, 12 points, 2 threads/threadblock ➔ grid size of (2, 3)

Cluster 1 Cluster 2 Cluster 3

slide-22
SLIDE 22

22

HGMM COMPLEXITY

Even though we now have 64 clusters, we only need to query 8 clusters for each point (avoiding the computation of all NxJ (sparse) expectations) Due to the 2D cuda grid and indexing structure, this segmentation of the points into 64 clusters is the exact same complexity/speed as the

  • riginal "simple" J=8 GMM.

Thus, we can keep increasing the complexity of the model eightfold while incurring only a linear time penalty

slide-23
SLIDE 23

23

HGMM ALGORITHM

Small EM algorithms (8 clusters at a time) are recursively performed on increasingly smaller partitions of the point cloud data E Step: Associate points to clusters M Step: Update mixture means, covariances, and weights Partition Step: Before each recursion step, new point partitions are determined by maximum likelihood point-cluster associations from last E Step

slide-24
SLIDE 24

24

HGMM DATA STRUCTURE

GMM J=8 GMM J=8 GMM J=8 GMM J=8 GMM J=8 GMM J=8

GMM J=8

GMM J=8 GMM J=8

GMM J=8

GMM J=8

GMM J=8 GMM J=8 GMM J=8 GMM J=8 “Level 2” GMM “Level 3” “Level 4”

Efficiency benefits of hierarchical structures like Octree Theoretical benefits of a probabilistic generative model

slide-25
SLIDE 25

25

E Step Performance

slide-26
SLIDE 26

Compactness vs Fidelity

slide-27
SLIDE 27

27

COMPACTNESS VS FIDELITY

Reconstruction Error (PSNR) vs Model Size (kB)

20 kB

slide-28
SLIDE 28

28

MODELING LARGE POINT CLOUDS

HGMM Level 6: <12 MB Volume created from stochastically sampled Marching Cubes Visualization is real-time: ~20 fps on Titan X Endeavor Snapshots: ~80 GB of Point Cloud Data each

slide-29
SLIDE 29

29

ENDEAVOR DATA: BILLIONS OF POINTS

slide-30
SLIDE 30

30

APPLICATION: RIGID REGISTRATION

Point-sampled surfaces displaced by some rigid transformation Recover translation, rotation that best overlaps point clouds

slide-31
SLIDE 31

31

Goal: Maximize data likelihood over T given some probability model θ MLE over Space of Rotations, Translations

Registration as EM with HGMM

slide-32
SLIDE 32

Outdoor Urban Velodyne Data

  • Velodyne VLP-16

– ~15k pts/frame – ~10 frames/sec

  • Frame-to-Frame

model-building and registration with

  • verlap estimation
slide-33
SLIDE 33

HGMM-Based Registration

  • Average Frame-

to-Frame Error: 0.0960

slide-34
SLIDE 34

Robust Point-to-Plane ICP

  • Average Frame-

to-Frame Error: 0.1519

  • best result on

libpointmatcher

slide-35
SLIDE 35

35

Speed vs Accuracy Trade-Off

Test: Random transformations of point cloud pairs while varying the subsampling rate. Less subsampling yields better accuracy, but slower speeds. Bottom left is fastest and most accurate. Our proposed methods are red/teal/black.

Our Proposed Methods

slide-36
SLIDE 36

36

HGMM COMING TO ISAAC

~350 fps on Titan Xp ~30 fps on Xavier Error: ~0.05° yaw (median, 4 Hz updates)

slide-37
SLIDE 37

37

DRIVEWORKS (Future Release) With Velodyne HDL-64E: ~300 FPS on Titan Xp ~30 FPS on Xavier

slide-38
SLIDE 38

38

DNN-BASED STEREO DEPTH MAPS

slide-39
SLIDE 39

39

FINAL REMARKS

HGMM’s have many nice properties for modeling point clouds:

Efficient: Fast to compute via CUDA/GPU, even scaling to billions of points Multi-Level: Can well-model the data distribution at multiple levels simultaneously Probabilistic: allows Bayesian optimization for applications like registration Compact and Continuous: no voxels and no aliasing artifacts, easy to transform

slide-40
SLIDE 40

40

QUESTIONS?

slide-41
SLIDE 41
slide-42
SLIDE 42

42

REGISTRATION FROM DNN-BASED STEREO

Noisy point cloud output is well-suited for HGMM representation

slide-43
SLIDE 43

43

Frame-to-frame registration from point cloud data only (no depth maps), subsampled to 2000 points, first 100 frames. Histograms of average Euler angle error per frame shown. GMM- Based ICP-Based Proposed

Stanford Lounge Dataset (Kinect)

slide-44
SLIDE 44

44

slide-45
SLIDE 45

Noise Handling

  • Test: Random (uniform)

noise injected at increasing amounts

  • Result: Mixture

component “stick” to areas of geometrically coherent, dense areas, disregarding areas of noise

slide-46
SLIDE 46

46

SAMPLING FOR PROBABILISTIC OCCUPANCY

Ƹ 𝑞 = 𝑀Σ𝑞 + 𝜈 ∀ 𝜈, Σ ∈ Θ

slide-47
SLIDE 47

47

MESHING UNDER NOISE

slide-48
SLIDE 48

48

ADAPTIVE MULTI- SCALE

slide-49
SLIDE 49

49

MULTI-SCALE MODELING

Multilevel cross-sections can be adaptively chosen for robustness

slide-50
SLIDE 50

50

E Step: Parallelized Tree Search

Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other suitable heuristics are possible.

Adaptive Thresholding Finds the Most Appropriate Scale to Associate Point Data to the Point Cloud Model

slide-51
SLIDE 51

51

The resulting form (1) is a weighted sum-of-squared Mahalanobis distances, further reduced to (2) by writing in terms of sufficient statistics M𝑘

{0,1}.

We seek the transformation that maximizes the expected joint log-likelihood of our data and latent associations wrt the posterior over our current association estimates.

M-Step: Mahalanobis Estimation

Lastly, covariance eigendecomposition produces an equivalent weighted point-to-plane distance measure (3), which we can solve efficiently with least squares.

(3 ) (1) (2)