MIXED REALITY FUSION Sven Middelberg, Developer Technology Engineer - - PowerPoint PPT Presentation

mixed reality fusion
SMART_READER_LITE
LIVE PREVIEW

MIXED REALITY FUSION Sven Middelberg, Developer Technology Engineer - - PowerPoint PPT Presentation

MIXED REALITY FUSION Sven Middelberg, Developer Technology Engineer smiddelberg@nvidia.com VIRTUAL REALITY DEPTH FUSION 2 THE SETUP Intel Realsense D435 + Vive Tracker NVIDIA GP100 Vive 90 FPS 848x480 depth stream 3584 CUDA Cores 90 Htz


slide-1
SLIDE 1

Sven Middelberg, Developer Technology Engineer smiddelberg@nvidia.com

MIXED REALITY FUSION

slide-2
SLIDE 2

2

VIRTUAL REALITY DEPTH FUSION

slide-3
SLIDE 3

3

THE SETUP

Intel Realsense D435 + Vive Tracker 90 FPS 848x480 depth stream NVIDIA GP100 3584 CUDA Cores 16GB HBM2 Memory Vive 90 Htz Update Rate

slide-4
SLIDE 4

4

MIXED REALITY FUSION

slide-5
SLIDE 5

5

MIXED REALITY FUSION

slide-6
SLIDE 6

6

MIXED REALITY FUSION

Symbiosis of VR and depth fusion

TAKEAWAYS

How can we take advantage of the VR system to make depth fusion more robust? Which optimizations are necessary to simultaneously reconstruct a 90 fps depth stream and visualize it in stereo VR?

slide-7
SLIDE 7

7

AGENDA

DEPTH FUSION IN A NUTSHELL ROBUST MIXED REALITY FUSION CUDA IMPLEMENTATION & OPTIMIZATIONS

slide-8
SLIDE 8

8

DEPTH FUSION IN A NUTSHELL

slide-9
SLIDE 9

12

NEW FRAME Pose Estimation Raycasting Volumetric Fusion VOLUMETRIC RECONSTRUCTION POSE VERTEX & NORMAL MAP

slide-10
SLIDE 10

13

RECONSTRUCTION DATA STRUCTURE

Voxel Grid Vi Truncated Signed Distance Field Di Truncation Size μ

  • 3.2
  • 1.9
  • 1.2
  • 0.6
  • 0.4
  • 0.2
  • 2.7
  • 1.8
  • 0.9

0.5 0.8

  • 1.9
  • 1.6
  • 1.0

0.0 0.9 1.7

  • 1.0
  • 0.6
  • 0.5
  • 0.1

0.8 1.7

  • 0.3

0.4 0.5 0.8 1.2 2.0

  • 0.8
  • 0.4

0.1 0.5 1.1 1.9 0.0

slide-11
SLIDE 11

14

RECONSTRUCTION DATA STRUCTURE

Voxel Grid Vi Truncated Signed Distance Field Di Truncation Size μ

  • 3.2
  • 1.9
  • 1.2
  • 0.6
  • 0.4
  • 0.2
  • 2.7
  • 1.8
  • 0.9

0.5 0.8

  • 1.9
  • 1.6
  • 1.0

0.0 0.9 1.7

  • 1.0
  • 0.6
  • 0.5
  • 0.1

0.8 1.7

  • 0.3

0.4 0.5 0.8 1.2 2.0

  • 0.8
  • 0.4

0.1 0.5 1.1 1.9 0.0 μ = 1.0

slide-12
SLIDE 12

15

RECONSTRUCTION DATA STRUCTURE

Voxel Grid Vi Truncated Signed Distance Field Di Truncation Size μ Number of samples Ci 6 x 6 x 4 m³, 4mm voxel size 1500 x 1500 x 1000 voxel grid 2.25 billion voxels ⇒ 16.76 Gbyte

1.0 1.0 1.0

  • 0.6
  • 0.4
  • 0.2

1.0 1.0

  • 0.9

0.5 0.8 1.0 1.0 1.0 0.0 0.9 1.0 1.0

  • 0.6
  • 0.5
  • 0.1

0.8 1.0

  • 0.3

0.4 0.5 0.8 1.0 1.0

  • 0.8
  • 0.4

0.1 0.5 1.0 1.0 0.0 μ = 1.0

slide-13
SLIDE 13

16

SPARSE VOXEL HASHING

Voxel brick: N3 voxel cube Hash function h(bx, by, bz) that maps from brick space to hash bucket Hash entry references actual brick memory within preallocated brick atlas

Nießner et al., 2013

i-5 i-4 i i+2 i-1 i+3 i+1 i-2 i-3 i+4 3D Brick Position Overflow List Offset Brick Atlas Pointer

Hash Entry Brick Atlas

slide-14
SLIDE 14

17

VOLUMETRIC FUSION

slide-15
SLIDE 15

18

VOLUMETRIC FUSION

Di, Ci

slide-16
SLIDE 16

19

VOLUMETRIC FUSION

Project voxel onto image plane

Di, Ci

slide-17
SLIDE 17

20

VOLUMETRIC FUSION

Project voxel onto image plane Find nearest depth

Di, Ci

slide-18
SLIDE 18

21

VOLUMETRIC FUSION

Project voxel onto image plane Find nearest depth Compute TSDF sample

d Di, Ci

slide-19
SLIDE 19

22

VOLUMETRIC FUSION

Project voxel onto image plane Find nearest depth Compute TSDF sample Update Di, Ci: 𝐸𝑗 ← 𝐷𝑗 ∗ 𝐸𝑗 + 𝑒 𝐷𝑗 + 1 𝐷𝑗 ← min(𝐷𝑗 + 1, 𝐷𝑛𝑏𝑦)

d Di, Ci

slide-20
SLIDE 20

33

VERTEX & NORMAL MAP CONSTRUCTION

Two-stage raycasting 1st stage: March ray in steps of truncation region size μ 2nd stage: March voxel by voxel

Raycasting

μ

μ

  • μ
slide-21
SLIDE 21

34

VERTEX & NORMAL MAP CONSTRUCTION

Two-stage raycasting 1st stage: March ray in steps of truncation region size μ 2nd stage: March voxel by voxel V*: Ray position at zero-crossing N*: Gradient of TSDF at V*

Raycasting

μ

μ

  • μ

V* N*

slide-22
SLIDE 22

39

POSE ESTIMATION

Given: Depth Image I, Raycast pose P*, V*, N* Find pose P = (R|t) of I Construct depth pyramid Ij, 0 ≤ j < L Extract camera space vertices & normals Vj, Nj Iterative coarse-to-fine minimization of distance between Vj, Nj and V*, N* Initialize P with P*

V*, N* V2, N2 V0, N0 V1, N1

slide-23
SLIDE 23

44

POSE ESTIMATION

Find correspondences (𝑊

𝑙 𝑘, 𝑂𝑙 𝑘, 𝑊 𝑙 ∗, 𝑂𝑙 ∗)

Minimize sum of squared point plane distances for 𝑄∆: 𝐹 𝑄∆ = ෍

𝑙

𝑒𝑗𝑡𝑢(𝑄∆𝑄𝑊

𝑙 𝑘, 𝑊 𝑙 ∗, 𝑂𝑙 ∗)2

Update 𝑄 ← 𝑄∆𝑄

Point-Plane ICP

𝑊

𝑙 ∗

𝑂𝑙

𝑄∆𝑄𝑊

𝑙 𝑘

𝑒𝑗𝑡𝑢(𝑄∆𝑄𝑊

𝑙 𝑘, 𝑊 𝑙 ∗, 𝑂𝑙 ∗)

slide-24
SLIDE 24

48

POSE ESTIMATION

𝑇𝐹(3): Lie-group of poses (12 parameters) 𝑡𝑓(3): Lie-algebra (6 parameters) Minimal parameterization! Mapping between SE(3) and se(3): 𝑓𝑦𝑞 ∶ 𝑡𝑓 3 → 𝑇𝐹 3 𝑚𝑝𝑕 ∶ 𝑇𝐹 3 → 𝑡𝑓(3)

Lie-Algebraic Parameterization

Substitute 𝑄∆ = 𝑓𝑦𝑞(𝜀): 𝐹 𝜀 = ෍

𝑙

𝑒𝑗𝑡𝑢(𝑓𝑦𝑞(𝜀) 𝑄𝑊

𝑙 𝑘, 𝑊 𝑙 ∗, 𝑂𝑙 ∗)2

slide-25
SLIDE 25

49

NEW FRAME Pose Estimation Raycasting Volumetric Fusion VOLUMETRIC RECONSTRUCTION POSE VERTEX & NORMAL MAP

slide-26
SLIDE 26

50

NEW FRAME Pose Estimation Raycasting Volumetric Fusion VOLUMETRIC RECONSTRUCTION POSE VERTEX & NORMAL MAP

slide-27
SLIDE 27

51

ROBUST MIXED REALITY FUSION

slide-28
SLIDE 28

52

NEW FRAME Pose Estimation Raycasting Volumetric Fusion VOLUMETRIC RECONSTRUCTION VERTEX & NORMAL MAP TRACKED POSE DISPLAY POSE

slide-29
SLIDE 29

53

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION TRACKED POSE DISPLAY POSE Raycasting VERTEX & NORMAL MAP

slide-30
SLIDE 30

54

NEW FRAME Pose Estimation Raycasting Volumetric Fusion VOLUMETRIC RECONSTRUCTION TRACKED POSE DISPLAY POSE

slide-31
SLIDE 31

55

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION Stereo Raycast TRACKED POSE DISPLAY POSE Raycasting

slide-32
SLIDE 32

56

ROBUST MIXED REALITY FUSION

Cheap relocalization using VR-tracked pose 𝑄𝑊𝑆 Requires registration of reconstruction and VR coordinate systems Why not use 𝑄𝑊𝑆 directly?

slide-33
SLIDE 33

57

ROBUST MIXED REALITY FUSION

Using 𝑄𝑊𝑆 directly Regularized optimization of 𝑄𝑊𝑆

slide-34
SLIDE 34

59

REGISTRATION

Find transformation T, such that 𝑄𝑗 = 𝑈 ∗ 𝑄𝑗

𝑊𝑆

Lie-algebraic approach Lie-Algebraic Averaging for Globally Consistent Motion Estimation Govindu, CVPR‘2004

𝑄0 𝑄

1

𝑄3 𝑄3

𝑊𝑆

𝑄2

𝑊𝑆

𝑄0

𝑊𝑆

𝑄

1 𝑊𝑆

𝑄2 T

slide-35
SLIDE 35

61

RELOCALIZATION & REGULARIZATION

Initialization of pose estimation with 𝑄𝑊𝑆 Penalize distance of 𝑄𝑊𝑆 and exp 𝜀 𝑄: 𝑒𝑗𝑡𝑢 𝜀 = 𝑚𝑝𝑕 𝑓𝑦𝑞 𝜀 𝑄𝑄𝑊𝑆−1 ≈ 𝜀 + 𝑚𝑝𝑕(𝑄𝑄𝑊𝑆−1) 𝐹′ 𝜀 = 𝐹 𝜀 + 𝑒𝑗𝑡𝑢 𝜀 𝑈𝑇−1𝑒𝑗𝑡𝑢 𝜀 Low computational overhead 𝑦, 𝑧 ≈ 0: 𝑚𝑝𝑕 𝑓𝑦𝑞 𝑦 𝑓𝑦𝑞 𝑧 ≈ 𝑦 + 𝑧

slide-36
SLIDE 36

62

CUDA IMPLEMENTATION & OPTIMIZATIONS

slide-37
SLIDE 37

63

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION Stereo Raycast TRACKED POSE DISPLAY POSE Raycasting

slide-38
SLIDE 38

64

System Setup Regularization & Solving

𝑄𝑊𝑆 𝑄 𝑄

Vj Nj V* N*

POSE ESTIMATION

System Setup - GPU Find correspondences Setup linear system σ 𝐾𝑙

𝑈𝐾𝑙𝜀 = − σ 𝐾𝑙𝑠 𝑙

Regularization & Solving - CPU Add regularization term to system Solve for 𝜀 Update 𝑄 ← 𝑓𝑦𝑞 𝜀 𝑄

slide-39
SLIDE 39

73

POSE ESTIMATION

Baseline

__global__ void setupSystem(float* gSys, ...) { int x = getPixX(); int y = getPixY(); if (findCorrespondence(x, y, ...)) { float lSys[27]; computeLocalSystem(lSys, ...); #pragma unroll for (int i=0; i<27; ++i) { atomicAdd(gSys+i, lSys[i]); } } } void solve(float* gSys, ...) { float hostSys[27]; cudaMemcpyAsync(hostSys, gSys, ...); cudaStreamSynchronize(stream); cudaMemsetAsync(gSys, 0, ...); addRegularization(hostSys, ...); float delta[6]; solve(hostSys, delta); pose = exp(delta)*pose; }

slide-40
SLIDE 40

74

POSE ESTIMATION

Baseline

19.5 26.5 36.4 BASELINE

slide-41
SLIDE 41

81

POSE ESTIMATION

Warp-Aggregated Atomics

__global__ void setupSystem(float* gSys, ...) { int x = getPixX(); int y = getPixY(); float lSys[27]; initZero(lSys); if (findCorrespondence(x, y, ...)) { computeLocalSystem(lSys, ...); } int lane = getLane(); warpReduceSystem(lSys, lane); if (lane < 27) { atomicAdd(gSys+lane, lSys[0]); } } __device__ __forceinline__ void warpReduceSystem(float* lSys, int lane) { #pragma unroll for (int i=0; i<27; ++i) { warpReduce(lSys[i], lane); if (lane == i) lSys[0] = lSys[i]; } }

slide-42
SLIDE 42

82

POSE ESTIMATION

Warp-Aggregated Atomics

19.5 3.7 26.5 3 36.4 3.6 BASELINE WARP-AGGREGATED ATOMICS

SPEEDUP 8.0

slide-43
SLIDE 43

83

POSE ESTIMATION

Minimizing CPU Overhead

126.5 μs

slide-44
SLIDE 44

84

POSE ESTIMATION

Minimizing CPU Overhead

79.5 μs Launch of next system setup kernel

slide-45
SLIDE 45

85

POSE ESTIMATION

Regularize & solve on GPU ✓ Removes CPU ↔ GPU synchronization ✓ Removes CPU ↔ GPU copies ✓ Keeps GPU busy

Minimizing CPU Overhead

__constant__ SE3 cPose; void trackingStep(float* gSys, SE3* gPose, ...) { cudaMemcpyToSymbolAsync(cPose, gPose, ...); setupSystem<<<...>>>(gSys, ...); solveAndUpdate<<<...>>>(gSys, gPose); }

slide-46
SLIDE 46

86

POSE ESTIMATION

Minimizing CPU Overhead

43.3 μs

slide-47
SLIDE 47

87

POSE ESTIMATION

Minimizing CPU Overhead

19.5 3.7 3 26.5 3 2.4 36.4 3.6 3 BASELINE WARP-AGGREGATED ATOMICS GPU SOLVER

SPEEDUP 8.0 SPEEDUP 9.7

slide-48
SLIDE 48

88

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION Stereo Raycast TRACKED POSE DISPLAY POSE Raycasting

slide-49
SLIDE 49

89

DATASTRUCTURE

Three raycasts per frame Vast number of trilinear interpolations → Brick atlas in texture memory Problem: Interpolation at brick boundaries

Reconstruction Space Texture Atlas

slide-50
SLIDE 50

90

DATASTRUCTURE

Three raycasts per frame Vast number of trilinear interpolations → Brick atlas in texture memory Problem: Interpolation at brick boundaries Solution: Apron voxels

Reconstruction Space Texture Atlas with Apron Voxels

slide-51
SLIDE 51

91

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION Stereo Raycast TRACKED POSE DISPLAY POSE Raycasting

slide-52
SLIDE 52

92

VOLUMETRIC FUSION

Brick Allocation Reserve texture atlas memory Brick Selection Select bricks that are within the new frame‘s camera frustum Brick Update Fuse selected brick‘s TSDFs and weights with new frame Apron Update Propate boundary voxels of updated bricks to neighbor brick‘s aprons

Brick Allocation Brick Update Brick Selection Apron Update New Frame & Pose

slide-53
SLIDE 53

93

VOLUMETRIC FUSION

Update horizontally

slide-54
SLIDE 54

94

VOLUMETRIC FUSION

Update horizontally Update vertically

slide-55
SLIDE 55

95

VOLUMETRIC FUSION

Update horizontally Update vertically

slide-56
SLIDE 56

96

VOLUMETRIC FUSION

Baseline

3.4 7.2 8.1 BASELINE

slide-57
SLIDE 57

97

VOLUMETRIC FUSION

8³ bricks ⇒ 600 updates for 488 apron voxels ⇒ 23% overhead

Update horizontally Update vertically

slide-58
SLIDE 58

98

VOLUMETRIC FUSION

Works only for 8³ bricks Requires at least 488 threads / block Classify apron voxels into three types, each warp updates a single type

Low Overhead Apron Update

Face Aprons 64 threads / face Updated by warps 1-12 Edge Aprons 8 threads / edge Updated by warps 13-15 Corner Aprons 1 thread / corner Updated by warp 16

slide-59
SLIDE 59

99

VOLUMETRIC FUSION

Low Overhead Apron Update

3.4 2.5 7.2 4.8 8.2 5.4 BASELINE LOW OVERHEAD APRON UPDATE

SPEEDUP 1.5

slide-60
SLIDE 60

106

VOLUMETRIC FUSION

Hiding Latency

▪ Kernel latency-bound ▪ Selected blocks written once, loaded twice from global memory ▪ Boundary voxel‘s TSDF loaded twice from atlas

__global__ void updateAprons(Brick* sel, int nSel) { for (int i=0; i<nSel; ++i) { Brick b = loadSelected(sel, i); // Latency Brick nb = loadNeighbor(b, ...); float tsdf = loadSrcVoxel(b, ...); // Latency writeApron(nb, tsdf, ...); } }

slide-61
SLIDE 61

113

VOLUMETRIC FUSION

Combined Selection / Update

✓ Much of the latency hidden ✓ Selected blocks not written to, nor read from global memory ✓ Boundary voxel‘s TSDs loaded

  • nly once

__global__ void combinedSelectAndUpdate(...) { __shared__ Brick sel[MAX_SEL_PER_BLOCK]; __shared__ int nSel; __shared__ float brickTsdfs[512]; select(sel, nSel); for (int i=0; i<nSel; ++i) { Brick b = sel[i]; Brick nb = loadNeighbor(b, ...); brickTsdfs[tid()] = updateVoxel(b, ...); __syncthreads(); float apronTsdf = getApronTsdf(brickTsdfs, tid()); writeApron(nb, apronTsdf, ...); } }

slide-62
SLIDE 62

114

VOLUMETRIC FUSION

Combined Selection / Update

3.4 2.5 1.4 7.2 4.8 2.4 8.2 5.4 2.7 BASELINE LOW OVERHEAD APRON UPDATE COMBINED SELECT & UPDATE

SPEEDUP 1.5 SPEEDUP 2.9

slide-63
SLIDE 63

115

NEW FRAME Pose Estimation Volumetric Fusion VOLUMETRIC RECONSTRUCTION Stereo Raycast TRACKED POSE DISPLAY POSE Raycasting

slide-64
SLIDE 64

116

RAYCASTING

One raycast for the next frame (848x480) Two raycasts for the stereo HMD (540x600) Two-stage approach Full use of GPU texture units

μ

μ

  • μ
slide-65
SLIDE 65

117

RAYCASTING

Baseline

2.3 4.4 4.1 BASELINE

slide-66
SLIDE 66

119

ACCELERATION STRUCTURE

Fast leaping over empty space Define supergrid of octrees Efficient update and query?

Empty-Space Skipping

Oi,j Oi,j-1 Oi+1,j-1 Oi-1,j-1 Oi,j+1 Oi+1,j+1 Oi-1,j+1 Oi-1,j Oi-1,j

slide-67
SLIDE 67

120

ACCELERATION STRUCTURE

Binary Encoded 3-Level Occupancy Octree

slide-68
SLIDE 68

121

ACCELERATION STRUCTURE

Binary Encoded 3-Level Occupancy Octree

Level 2 Cell

slide-69
SLIDE 69

122

ACCELERATION STRUCTURE

Binary Encoded 3-Level Occupancy Octree

Level 1 Cells

slide-70
SLIDE 70

123

ACCELERATION STRUCTURE

Binary Encoded 3-Level Occupancy Octree

Level 0 Cell

slide-71
SLIDE 71

124

1 2 5 6 3 4 7 8 9 10 13 14 12 15 16 11

ACCELERATION STRUCTURE

Binary Encoded 3-Level Occupancy Octree

64-bit descriptor d Bit i is set if level 0 cell i is occupied Bits ordered corresponding to the octree hierarchy Level 2: Check if d is zero Level 1: Check if corresponding byte of d is zero Level 0: Check if corresponding bit of d is zero Update: atomicOr during brick allocation

slide-72
SLIDE 72

130

RAYCASTING

Binary Encoded Occupancy Octree

2.3 1.8 4.4 3.4 4.1 3.5 BASELINE BINARY ENCODED OCCUPANCY OCTREE

SPEEDUP 1.3

slide-73
SLIDE 73

132

OVERALL TIMINGS

6.2 10.2 7.9 12.2 9.1 17 AVG TIME MAX TIME

100% 97.2% 78.6%

slide-74
SLIDE 74

133

NEXT STEPS

RGB data for reconstruction & texturing

COLOR INTEGRATION

2nd GPU to perform on-the-fly optimization

  • f reconstruction

MULTI-GPU OPTIMIZATION

VR controller for interactive reconstruction

VR INTERACTION

Filtering & Optimization

DEEP LEARNING

slide-75
SLIDE 75

Sven Middelberg, Developer Technology Engineer smiddelberg@nvidia.com

THANK YOU!