RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor - - PowerPoint PPT Presentation

rgb d mapping using depth cameras for dense 3d modeling
SMART_READER_LITE
LIVE PREVIEW

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor - - PowerPoint PPT Presentation

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments Peter Henry 1 , Michael Krainin 1 , Evan Herbst 1 , Hao Du 3 , Marvin Cheng 1 , Xiaofeng Ren 2 , and Dieter Fox 1,2 1 University of Washington Computer Science &


slide-1
SLIDE 1

RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments

Peter Henry1, Michael Krainin1, Evan Herbst1, Hao Du3, Marvin Cheng1, Xiaofeng Ren2, and Dieter Fox1,2

1University of Washington

Computer Science & Engineering

2Intel Labs Seattle (now ISTC at UW) 3Google

1

slide-2
SLIDE 2

The Kinect

2

slide-3
SLIDE 3

PrimeSense Technology

3

Red Green Blue Depth

slide-4
SLIDE 4

RGB-D Data

4

slide-5
SLIDE 5

The Goal

Align the “frames” from a Kinect to create a single 3D map (or model) of the environment Like this…

5

slide-6
SLIDE 6

6

slide-7
SLIDE 7

Related Work

 SLAM

 [Davison et al, PAMI 2007] (monocular)  [Konolige et al, IJR 2010] (stereo)  [Pollefeys et al, IJCV 2007] (multi-view stereo)  [Borrmann et al, RAS 2008] (3D laser)  [May et al, JFR 2009] (ToF sensor)

 Loop Closure Detection

 [Nister et al, CVPR 2006]  [Paul et al, ICRA 2010]

 Photo collections

 [Snavely et al, SIGGRAPH 2006]  [Furukawa et al, ICCV 2009]

7

slide-8
SLIDE 8

System Overview

  • 1. Frame-to-frame alignment
  • 2. Global Optimization (Loop Closure)
  • 3. Map representation

8

slide-9
SLIDE 9

RANSAC

(Random Sample Consensus)

 Visual features (from image) in 3D (from depth)  Figure out how the camera moved by matching

these feature

9

slide-10
SLIDE 10

What is RANSAC?

 For each feature point, find the most similar descriptor

in the other frame

 Find largest set of consistent matches  Move the new frame to align these matches

10

slide-11
SLIDE 11

Alignment (RANSAC)

11

slide-12
SLIDE 12

RANSAC Details

 Feature Detector / Descriptor Options

 SIFT (SiftGPU)  SURF  FAST Detector / Calonder Descriptor  (All available in OpenCV)

 Matching:

 L2 descriptor distance  Either SIFT style matching or window matching

12

slide-13
SLIDE 13

RANSAC Failure Cases

13

 Low light  Lack of visual “texture” or features  Kinect still provides depth or “shape” information

slide-14
SLIDE 14

ICP (Iterative Closest Point)

 Iterative Closest Point (ICP) uses shape to align

frames

 Does not require the RGB image  Does need a good initial “guess”  Repeat the following two steps:

 For each point in cloud 1, find the closest point in

cloud 2

 Compute the transformation that best aligns this set

  • f corresponding pairs

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

ICP Failure Cases

16

 Not enough distinctive shape  Don’t have a close enough initial “guess”  Here the shape is basically a simple plane…

slide-17
SLIDE 17

Joint Optimization (RGBD-ICP)

17

slide-18
SLIDE 18

Optimal Transformation

18

slide-19
SLIDE 19

Optimal Transformation

19

SCARY MATH!?!?

slide-20
SLIDE 20

Two-Stage Alternative

20

slide-21
SLIDE 21

Loop Closure

 Sequential alignments accumulate error  Revisiting a previous location results in an

inconsistent map

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

Loop Closure Detection

 Detect by running RANSAC against previous frames  Pre-filter options (for efficiency):

 Only a subset of frames (keyframes)  Only keyframes with similar estimated 3D pose  Place recognition using vocabulary tree

 Post-filter (avoid false positives)

 Estimate maximum expected drift and reject

detections changing pose too greatly

23

slide-24
SLIDE 24

Loop Closure Correction (TORO)

 TORO [Grisetti 2007]:

 Constraints between camera locations in pose graph  Maximum likelihood global camera poses

24

slide-25
SLIDE 25

Loop Closure Correction (SBA)

 Minimize reprojection error of features

25

slide-26
SLIDE 26

Comparison (TORO)

26

slide-27
SLIDE 27

Comparison (SBA)

27

slide-28
SLIDE 28

A Second Comparison

28

TORO SBA

slide-29
SLIDE 29

29

slide-30
SLIDE 30

Resulting Map

30

slide-31
SLIDE 31

Map Representation: Surfels

 Surface Elements [Pfister 2000, Weise 2009, Krainin 2010]  Circular surface patches  Accumulate color / orientation / size information  Incremental, independent updates  Incorporate occlusion reasoning  750 million points reduced to 9 million surfels

31

slide-32
SLIDE 32

32

slide-33
SLIDE 33

Experiments

 Reprojection error is better for RANSAC:  Errors for variations of the algorithm:  Timing for variations of the algorithm:

33

slide-34
SLIDE 34

Experiments: Overlay 1

34

slide-35
SLIDE 35

Experiments: Overlay 2

35

slide-36
SLIDE 36

Application: Measurements

36

slide-37
SLIDE 37

Application: Quadrocopter

 Collaboration with Albert Huang, Abe Bacharach,

and Nicholas Roy from MIT

37

slide-38
SLIDE 38

38

slide-39
SLIDE 39

39

slide-40
SLIDE 40

40

slide-41
SLIDE 41

41

slide-42
SLIDE 42

Occupancy Map

42

slide-43
SLIDE 43

43

slide-44
SLIDE 44

44

slide-45
SLIDE 45

45

slide-46
SLIDE 46

46

slide-47
SLIDE 47

Application: Interactive Mapping

 Allow anyone to construct maps with a Kinect  Uses for these maps

 Localization  Measurements  Remodeling  Buy new furniture  Video game levels???

47

slide-48
SLIDE 48

48

slide-49
SLIDE 49

Conclusion

 Kinect-style depth cameras have recently become available as

consumer products

 RGB-D Mapping can generate rich 3D maps using these

cameras

 RGBD-ICP combines visual and shape information for robust

frame-to-frame alignment

 Global consistency achieved via loop closure detection and

  • ptimization (RANSAC, TORO, SBA)

 Surfels provide a compact map representation  ROS + OpenCV are powerful tools to enable these applications

49

slide-50
SLIDE 50

Open Questions

 Which are the best features to use?  How to find more loop closure constraints between

frames?

 What is the right representation (point clouds, surfels,

meshes, volumetric, geometric primitives, objects)?

 How to generate increasingly photorealistic maps?  Autonomous exploration for map completeness?  Can we use these rich 3D maps for semantic mapping?

50

slide-51
SLIDE 51

Links

 www.cs.washington.edu/robotics/projects/rgbd-3d-

mapping/

 www.ros.org  The following have nice ROS integration but also

work separately:  http://opencv.willowgarage.com/wiki/  http://www.pointclouds.org/

 peter@cs.washington.edu

51