Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 - - PowerPoint PPT Presentation

aron yu nov 2 2012
SMART_READER_LITE
LIVE PREVIEW

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 - - PowerPoint PPT Presentation

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al. Real-Time Human Pose Recognition in Parts from Single Depth Images Released: Nov 4, 2010 Color: 640 x 480@ 32 bits Depth: 640 x 480 @


slide-1
SLIDE 1

Aron Yu Nov 2, 2012

1

slide-2
SLIDE 2

2

Image Credit: Shotton et al. – Real-Time Human Pose Recognition in Parts from Single Depth Images

Depth Image Body Parts 3D Joint Est.

slide-3
SLIDE 3

 Released: Nov 4, 2010  Color: 640 x 480@ 32 bits  Depth: 640 x 480 @ 16bits  Frame Rate: 30/sec  Ideal Range: 1.2m ~ 3.5m  Operational Range: 0.7m ~ 6.0m  Tracking: Up to 6 people, including 2 active players  Method: 20-point joint tracking per player  Opened doors to new research (and games)!

3

Source: www.xbox.com/en-US/kinect

slide-4
SLIDE 4

4

Image Credit: www.gamerant.com

slide-5
SLIDE 5

 Windows SDK 1.5 & Toolkit 1.6

5

Demo Time!

slide-6
SLIDE 6

 Depth Comparison Feature

  • weak but efficient
  • offsets in pixel distance

6

Image Credit: Shotton et al. – Real-Time Human Pose Recognition in Parts from Single Depth Images

depth invariant

slide-7
SLIDE 7

 Randomly generate

splitting candidates at each node

 Partition training

pixels and check for entropy gain

 Repeat until gain is

minimal

7

1 2 3 6 7 4 9 5 8 10 11 12 13 14

Source: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

𝑔

𝜄1 𝐽, 𝑦 < 𝜐1

15

𝑔

𝜄1 𝐽, 𝑦 ≥ 𝜐1

Training Pixels 𝜚1:𝑂 = (𝜄, 𝜐)

slide-8
SLIDE 8

 Ensemble of random decision trees

  • final distributions are averaged

8

……

tree 𝑢1 tree 𝑢𝑈

Source: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

𝐽(𝑦) 𝐽(𝑦) category c 𝑄

1(𝑑)

category c 𝑄𝑈(𝑑)

slide-9
SLIDE 9

9

Image Credit: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

slide-10
SLIDE 10

10

Image Credit: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

slide-11
SLIDE 11

11

Image Credit: www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial

Forest of 50 Trees

slide-12
SLIDE 12

 B3DO dataset with objects

  • bounding box ground truth

 300~350 training images

  • 2000~3000 pixels per image

 Fixed and random features (uv pairs)

  • 4~16 fixed, 50~150 random

 TreeBagger function from Matlab

  • 16 trees, 80% of the samples used per tree
  • quad core computer w/ 16GB RAM

12

(synthetic & real depth data) (pixel-level ground truth) (350k~1M images) (2000 random features) (1000-core cluster)

slide-13
SLIDE 13

 Berkeley 3D Object Dataset

  • household object detection
  • 849 images (color, raw depth, smoothed)
  • 89 object classes

13

Source: Berkeley 3D Object Dataset (www.kinectdata.com)

Color Raw Depth Smoothed

slide-14
SLIDE 14

14

Source: Berkeley 3D Object Dataset (www.kinectdata.com)

8 categories

slide-15
SLIDE 15

 VOC format bounding box

  • create pixel-level ground truth
  • inevitable overlaps

15 bottle keyboard bowl monitor chair pillow cup sofa

slide-16
SLIDE 16

 VOC format bounding box

  • create pixel-level ground truth
  • inevitable overlaps

16 bottle keyboard bowl monitor chair pillow cup sofa

slide-17
SLIDE 17

 Random features

  • body parts are deformable, each with unique shapes
  • find the best from large samples of random features

 Fixed features

  • household objects are rigid with defined shapes
  • might be sufficient with few known features

17

slide-18
SLIDE 18

18

Not Normalized Normalized Color Image Depth Image

slide-19
SLIDE 19

19

Not Normalized Normalized Color Image Depth Image

slide-20
SLIDE 20

20

Not Normalized Normalized Color Image Depth Image

slide-21
SLIDE 21

21

Not Normalized Normalized Color Image Depth Image

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

100 Features 150 Features Ground Truth 50 Features

slide-24
SLIDE 24

24

100 Features 150 Features Ground Truth 50 Features

slide-25
SLIDE 25

25

100 Features 150 Features Ground Truth 50 Features

slide-26
SLIDE 26

26

8 Features 16 Features Ground Truth 4 Features

slide-27
SLIDE 27

27

8 Features 16 Features Ground Truth 4 Features

slide-28
SLIDE 28

28

8 Features 16 Features Ground Truth 4 Features

slide-29
SLIDE 29

29

40 Pixel Meters 60 Pixel Meters Ground Truth 10 Pixel Meters

slide-30
SLIDE 30

30

40 Pixel Meters 60 Pixel Meters Ground Truth 10 Pixel Meters

slide-31
SLIDE 31

31

40 Pixel Meters 60 Pixel Meters Ground Truth 10 Pixel Meters

slide-32
SLIDE 32

32

Not Normalized Normalized Ground Truth Ground Truth

slide-33
SLIDE 33

33

Not Normalized Normalized Ground Truth Ground Truth

slide-34
SLIDE 34

34

Not Normalized Normalized Ground Truth Ground Truth

slide-35
SLIDE 35

35

slide-36
SLIDE 36

[1] Microsoft Kinect SDK & Toolkit (www.microsoft.com/en-us/kinectforwindows/develop) [2] “Real-Time Human Pose Recognition in Parts from Single Depth Images”

  • J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A.

Kipman, A. Blake (CVPR 2011) [3] “Randomized Trees for Real-Time Keypoint Recognition” V.Lepetit, P. Lagger, P. Fua (CVPR 2005) [4] “Boosting & Randomized Forests for Visual Recognition” J. Shotton (www.iis.ee.ic.ac.uk/~tkkim/iccv09_tutorial) [5] “A Category-Level 3D Object Dataset: Putting the Kinect to Work” A. Janoch, S. Karayev, Y. Jia, J. Barron, M. Fritz, K. Saenko, T. Darrell (www.kinectdata.com)

36