Real-Time Human Pose Recognition in Parts from Single Depth Images - - PowerPoint PPT Presentation

real time human pose recognition in parts from single
SMART_READER_LITE
LIVE PREVIEW

Real-Time Human Pose Recognition in Parts from Single Depth Images - - PowerPoint PPT Presentation

Real-Time Human Pose Recognition in Parts from Single Depth Images Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011 PRESENTER: AHSAN ABDULLAH PROBLEM APPROACH


slide-1
SLIDE 1

Real-Time Human Pose Recognition in Parts from Single Depth Images

Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, Andrew Blake CVPR 2011

PRESENTER: AHSAN ABDULLAH

slide-2
SLIDE 2

PROBLEM

slide-3
SLIDE 3

right elbow right hand left shoulder neck

APPROACH

  • Partitioning into body parts helps localizing the joints

Shotton et. al. CVPR 2011

slide-4
SLIDE 4

infer body parts per pixel cluster pixels to hypothesize body joint positions capture depth image & remove bg fit model & track skeleton

PIPELINE

Shotton et. al. CVPR 2011

Design Goals

  • Efficiency
  • Robustness
slide-5
SLIDE 5

 Compute P(ci|wi)

  • pixels i = (x, y)
  • body part ci
  • image window wi

 Discriminative approach

  • learn classifier P(ci|wi) from training data

image windows move with classifier

BODY PART CLASSIFICATION

Shotton et. al. CVPR 2011

slide-6
SLIDE 6

LEARNING DATA

synthetic (train & test) real (test)

Shotton et. al. CVPR 2011

slide-7
SLIDE 7

LEARNING – DATA SYNTHESIS

Record MoCap

500k frames distilled to 100k poses

Retarget to several models Render (depth, body parts) pairs

Shotton et. al. CVPR 2011

slide-8
SLIDE 8
  • Depth comparisons
  • very fast to compute

input depth image

x

Δ

x

Δ

x

Δ x Δ

x

Δ

x

Δ

𝑔 𝐽, x = 𝑒𝐽 x − 𝑒𝐽(x + Δ)

image depth image coordinate

  • ffset depth

feature response

Background pixels d = large constant scales inversely with depth Δ = 𝐰 𝑒𝐽 x

FEATURE SET

Shotton et. al. CVPR 2011

slide-9
SLIDE 9

 Aggregation of decision trees

DECISION FORESTS

slide-10
SLIDE 10

Qn = (I, x) f(I, x; Δn) > θn

no yes

c Pr(c)

body part c Pn(c)

c Pl(c)

Take (Δ, θ) that maximises information gain

n l r reduce entropy

[Breiman et al. 84]

for all pixels

Shotton et. al. CVPR 2011

TRAINING DECISION TREES

slide-11
SLIDE 11

image window centred at x

no

Toy example: Distinguish left (L) and right (R) sides of the body

no yes yes

L R P(c) L R P(c) L R P(c)

f(I, x; Δ1) > θ1 f(I, x; Δ2) > θ2

Shotton et. al. CVPR 2011

DECISION TREE CLASSIFICATION

slide-12
SLIDE 12

 Trained on different random subset of images

 “bagging” helps avoid over-fitting

 Average tree posteriors

[Amit & Geman 97] [Breiman 01] [Geurts et al. 06]

………

tree 1 tree T

c P1(c) c PT(c) (𝐽, x) (𝐽, x)

𝑄 𝑑 𝐽, x = 1 𝑈

𝑢=1 𝑈

𝑄𝑢(𝑑|𝐽, x)

Shotton et. al. CVPR 2011

DECISION FOREST CLASSIFIER

slide-13
SLIDE 13

ground truth

1 tree 3 trees 6 trees

inferred body parts (most likely)

40% 45% 50% 55% 1 2 3 4 5 6

Average per-class … Number of trees

Shotton et. al. CVPR 2011

NUMBER OF TREES

slide-14
SLIDE 14

30% 35% 40% 45% 50% 55% 60% 65% 8 12 16 20

Average per-class accuracy Depth of trees

30% 35% 40% 45% 50% 55% 60% 65% 5 15

Depth of trees synthetic test data real test data

Shotton et. al. CVPR 2011

TREE DEPTH

slide-15
SLIDE 15
  • Define 3D world space density
  • Mean shift for mode detection

Body parts to joint hypotheses

  • 3. hypothesize

body joints …

1 2 pixel index i bandwidth 3D coord

  • f i th pixel

3D coord pixel weight inferred probability depth at i th pixel Shotton et. al. CVPR 2011

slide-16
SLIDE 16

front view top view side view

input depth inferred body parts inferred joint positions

Shotton et. al. CVPR 2011

No tracking or smoothing

slide-17
SLIDE 17

front view top view side view

input depth inferred body parts inferred joint positions

Shotton et. al. CVPR 2011

No tracking or smoothing

slide-18
SLIDE 18

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Center Head Center Neck Left Shoulder Right… Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP

Average precision

Shotton et. al. CVPR 2011

JOINT PREDICTION ACCURACY

slide-19
SLIDE 19

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Center Head Center Neck Left Shoulder Right Shoulder Left Elbow Right Elbow Left Wrist Right Wrist Left Hand Right Hand Left Knee Right Knee Left Ankle Right Ankle Left Foot Right Foot Mean AP

Average precision

Joint prediction from ground truth body parts Joint prediction from inferred body parts Shotton et. al. CVPR 2011

JOINT PREDICTION ACCURACY

slide-20
SLIDE 20
  • No temporal information
  • frame-by-frame
  • Very fast
  • simple depth image feature
  • parallel decision forest classifier

Shotton et. al. CVPR 2011

ANALYSIS

slide-21
SLIDE 21

Uses…

  • 3D joint hypotheses
  • kinematic constraints
  • temporal coherence

… to give

  • full skeleton
  • higher accuracy
  • invisible joints
  • multi-player
  • 4. track skeleton

1 2 3

KINECT SYSTEM

slide-22
SLIDE 22
  • Frame-by-frame gives robustness
  • Body parts representation for efficiency
  • Fast, simple machine learning
  • Significant engineering to scale to a

massive, varied training data set

Shotton et. al. CVPR 2011

SUMMARY

slide-23
SLIDE 23

QUESTIONS