Lifting from the Deep: Convolutional 3D Pose Estimation from a - - PowerPoint PPT Presentation

lifting from the deep convolutional 3d pose estimation
SMART_READER_LITE
LIVE PREVIEW

Lifting from the Deep: Convolutional 3D Pose Estimation from a - - PowerPoint PPT Presentation

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom Chris Russell Lourdes Agapito We introduce a novel approach to solve the problem of 3D human pose estimation from a single RGB image Input


slide-1
SLIDE 1

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image

Denis Tomè Chris Russell Lourdes Agapito

slide-2
SLIDE 2

We introduce a novel approach to solve the problem of 3D human pose estimation from a single RGB image

Input image Output 3D Pose

slide-3
SLIDE 3

Our method reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks.

slide-4
SLIDE 4

Our approach

  • First, we learn a probabilistic model of 3D human pose from

3D mocap data

  • We integrate this model within a novel end-to-end CNN

architecture for joint 2D and 3D human pose estimation

  • Our method achieves state-of-the-art results on the Human3.6M

dataset This model lifts 2D joint positions (landmarks) into 3D

2D landmarks 3D pose probabilistic 3D pose model

slide-5
SLIDE 5

Our approach

  • Next, we train a novel end-to-end multi-stage CNN for 2D

landmark estimation

Stage6 Stage2 Stage1

slide-6
SLIDE 6

Our approach

  • Next, we train a novel end-to-end multi-stage CNN for 2D

landmark estimation

  • Each stage includes a new layer based on our probabilistic 3D

pose model of human poses to enforce 3D pose constraints

Stage6 Stage2 Stage1

slide-7
SLIDE 7

Detailed architecture

slide-8
SLIDE 8

Feature extraction 2D Joint prediction

Convolutional layers

C

5 x 5

C

9 x 9

C

9 x 9

C

9 x 9

P

x 2

P

x 2

P

x 2

feature extraction

C

1 x 1

C

1 x 1

C

9 x 9

2D joint prediction

slide-9
SLIDE 9

Feature extraction 2D Joint prediction

For each landmark, a 2D belief map is generated This defines how confident the architecture is that a specific landmark occurs at any given pixel (u,v) of the input image

belief maps

slide-10
SLIDE 10

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

Our pre-learned probabilistic model lifts 2D landmarks into 3D and injects 3D pose information

slide-11
SLIDE 11

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

belief maps

The 3D pose is used to generate a new set of 2D belief maps

slide-12
SLIDE 12

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

belief maps

2D FUSION

Belief maps are fused together

slide-13
SLIDE 13

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

belief maps

2D FUSION

slide-14
SLIDE 14

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

2D FUSION

STAGE t=1 belief maps

slide-15
SLIDE 15

STAGE t=6 STAGE t=3

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

2D FUSION

STAGE t=1 belief maps

Shared feature extraction 2D Joint prediction Probabilistic 3D pose model

2D FUSION

STAGE t=2 belief maps belief maps

3D pose

The accuracy of the belief maps increases progressively through the stages The 2D belief maps from each stage are used as input to the next stage

slide-16
SLIDE 16

STAGE t=6 STAGE t=3

Feature extraction 2D Joint prediction

belief maps

Probabilistic 3D pose model 3D pose

2D FUSION

STAGE t=1 belief maps

Shared feature extraction 2D Joint prediction Probabilistic 3D pose model

2D FUSION

STAGE t=2 belief maps belief maps

3D pose

End-to-end learning

Probabilistic 3D pose model

Output 3D pose

Output 3D Pose

Final lifting

Output Belief maps

slide-17
SLIDE 17

Our approach achieves state-of-the-art results

  • n the Human3.6M dataset
slide-18
SLIDE 18

Example results on the Human3.6M dataset