[PPT] - DeepCap: Monocular Human Performance Capture Using Weak Supervision PowerPoint Presentation

SLIDE 1

Marc Habermann

DeepCap: Monocular Human Performance Capture Using Weak Supervision

Marc Habermann, Weipeng Xu, Michael Zollhoefer, Gerard Pons-Moll, and Christian Theobalt

SLIDE 2

Marc Habermann

DeepCap

2

Human performance capture from a monocular camera

SLIDE 3

Marc Habermann

Challenges § Monocular setting is inherently ambiguous § High-dimensional problem

– Pose and surface deformation

3

Source: https://www.fiylo.de/

SLIDE 4

Marc Habermann

Related Work § Capture using parametric models

4

Kanazawa et al. 2018 Xiang et al. 2018

Metaxas et al. 1993, Plaenkers et al. 2001, Sminchisescu et al. 2003, Sigal et al. 2004, Joo et al. 2018, Pavlakos et al. 2018, Kanazawa et al. 2019, Pavlakos et al. 2019, …

SLIDE 5

Marc Habermann

Related Work § Monocular template-free capture

5

Zheng et al. 2019 Saito et al. 2019

Huang et al. 2018, Varol et al. 2018, Natsume et al. 2019, …

SLIDE 6

Marc Habermann

Related Work § Template-based capture

6

Habermann et al. 2019 Xu et al. 2018

Carranza et al. 2003, Bray et al. 2006, Starck et al. 2007, De Aguiar et al. 2008, Brox et al. 2010, Cagniart et al. 2010, …

SLIDE 7

Marc Habermann

DeepCap

7

Learning based approach Pose + surface deformation Weak multi-view supervision

SLIDE 8

Marc Habermann

Personalized Character Model

8

Template mesh Embedded graph Skeleton

Fully automatic

SLIDE 9

Marc Habermann

Inference Time

9

SLIDE 10

Marc Habermann

Direct Supervision?

10

Difficult to obtain

Ground truth 3D pose Ground truth 3D surface

SLIDE 11

Marc Habermann

Weak Supervision

11

Multi-view 2D detections Multi-view foreground masks

Differentiable 3D to 2D modules

SLIDE 12

Marc Habermann

Training Data – Weak Multi View

12 Calibrated multi-view images 2D keypoints Foreground mask OpenPose (Cao et al. 2019) Color keying

SLIDE 13

Marc Habermann

Pipeline

13

SLIDE 14

Marc Habermann

PoseNet

14

Multi-view Sparse Keypoint Loss Kinematics Layer Global Alignment Layer Pose Net Pose Prior Loss Segmented Input Image Rotation ! Joint Angles " Root Relative Landmarks Global Landmarks # Joint Detections $%,'

PoseNet Root rotation ! ∈ ℝ* Joint angles " ∈ ℝ*

SLIDE 15

Marc Habermann

PoseNet

15

Multi-view Sparse Keypoint Loss Kinematics Layer Global Alignment Layer Pose Net Pose Prior Loss Segmented Input Image Rotation ! Joint Angles " Root Relative Landmarks Global Landmarks # Joint Detections $%,'

Skeletool pose Function +

' !, " : ℝ*- → ℝ* per landmark /

Camera and root relative 3D landmark positions #%0,' Kinematics Layer

SLIDE 16

Marc Habermann

PoseNet

16

Multi-view Sparse Keypoint Loss Kinematics Layer Global Alignment Layer Pose Net Pose Prior Loss Segmented Input Image Rotation ! Joint Angles " Root Relative Landmarks Global Landmarks # Joint Detections $%,'

Rigid transform for landmark #%0,'

Camera and root relative 3D space Global 3D space

#' = 2%0

3 #%0,' + 5

Inverse extrinsic rotation of the input camera 67 Global translation

SLIDE 17

Marc Habermann

PoseNet

17

Multi-view Sparse Keypoint Loss Kinematics Layer Global Alignment Layer Pose Net Pose Prior Loss Segmented Input Image Rotation ! Joint Angles " Root Relative Landmarks Global Landmarks # Joint Detections $%,'

Multi-view Sparse Keypoint Loss Projecting (9) 3D landmark #' into camera view 6 Comparing to 2D joint detection $%,'

;<= # = >

%

>

'

9% #' − $%,' @

@

SLIDE 18

Marc Habermann

DefNet

18

Deformation Layer Multi-view Non-rigid Silhouette Loss ARAP Loss Multi-view Sparse Keypoint Graph Loss Root Relative Landmarks Global Landmarks A Global Vertices B Root Relative Vertices Rotation C Translation D Foreground Masks Pose Net Segmented Input Image Rotation ! Joint Angles " Global Alignment Layer Joint Detections $%,' Def Net

DefNet Regresses embedded deformation* in canonical pose Per node E rotation angles C< and translation D<

*(Sumner et al. 2007, Sorkine et al. 2007)

SLIDE 19

Marc Habermann

19 Posed and deformed Landmarks A%0,' Vertices B%0,F Pose Deformation

Deformation Layer

Embedded deformation Dual Quaternion Skinning (Kavan et al. 2007)

DefNet

Deformation Layer Multi-view Non-rigid Silhouette Loss ARAP Loss Multi-view Sparse Keypoint Graph Loss Root Relative Landmarks Global Landmarks A Global Vertices B Root Relative Vertices Rotation C Translation D Foreground Masks Pose Net Segmented Input Image Rotation ! Joint Angles " Global Alignment Layer Joint Detections $%,' Def Net

SLIDE 20

Marc Habermann

20

Rigid transform for landmark G and vertex H

Camera and root relative 3D landmark A%0,' and vertex B%0,F Global 3D landmark A' and vertex BF

DefNet

Deformation Layer Multi-view Non-rigid Silhouette Loss ARAP Loss Multi-view Sparse Keypoint Graph Loss Root Relative Landmarks Global Landmarks A Global Vertices B Root Relative Vertices Rotation C Translation D Foreground Masks Pose Net Segmented Input Image Rotation ! Joint Angles " Global Alignment Layer Joint Detections $%,' Def Net

SLIDE 21

Marc Habermann

21

Multi-view Sparse Keypoint Graph Loss

;<=I # = >

%

>

'

9% A' − $%,' @

@

Global 3D landmark A'

DefNet

Deformation Layer Multi-view Non-rigid Silhouette Loss ARAP Loss Multi-view Sparse Keypoint Graph Loss Root Relative Landmarks Global Landmarks A Global Vertices B Root Relative Vertices Rotation C Translation D Foreground Masks Pose Net Segmented Input Image Rotation ! Joint Angles " Global Alignment Layer Joint Detections $%,' Def Net

SLIDE 22

Marc Habermann

22

Non-rigid Silhouette Loss

;JFK B = >

%

>

F∈LM

N% 9% OF

@ @

Distance transform image Set of boundary vertices for camera 6

DefNet

Deformation Layer Multi-view Non-rigid Silhouette Loss ARAP Loss Multi-view Sparse Keypoint Graph Loss Root Relative Landmarks Global Landmarks A Global Vertices B Root Relative Vertices Rotation C Translation D Foreground Masks Pose Net Segmented Input Image Rotation ! Joint Angles " Global Alignment Layer Joint Detections $%,' Def Net

SLIDE 23

Marc Habermann

Qualitative Evaluation

23

Overlay on input image Ours Habermann et al. 2019 Overlay on reference view

SLIDE 24

Marc Habermann

Qualitative Evaluation

24

Overlay on input image

Ours Zheng et al. 2019

3D view

Saito et al. 2019

SLIDE 25

Marc Habermann

25

Quantitative Evaluation

Method (on S4) Multi-view IoU* (in %) HMR (Kanazawa et al. 2018) 65.1 HMMR(Kanazawa et al. 2019) 63.79 LiveCap (Habermann et al. 2019) 59.96 Ours 82.53

Surface reconstruction accuracy

*IoU = Intersection over Union Person-specific Person-unspecific

SLIDE 26

Marc Habermann

More results

26

SLIDE 27

Marc Habermann

Thank you!

27

Weipeng Xu Michael Zollhoefer Gerard Pons-Moll Christian Theobalt Marc Habermann