Fields of Parts & Friends peter.gehler.net p i Detection + - - PowerPoint PPT Presentation

fields of parts friends
SMART_READER_LITE
LIVE PREVIEW

Fields of Parts & Friends peter.gehler.net p i Detection + - - PowerPoint PPT Presentation

Fields of Parts & Friends peter.gehler.net p i Detection + Geometry p i Human Pose Estimation or Predict Predict Observation Observation Bounding Boxes Joint Locations Human Pose Estimation F (1) top X Y top F (2) top , head . . .


slide-1
SLIDE 1

Fields of Parts & Friends

peter.gehler.net

slide-2
SLIDE 2
slide-3
SLIDE 3

Detection + Geometry

pi

slide-4
SLIDE 4

pi

slide-5
SLIDE 5

Human Pose Estimation

Observation Predict Bounding Boxes Observation Predict Joint Locations

  • r
slide-6
SLIDE 6

Human Pose Estimation

Observation Desired Output

  • P. Felzenszwalb, D. Huttenlocher, Pictorial Structures for Object Recognition International Journal of Computer Vision (IJCV), 2005

. . .

Ytop Yhead Ytorso Yrarm Yrhnd Yrleg Yrfoot Ylfoot Ylleg Ylarm Ylhnd

X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F (1) top F (2) top,head

p(y|I, w) ∝ X

p

ψ(yp, I; w) + X

p∼p0

ψ(yp, yp0; w)

slide-7
SLIDE 7

. . .

Ytop Yhead Ytorso Yrarm Yrhnd Yrleg Yrfoot Ylfoot Ylleg Ylarm Ylhnd

X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

F (1) top F (2) top,head

Pictorial Structures

[Johnson&Everingham, BMVC’10], [Yang&Ramanan, CVPR’11],[Eichner&Ferrari, ACCV’12], [Sapp et al., ECCV’10], [Tran&Forsyth, ECCV’10], [Wang et al., CVPR’11], [Agarwal&Triggs, PAMI’02], [Urtasun&Darrell, ICCV’09], [Ionescu et al., ICCV’11]

(∆x, ∆y)

θ

X

p

ψ(yp; I, w)

+ X

p∼p0

ψ(yp, yp0; I, w)

p(y|I, w) ∝

slide-8
SLIDE 8

Extensions

  • Ever since introduced many

extensions are proposed:

  • loopy …
  • mixture …
  • holistic approaches…

[Johnson&Everingham, BMVC’10] [Yang&Ramanan, CVPR’11] [Eichner&Ferrari, ACCV’12] [Sapp et al., ECCV’10] [Tran&Forsyth, ECCV’10] [Wang et al., CVPR’11] [Agarwal&Triggs, PAMI’02] [Urtasun&Darrell, ICCV’09] [Ionescu et al., ICCV’11] …

slide-9
SLIDE 9

result kinematic tree pairwise conditioning

II

IV position/rotation

...

50 100 150 200

extra unary factors

III

appearance

50 100 150 200 50

. . . I

poselets

  • L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele Poselet Conditioned Pictorial Structures, CVPR 2013

Poselet Conditioned Pictorial Structures

slide-10
SLIDE 10
  • “Clusters” of more parts
  • Capture non-adjacent part dependencies

Poselets

... ... ... ...

...

Top detections Poselet cluster medoids

  • L. Bourdev, J. Malik, Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations, ICCV 2009
slide-11
SLIDE 11

Conditioning Pairwise Terms

...

...

Possible pairwise factors

(∆x, ∆y)

θ

Yhead

X

ψ(yhead, x) Ytorso

X

ψ(ytorso, x) ψ(yhead, ytorso)

Possible body models

slide-12
SLIDE 12

Results

Top poselet detections Cluster medoids

Poselet Conditioned

Prediction Result

Baseline PS

Generic Tree Result

slide-13
SLIDE 13

Results on Leeds Sports Poses

1000 training, 1000 testing images

  • bserver centric annotation [Eichner&Ferrari, ACCV12]

Error: PCP percentage of correct parts

  • S. Johnson, M. Everingham, Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation, BMVC 2010
slide-14
SLIDE 14

Results (PCP)

kinematic tree pairwise conditioing

II

result IV position/rotation

...

50 100 150 200

unary factors

III

appearance

50 100 150 200 50

. . . I

poselets

55.7

pairwise

60.9

Baseline PS unary

60.8

pairwise + unary 62.9

slide-15
SLIDE 15

result kinematic tree pairwise conditioning

II

IV position/rotation

...

50 100 150 200

unary factors

III

appearance

50 100 150 200 50

. . . I

poselets

55.7

pairwise

60.9

Baseline PS unary

60.8

pairwise + unary 62.9

Results (PCP)

slide-16
SLIDE 16

kinematic tree result IV position/rotation

...

50 100 150 200

unary factors

III

appearance

50 100 150 200 50

. . . I II

pairwise conditioning poselets

55.7

pairwise

60.9

Baseline PS unary

60.8

pairwise + unary 62.9

Results (PCP)

slide-17
SLIDE 17

result kinematic tree pairwise conditioning

II

IV position/rotation

...

50 100 150 200

unary factors

III

appearance

50 100 150 200 50

. . . I

poselets

55.7

pairwise

60.9

Baseline PS unary

60.8

pairwise + unary 62.9

Results (PCP)

slide-18
SLIDE 18

Results

M A P

F u l l m

  • d

e l

P a r t M a r g i n a l s

P l a i n P i c t

  • r

i a l S t r u c t u r e s

M A P P a r t M a r g i n a l s

slide-19
SLIDE 19

Only 62.9% ??? Why not 100%?

  • What are we missing?
slide-20
SLIDE 20

Expressive Spatial Models…

  • L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele, Strong Appearance and Expressive Spatial Models for Human Pose Estimation,

ICCV 2013

Joint model for body parts and body joints Mid-Level representation

slide-21
SLIDE 21
  • L. Pishchulin, M. Andriluka, P. Gehler, B. Schiele, Strong Appearance and Expressive Spatial Models for Human Pose Estimation,

ICCV 2013

… and Strong Appearance

rotation

Mixtures of DPM for local Appearance Rotation Dependent Part Detectors

slide-22
SLIDE 22

(CNNs)

Empirical Results

Setting PCP [%] model so far 62.9 Andriluka et al. CVPR 09 55.7 + flexible body model 56.9 + local mixtures 65.2 + Poselet conditioned unaries 68.5 + Poselet conditioned pairwise 69.0 Yang & Ramanan, CVPR 11 60.8 Eichner & Ferrari, ACCV 12 64.3 Ramakrishna et al. ECCV 14 67.6 Chen & Yuille arXiv 14 76.6

(Pose Inference Machines)

slide-23
SLIDE 23

Still not perfect … ?

  • All remaining failure cases are of these types

Rare poses Self-occlusion Strong foreshortening

slide-24
SLIDE 24

Only detection!

Same color! Explain this then!

slide-25
SLIDE 25

Challenging Pose Dataset

  • 400 activities
  • 40000 examples
  • multiple people
  • video
  • M. Andriluka, L. Pishchulin, P. Gehler, B. Schiele, Human Pose Estimation: A new Benchmark and State of the Art Analysis, CVPR 2014

joint positions and occlusions 3D torso and head orientation part occlusions activity labels

slide-26
SLIDE 26

Fields of Parts — Parametrization

  • for every body part…
  • …and every possible state
  • … a binary random variable

xp

i ∈ {0, 1}, i = 1, . . . , |Yp|

. . , |Yp|

p = 1, . . . , P

Kiefel & Gehler, Human Pose Estimation with a Fields of Parts, ECCV 2014

slide-27
SLIDE 27

Fields of Parts — Energy

  • Pairwise binary CRF (looooooopy)

Kiefel & Gehler, Human Pose Estimation with a Fields of Parts, ECCV 2014

slide-28
SLIDE 28

Fields of Parts — Factors

  • Unary Factors — your usual HOG filter
  • Pairwise Factors — your usual displacement factor (and more)

(∆x, ∆y)

θ

slide-29
SLIDE 29
  • Number of (body) parts
  • Pictorial Structures — few parts, huge state

space

  • Fields of Parts — many parts, small state space

Comparison to PS

p = 1, . . . , P

yp ∈ {1, . . . , M} × {1, . . . , N} = Yp

xp

i ∈ {0, 1}, i = 1, . . . , |Yp|

slide-30
SLIDE 30

Gain: Bilateral

  • Locally image conditioned pairwise

factors (bilateral, segmentation)

  • Not possible in distance transform for

pictorial structures

slide-31
SLIDE 31

More connections

  • Block-dense connections already
  • New connections scale linearly
slide-32
SLIDE 32

Inference

  • Intractable Inference
  • Mean Field Approximation
  • Update Equation — Bilateral Filtering Operation (linear complexity)

Krähenbühl & Koltun, Efficient inference in fully connected CRFs with Gaussian edge potentials, NIPS 2011

slide-33
SLIDE 33

Fields of Parts — Inference

  • Mean Field updates (here 10)
  • Predict the maximum marginal state
  • ˆ

ip = argmax

i∈Yp

Q10(xp

i = 1|I)

Q0(x|I, θ) → Q1(x|I, θ) → · · · → Q10(x|I, θ)

unaries (step 0)

→ →

Q5(x|I, θ) Q10(x|I, θ)

slide-34
SLIDE 34
  • Objective: Max-Margin Max-Marginal (structured SVM)
  • Backpropagation Mean Field — autodiff through bilateral filtering

Fields of Parts — Objective

→ →

unaries (step 0)

Q5(x|I, θ) Q10(x|I, θ)

Q0(x|I, θ) → Q1(x|I, θ) → · · · → Q10(x|I, θ)

  • J. Domke, Learning Graphical Model Parameters with Approximate Marginal Inference, PAMI 2013
  • P. Krähenbühl & V. Koltun, Parameter Learning and Convergent Inference for Dense Random Fields, ICML 2013
slide-35
SLIDE 35
  • Non-linear convolutional Filter defined by dense graphical model and

mean field inference

  • Neural Network Interpretation

→ →

unaries (step 0)

Q5(x|I, θ) Q10(x|I, θ)

Qi+1(x|I, θ) = F(Qi(x|I, θ))

slide-36
SLIDE 36

Results — APK

  • On equal ground: same features, same “pairwise” terms
  • Pairwise conditionals improve
slide-37
SLIDE 37

Disclaimer: Not state-of-the-art

  • PCP error measure
slide-38
SLIDE 38

Conclusion & Future Work

  • Parts are important for better models/understanding, not necessarily

for performance

  • Richer image interpretation: joint pose estimation & image

segmentation

  • More output: 3D pose, clothing, body measurements, etc
  • Robustness and speed
  • Will see more models that put tractable inference first
slide-39
SLIDE 39

Reference List

  • Teaching Geometry to Deformable Part Models, CVPR12
  • 3D2DPM — 3D Deformable Part Models, ECCV12
  • Poselet Conditioned Pictorial Structures, CVPR13
  • Strong Appearance and Expressive Spatial Models for Human Pose

Estimation, ICCV13

  • Human Pose Estimation: A new Benchmark and State of the Art Analysis,

CVPR14

  • Human Pose Estimation with a Fields of Parts, ECCV14
pi pi
slide-40
SLIDE 40

Thank You! Feedback Welcome!

Bernt Schiele Martin Kiefel Micha Andriluka Leonid Pishchulin